Image caption generation is a challenging task at the intersection of computer vision and natural language processing that aims to generate meaningful captions for a given image. This paper proposes an image caption generator that will accept an image as an input and generate an English sentence as output by labeling the image’s content . The system takes the pre-trained deep learning Convolutional Neural Network (CNN) architecture model that extracts high-level visual features from input images, which are then processed by the LSTM to generate coherent and contextually relevant captions. he model is trained on the Flickr8K dataset, ensuring diverse and comprehensive caption generation . Evaluation of model is done using standard metrics such as BLEU and METEOR scores to assess the accuracy and fluency of generated captions.