ABSTRACT

The image caption generation is a major problem faced in many situations. It basically means to generate a descriptive sentence of the image. This makes it very complex as the machine has to deep learn from the datasets and then describe the objects, activities, and the places. The fact that humans can do it quite easily for small sets but fail when the number of images is more, make it a quite interesting challenge of deep learning. Many of the implementations are quite complex and some even require multiple passes. The image caption generation task can be simplified with the help of deep learning concepts of deep neural networks. The show and tell algorithm that incorporates the concept of LSTMs proves to be beneficial in simplifying the task to generate captions.