ABSTRACT
In this paper the recent advances in image description techniques have covered a wide range of methodologies, from classical approaches to cutting-edge deep learning techniques. The study begins by introducing the essential principles and issues of image description, such as semantic gap bridging and individual interpretation. It then looks into older approaches such as handmade feature extraction and rule-based captioning systems, emphasizing their benefits and drawbacks. The research investigates contemporary advancements in multimodal image description, which combines textual and visual modalities to produce richer, more contextually relevant descriptions.
