Scholarly open access journals, Peer-reviewed, and Refereed Journals, Impact factor 8.14 (Calculate by google scholar and Semantic Scholar | AI-Powered Research Tool) , Multidisciplinary, Monthly, Indexing in all major database & Metadata, Citation Generator, Digital Object Identifier(DOI)
Generating image captions is an essential process in computer vision and natural language processing, where an automatic description of the image is generated through a computer algorithm. Image captioning has attracted considerable attention owing to its usage in various domains such as assistive technologies, content management, and human-computer inter- action. This paper presents a smart image caption generator that integrates deep learning algorithms and effective feature extraction methods to create precise and coherent captions.
The proposed intelligent image caption generator adopts the CLIP model, which enables the extraction of high-quality se- mantic features from the image. The model uses a Transformer decoder to create the captions. The system is trained using the TextCaps dataset, which consists of images and their corre- sponding captions, thus enabling the learning of the relationship between the visual aspect and text. The beam search decoding technique is implemented to enhance the quality of the generated captions by choosing the appropriate sequence of words.
In order to ensure that the system is easy to use and deployable in the real world, a web application is built using Streamlit, allowing users to either upload images or take pictures through their device cameras. Moreover, features like activity detection and text-to-speech improve the ease of use and accessibility of the model. The proposed method proves that using CLIP for feature extraction and Transformer for caption generation is effective.
Index Terms—Image Captioning, CLIP, Transformer Decoder, Beam Search, Deep Learning, TextCaps, Streamlit, Action De- tection, Text-to-Speech
"CLIP-Based Image Caption Generation Using Transformer Decoder", International Journal for Research Trends and Innovation (www.ijrti.org), ISSN:2456-3315, Vol.11, Issue 4, page no.b97-b100, April-2026, Available :http://www.ijrti.org/papers/IJRTI2604149.pdf
Downloads:
00040
ISSN:
2456-3315 | IMPACT FACTOR: 8.14 Calculated By Google Scholar| ESTD YEAR: 2016
An International Scholarly Open Access Journal, Peer-Reviewed, Refereed Journal Impact Factor 8.14 Calculate by Google Scholar and Semantic Scholar | AI-Powered Research Tool, Multidisciplinary, Monthly, Multilanguage Journal Indexing in All Major Database & Metadata, Citation Generator