Scholarly open access journals, Peer-reviewed, and Refereed Journals, Impact factor 8.14 (Calculate by google scholar and Semantic Scholar | AI-Powered Research Tool) , Multidisciplinary, Monthly, Indexing in all major database & Metadata, Citation Generator, Digital Object Identifier(DOI)
Direct speech to image translation is a challenging task in the realm of artificial intelligence, with applications ranging from aiding the visually impaired to enhancing human-computer interaction. This paper proposes a novel approach utilizing Convolutional Neural Networks (CNNs) to directly translate spoken descriptions into corresponding images. The system first converts speech input into text using automatic speech recognition (ASR), then employs a CNN-based architecture to generate images based on the extracted textual features. The proposed CNN architecture comprises convolutional layers for feature extraction, followed by deconvolutional layers for image reconstruction. To enhance the fidelity of generated images, techniques such as attention mechanisms and adversarial training are integrated into the network. Additionally, transfer learning may be employed to leverage pre-trained CNN models for better generalization and performance.
Keywords:
speech recognition, CNN, direct speech translation, Image generation
Cite Article:
"Direct Speech to Image Translation unsing CNN", International Journal for Research Trends and Innovation (www.ijrti.org), ISSN:2455-2631, Vol.9, Issue 4, page no.887 - 895, April-2024, Available :http://www.ijrti.org/papers/IJRTI2404123.pdf
Downloads:
000205218
ISSN:
2456-3315 | IMPACT FACTOR: 8.14 Calculated By Google Scholar| ESTD YEAR: 2016
An International Scholarly Open Access Journal, Peer-Reviewed, Refereed Journal Impact Factor 8.14 Calculate by Google Scholar and Semantic Scholar | AI-Powered Research Tool, Multidisciplinary, Monthly, Multilanguage Journal Indexing in All Major Database & Metadata, Citation Generator