Multimodal-AI-App-using-Llava-7B

Multimodal AI App using Llava 7B and Gradio. Building an AI Voice Assistant App using Multimodal LLM "Llava" and Whisper.

Description:

Dive into the fascinating world of generative AI as we build a cutting-edge voice assistant using the multimodal LLM "Llava 1.5 7B" for unparalleled image/text understanding capabilities, and the robust Whisper model by OpenAI for accurate speech-to-text conversion.
It showcases the integration of these technologies within a Gradio app, complemented by the gTTS library for realistic text-to-speech functionality bringing our voice assistant to life.
Build an AI Voice Assistant App using Multimodal LLM "Llava" and Whisper

Distributed under the MIT License. See LICENSE for more information.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
.gitignore		.gitignore
LICENSE		LICENSE
Multimodal_RAG.ipynb		Multimodal_RAG.ipynb
README.md		README.md
multimodal_rag.py		multimodal_rag.py