Multimodal AI App using Llava 7B and Gradio. Building an AI Voice Assistant App using Multimodal LLM "Llava" and Whisper.
- Dive into the fascinating world of generative AI as we build a cutting-edge voice assistant using the multimodal LLM "Llava 1.5 7B" for unparalleled image/text understanding capabilities, and the robust Whisper model by OpenAI for accurate speech-to-text conversion.
- It showcases the integration of these technologies within a Gradio app, complemented by the gTTS library for realistic text-to-speech functionality bringing our voice assistant to life.
- Build an AI Voice Assistant App using Multimodal LLM "Llava" and Whisper
Distributed under the MIT License. See LICENSE
for more information.