This project aims to create a realistic, low-latency chatbot that functions as an AI sales assistant for Nooks, an AI-powered sales development platform. The chatbot responds when the user falls silent for some time, simulating a natural conversation flow.
https://www.loom.com/share/ce9d6443c02e4ae19e5ca3994b58e3df
The current implementation is relatively slow to respond the the user - the goal is to make it faster.
The system consists of three main components:
- Speech-to-text (STT) using AssemblyAI's hosted API for real-time transcription
- A sales chatbot powered by OpenAI's GPT-4 model (note - not gpt4o which is faster but not as accurate in some cases)
- Text-to-speech (TTS) using ElevenLabs for voice output
The chatbot listens to user input, transcribes it in real-time, and generates a response when the user stops speaking. The AI's response is then converted to speech and played back to the user.
Assume that you are not allowed to modify the services used (you must use Assembly's hosted model for STT, OpenAI's GPT-4 for the chatbot, and ElevenLabs with this voice setting for TTS). How would you modify the code to make the chatbot lower latency & respond faster?
Your solution will be evaluated based on:
- Reduction in overall latency
- Maintenance of conversation quality and realism (i.e the chatbot doesn't interrupt the human speaker while they're in the middle of speaking)
- Code quality and clarity of explanation
- Review the existing code in
main.py
,lib/sales_chatbot.py
, andlib/elevenlabs_tts.py
- Install the requirements by running
pip install -r requirements.txt
(or use a virtual environment if you prefer) - Set your OpenAI, AssemblyAI, and ElevenLabs API keys in
.env
- you should have received them via email. - Run the current implementation to understand its behavior by running
python3 main.py
- Begin your optimization process. Document your changes and reasoning in this README.md file when done.
If you're getting stuck with installation issues, we offer an alternative Poetry-based installation method.
- Install Poetry
- Install all requirements by running
poetry install
- Run the current implementation by running
poetry run python3 main.py
Good luck!
Right now a lot of the latency comes from external services (TTS, LLM inference, STT). An easy way to reduce latency would be to use local models. For example you could use:
- Nemo ASR for STT
- Llama for the chatbot
- bark or tortoise for TTS Try building a version of this chatbot that is local-only and see what speedup you achieve!