Skip to content

winterdelta/swift-vad-stt-for-athens

Repository files navigation

w∆ | Winterdelta

Updates in this repo to ai-ng / swift

  • voice-ai / swift is now uninterruptible:
    • If the ai has started speaking, it will continue until it has finished speaking.
    • Aids conversational-fluency a little.
    • The overall conversational-experience should be less 'jittery' - e.g. if there is ambient background noise, it won't unexpectedly stop or start speaking.
    • Easier to listen to, basically, and easier to chat with, overall.

Swift is a fast AI voice assistant.

  • Groq is used for fast inference of OpenAI Whisper (for transcription) and Meta Llama 3 (for generating the text response).
  • Cartesia's Sonic voice model is used for fast speech synthesis, which is streamed to the frontend.
  • VAD is used to detect when the user is talking, and run callbacks on speech segments.
  • The app is a Next.js project written in TypeScript and deployed to Vercel.

Thank you to the teams at Groq and Cartesia for providing access to their APIs for this demo!

Deploy with Vercel

Developing

  • Clone the repository
  • Copy .env.example to .env.local and fill in the environment variables.
  • Run pnpm install to install dependencies.
  • Run pnpm dev to start the development server.