v0.8 is a major release of the framework, featuring significant reliability improvements to VoiceAssistant. This update includes a few breaking API changes that will impact the way you build your agents. We strive to minimize breaking changes, and will stabilize the API as we approach version 1.0.
entrypoint_fnc
is now a parameter in WorkerOptions. Previously, you were required to explicitly accept the job.
We've removed the namespace option in order to simplify the registration process. In future versions, it'll be possible to provide an explicit agent_name
to launch multiple kinds of agents for each room.
You now need to call await ctx.connect()
to initiate the connection to the room. This allows for pre-connect setup (such as callback registrations) to avoid race conditions.
The above changes are reflected in the following minimal example:
from livekit.agents import JobContext, JobRequest, WorkerOptions, cli
async def job_entrypoint(ctx: JobContext):
await ctx.connect()
# your logic here
...
if __name__ == "__main__":
cli.run_app(
WorkerOptions(entrypoint_fnc=job_entrypoint)
)
VoiceAssistant API remains mostly unchanged, despite significant improvements to functionality and internals. However, there have been changes to the configuration.
- Removed
- base_volume
- debug
- sentence_tokenizer, word_tokenizer, hyphenate_word
- Changed
- transcription related options are grouped within
transcription
param
- transcription related options are grouped within
class VoiceAssistant(utils.EventEmitter[EventTypes]):
def __init__(
self,
*,
vad: vad.VAD,
stt: stt.STT,
llm: LLM,
tts: tts.TTS,
chat_ctx: ChatContext | None = None,
fnc_ctx: FunctionContext | None = None,
allow_interruptions: bool = True,
interrupt_speech_duration: float = 0.6,
interrupt_min_words: int = 0,
preemptive_synthesis: bool = True,
transcription: AssistantTranscriptionOptions = AssistantTranscriptionOptions(),
will_synthesize_assistant_reply: WillSynthesizeAssistantReply = _default_will_synthesize_assistant_reply,
plotting: bool = False,
loop: asyncio.AbstractEventLoop | None = None,
) -> None:
...
The LLM class has been restructured to enhance ergonomics and improve the function calling support.
Function calling has gotten a complete overhaul in v0.8.0. The primary breaking change is that function calls are now NOT automatically invoked when iterating the LLM stream. LLMStream.execute_functions
needs to be called instead. (VoiceAssistant handles this automatically)
Previously, LLM.chat() was an async method that returned an LLMStream (which itself was an AsyncIterable).
We found it easier and less-confusing for LLM.chat() to be synchronous, while still returning the same AsyncIterable LLMStream.
In order to improve consistency and reduce confusion.
chat_ctx = llm.ChatContext()
chat_ctx.append(role="user", text="user message")
stream = llm_plugin.chat(chat_ctx=chat_ctx)
Previously, to communicate to a STT provider that you have sent enough input to generate a response - you could push_frame(None) to coax the TTS into synthesizing a response.
In v0.8.0 that API has been removed and replaced with flush()
end_input
signals to the STT provider that the input is complete and no additional input will follow. Previously, this was done using aclose(wait=True).
The wait
arg of aclose has been removed in favor of SpeechStream.end_input (see above). Now, if you call TTS.aclose()
without first calling STT.end_input, the behavior will be that the request is cancelled.
stt_stream = my_stt_instance.stream()
async for ev in audio_stream:
stt_stream.push_frame(ev.frame)
# optionally flush when enough frames have been pushed
stt_stream.flush()
stt_stream.end_input()
await stt_stream.aclose()
SynthesizedAudio dataclass has gone through a major change
# New SynthesizedAudio dataclass
@dataclass
class SynthesizedAudio:
request_id: str
"""Request ID (one segment could be made up of multiple requests)"""
segment_id: str
"""Segment ID, each segment is separated by a flush"""
frame: rtc.AudioFrame
"""Synthesized audio frame"""
delta_text: str = ""
"""Current segment of the synthesized audio"""
#Old SynthesizedAudio dataclass
@dataclass
class SynthesizedAudio:
text: str
data: rtc.AudioFrame
The SynthesisEvent has been removed entirely. All occurrences of it have been replaced with SynthesizedAudio
Similar to the STT changes, this coaxes the TTS provider into generating a response. The SynthesizedAudio response will have a new segment_id after calls to flush().
Similar to the STT changes, aclose(wait=True) has been replaced.
Similar to the STT changes, the wait arg has been removed.
tts_stream = my_tts_instance.stream()
tts_stream.push_text("This is the first sentence")
tts_stream.flush()
tts_stream.push_text("This is the second sentence")
tts_stream.end_input()
await tts_stream.aclose()
The same changes made to STT and TTS have also been made to VAD
vad_stream = my_vad_instance.stream()
async for ev in audio_stream:
vad_stream.push_frame(ev.frame)
# optionally flush when enough frames have been pushed
vad_stream.flush()
vad_stream.end_input()
await vad_stream.aclose()