Note
For llamaspeak version 2 with multimodal support, see the local_llm
container
- Talk live with LLM's using NVIDIA Riva ASR and TTS!
- Requires the
riva-server
andtext-generation-webui
to be running
First, follow the steps from the riva-client:python
package to run and test the Riva server:
- Start the Riva server on your Jetson by following
riva_quickstart_arm64
- Run some of the Riva ASR examples to confirm that ASR is working: https://github.com/nvidia-riva/python-clients#asr
- Run some of the Riva TTS examples to confirm that TTS is working: https://github.com/nvidia-riva/python-clients#tts
You can also see this helpful video and guide from JetsonHacks for setting up Riva: Speech AI on Jetson Tutorial
Next, start text-generation-webui
(version 1.7) with the --api
flag and load your chat model of choice through it's web UI on port 7860:
./run.sh --workdir /opt/text-generation-webui $(./autotag text-generation-webui:1.7) \
python3 server.py --listen --verbose --api \
--model-dir=/data/models/text-generation-webui
note: launch the
text-generation-webui:1.7
container to maintain API compatability
Alternatively, you can manually specify the model that you want to load without needing to use the web UI:
./run.sh --workdir /opt/text-generation-webui $(./autotag text-generation-webui:1.7) \
python3 server.py --listen --verbose --api \
--model-dir=/data/models/text-generation-webui \
--model=llama-2-13b-chat.Q4_K_M.gguf \
--loader=llamacpp \
--n-gpu-layers=128 \
--n_ctx=4096 \
--n_batch=4096 \
--threads=$(($(nproc) - 2))
See here for command-line arguments: https://github.com/oobabooga/text-generation-webui/tree/main#basic-settings
Browsers require HTTPS to be used in order to access the client's microphone. Hence, you'll need to create a self-signed SSL certificate and key:
$ cd /path/to/your/jetson-containers/data
$ openssl req -x509 -newkey rsa:4096 -keyout key.pem -out cert.pem -sha256 -days 365 -nodes -subj '/CN=localhost'
You'll want to place these in your jetson-containers/data
directory, because this gets automatically mounted into the containers under /data
, and will keep your SSL certificate persistent across container runs. When you first navigate your browser to a page that uses these self-signed certificates, it will issue you a warning since they don't originate from a trusted authority:
You can choose to override this, and it won't re-appear again until you change certificates or your device's hostname/IP changes.
To run the llamaspeak chat server with its default arguments and the SSL keys you generated, start it like this:
./run.sh --env SSL_CERT=/data/cert.pem --env SSL_KEY=/data/key.pem $(./autotag llamaspeak)
See chat.py
for command-line options that can be changed. For example, to enable --verbose
or --debug
logging:
./run.sh --workdir=/opt/llamaspeak \
--env SSL_CERT=/data/cert.pem \
--env SSL_KEY=/data/key.pem \
$(./autotag llamaspeak) \
python3 chat.py --verbose
if you're having issues with getting audio or responses from the web client, enable debug logging to check the message traffic.
The default port is 8050
, but that can be changed with the --port
argument. You can then navigate your browser to https://HOSTNAME:8050
CONTAINERS
llamaspeak |
|
---|---|
Builds | |
Requires | L4T >=34.1.0 |
Dependencies | build-essential python riva-client:python numpy |
Dockerfile | Dockerfile |
Images | dustynv/llamaspeak:r35.2.1 (2023-09-07, 5.0GB) dustynv/llamaspeak:r35.3.1 (2023-08-29, 5.0GB) dustynv/llamaspeak:r35.4.1 (2023-12-05, 5.0GB) |
CONTAINER IMAGES
Repository/Tag | Date | Arch | Size |
---|---|---|---|
dustynv/llamaspeak:r35.2.1 |
2023-09-07 |
arm64 |
5.0GB |
dustynv/llamaspeak:r35.3.1 |
2023-08-29 |
arm64 |
5.0GB |
dustynv/llamaspeak:r35.4.1 |
2023-12-05 |
arm64 |
5.0GB |
Container images are compatible with other minor versions of JetPack/L4T:
• L4T R32.7 containers can run on other versions of L4T R32.7 (JetPack 4.6+)
• L4T R35.x containers can run on other versions of L4T R35.x (JetPack 5.1+)
RUN CONTAINER
To start the container, you can use the run.sh
/autotag
helpers or manually put together a docker run
command:
# automatically pull or build a compatible container image
./run.sh $(./autotag llamaspeak)
# or explicitly specify one of the container images above
./run.sh dustynv/llamaspeak:r35.4.1
# or if using 'docker run' (specify image and mounts/ect)
sudo docker run --runtime nvidia -it --rm --network=host dustynv/llamaspeak:r35.4.1
run.sh
forwards arguments todocker run
with some defaults added (like--runtime nvidia
, mounts a/data
cache, and detects devices)
autotag
finds a container image that's compatible with your version of JetPack/L4T - either locally, pulled from a registry, or by building it.
To mount your own directories into the container, use the -v
or --volume
flags:
./run.sh -v /path/on/host:/path/in/container $(./autotag llamaspeak)
To launch the container running a command, as opposed to an interactive shell:
./run.sh $(./autotag llamaspeak) my_app --abc xyz
You can pass any options to run.sh
that you would to docker run
, and it'll print out the full command that it constructs before executing it.
BUILD CONTAINER
If you use autotag
as shown above, it'll ask to build the container for you if needed. To manually build it, first do the system setup, then run:
./build.sh llamaspeak
The dependencies from above will be built into the container, and it'll be tested during. See ./build.sh --help
for build options.