Extras project is discontinued and won't receive any new updates or modules. The vast majority of modules are available natively in the main SillyTavern application. You may still install and use it but don't expect to get immediate support if you face any issues.
- April 24 2024 - The project is officially discontinued.
- November 20 2023 - The project is relicensed as AGPLv3 to comply with the rest of ST organization policy. If you have any concerns about that, please raise a discussion in the appropriate channel.
- November 16 2023 - Requirement files were remade from scratch to simplify the process of local installation.
- Removed requirements-complete.txt, please use requirements.txt instead.
- Unlocked versions of all requirements unless strictly necessary.
- Coqui TTS requirements moved to requirements-coqui.txt.
- July 25 2023 - Now extras require Python 3.11 to run, some of the new modules will be incompatible with old Python 3.10 installs. To migrate using conda, please remove old environment using
conda remove --name extras --all
and reinstall using the instructions below.
A set of APIs for various SillyTavern extensions.
You need to run the latest version of SillyTavern. Grab it here: How to install, Git repository
All modules, except for Stable Diffusion, run on the CPU by default. However, they can alternatively be configured to use CUDA (with --cuda
command line option). When running all modules simultaneously, you can expect a usage of approximately 6 GB of RAM. Loading Stable Diffusion adds an additional couple of GB to the memory usage.
Some modules can be configured to use CUDA separately from the rest (e.g. --talkinghead-gpu
, --coqui-gpu
command line options). This is useful in low-VRAM setups, such as on a gaming laptop.
Try on Colab (will give you a link to Extras API):
Colab link: https://colab.research.google.com/github/SillyTavern/SillyTavern/blob/release/colab/GPU.ipynb
Documentation: https://docs.sillytavern.app/
- Default requirements.txt installs PyTorch CUDA by default.
- If you run on AMD GPU, use requirements-rocm.txt file instead.
- If you run on Apple Silicon (ARM series), use the requirements-silicon.txt file instead.
- If you want to use Coqui TTS, install requirements-coqui.txt after choosing the requirements from the list above.
- If you want to use RVC, install requirements-rvc.txt after choosing the requirements from the list above.
- BE WARNED THAT:
- Coqui package is extremely unstable and may break other packages or not work at all in your environment.
- It's not really worth it.
ERROR: Could not build wheels for hnswlib, which is required to install pyproject.toml-based projects
Installing the chromadb package requires one of the following:
- Have Visual C++ build tools installed: https://visualstudio.microsoft.com/visual-cpp-build-tools/
- Installing hnswlib from conda:
conda install -c conda-forge hnswlib
❗ IMPORTANT! The chromadb package is used only by the chromadb
module for the old Smart Context extension, which is deprecated. You will likely not need it.
You must specify a list of module names to be run in the --enable-modules
command (caption
provided as an example). See Modules section.
- Open colab link
- Select desired "extra" options and start the cell
- Wait for it to finish
- Get an API URL link from colab output under the
### SillyTavern Extensions LINK ###
title - Start SillyTavern with extensions support: set
enableExtensions
totrue
in config.conf - Navigate to SillyTavern extensions menu and put in an API URL and tap "Connect" to load the extensions
There are some folks in the community having success running Extras on their phones via Ubuntu on Termux. This project wasn't made with mobile support in mind, so this guide is provided strictly for your information only: https://rentry.org/STAI-Termux#downloading-and-running-tai-extras
We will NOT provide any support for running Extras on Android. Direct all your questions to the creator of the guide linked above.
PREREQUISITES
- Install Miniconda: https://docs.conda.io/en/latest/miniconda.html
- (Important!) Read how to use Conda: https://conda.io/projects/conda/en/latest/user-guide/getting-started.html
- Install git: https://git-scm.com/downloads
EXECUTE THESE COMMANDS ONE BY ONE IN THE CONDA COMMAND PROMPT.
TYPE/PASTE EACH COMMAND INTO THE PROMPT, HIT ENTER AND WAIT FOR IT TO FINISH!
- Before the first run, create an environment (let's call it
extras
):
conda create -n extras
- Now activate the newly created env
conda activate extras
- Install Python 3.11
conda install python=3.11
- Install the required system packages
conda install git
- Clone this repository
git clone https://github.com/SillyTavern/SillyTavern-extras
- Navigated to the freshly cloned repository
cd SillyTavern-extras
- Install the project requirements
pip install -r requirements.txt
- Run the Extensions API server
python server.py --enable-modules=caption,summarize,classify
- Copy the Extra's server API URL listed in the console window after it finishes loading up. On local installs, this defaults to
http://localhost:5100
. - Open your SillyTavern config.conf file (located in the base install folder), and look for a line "
const enableExtensions
". Make sure that line has "= true
", and not "= false
". - Start your SillyTavern server
- Open the Extensions panel (via the 'Stacked Blocks' icon at the top of the page), paste the API URL into the input box, and click "Connect" to connect to the Extras extension server.
- To run again, simply activate the environment and run these commands. Be sure to the additional options for server.py (see below) that your setup requires.
conda activate extras
python server.py
Installation requirements for Talkinghead changed in January 2024. The live mode - i.e. the talkinghead
module that powers the Talkinghead mode of Character Expressions - no longer needs any additional packages.
However, a manual poser app has been added, serving two purposes. First, it is a GUI editor for the Talkinghead emotion templates. Secondly, it can batch-generate static emotion sprites from a single Talkinghead image. The latter can be convenient if you want the convenience of AI-powered posing (e.g. if you make new characters often), but don't want to run the live mode.
The manual poser app, and only that app, still requires the installation of an additional package that is not installed automatically due to incompatibility with Colab. If you want to be able to use the manual poser app, then run this after you have installed other requirements:
conda activate extras
pip install wxpython==4.2.1
The installation of the wxpython package can easily take half an hour on a fast CPU, as it needs to compile a whole GUI toolkit.
More information about Talkinghead can be found in its full documentation.
- Install Python 3.11: https://www.python.org/downloads/release/python-3114/
- Install git: https://git-scm.com/downloads
- Clone the repo:
git clone https://github.com/SillyTavern/SillyTavern-extras
cd SillyTavern-extras
- Run
python -m pip install -r requirements.txt
- Run
python server.py --enable-modules=caption,summarize,classify
- Get the API URL. Defaults to
http://localhost:5100
if you run locally. - Start SillyTavern with extensions support: set
enableExtensions
totrue
in config.conf - Navigate to the SillyTavern extensions menu and put in an API URL and tap "Connect" to load the extensions
Name | Used by | Description |
---|---|---|
caption |
Image captioning | |
chromadb |
Smart Context | Vector storage server |
classify |
Character Expressions | Text sentiment classification |
coqui-tts |
Coqui TTS server | |
edge-tts |
Microsoft Edge TTS client | |
embeddings |
Vector Storage | The Extras vectorization source |
rvc |
Real-time voice cloning | |
sd |
Stable Diffusion image generation (remote A1111 server by default) | |
silero-tts |
Silero TTS server | |
summarize |
Summarize | The Extras API backend |
talkinghead |
Character Expressions | AI-powered character animation (see full documentation) |
websearch |
Websearch | Google or DuckDuckGo search using Selenium headless browser |
- Character Expressions can connect to two Extras modules,
classify
andtalkinghead
.classify
updates the expression of the AI character's avatar automatically based on text sentiment analysis.talkinghead
provides AI-powered character animation. It also takes its expression from the Extrasclassify
.- To use Talkinghead, Extensions ⊳ Character Expressions ⊳ Local server classification in the ST GUI must be off, and
classify
must be enabled in Extras.
- To use Talkinghead, Extensions ⊳ Character Expressions ⊳ Local server classification in the ST GUI must be off, and
- Smart Context is deprecated; superseded by Vector Storage.
- The
embeddings
module makes the ingestion performance comparable with ChromaDB, as it uses the same vectorization backend. - Vector Storage does not use other Extras modules.
- The
- Summarize: the Main API is generally more capable, as it uses your main LLM to perform the summarization.
- The
summarize
module is only used when you summarize with the Extras API. It uses a specialized BART summarization model, with a context size of 1024.
- The
Flag | Description |
---|---|
--enable-modules |
Required option. Which modules to enable. Expects a comma-separated list of module names. Ordering does not matter. See Modules Example: --enable-modules=caption,sd |
--port |
Specify the port on which the application is hosted. Default: 5100 |
--listen |
Host the app on the local network |
--share |
Share the app on CloudFlare tunnel |
--secure |
Adds API key authentication requirements. Highly recommended when paired with share! |
--cpu |
Run the models on the CPU instead of CUDA. Enabled by default. |
--mps or --m1 |
Run the models on Apple Silicon. Only for M1 and M2 processors. |
--cuda |
Use CUDA (GPU+VRAM) to run modules if it is available. Otherwise, falls back to using CPU. |
--cuda-device |
Specifies a CUDA device to use. Defaults to cuda:0 (first available GPU). |
--talkinghead-gpu |
Use CUDA (GPU+VRAM) for Talkinghead. Highly recommended, 10-30x FPS increase in animation. |
--talkinghead-model |
Load a specific variant of the THA3 AI poser model for Talkinghead. Default: auto (which is separable_half on GPU, separable_float on CPU). |
--talkinghead-models |
If the THA3 AI poser models are not yet installed, downloads and installs them. Expects a HuggingFace model ID. Default: OktayAlpk/talking-head-anime-3 |
--coqui-gpu |
Use GPU for coqui TTS (if available). |
--coqui-model |
If provided, downloads and preloads a coqui TTS model. Default: none. Example: tts_models/multilingual/multi-dataset/bark |
--summarization-model |
Load a custom summarization model. Expects a HuggingFace model ID. Default: Qiliang/bart-large-cnn-samsum-ChatGPT_v3 |
--classification-model |
Load a custom sentiment classification model. Expects a HuggingFace model ID. Default (6 emotions): nateraw/bert-base-uncased-emotion Other solid option is (28 emotions): joeddav/distilbert-base-uncased-go-emotions-student For Chinese language: touch20032003/xuyuan-trial-sentiment-bert-chinese |
--captioning-model |
Load a custom captioning model. Expects a HuggingFace model ID. Default: Salesforce/blip-image-captioning-large |
--embedding-model |
Load a custom text embedding (vectorization) model. Both the embeddings and chromadb modules use this.The backend is sentence_transformers , so check there for info on supported models.Expects a HuggingFace model ID. Default: sentence-transformers/all-mpnet-base-v2 |
--chroma-host |
Specifies a host IP for a remote ChromaDB server. |
--chroma-port |
Specifies an HTTP port for a remote ChromaDB server. Default: 8000 |
--sd-model |
Load a custom Stable Diffusion image generation model. Expects a HuggingFace model ID. Default: ckpt/anything-v4.5-vae-swapped Must have VAE pre-baked in PyTorch format or the output will look drab! |
--sd-cpu |
Force the Stable Diffusion generation pipeline to run on the CPU. SLOW! |
--sd-remote |
Use a remote SD backend. Supported APIs: sd-webui |
--sd-remote-host |
Specify the host of the remote SD backend Default: 127.0.0.1 |
--sd-remote-port |
Specify the port of the remote SD backend Default: 7860 |
--sd-remote-ssl |
Use SSL for the remote SD backend Default: False |
--sd-remote-auth |
Specify the username:password for the remote SD backend (if required) |
If you're getting the following error when running coqui-tts module on M1 Mac:
ImportError: dlopen(/Users/user/.../lib/python3.11/site-packages/MeCab/_MeCab.cpython-311-darwin.so, 0x0002): symbol not found in flat namespace '__ZN5MeCab11createModelEPKc'
Do the following:
- Install homebrew: https://brew.sh/
- Build and install the
mecab
package
brew install --build-from-source mecab
ARCHFLAGS='-arch arm64' pip install --no-binary :all: --compile --use-pep517 --no-cache-dir --force mecab-python3
❗ IMPORTANT! ChromaDB is used only by the chromadb
module for the old Smart Context extension, which is deprecated. You will likely not need it.
ChromaDB is a blazing fast and open source database that is used for long-term memory when chatting with characters. It can be run in-memory or on a local server on your LAN.
NOTE: You should NOT run ChromaDB on a cloud server. There are no methods for authentication (yet), so unless you want to expose an unauthenticated ChromaDB to the world, run this on a local server in your LAN.
Run the extras server with the chromadb
module enabled (recommended).
Use this if you want to use ChromaDB with docker or host it remotely. If you don't know what that means and only want to use ChromaDB with ST on your local device, use the 'in-memory' instructions instead.
Prerequisites: Docker, Docker compose (make sure you're running in rootless mode with the systemd service enabled if on Linux).
Steps:
- Run
git clone https://github.com/chroma-core/chroma chromadb
andcd chromadb
- Run
docker-compose up -d --build
to build ChromaDB. This may take a long time depending on your system - Once the build process is finished, ChromaDB should be running in the background. You can check with the command
docker ps
- On your client machine, specify your local server ip in the
--chroma-host
argument (ex.--chroma-host=192.168.1.10
)
If you are running ChromaDB on the same machine as SillyTavern, you will have to change the port of one of the services. To do this for ChromaDB:
- Run
docker ps
to get the container ID and thendocker container stop <container ID>
- Enter the ChromaDB git repository
cd chromadb
- Open
docker-compose.yml
and look for the line starting withuvicorn chromadb.app:app
- Change the
--port
argument to whatever port you want. - Look for the
ports
category and change the occurrences of8000
to whatever port you chose in step 4. - Save and exit. Then run
docker-compose up --detach
- On your client machine, make sure to specity the
--chroma-port
argument (ex.--chroma-port=<your-port-here>
) along with the--chroma-host
argument.
This section is developer documentation, containing usage examples of the API endpoints.
This is kept up-to-date on a best-effort basis, but there is a risk of this documentation being out of date. When in doubt, refer to the actual source code.
GET /api/modules
None
{"modules":["caption", "classify", "summarize"]}
POST /api/caption
{ "image": "base64 encoded image" }
{ "caption": "caption of the posted image" }
POST /api/summarize
{ "text": "text to be summarize", "params": {} }
{ "summary": "summarized text" }
Name | Default value |
---|---|
temperature |
1.0 |
repetition_penalty |
1.0 |
max_length |
500 |
min_length |
200 |
length_penalty |
1.5 |
bad_words |
["\n", '"', "*", "[", "]", "{", "}", ":", "(", ")", "<", ">"] |
POST /api/classify
{ "text": "text to classify sentiment of" }
{
"classification": [
{
"label": "joy",
"score": 1.0
},
{
"label": "anger",
"score": 0.7
},
{
"label": "love",
"score": 0.6
},
{
"label": "sadness",
"score": 0.5
},
{
"label": "fear",
"score": 0.4
},
{
"label": "surprise",
"score": 0.3
}
]
}
NOTES
- Sorted by descending score order
- List of categories defined by the summarization model
- Value range from 0.0 to 1.0
POST /api/image
{ "prompt": "prompt to be generated", "sampler": "DDIM", "steps": 20, "scale": 6, "model": "model_name" }
{ "image": "base64 encoded image" }
NOTES
- Only the "prompt" parameter is required
- Both "sampler" and "model" parameters only work when using a remote SD backend
GET /api/image/models
{ "models": [list of all available model names] }
GET /api/image/samplers
{ "samplers": [list of all available sampler names] }
GET /api/image/model
{ "model": "name of the current loaded model" }
POST /api/image/model
{ "model": "name of the model to load" }
{ "previous_model": "name of the previous model", "current_model": "name of the newly loaded model" }
POST /api/tts/generate
{ "speaker": "speaker voice_id", "text": "text to narrate" }
WAV audio file.
GET /api/tts/speakers
[
{
"name": "en_0",
"preview_url": "http://127.0.0.1:5100/api/tts/sample/en_0",
"voice_id": "en_0"
}
]
GET /api/tts/sample/<voice_id>
WAV audio file.
POST /api/embeddings/compute
This is a vectorization source (text embedding provider) for the Vector Storage built-in extension of ST.
If you have many text items to vectorize (e.g. chat history, or chunks for RAG ingestion), send them in all at once. This allows the backend to batch the input, allocating the available compute resources efficiently, and thus running much faster (compared to processing a single item at a time).
The embeddings are always normalized.
For one text item:
{ "text": "The quick brown fox jumps over the lazy dog." }
For multiple text items, just put them in an array:
{ "text": ["The quick brown fox jumps over the lazy dog.",
"Lorem ipsum dolor sit amet, consectetur adipiscing elit.",
...] }
When the input was one text item, returns one vector (the embedding of that text item) as an array:
{ "embedding": [numbers] }
When the input was multiple text items, returns multiple vectors (one for each input text item) as an array of arrays:
{ "embedding": [[numbers],
[numbers], ...] }
POST /api/chromadb
{
"chat_id": "chat1 - 2023-12-31",
"messages": [
{
"id": "633a4bd1-8350-46b5-9ef2-f5d27acdecb7",
"date": 1684164339877,
"role": "user",
"content": "Hello, AI world!",
"meta": "this is meta"
},
{
"id": "8a2ed36b-c212-4a1b-84a3-0ffbe0896506",
"date": 1684164411759,
"role": "assistant",
"content": "Hello, Hooman!"
},
]
}
{ "count": 2 }
POST /api/chromadb/query
{
"chat_id": "chat1 - 2023-12-31",
"query": "Hello",
"n_results": 2,
}
[
{
"id": "633a4bd1-8350-46b5-9ef2-f5d27acdecb7",
"date": 1684164339877,
"role": "user",
"content": "Hello, AI world!",
"distance": 0.31,
"meta": "this is meta"
},
{
"id": "8a2ed36b-c212-4a1b-84a3-0ffbe0896506",
"date": 1684164411759,
"role": "assistant",
"content": "Hello, Hooman!",
"distance": 0.29
},
]
POST /api/chromadb/purge
{ "chat_id": "chat1 - 2023-04-12" }
GET /api/edge-tts/list
[{'Name': 'Microsoft Server Speech Text to Speech Voice (af-ZA, AdriNeural)', 'ShortName': 'af-ZA-AdriNeural', 'Gender': 'Female', 'Locale': 'af-ZA', 'SuggestedCodec': 'audio-24khz-48kbitrate-mono-mp3', 'FriendlyName': 'Microsoft Adri Online (Natural) - Afrikaans (South Africa)', 'Status': 'GA', 'VoiceTag': {'ContentCategories': ['General'], 'VoicePersonalities': ['Friendly', 'Positive']}}]
POST /api/edge-tts/generate
{ "text": "Text to narrate", "voice": "af-ZA-AdriNeural", "rate": 0 }
MP3 audio file.
GET /api/coqui-tts/load
_model (string, required): The name of the Coqui TTS model to load. _gpu (string, Optional): Use the GPU to load model. _progress (string, Optional): Show progress bar in terminal.
{ "_model": "tts_models--en--jenny--jenny\model.pth" }
{ "_gpu": "False" }
{ "_progress": "True" }
"Loaded"
GET /api/coqui-tts/list
["tts_models--en--jenny--jenny\\model.pth", "tts_models--en--ljspeech--fast_pitch\\model_file.pth", "tts_models--en--ljspeech--glow-tts\\model_file.pth", "tts_models--en--ljspeech--neural_hmm\\model_file.pth", "tts_models--en--ljspeech--speedy-speech\\model_file.pth", "tts_models--en--ljspeech--tacotron2-DDC\\model_file.pth", "tts_models--en--ljspeech--vits\\model_file.pth", "tts_models--en--ljspeech--vits--neon\\model_file.pth.tar", "tts_models--en--multi-dataset--tortoise-v2", "tts_models--en--vctk--vits\\model_file.pth", "tts_models--et--cv--vits\\model_file.pth.tar", "tts_models--multilingual--multi-dataset--bark", "tts_models--multilingual--multi-dataset--your_tts\\model_file.pth", "tts_models--multilingual--multi-dataset--your_tts\\model_se.pth"]
GET /api/coqui-tts/multspeaker
{"0": "female-en-5", "1": "female-en-5\n", "2": "female-pt-4\n", "3": "male-en-2", "4": "male-en-2\n", "5": "male-pt-3\n"}
GET /api/coqui-tts/multlang
{"0": "en", "1": "fr-fr", "2": "pt-br"}
POST /api/edge-tts/generate
{
"text": "Text to narrate",
"speaker_id": "0",
"mspker": null,
"language_id": null,
"style_wav": null
}
MP3 audio file.
POST /api/talkinghead/load
A FormData
with files, with an image file in a field named "file"
. The posted file should be a PNG image in RGBA format. Optimal resolution is 512x512. See the talkinghead
README for details.
'http://localhost:5100/api/talkinghead/load'
'OK'
POST /api/talkinghead/load_emotion_templates
{"anger": {"eyebrow_angry_left_index": 1.0,
...}
"curiosity": {"eyebrow_lowered_left_index": 0.5895,
...}
...}
For details, see Animator.load_emotion_templates
in talkinghead/tha3/app/app.py
. This is essentially the format used by talkinghead/emotions/_defaults.json
.
Any emotions NOT supplied in the posted JSON will revert to server defaults. In any supplied emotion, any morph NOT supplied will default to zero. This allows making the templates shorter.
To reset all emotion templates to their server defaults, send a blank JSON.
"OK"
POST /api/talkinghead/load_animator_settings
{"target_fps": 25,
"breathing_cycle_duration": 4.0,
"postprocessor_chain": [["bloom", {}],
["chromatic_aberration", {}],
["vignetting", {}],
["translucency", {"alpha": 0.9}],
["alphanoise", {"magnitude": 0.1, "sigma": 0.0}],
["banding", {}],
["scanlines", {"dynamic": true}]]
...}
For a full list of supported settings, see animator_defaults
and Animator.load_animator_settings
, both in talkinghead/tha3/app/app.py
.
Particularly for "postprocess_chain"
, see talkinghead/tha3/app/postprocessor.py
. The postprocessor applies pixel-space glitch artistry, which can e.g. make your talkinghead look like a scifi hologram (the above example does this). The postprocessing filters are applied in the order they appear in the list.
To reset all animator/postprocessor settings to their server defaults, send a blank JSON.
"OK"
GET /api/talkinghead/start_talking
'http://localhost:5100/api/talkinghead/start_talking'
"talking started"
GET /api/talkinghead/stop_talking
'http://localhost:5100/api/talkinghead/stop_talking'
"talking stopped"
POST /api/talkinghead/set_emotion
Available emotions: see talkinghead/emotions/*.json
. An emotion must be specified, but if it is not available, this operation defaults to "neutral"
, which must always be available. This endpoint is the backend behind the /emote
slash command in talkinghead mode.
{"emotion_name": "curiosity"}
'http://localhost:5100/api/talkinghead/set_emotion'
"emotion set to curiosity"
GET /api/talkinghead/result_feed
Animated transparent image, each frame a 512x512 PNG image in RGBA format.
POST /api/websearch
Available engines: google
(default), duckduckgo
{ "query": "what is beauty?", "engine": "google" }
{ "results": "that would fall within the purview of your conundrums of philosophy", "links": ["http://example.com"] }