-
Notifications
You must be signed in to change notification settings - Fork 33
2.2.4 Backend: TabbyAPI
av edited this page Oct 27, 2024
·
4 revisions
Handle:
tabbyapi
URL: http://localhost:33931
An OAI compatible exllamav2 API that's both lightweight and fast
- Supports same set of models as exllamav2
Harbor integrates with the HuggingFaceDownloader CLI which can be used to download models for the TabbyAPI service.
# [Optional] lookup models on the HF Hub
harbor hf find exl2
# [Optional] If pulling from the closed or gated repo
# Pre-configure the HF access token
harbor hf token <your-token>
# 1. Download the desired model, use "user/repo" specifier
# Note the "./hf" directory set as the download location - this is
# where the HuggingFace cache is mounted for downloader CLI
harbor hf dl -m Annuvin/gemma-2-2b-it-abliterated-4.0bpw-exl2 -s ./hf
harbor hf dl -m bartowski/Phi-3.1-mini-4k-instruct-exl2 -s ./hf -b 8_0
# 2. Set the model to run
# Use the same specifier as for the downloader
harbor tabbyapi model Annuvin/gemma-2-2b-it-abliterated-4.0bpw-exl2
harbor tabbyapi model bartowski/Phi-3.1-mini-4k-instruct-exl2
# 3. Start the service
harbor up tabbyapi
# Download with a model specifier
harbor hf download ChenMnZ/Mistral-Large-Instruct-2407-EfficientQAT-w2g64-GPTQ
# With a specific revision
harbor hf download turboderp/Llama-3.1-8B-Instruct-exl2 --revision 6.0bpw
# Grab actual name for the folder
harbor find ChenMnZ
# Set the model to run
harbor config set tabbyapi.model.specifier /hub/models--ChenMnZ--Mistral-Large-Instruct-2407-EfficientQAT-w2g64-GPTQ/snapshots/f46105941fa36d2663f77f11840c2f49a69d6681/
TabbyAPI exposes an OpenAI-compatible API and can be used with related services directly.
# [Optional] Pull the tabbyapi images
harbor pull tabbyapi
# Start the service
harbor up tabbyapi
# [Optional] Set additional arguments
harbor tabbyapi args --log-prompt true
# See TabbyAPI docs
harbor tabbyapi docs
Harbor will mount a few volumes for the TabbyAPI container:
- Host HuggingFace cache -
/models/hf
-
llama.cpp cache -
/models/llama.cpp