2.2.4 Backend: TabbyAPI

TabbyAPI

Handle: tabbyapi URL: http://localhost:33931

Python 3.10, 3.11, and 3.12

An OAI compatible exllamav2 API that's both lightweight and fast

Models

Supports same set of models as exllamav2

HuggingFaceDownloader

Harbor integrates with the HuggingFaceDownloader CLI which can be used to download models for the TabbyAPI service.

# [Optional] lookup models on the HF Hub
harbor hf find exl2

# [Optional] If pulling from the closed or gated repo
# Pre-configure the HF access token
harbor hf token <your-token>

# 1. Download the desired model, use "user/repo" specifier
# Note the "./hf" directory set as the download location - this is
# where the HuggingFace cache is mounted for downloader CLI
harbor hf dl -m Annuvin/gemma-2-2b-it-abliterated-4.0bpw-exl2 -s ./hf
harbor hf dl -m bartowski/Phi-3.1-mini-4k-instruct-exl2 -s ./hf -b 8_0

# 2. Set the model to run
# Use the same specifier as for the downloader
harbor tabbyapi model Annuvin/gemma-2-2b-it-abliterated-4.0bpw-exl2
harbor tabbyapi model bartowski/Phi-3.1-mini-4k-instruct-exl2

# 3. Start the service
harbor up tabbyapi

Native HF CLI

# Download with a model specifier
harbor hf download ChenMnZ/Mistral-Large-Instruct-2407-EfficientQAT-w2g64-GPTQ
# With a specific revision
harbor hf download turboderp/Llama-3.1-8B-Instruct-exl2 --revision 6.0bpw

# Grab actual name for the folder
harbor find ChenMnZ

# Set the model to run
harbor config set tabbyapi.model.specifier /hub/models--ChenMnZ--Mistral-Large-Instruct-2407-EfficientQAT-w2g64-GPTQ/snapshots/f46105941fa36d2663f77f11840c2f49a69d6681/

Starting

TabbyAPI exposes an OpenAI-compatible API and can be used with related services directly.

# [Optional] Pull the tabbyapi images
harbor pull tabbyapi

# Start the service
harbor up tabbyapi

# [Optional] Set additional arguments
harbor tabbyapi args --log-prompt true

# See TabbyAPI docs
harbor tabbyapi docs

Configuration

Server options

Harbor will mount a few volumes for the TabbyAPI container:

Host HuggingFace cache - /models/hf
llama.cpp cache - /models/llama.cpp

Home | CLI Reference | Services | Adding New Service | Compatibility

Provide feedback

Saved searches

Use saved searches to filter your results more quickly