Skip to content

Commit

Permalink
Support Ollama embeddings
Browse files Browse the repository at this point in the history
  • Loading branch information
imartinez committed Mar 1, 2024
1 parent 274c386 commit f6ff280
Show file tree
Hide file tree
Showing 9 changed files with 72 additions and 49 deletions.
11 changes: 6 additions & 5 deletions fern/docs/pages/installation/concepts.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -40,20 +40,21 @@ In order to run PrivateGPT in a fully local setup, you will need to run the LLM,
### Vector stores
The vector stores supported (Qdrant, ChromaDB and Postgres) run locally by default.
### Embeddings
For local embeddings you need to install the 'embeddings-huggingface' extra dependencies. It will use Huggingface Embeddings.

Note: Ollama will support Embeddings in the short term for easier installation, but it doesn't as of today.
For local Embeddings there are two options:
* (Recommended) You can use the 'ollama' option in PrivateGPT, which will connect to your local Ollama instance. Ollama simplifies a lot the installation of local LLMs.
* You can use the 'embeddings-huggingface' option in PrivateGPT, which will use HuggingFace.

In order for local embeddings to work, you need to download the embeddings model to the `models` folder. You can do so by running the `setup` script:
In order for HuggingFace LLM to work (the second option), you need to download the embeddings model to the `models` folder. You can do so by running the `setup` script:
```bash
poetry run python scripts/setup
```

### LLM
For local LLM there are two options:
* (Recommended) You can use the 'ollama' option in PrivateGPT, which will connect to your local Ollama instance. Ollama simplifies a lot the installation of local LLMs.
* You can use the 'llms-llama-cpp' option in PrivateGPT, which will use LlamaCPP. It works great on Mac with Metal most of the times (leverages Metal GPU), but it can be tricky in certain Linux and Windows distributions, depending on the GPU. In the installation document you'll find guides and troubleshooting.

In order for local LLM to work (the second option), you need to download the embeddings model to the `models` folder. You can do so by running the `setup` script:
In order for LlamaCPP powered LLM to work (the second option), you need to download the LLM model to the `models` folder. You can do so by running the `setup` script:
```bash
poetry run python scripts/setup
```
50 changes: 18 additions & 32 deletions fern/docs/pages/installation/installation.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -44,11 +44,12 @@ poetry install --extras "<extra1> <extra2>..."
Where `<extra>` can be any of the following:

- ui: adds support for UI using Gradio
- llms-ollama: adds support for Ollama LLM, the easiest way to get a local LLM running
- llms-ollama: adds support for Ollama LLM, the easiest way to get a local LLM running, requires Ollama running locally
- llms-llama-cpp: adds support for local LLM using LlamaCPP - expect a messy installation process on some platforms
- llms-sagemaker: adds support for Amazon Sagemaker LLM, requires Sagemaker inference endpoints
- llms-openai: adds support for OpenAI LLM, requires OpenAI API key
- llms-openai-like: adds support for 3rd party LLM providers that are compatible with OpenAI's API
- embeddings-ollama: adds support for Ollama Embeddings, requires Ollama running locally
- embeddings-huggingface: adds support for local Embeddings using HuggingFace
- embeddings-sagemaker: adds support for Amazon Sagemaker Embeddings, requires Sagemaker inference endpoints
- embeddings-openai = adds support for OpenAI Embeddings, requires OpenAI API key
Expand Down Expand Up @@ -78,21 +79,29 @@ set PGPT_PROFILES=ollama
make run
```

### Local, Ollama-powered setup
### Local, Ollama-powered setup - RECOMMENDED

The easiest way to run PrivateGPT fully locally is to depend on Ollama for the LLM. Ollama provides a local LLM that is easy to install and use.
**The easiest way to run PrivateGPT fully locally** is to depend on Ollama for the LLM. Ollama provides local LLM and Embeddings super easy to install and use, abstracting the complexity of GPU support. It's the recommended setup for local development.

Go to [ollama.ai](https://ollama.ai/) and follow the instructions to install Ollama on your machine.

Once done, you can install PrivateGPT with the following command:
After the installation, make sure the Ollama desktop app is closed.

Install the models to be used, the default settings-ollama.yaml is configured to user `mistral 7b` LLM (~4GB) and `nomic-embed-text` Embeddings (~275MB). Therefore:

```bash
poetry install --extras "ui llms-ollama embeddings-huggingface vector-stores-qdrant"
ollama pull mistral
ollama pull nomic-embed-text
```

We are installing "embeddings-huggingface" dependency to support local embeddings, because Ollama doesn't support embeddings just yet. But they working on it!
In order for local embeddings to work, you need to download the embeddings model to the `models` folder. You can do so by running the `setup` script:
Now, start Ollama service (it will start a local inference server, serving both the LLM and the Embeddings):
```bash
poetry run python scripts/setup
ollama serve
```

Once done, on a different terminal, you can install PrivateGPT with the following command:
```bash
poetry install --extras "ui llms-ollama embeddings-ollama vector-stores-qdrant"
```

Once installed, you can run PrivateGPT. Make sure you have a working Ollama running locally before running the following command.
Expand All @@ -101,7 +110,7 @@ Once installed, you can run PrivateGPT. Make sure you have a working Ollama runn
PGPT_PROFILES=ollama make run
```

PrivateGPT will use the already existing `settings-ollama.yaml` settings file, which is already configured to use Ollama LLM, local Embeddings, and Qdrant. Review it and adapt it to your needs (different LLM model, different Ollama port, etc.)
PrivateGPT will use the already existing `settings-ollama.yaml` settings file, which is already configured to use Ollama LLM and Embeddings, and Qdrant. Review it and adapt it to your needs (different models, different Ollama port, etc.)

The UI will be available at http://localhost:8001

Expand All @@ -128,29 +137,6 @@ PrivateGPT will use the already existing `settings-sagemaker.yaml` settings file

The UI will be available at http://localhost:8001

### Local, Llama-CPP powered setup

If you want to run PrivateGPT fully locally without relying on Ollama, you can run the following command:

```bash
poetry install --extras "ui llms-llama-cpp embeddings-huggingface vector-stores-qdrant"
```

In order for local LLM and embeddings to work, you need to download the models to the `models` folder. You can do so by running the `setup` script:
```bash
poetry run python scripts/setup
```

Once installed, you can run PrivateGPT with the following command:

```bash
PGPT_PROFILES=local make run
```

PrivateGPT will load the already existing `settings-local.yaml` file, which is already configured to use LlamaCPP LLM, HuggingFace embeddings and Qdrant.

The UI will be available at http://localhost:8001

### Non-Private, OpenAI-powered test setup

If you want to test PrivateGPT with OpenAI's LLM and Embeddings -taking into account your data is going to OpenAI!- you can run the following command:
Expand Down
17 changes: 16 additions & 1 deletion poetry.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

15 changes: 15 additions & 0 deletions private_gpt/components/embedding/embedding_component.py
Original file line number Diff line number Diff line change
Expand Up @@ -57,6 +57,21 @@ def __init__(self, settings: Settings) -> None:

openai_settings = settings.openai.api_key
self.embedding_model = OpenAIEmbedding(api_key=openai_settings)
case "ollama":
try:
from llama_index.embeddings.ollama import ( # type: ignore
OllamaEmbedding,
)
except ImportError as e:
raise ImportError(
"Local dependencies not found, install with `poetry install --extras embeddings-ollama`"
) from e

ollama_settings = settings.ollama
self.embedding_model = OllamaEmbedding(
model_name=ollama_settings.embedding_model,
base_url=ollama_settings.api_base
)
case "mock":
# Not a random number, is the dimensionality used by
# the default embedding model
Expand Down
2 changes: 1 addition & 1 deletion private_gpt/components/llm/llm_component.py
Original file line number Diff line number Diff line change
Expand Up @@ -109,7 +109,7 @@ def __init__(self, settings: Settings) -> None:

ollama_settings = settings.ollama
self.llm = Ollama(
model=ollama_settings.model, base_url=ollama_settings.api_base
model=ollama_settings.llm_model, base_url=ollama_settings.api_base
)
case "mock":
self.llm = MockLLM()
8 changes: 6 additions & 2 deletions private_gpt/settings/settings.py
Original file line number Diff line number Diff line change
Expand Up @@ -127,7 +127,7 @@ class HuggingFaceSettings(BaseModel):


class EmbeddingSettings(BaseModel):
mode: Literal["huggingface", "openai", "sagemaker", "mock"]
mode: Literal["huggingface", "openai", "sagemaker", "ollama", "mock"]
ingest_mode: Literal["simple", "batch", "parallel"] = Field(
"simple",
description=(
Expand Down Expand Up @@ -176,10 +176,14 @@ class OllamaSettings(BaseModel):
"http://localhost:11434",
description="Base URL of Ollama API. Example: 'https://localhost:11434'.",
)
model: str = Field(
llm_model: str = Field(
None,
description="Model to use. Example: 'llama2-uncensored'.",
)
embedding_model: str = Field(
None,
description="Model to use. Example: 'nomic-embed-text'.",
)


class UISettings(BaseModel):
Expand Down
2 changes: 2 additions & 0 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,7 @@ llama-index-llms-llama-cpp = {version = "^0.1.3", optional = true}
llama-index-llms-openai = {version = "^0.1.6", optional = true}
llama-index-llms-openai-like = {version ="^0.1.3", optional = true}
llama-index-llms-ollama = {version ="^0.1.2", optional = true}
llama-index-embeddings-ollama = {version ="^0.1.2", optional = true}
llama-index-embeddings-huggingface = {version ="^0.1.4", optional = true}
llama-index-embeddings-openai = {version ="^0.1.6", optional = true}
llama-index-vector-stores-qdrant = {version ="^0.1.3", optional = true}
Expand All @@ -38,6 +39,7 @@ llms-openai = ["llama-index-llms-openai"]
llms-openai-like = ["llama-index-llms-openai-like"]
llms-ollama = ["llama-index-llms-ollama"]
llms-sagemaker = ["boto3"]
embeddings-ollama = ["llama-index-embeddings-ollama"]
embeddings-huggingface = ["llama-index-embeddings-huggingface"]
embeddings-openai = ["llama-index-embeddings-openai"]
embeddings-sagemaker = ["boto3"]
Expand Down
12 changes: 5 additions & 7 deletions settings-ollama.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -6,15 +6,13 @@ llm:
max_new_tokens: 512
context_window: 3900

ollama:
model: llama2
api_base: http://localhost:11434

embedding:
mode: huggingface
mode: ollama

huggingface:
embedding_hf_model_name: BAAI/bge-small-en-v1.5
ollama:
llm_model: mistral
embedding_model: nomic-embed-text
api_base: http://localhost:11434

vectorstore:
database: qdrant
Expand Down
4 changes: 3 additions & 1 deletion settings.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -78,4 +78,6 @@ openai:
model: gpt-3.5-turbo

ollama:
model: llama2-uncensored
llm_model: llama2
embedding_model: nomic-embed-text
api_base: http://localhost:11434

0 comments on commit f6ff280

Please sign in to comment.