Support Ollama embeddings

zylon-ai · Mar 1, 2024 · f6ff280 · f6ff280
1 parent 274c386
commit f6ff280
Show file tree

Hide file tree

Showing 9 changed files with 72 additions and 49 deletions.
diff --git a/fern/docs/pages/installation/concepts.mdx b/fern/docs/pages/installation/concepts.mdx
@@ -40,20 +40,21 @@ In order to run PrivateGPT in a fully local setup, you will need to run the LLM,
 ### Vector stores
 The vector stores supported (Qdrant, ChromaDB and Postgres) run locally by default.
 ### Embeddings
-For local embeddings you need to install the 'embeddings-huggingface' extra dependencies. It will use Huggingface Embeddings.
-
-Note: Ollama will support Embeddings in the short term for easier installation, but it doesn't as of today.
+For local Embeddings there are two options:
+* (Recommended) You can use the 'ollama' option in PrivateGPT, which will connect to your local Ollama instance. Ollama simplifies a lot the installation of local LLMs.
+* You can use the 'embeddings-huggingface' option in PrivateGPT, which will use HuggingFace.
 
-In order for local embeddings to work, you need to download the embeddings model to the `models` folder. You can do so by running the `setup` script:
+In order for HuggingFace LLM to work (the second option), you need to download the embeddings model to the `models` folder. You can do so by running the `setup` script:
 ```bash
 poetry run python scripts/setup
 ```
+
 ### LLM
 For local LLM there are two options:
 * (Recommended) You can use the 'ollama' option in PrivateGPT, which will connect to your local Ollama instance. Ollama simplifies a lot the installation of local LLMs.
 * You can use the 'llms-llama-cpp' option in PrivateGPT, which will use LlamaCPP. It works great on Mac with Metal most of the times (leverages Metal GPU), but it can be tricky in certain Linux and Windows distributions, depending on the GPU. In the installation document you'll find guides and troubleshooting.
 
-In order for local LLM to work (the second option), you need to download the embeddings model to the `models` folder. You can do so by running the `setup` script:
+In order for LlamaCPP powered LLM to work (the second option), you need to download the LLM model to the `models` folder. You can do so by running the `setup` script:
 ```bash
 poetry run python scripts/setup
 ```
diff --git a/fern/docs/pages/installation/installation.mdx b/fern/docs/pages/installation/installation.mdx
@@ -44,11 +44,12 @@ poetry install --extras "<extra1> <extra2>..."
 Where `<extra>` can be any of the following:
 
 - ui: adds support for UI using Gradio
-- llms-ollama: adds support for Ollama LLM, the easiest way to get a local LLM running
+- llms-ollama: adds support for Ollama LLM, the easiest way to get a local LLM running, requires Ollama running locally
 - llms-llama-cpp: adds support for local LLM using LlamaCPP - expect a messy installation process on some platforms
 - llms-sagemaker: adds support for Amazon Sagemaker LLM, requires Sagemaker inference endpoints
 - llms-openai: adds support for OpenAI LLM, requires OpenAI API key
 - llms-openai-like: adds support for 3rd party LLM providers that are compatible with OpenAI's API
+- embeddings-ollama: adds support for Ollama Embeddings, requires Ollama running locally
 - embeddings-huggingface: adds support for local Embeddings using HuggingFace
 - embeddings-sagemaker: adds support for Amazon Sagemaker Embeddings, requires Sagemaker inference endpoints
 - embeddings-openai = adds support for OpenAI Embeddings, requires OpenAI API key
@@ -78,21 +79,29 @@ set PGPT_PROFILES=ollama
 make run
 ```
 
-### Local, Ollama-powered setup
+### Local, Ollama-powered setup - RECOMMENDED
 
-The easiest way to run PrivateGPT fully locally is to depend on Ollama for the LLM. Ollama provides a local LLM that is easy to install and use.
+**The easiest way to run PrivateGPT fully locally** is to depend on Ollama for the LLM. Ollama provides local LLM and Embeddings super easy to install and use, abstracting the complexity of GPU support. It's the recommended setup for local development.
 
 Go to [ollama.ai](https://ollama.ai/) and follow the instructions to install Ollama on your machine.
 
-Once done, you can install PrivateGPT with the following command:
+After the installation, make sure the Ollama desktop app is closed.
+
+Install the models to be used, the default settings-ollama.yaml is configured to user `mistral 7b` LLM (~4GB) and `nomic-embed-text` Embeddings (~275MB). Therefore:
+
 ```bash
-poetry install --extras "ui llms-ollama embeddings-huggingface vector-stores-qdrant"
+ollama pull mistral
+ollama pull nomic-embed-text
 ```
 
-We are installing "embeddings-huggingface" dependency to support local embeddings, because Ollama doesn't support embeddings just yet. But they working on it!
-In order for local embeddings to work, you need to download the embeddings model to the `models` folder. You can do so by running the `setup` script:
+Now, start Ollama service (it will start a local inference server, serving both the LLM and the Embeddings):
 ```bash
-poetry run python scripts/setup
+ollama serve
+```
+
+Once done, on a different terminal, you can install PrivateGPT with the following command:
+```bash
+poetry install --extras "ui llms-ollama embeddings-ollama vector-stores-qdrant"
 ```
 
 Once installed, you can run PrivateGPT. Make sure you have a working Ollama running locally before running the following command.
@@ -101,7 +110,7 @@ Once installed, you can run PrivateGPT. Make sure you have a working Ollama runn
 PGPT_PROFILES=ollama make run
 ```
 
-PrivateGPT will use the already existing `settings-ollama.yaml` settings file, which is already configured to use Ollama LLM, local Embeddings, and Qdrant. Review it and adapt it to your needs (different LLM model, different Ollama port, etc.)
+PrivateGPT will use the already existing `settings-ollama.yaml` settings file, which is already configured to use Ollama LLM and Embeddings, and Qdrant. Review it and adapt it to your needs (different models, different Ollama port, etc.)
 
 The UI will be available at http://localhost:8001
 
@@ -128,29 +137,6 @@ PrivateGPT will use the already existing `settings-sagemaker.yaml` settings file
 
 The UI will be available at http://localhost:8001
 
-### Local, Llama-CPP powered setup
-
-If you want to run PrivateGPT fully locally without relying on Ollama, you can run the following command:
-
-```bash
-poetry install --extras "ui llms-llama-cpp embeddings-huggingface vector-stores-qdrant"
-```
-
-In order for local LLM and embeddings to work, you need to download the models to the `models` folder. You can do so by running the `setup` script:
-```bash
-poetry run python scripts/setup
-```
-
-Once installed, you can run PrivateGPT with the following command:
-
-```bash
-PGPT_PROFILES=local make run
-```
-
-PrivateGPT will load the already existing `settings-local.yaml` file, which is already configured to use LlamaCPP LLM, HuggingFace embeddings and Qdrant.
-
-The UI will be available at http://localhost:8001
-
 ### Non-Private, OpenAI-powered test setup
 
 If you want to test PrivateGPT with OpenAI's LLM and Embeddings -taking into account your data is going to OpenAI!- you can run the following command:

diff --git a/poetry.lock b/poetry.lock
diff --git a/private_gpt/components/embedding/embedding_component.py b/private_gpt/components/embedding/embedding_component.py
@@ -57,6 +57,21 @@ def __init__(self, settings: Settings) -> None:
 
                 openai_settings = settings.openai.api_key
                 self.embedding_model = OpenAIEmbedding(api_key=openai_settings)
+            case "ollama":
+                try:
+                    from llama_index.embeddings.ollama import (  # type: ignore
+                        OllamaEmbedding,
+                    )
+                except ImportError as e:
+                    raise ImportError(
+                        "Local dependencies not found, install with `poetry install --extras embeddings-ollama`"
+                    ) from e
+
+                ollama_settings = settings.ollama
+                self.embedding_model = OllamaEmbedding(
+                    model_name=ollama_settings.embedding_model,
+                    base_url=ollama_settings.api_base
+                )
             case "mock":
                 # Not a random number, is the dimensionality used by
                 # the default embedding model

diff --git a/private_gpt/components/llm/llm_component.py b/private_gpt/components/llm/llm_component.py
@@ -109,7 +109,7 @@ def __init__(self, settings: Settings) -> None:
 
                 ollama_settings = settings.ollama
                 self.llm = Ollama(
-                    model=ollama_settings.model, base_url=ollama_settings.api_base
+                    model=ollama_settings.llm_model, base_url=ollama_settings.api_base
                 )
             case "mock":
                 self.llm = MockLLM()
diff --git a/private_gpt/settings/settings.py b/private_gpt/settings/settings.py
@@ -127,7 +127,7 @@ class HuggingFaceSettings(BaseModel):
 
 
 class EmbeddingSettings(BaseModel):
-    mode: Literal["huggingface", "openai", "sagemaker", "mock"]
+    mode: Literal["huggingface", "openai", "sagemaker", "ollama", "mock"]
     ingest_mode: Literal["simple", "batch", "parallel"] = Field(
         "simple",
         description=(
@@ -176,10 +176,14 @@ class OllamaSettings(BaseModel):
         "http://localhost:11434",
         description="Base URL of Ollama API. Example: 'https://localhost:11434'.",
     )
-    model: str = Field(
+    llm_model: str = Field(
         None,
         description="Model to use. Example: 'llama2-uncensored'.",
     )
+    embedding_model: str = Field(
+        None,
+        description="Model to use. Example: 'nomic-embed-text'.",
+    )
 
 
 class UISettings(BaseModel):

diff --git a/pyproject.toml b/pyproject.toml
@@ -21,6 +21,7 @@ llama-index-llms-llama-cpp = {version = "^0.1.3", optional = true}
 llama-index-llms-openai = {version = "^0.1.6", optional = true}
 llama-index-llms-openai-like = {version ="^0.1.3", optional = true}
 llama-index-llms-ollama = {version ="^0.1.2", optional = true}
+llama-index-embeddings-ollama = {version ="^0.1.2", optional = true}
 llama-index-embeddings-huggingface = {version ="^0.1.4", optional = true}
 llama-index-embeddings-openai = {version ="^0.1.6", optional = true}
 llama-index-vector-stores-qdrant = {version ="^0.1.3", optional = true}
@@ -38,6 +39,7 @@ llms-openai = ["llama-index-llms-openai"]
 llms-openai-like = ["llama-index-llms-openai-like"]
 llms-ollama = ["llama-index-llms-ollama"]
 llms-sagemaker = ["boto3"]
+embeddings-ollama = ["llama-index-embeddings-ollama"]
 embeddings-huggingface = ["llama-index-embeddings-huggingface"]
 embeddings-openai = ["llama-index-embeddings-openai"]
 embeddings-sagemaker = ["boto3"]

diff --git a/settings-ollama.yaml b/settings-ollama.yaml
@@ -6,15 +6,13 @@ llm:
   max_new_tokens: 512
   context_window: 3900
 
-ollama:
-  model: llama2
-  api_base: http://localhost:11434
-
 embedding:
-  mode: huggingface
+  mode: ollama
 
-huggingface:
-  embedding_hf_model_name: BAAI/bge-small-en-v1.5
+ollama:
+  llm_model: mistral
+  embedding_model: nomic-embed-text
+  api_base: http://localhost:11434
 
 vectorstore:
   database: qdrant

diff --git a/settings.yaml b/settings.yaml
@@ -78,4 +78,6 @@ openai:
   model: gpt-3.5-turbo
 
 ollama:
-  model: llama2-uncensored
+  llm_model: llama2
+  embedding_model: nomic-embed-text
+  api_base: http://localhost:11434