Ollama embedding retrieval issues #1052

seanw7 · 2024-09-05T16:20:38Z

Describe the bug
When using the Ollama embeddings (both OllamaDocumentEmbedder and OllamaTextEmbedder), I consistently receive incorrect embedding results. Regardless of the input query or document content, the system always retrieves the same document (usually the last document in the list), leading to an incorrect context in downstream tasks such as retrieval or response generation.

It is important to note that switching over to using SentenceTransformer for Embedding seems to immediately resolve the issue and the pipeline is able to collect the correct context. See lines 23 & 62 in the gist.

Expected Behavior: The embeddings should accurately represent the input query or document, and the correct context should be retrieved based on the input.

Actual Behavior: No matter what query or document content is provided, the same context (typically the final document in the list of embedded documents) is always returned.

To Reproduce
I created a public gist with the code that I have put together based on the Deeplearning.ai haystack course lesson 2 ipynb
(https://gist.github.com/seanw7/5a480e04597a20325fe0252de98c2019)

Describe your environment (please complete the following information):

OS: iOS
Haystack version: haystack-ai==2.5.0
Integration version: ollama-haystack==0.0.7

anakin87 · 2024-09-06T17:03:04Z

Hey, @seanw7... thanks for reporting this.

In your gist, I see that you are using orca-mini (a generative model) instead of an embedding model.

Can you try with an embedding model (e.g. mxbai-embed-large) and see if the problem persists?

seanw7 · 2024-09-07T00:43:29Z

@anakin87 Thank you very much for your response,

I was originally using that orca-mini model because it was in the documentation example here: https://docs.haystack.deepset.ai/docs/ollamadocumentembedder#on-its-own

I previously tested with snowflake-artic-embed:137m and had the same issue. However, I just ran a few tests with that embedding model you suggested, and it does seem to be working much better👍

Thank you

julian-risch · 2024-09-07T14:26:08Z

Had a look at Haystack docs and can confirm that we wrongly state "the default embedding model is orca-mini" here in the docs https://docs.haystack.deepset.ai/v2.0/docs/ollamatextembedder#compatible-models It's not. The default model for OllamaDocumentEmbedder and OllamaTextEmbedder is nomic-embed-text.

We also mention ocra-mini here:

@dfokina @anakin87 I suggest we replace orca-mini with nomic-embed-text on these pages and once that is done we can close this issue. Or would you suggest we use another model in the docs?

@seanw7 Thank you for bringing this issue to our attention! Glad to hear it's working much better now for you with a different model.

anakin87 · 2024-09-07T21:24:16Z

I replaced orca-mini with nomic-embed-text in the docs and fixed other stuff.
I'll close this issue.

seanw7 added the bug Something isn't working label Sep 5, 2024

anakin87 added the integration:ollama label Sep 6, 2024

julian-risch added the P1 label Sep 7, 2024

anakin87 self-assigned this Sep 7, 2024

anakin87 closed this as completed Sep 7, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ollama embedding retrieval issues #1052

Ollama embedding retrieval issues #1052

seanw7 commented Sep 5, 2024

anakin87 commented Sep 6, 2024

seanw7 commented Sep 7, 2024

julian-risch commented Sep 7, 2024

anakin87 commented Sep 7, 2024

Ollama embedding retrieval issues #1052

Ollama embedding retrieval issues #1052

Comments

seanw7 commented Sep 5, 2024

anakin87 commented Sep 6, 2024

seanw7 commented Sep 7, 2024

julian-risch commented Sep 7, 2024

anakin87 commented Sep 7, 2024