Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ollama embedding retrieval issues #1052

Closed
seanw7 opened this issue Sep 5, 2024 · 4 comments
Closed

Ollama embedding retrieval issues #1052

seanw7 opened this issue Sep 5, 2024 · 4 comments
Assignees
Labels
bug Something isn't working integration:ollama P1

Comments

@seanw7
Copy link

seanw7 commented Sep 5, 2024

Describe the bug
When using the Ollama embeddings (both OllamaDocumentEmbedder and OllamaTextEmbedder), I consistently receive incorrect embedding results. Regardless of the input query or document content, the system always retrieves the same document (usually the last document in the list), leading to an incorrect context in downstream tasks such as retrieval or response generation.

It is important to note that switching over to using SentenceTransformer for Embedding seems to immediately resolve the issue and the pipeline is able to collect the correct context. See lines 23 & 62 in the gist.

Expected Behavior: The embeddings should accurately represent the input query or document, and the correct context should be retrieved based on the input.

Actual Behavior: No matter what query or document content is provided, the same context (typically the final document in the list of embedded documents) is always returned.

To Reproduce
I created a public gist with the code that I have put together based on the Deeplearning.ai haystack course lesson 2 ipynb
(https://gist.github.com/seanw7/5a480e04597a20325fe0252de98c2019)

Describe your environment (please complete the following information):

  • OS: iOS
  • Haystack version: haystack-ai==2.5.0
  • Integration version: ollama-haystack==0.0.7
@seanw7 seanw7 added the bug Something isn't working label Sep 5, 2024
@anakin87
Copy link
Member

anakin87 commented Sep 6, 2024

Hey, @seanw7... thanks for reporting this.

In your gist, I see that you are using orca-mini (a generative model) instead of an embedding model.

Can you try with an embedding model (e.g. mxbai-embed-large) and see if the problem persists?

@seanw7
Copy link
Author

seanw7 commented Sep 7, 2024

@anakin87 Thank you very much for your response,

I was originally using that orca-mini model because it was in the documentation example here: https://docs.haystack.deepset.ai/docs/ollamadocumentembedder#on-its-own

I previously tested with snowflake-artic-embed:137m and had the same issue. However, I just ran a few tests with that embedding model you suggested, and it does seem to be working much better👍

Thank you

@julian-risch
Copy link
Member

Had a look at Haystack docs and can confirm that we wrongly state "the default embedding model is orca-mini" here in the docs https://docs.haystack.deepset.ai/v2.0/docs/ollamatextembedder#compatible-models It's not. The default model for OllamaDocumentEmbedder and OllamaTextEmbedder is nomic-embed-text.

We also mention ocra-mini here:

@dfokina @anakin87 I suggest we replace orca-mini with nomic-embed-text on these pages and once that is done we can close this issue. Or would you suggest we use another model in the docs?

@seanw7 Thank you for bringing this issue to our attention! Glad to hear it's working much better now for you with a different model.

@julian-risch julian-risch added the P1 label Sep 7, 2024
@anakin87 anakin87 self-assigned this Sep 7, 2024
@anakin87
Copy link
Member

anakin87 commented Sep 7, 2024

I replaced orca-mini with nomic-embed-text in the docs and fixed other stuff.
I'll close this issue.

@anakin87 anakin87 closed this as completed Sep 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working integration:ollama P1
Projects
None yet
Development

No branches or pull requests

3 participants