-
Notifications
You must be signed in to change notification settings - Fork 109
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ollama embedding retrieval issues #1052
Comments
Hey, @seanw7... thanks for reporting this. In your gist, I see that you are using Can you try with an embedding model (e.g. |
@anakin87 Thank you very much for your response, I was originally using that orca-mini model because it was in the documentation example here: https://docs.haystack.deepset.ai/docs/ollamadocumentembedder#on-its-own I previously tested with snowflake-artic-embed:137m and had the same issue. However, I just ran a few tests with that embedding model you suggested, and it does seem to be working much better👍 Thank you |
Had a look at Haystack docs and can confirm that we wrongly state "the default embedding model is We also mention
@dfokina @anakin87 I suggest we replace @seanw7 Thank you for bringing this issue to our attention! Glad to hear it's working much better now for you with a different model. |
I replaced |
Describe the bug
When using the Ollama embeddings (both OllamaDocumentEmbedder and OllamaTextEmbedder), I consistently receive incorrect embedding results. Regardless of the input query or document content, the system always retrieves the same document (usually the last document in the list), leading to an incorrect context in downstream tasks such as retrieval or response generation.
It is important to note that switching over to using SentenceTransformer for Embedding seems to immediately resolve the issue and the pipeline is able to collect the correct context. See lines 23 & 62 in the gist.
Expected Behavior: The embeddings should accurately represent the input query or document, and the correct context should be retrieved based on the input.
Actual Behavior: No matter what query or document content is provided, the same context (typically the final document in the list of embedded documents) is always returned.
To Reproduce
I created a public gist with the code that I have put together based on the Deeplearning.ai haystack course lesson 2 ipynb
(https://gist.github.com/seanw7/5a480e04597a20325fe0252de98c2019)
Describe your environment (please complete the following information):
The text was updated successfully, but these errors were encountered: