Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ChromaDocumentStoreFilterError: 'operator' key missing in {} #1116

Closed
ZanSara opened this issue Sep 30, 2024 · 1 comment · Fixed by #1117
Closed

ChromaDocumentStoreFilterError: 'operator' key missing in {} #1116

ZanSara opened this issue Sep 30, 2024 · 1 comment · Fixed by #1117
Assignees
Labels
bug Something isn't working integration:chroma

Comments

@ZanSara
Copy link
Contributor

ZanSara commented Sep 30, 2024

Describe the bug
Seems like a bug with filtering was introduced in the last release of chroma-haystack.

ChromaDocumentStoreFilterError            Traceback (most recent call last)

[<ipython-input-26-f0deb7e25b35>](https://localhost:8080/#) in <cell line: 41>()
     39 
     40 question = "How tall was Rhodos statue?"
---> 41 pipeline.run({"text_embedder": {"text": question}, "prompt_builder": {"question": question}})
     42 

6 frames

[/usr/local/lib/python3.10/dist-packages/haystack/core/pipeline/pipeline.py](https://localhost:8080/#) in run(self, data, include_outputs_from)
    226                         raise PipelineMaxLoops(msg)
    227 
--> 228                     res: Dict[str, Any] = self._run_component(name, components_inputs[name])
    229 
    230                     if name in include_outputs_from:

[/usr/local/lib/python3.10/dist-packages/haystack/core/pipeline/pipeline.py](https://localhost:8080/#) in _run_component(self, name, inputs)
     65             span.set_content_tag("haystack.component.input", inputs)
     66             logger.info("Running component {component_name}", component_name=name)
---> 67             res: Dict[str, Any] = instance.run(**inputs)
     68             self.graph.nodes[name]["visits"] += 1
     69 

[/usr/local/lib/python3.10/dist-packages/haystack_integrations/components/retrievers/chroma/retriever.py](https://localhost:8080/#) in run(self, query_embedding, filters, top_k)
    156 
    157         query_embeddings = [query_embedding]
--> 158         return {"documents": self.document_store.search_embeddings(query_embeddings, top_k, filters)[0]}

[/usr/local/lib/python3.10/dist-packages/haystack_integrations/document_stores/chroma/document_store.py](https://localhost:8080/#) in search_embeddings(self, query_embeddings, top_k, filters)
    354             )
    355         else:
--> 356             chroma_filters = _convert_filters(filters=filters)
    357             results = self._collection.query(
    358                 query_embeddings=query_embeddings,

[/usr/local/lib/python3.10/dist-packages/haystack_integrations/document_stores/chroma/filters.py](https://localhost:8080/#) in _convert_filters(filters)
     49     where_document: Dict[str, Any] = defaultdict(list)
     50 
---> 51     converted_filters = _convert_filter_clause(filters)
     52     for field, value in converted_filters.items():
     53         if value is None:

[/usr/local/lib/python3.10/dist-packages/haystack_integrations/document_stores/chroma/filters.py](https://localhost:8080/#) in _convert_filter_clause(filters)
     93         converted_clauses.update(_parse_comparison_condition(filters))
     94     else:
---> 95         converted_clauses.update(_parse_logical_condition(filters))
     96 
     97     return converted_clauses

[/usr/local/lib/python3.10/dist-packages/haystack_integrations/document_stores/chroma/filters.py](https://localhost:8080/#) in _parse_logical_condition(condition)
    124     if "operator" not in condition:
    125         msg = f"'operator' key missing in {condition}"
--> 126         raise ChromaDocumentStoreFilterError(msg)
    127     if "conditions" not in condition:
    128         msg = f"'conditions' key missing in {condition}"

ChromaDocumentStoreFilterError: 'operator' key missing in {}

To Reproduce

This same code was working last Friday with the latest Haystack, Chroma and chroma-haystack.

from datasets import load_dataset
from haystack import Document, Pipeline
from haystack_integrations.document_stores.chroma import ChromaDocumentStore
from haystack_integrations.components.retrievers.chroma import ChromaEmbeddingRetriever
from haystack.components.generators import OpenAIGenerator
from haystack.components.embedders import OpenAIDocumentEmbedder, OpenAITextEmbedder
from haystack.components.builders import PromptBuilder

dataset_hs = load_dataset("bilgeyucel/seven-wonders", split="train")
docs = [Document(content=doc["content"], meta={"url": doc["url"]}) for doc in dataset_hs]

document_store = ChromaDocumentStore(collection_name="demo-haystack")
doc_embedder = OpenAIDocumentEmbedder(model="text-embedding-3-small")
document_store.write_documents(doc_embedder.run(docs)["documents"])

template="""
You are an assistant for question-answering tasks. Use the following pieces of retrieved context to answer the question. If you don't know the answer, just say that you don't know. Use three sentences maximum and keep the answer concise.

Question: {{question}}

Context: {% for doc in docs %} {{ doc.content }} {% endfor %}

Answer:
"""

pipeline = Pipeline()
pipeline.add_component("text_embedder", OpenAITextEmbedder(model="text-embedding-3-small"))
pipeline.add_component("retriever", ChromaEmbeddingRetriever(document_store, top_k=2))
pipeline.add_component("prompt_builder", PromptBuilder(template=template))
pipeline.add_component("llm", OpenAIGenerator(model="gpt-4o-mini"))
pipeline.connect("text_embedder.embedding", "retriever.query_embedding")
pipeline.connect("retriever", "prompt_builder.docs")
pipeline.connect("prompt_builder", "llm")

question = "How tall was Rhodos statue?"
pipeline.run({"text_embedder": {"text": question}, "prompt_builder": {"question": question}})

Describe your environment (please complete the following information):

  • OS: Colab
  • Haystack version: 2.5.1
  • Integration version: 0.22.0
@ZanSara ZanSara added the bug Something isn't working label Sep 30, 2024
@anakin87
Copy link
Member

👋
Thank you for reporting the issue...

I've just released this new version.

Looking at the changelog, I suspect this may be related to #1072.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working integration:chroma
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants