-
Notifications
You must be signed in to change notification settings - Fork 109
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pgvector - filters #257
Pgvector - filters #257
Conversation
@pytest.mark.skip(reason="NOT operator is not supported in PgVectorDocumentStore") | ||
def test_not_operator(self, document_store, filterable_docs): ... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This bothers me a bit.
Since we magically handle NULL values, as required by most of our base tests, it is difficult to properly support the NOT operator.
(For example, a != 3
becomes a IS DISTINCT FROM 3
to also include NULL values in the result.)
I also tried to support the NOT operator, but this involved inverting all nested conditions (which can also be logical) and started to get messy.
So the simplest solutions are:
- do not support the NOT operator
- drop magic handling of NULL values, consistently with Postgres behavior (but requires overriding many of our base tests).
Happy to discuss this...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's start by not supporting NOT, if there's demand we can proceed with option number 2, knowing that the investment in patching the base tests is worth it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Took a first pass and left some minor comments. Can we add unit tests for the private methods in filters.py
? Base tests mostly rely on happy paths, and I suspect this module might benefit from direct coverage, specially for the faulty corner cases.
sql_query_str = sql_query.as_string(cursor) if not isinstance(sql_query, str) else sql_query | ||
detailed_error_msg = f"{error_msg}.\nSQL query: {sql_query_str} \nParameters: {params}" | ||
raise DocumentStoreError(detailed_error_msg) from e |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Beware printing raw SQL queries, they can leak secrets and sensitive data.
f"Could not write documents to PgvectorDocumentStore. \n" | ||
f"SQL query: {sql_query_str} \nParameters: {db_documents}" | ||
) | ||
raise DocumentStoreError(error_msg) from e |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same security considerations
integrations/pgvector/src/haystack_integrations/document_stores/pgvector/filters.py
Outdated
Show resolved
Hide resolved
integrations/pgvector/src/haystack_integrations/document_stores/pgvector/filters.py
Outdated
Show resolved
Hide resolved
integrations/pgvector/src/haystack_integrations/document_stores/pgvector/filters.py
Outdated
Show resolved
Hide resolved
integrations/pgvector/src/haystack_integrations/document_stores/pgvector/filters.py
Show resolved
Hide resolved
integrations/pgvector/src/haystack_integrations/document_stores/pgvector/filters.py
Outdated
Show resolved
Hide resolved
I have tried to address most of the comments and added some unit tests (some of these overlap with existing ones, but run very fast). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's remove the SQL query from the error and we're good to go
sql_query_str = sql_query.as_string(cursor) if not isinstance(sql_query, str) else sql_query | ||
detailed_error_msg = f"{error_msg}.\nSQL query: {sql_query_str} \nParameters: {params}" | ||
raise DocumentStoreError(detailed_error_msg) from e |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could print the SQL query in the debug logs and do something like:
detailed_error_msg = f"{error_msg}.\nYou can find the SQL query and the parameters in the debug logs"
It's not perfect but it should be ok for most of the cases
@pytest.mark.skip(reason="NOT operator is not supported in PgVectorDocumentStore") | ||
def test_not_operator(self, document_store, filterable_docs): ... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's start by not supporting NOT, if there's demand we can proceed with option number 2, knowing that the investment in patching the base tests is worth it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice!
Support for the Haystack filtering logic in PgvectorDocumentStore.