Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AstraDB delete_documents performs an unnecessary count #1047

Closed
brendancicchi opened this issue Sep 3, 2024 · 2 comments · Fixed by #1049
Closed

AstraDB delete_documents performs an unnecessary count #1047

brendancicchi opened this issue Sep 3, 2024 · 2 comments · Fixed by #1049
Assignees
Labels
bug Something isn't working integration:astra P3

Comments

@brendancicchi
Copy link
Contributor

https://github.com/deepset-ai/haystack-core-integrations/blob/main/integrations/astra/src/haystack_integrations/document_stores/astra/document_store.py#L406-L427

In delete_documents, an unnecessary check is being done, which is performing a full count of the collection as a naive check to ensure the collection is not empty. This is an extremely expensive operation and not one that is necessary, as performing a delete on nonexisting documents is not harmful. In reality, the existing check does not actually check if the document exists or not anyway.

The count_documents check should be removed.

@brendancicchi brendancicchi added the bug Something isn't working label Sep 3, 2024
brendancicchi added a commit to brendancicchi/haystack-core-integrations that referenced this issue Sep 3, 2024
Removed the expensive check to see if the collection is non-empty by performing a full count.

This is to fix issue deepset-ai#1047
@julian-risch
Copy link
Member

Thank you @brendancicchi for opening this issue. I can see that you already worked on a fix. Would you like to open a pull request and contribute the fix to Haystack? 🙂 You can find our contributing guidelines here: https://github.com/deepset-ai/haystack/blob/main/CONTRIBUTING.md#contribute-code

@julian-risch julian-risch added the P3 label Sep 4, 2024
@brendancicchi
Copy link
Contributor Author

@julian-risch Yes, I'll open a PR. Due to this test https://github.com/deepset-ai/haystack/blob/main/haystack/testing/document_store.py#L161C5-L163C61, it looks like I can't just change the logic to do the same on an empty collection as when no document_ids are deleted. So, instead, I'll grab any single document as a check for a non-empty collection, which will be much cheaper than the full count.

brendancicchi added a commit to brendancicchi/haystack-core-integrations that referenced this issue Sep 4, 2024
Removed the expensive check to see if the collection is non-empty by performing a full count.

This is to fix issue deepset-ai#1047
vblagoje added a commit that referenced this issue Sep 26, 2024
Removed the expensive check to see if the collection is non-empty by performing a full count.

This is to fix issue #1047

Co-authored-by: Vladimir Blagojevic <[email protected]>
Amnah199 pushed a commit that referenced this issue Oct 2, 2024
Removed the expensive check to see if the collection is non-empty by performing a full count.

This is to fix issue #1047

Co-authored-by: Vladimir Blagojevic <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working integration:astra P3
Projects
Development

Successfully merging a pull request may close this issue.

3 participants