Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: converting Pinecone metadata fields from float back to int #1034

Merged
merged 19 commits into from
Aug 29, 2024

Conversation

davidsbatista
Copy link
Contributor

@davidsbatista davidsbatista commented Aug 28, 2024

Proposed Changes:

Pinecone converts all meta numbers in the meta field to float (https://docs.pinecone.io/guides/data/filter-with-metadata). This causes the SentenceWindowRetriever to crash completely.

This PR checks if the metadata values are floats and convert them back to integers making Pinecone supported by the SentenceWindowRetriever

How did you test it?

  • manual verification, unit tests and integration tests

Checklist

@github-actions github-actions bot added integration:pinecone type:documentation Improvements or additions to documentation labels Aug 28, 2024
@davidsbatista davidsbatista changed the title Pinecone converting floats back to int fix: converting Pinecone metadata fields from float back to int Aug 28, 2024
@davidsbatista davidsbatista changed the title fix: converting Pinecone metadata fields from float back to int fix: converting Pinecone metadata fields from float back to int Aug 28, 2024
@davidsbatista davidsbatista marked this pull request as ready for review August 28, 2024 16:17
@davidsbatista davidsbatista requested a review from a team as a code owner August 28, 2024 16:17
@davidsbatista davidsbatista requested review from shadeMe and anakin87 and removed request for a team and shadeMe August 28, 2024 16:17
Copy link
Member

@anakin87 anakin87 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The logic looks good.

I found some small opportunities for improvement.
Let's also add a small unit test for the new class method.

integrations/pinecone/tests/test_document_store.py Outdated Show resolved Hide resolved
@@ -257,3 +260,59 @@ def test_embedding_retrieval(self, document_store: PineconeDocumentStore):
assert len(results) == 2
assert results[0].content == "Most similar document"
assert results[1].content == "2nd best document"

@pytest.mark.skipif("PINECONE_API_KEY" not in os.environ, reason="PINECONE_API_KEY not set")
def test_sentence_window_retriever(self, sleep_time):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this test could be made much shorter if we use the document_store fixture, as done for example in test_write_documents_duplicate_overwrite.

The fixture should take care of creating and deleting the document store, automatically sleep after writing docs, etc...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good point 👍🏽 thanks

@davidsbatista
Copy link
Contributor Author

Do you think an extra unit test is necessary? It's a private function, and testing its functionality is already covered by other test

@anakin87
Copy link
Member

Do you think an extra unit test is necessary? It's a private function, and testing its functionality is already covered by other test

I'll leave it up to you; I'd like to have it.

@davidsbatista
Copy link
Contributor Author

Do you think an extra unit test is necessary? It's a private function, and testing its functionality is already covered by other test

I'll leave it up to you; I'd like to have it.

Copy link
Member

@anakin87 anakin87 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I pushed some small changes.
I'll merge if the tests pass.

@anakin87 anakin87 merged commit 52da6c7 into main Aug 29, 2024
6 checks passed
@anakin87 anakin87 deleted the fix-make-sentence-window-retriever-work-picone branch August 29, 2024 11:44
Amnah199 pushed a commit that referenced this pull request Oct 2, 2024
)

* Pinecone converting floats back to int

* linting

* simplifying tests

* fixing tests

* fixing test

* linting issues

* adding and sleep between inserting and querying

* cleaning the indexes after tests ran

* ruff fix

* fixing tests

* Update integrations/pinecone/tests/test_document_store.py

Co-authored-by: Stefano Fiorucci <[email protected]>

* changing docstring for private utility function

* adding unit tests to private function

* fixing linting

* fixing linting according to black rules

* fixing linting

* removing time sleep, it's contained in the fixture

* fixes

---------

Co-authored-by: Stefano Fiorucci <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
integration:pinecone type:documentation Improvements or additions to documentation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants