Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AzureSearch vectorstore does not work asyncronously #24064

Closed
5 tasks done
thedavgar opened this issue Jul 10, 2024 · 0 comments
Closed
5 tasks done

AzureSearch vectorstore does not work asyncronously #24064

thedavgar opened this issue Jul 10, 2024 · 0 comments
Labels
🤖:bug Related to a bug, vulnerability, unexpected error with an existing feature Ɑ: vector store Related to vector store module

Comments

@thedavgar
Copy link
Contributor

Checked other resources

  • I added a very descriptive title to this issue.
  • I searched the LangChain documentation with the integrated search.
  • I used the GitHub search to find a similar question and didn't find it.
  • I am sure that this is a bug in LangChain rather than my code.
  • The bug is not resolved by updating to the latest stable version of LangChain (or the specific integration package).

Example Code

I am trying to use the Azure AI Search Vectorstore and retriever and the vectorstore and retriever (given from the vectorstore) work perfectly when doing retrieval of documents using the syncronous methods but gives an error when trying to run the async methods.

Creating the instances of embeddings and Azure Search

from azure.search.documents.indexes.models import (
    SearchField,
    SearchFieldDataType,
    SimpleField,
)
from langchain_openai import AzureOpenAIEmbeddings
from langchain_community.vectorstores import AzureSearch

fields = [
  SimpleField(
      name="content",
      type=SearchFieldDataType.String,
      key=True,
      filterable=True,
  ),
  SearchField(
      name="content_vector",
      type=SearchFieldDataType.Collection(SearchFieldDataType.Single),
      searchable=True,
      vector_search_dimensions=1536,
      vector_search_profile_name="myHnswProfile",
  ),
  SimpleField(
      name="document_name",
      type=SearchFieldDataType.String,
      key=True,
      filterable=True,
  )
]

encoder = AzureOpenAIEmbeddings(
    azure_endpoint=os.getenv("EMBEDDINGS_OPENAI_ENDPOINT"),
    deployment=os.getenv("EMBEDDINGS_DEPLOYMENT_NAME"),
    openai_api_version=os.getenv("OPENAI_API_VERSION"),
    openai_api_key=os.getenv("AZURE_OPENAI_API_KEY"),
)

vectorstore = AzureSearch(
    azure_search_endpoint=os.getenv("AI_SEARCH_ENDPOINT_SECRET"),
    azure_search_key=os.getenv("AI_SEARCH_API_KEY"),
    index_name=os.getenv("AI_SEARCH_INDEX_NAME_SECRET"),
    fields=fields,
    embedding_function=encoder,
)

retriever = vectorstore.as_retriever(search_type="hybrid", k=2)

Syncronous methods working and returning documents

vectorstore.vector_search("what is the capital of France")
retriever.invoke("what is the capital of France")

Asyncronous methods working and returning documents

await vectorstore.avector_search("what is the capital of France")
await retriever.ainvoke("what is the capital of France")

Error Message and Stack Trace (if applicable)


KeyError Traceback (most recent call last)
Cell In[15], line 1
----> 1 await vectorstore.avector_search("what is the capital of France")

File ~/.local/lib/python3.11/site-packages/langchain_community/vectorstores/azuresearch.py:695, in AzureSearch.avector_search(self, query, k, filters, **kwargs)
682 async def avector_search(
683 self, query: str, k: int = 4, *, filters: Optional[str] = None, **kwargs: Any
684 ) -> List[Document]:
685 """
686 Returns the most similar indexed documents to the query text.
687
(...)
693 List[Document]: A list of documents that are most similar to the query text.
694 """
--> 695 docs_and_scores = await self.avector_search_with_score(
696 query, k=k, filters=filters
697 )
698 return [doc for doc, _ in docs_and_scores]

File ~/.local/lib/python3.11/site-packages/langchain_community/vectorstores/azuresearch.py:742, in AzureSearch.avector_search_with_score(self, query, k, filters, **kwargs)
730 """Return docs most similar to query.
731
732 Args:
(...)
739 to the query and score for each
740 """
741 embedding = await self._aembed_query(query)
--> 742 docs, scores, _ = await self._asimple_search(
743 embedding, "", k, filters=filters, **kwargs
744 )
746 return list(zip(docs, scores))

File ~/.local/lib/python3.11/site-packages/langchain_community/vectorstores/azuresearch.py:1080, in AzureSearch._asimple_search(self, embedding, text_query, k, filters, **kwargs)
1066 async with self._async_client() as async_client:
1067 results = await async_client.search(
1068 search_text=text_query,
1069 vector_queries=[
(...)
1078 **kwargs,
1079 )
-> 1080 docs = [
1081 (
1082 _result_to_document(result),
1083 float(result["@search.score"]),
1084 result[FIELDS_CONTENT_VECTOR],
1085 )
1086 async for result in results
1087 ]
1088 if not docs:
1089 raise ValueError(f"No {docs=}")

File ~/.local/lib/python3.11/site-packages/langchain_community/vectorstores/azuresearch.py:1084, in (.0)
1066 async with self._async_client() as async_client:
1067 results = await async_client.search(
1068 search_text=text_query,
1069 vector_queries=[
(...)
1078 **kwargs,
1079 )
1080 docs = [
1081 (
1082 _result_to_document(result),
1083 float(result["@search.score"]),
-> 1084 result[FIELDS_CONTENT_VECTOR],
1085 )
1086 async for result in results
1087 ]
1088 if not docs:
1089 raise ValueError(f"No {docs=}")

KeyError: 'content_vector'

Description

The async methods for searching documents (at least) do not work and raise an error. The async client is not being used for async methods for retrieval possibly.

System Info

langchain==0.2.6
langchain-community==0.2.4
langchain-core==0.2.11
langchain-openai==0.1.8
langchain-text-splitters==0.2.1

@dosubot dosubot bot added Ɑ: vector store Related to vector store module 🤖:bug Related to a bug, vulnerability, unexpected error with an existing feature labels Jul 10, 2024
isahers1 added a commit that referenced this issue Jul 12, 2024
…24081)

Thank you for contributing to LangChain!

**Description**:
This PR fixes a bug described in the issue in #24064, when using the
AzureSearch Vectorstore with the asyncronous methods to do search which
is also the method used for the retriever. The proposed change includes
just change the access of the embedding as optional because is it not
used anywhere to retrieve documents. Actually, the syncronous methods of
retrieval do not use the embedding neither.

With this PR the code given by the user in the issue works.

```python
vectorstore = AzureSearch(
    azure_search_endpoint=os.getenv("AI_SEARCH_ENDPOINT_SECRET"),
    azure_search_key=os.getenv("AI_SEARCH_API_KEY"),
    index_name=os.getenv("AI_SEARCH_INDEX_NAME_SECRET"),
    fields=fields,
    embedding_function=encoder,
)

retriever = vectorstore.as_retriever(search_type="hybrid", k=2)

await vectorstore.avector_search("what is the capital of France")
await retriever.ainvoke("what is the capital of France")
```

**Issue**:
The Azure Search Vectorstore is not working when searching for documents
with asyncronous methods, as described in issue #24064

**Dependencies**:
There are no extra dependencies required for this change.

---------

Co-authored-by: isaac hershenson <[email protected]>
isahers1 pushed a commit that referenced this issue Aug 13, 2024
**Description**
Fix the asyncronous methods to retrieve documents from AzureSearch
VectorStore. The previous changes from [this
commit](ffe6ca9)
create a similar code for the syncronous methods and the asyncronous
ones but the asyncronous client return an asyncronous iterator
"AsyncSearchItemPaged" as said in the issue #24740.
To solve this issue, the syncronous iterators in asyncronous methods
where changed to asyncronous iterators.

@chrislrobert said in [this
comment](#24740 (comment))
that there was a still a flaw due to `with` blocks that close the client
after each call. I removed this `with` blocks in the `async_client`
following the same pattern as the sync `client`.

In order to close up the connections, a __del__ method is included to
gently close up clients once the vectorstore object is destroyed.

**Issue:** #24740 and #24064
**Dependencies:** No new dependencies for this change

**Example notebook:** I created a notebook just to test the changes work
and gives the same results as the syncronous methods for vector and
hybrid search. With these changes, the asyncronous methods in the
retriever work as well.

![image](https://github.com/user-attachments/assets/697e431b-9d7f-4d0d-b205-59d051ac2b67)


**Lint and test**: Passes the tests and the linter
olgamurraft pushed a commit to olgamurraft/langchain that referenced this issue Aug 16, 2024
…-ai#24921)

**Description**
Fix the asyncronous methods to retrieve documents from AzureSearch
VectorStore. The previous changes from [this
commit](langchain-ai@ffe6ca9)
create a similar code for the syncronous methods and the asyncronous
ones but the asyncronous client return an asyncronous iterator
"AsyncSearchItemPaged" as said in the issue langchain-ai#24740.
To solve this issue, the syncronous iterators in asyncronous methods
where changed to asyncronous iterators.

@chrislrobert said in [this
comment](langchain-ai#24740 (comment))
that there was a still a flaw due to `with` blocks that close the client
after each call. I removed this `with` blocks in the `async_client`
following the same pattern as the sync `client`.

In order to close up the connections, a __del__ method is included to
gently close up clients once the vectorstore object is destroyed.

**Issue:** langchain-ai#24740 and langchain-ai#24064
**Dependencies:** No new dependencies for this change

**Example notebook:** I created a notebook just to test the changes work
and gives the same results as the syncronous methods for vector and
hybrid search. With these changes, the asyncronous methods in the
retriever work as well.

![image](https://github.com/user-attachments/assets/697e431b-9d7f-4d0d-b205-59d051ac2b67)


**Lint and test**: Passes the tests and the linter
@dosubot dosubot bot added the stale Issue has not had recent activity or appears to be solved. Stale issues will be automatically closed label Oct 9, 2024
@dosubot dosubot bot closed this as not planned Won't fix, can't repro, duplicate, stale Oct 16, 2024
@dosubot dosubot bot removed the stale Issue has not had recent activity or appears to be solved. Stale issues will be automatically closed label Oct 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🤖:bug Related to a bug, vulnerability, unexpected error with an existing feature Ɑ: vector store Related to vector store module
Projects
None yet
Development

No branches or pull requests

1 participant