AzureSearch vectorstore does not work asyncronously #24064

thedavgar · 2024-07-10T11:28:29Z

Checked other resources

I added a very descriptive title to this issue.
I searched the LangChain documentation with the integrated search.
I used the GitHub search to find a similar question and didn't find it.
I am sure that this is a bug in LangChain rather than my code.
The bug is not resolved by updating to the latest stable version of LangChain (or the specific integration package).

Example Code

I am trying to use the Azure AI Search Vectorstore and retriever and the vectorstore and retriever (given from the vectorstore) work perfectly when doing retrieval of documents using the syncronous methods but gives an error when trying to run the async methods.

Creating the instances of embeddings and Azure Search

from azure.search.documents.indexes.models import (
    SearchField,
    SearchFieldDataType,
    SimpleField,
)
from langchain_openai import AzureOpenAIEmbeddings
from langchain_community.vectorstores import AzureSearch

fields = [
  SimpleField(
      name="content",
      type=SearchFieldDataType.String,
      key=True,
      filterable=True,
  ),
  SearchField(
      name="content_vector",
      type=SearchFieldDataType.Collection(SearchFieldDataType.Single),
      searchable=True,
      vector_search_dimensions=1536,
      vector_search_profile_name="myHnswProfile",
  ),
  SimpleField(
      name="document_name",
      type=SearchFieldDataType.String,
      key=True,
      filterable=True,
  )
]

encoder = AzureOpenAIEmbeddings(
    azure_endpoint=os.getenv("EMBEDDINGS_OPENAI_ENDPOINT"),
    deployment=os.getenv("EMBEDDINGS_DEPLOYMENT_NAME"),
    openai_api_version=os.getenv("OPENAI_API_VERSION"),
    openai_api_key=os.getenv("AZURE_OPENAI_API_KEY"),
)

vectorstore = AzureSearch(
    azure_search_endpoint=os.getenv("AI_SEARCH_ENDPOINT_SECRET"),
    azure_search_key=os.getenv("AI_SEARCH_API_KEY"),
    index_name=os.getenv("AI_SEARCH_INDEX_NAME_SECRET"),
    fields=fields,
    embedding_function=encoder,
)

retriever = vectorstore.as_retriever(search_type="hybrid", k=2)

Syncronous methods working and returning documents

vectorstore.vector_search("what is the capital of France")
retriever.invoke("what is the capital of France")

Asyncronous methods working and returning documents

await vectorstore.avector_search("what is the capital of France")
await retriever.ainvoke("what is the capital of France")

Error Message and Stack Trace (if applicable)

KeyError Traceback (most recent call last)
Cell In[15], line 1
----> 1 await vectorstore.avector_search("what is the capital of France")

File ~/.local/lib/python3.11/site-packages/langchain_community/vectorstores/azuresearch.py:695, in AzureSearch.avector_search(self, query, k, filters, **kwargs)
682 async def avector_search(
683 self, query: str, k: int = 4, *, filters: Optional[str] = None, **kwargs: Any
684 ) -> List[Document]:
685 """
686 Returns the most similar indexed documents to the query text.
687
(...)
693 List[Document]: A list of documents that are most similar to the query text.
694 """
--> 695 docs_and_scores = await self.avector_search_with_score(
696 query, k=k, filters=filters
697 )
698 return [doc for doc, _ in docs_and_scores]

File ~/.local/lib/python3.11/site-packages/langchain_community/vectorstores/azuresearch.py:742, in AzureSearch.avector_search_with_score(self, query, k, filters, **kwargs)
730 """Return docs most similar to query.
731
732 Args:
(...)
739 to the query and score for each
740 """
741 embedding = await self._aembed_query(query)
--> 742 docs, scores, _ = await self._asimple_search(
743 embedding, "", k, filters=filters, **kwargs
744 )
746 return list(zip(docs, scores))

File ~/.local/lib/python3.11/site-packages/langchain_community/vectorstores/azuresearch.py:1080, in AzureSearch._asimple_search(self, embedding, text_query, k, filters, **kwargs)
1066 async with self._async_client() as async_client:
1067 results = await async_client.search(
1068 search_text=text_query,
1069 vector_queries=[
(...)
1078 **kwargs,
1079 )
-> 1080 docs = [
1081 (
1082 _result_to_document(result),
1083 float(result["@search.score"]),
1084 result[FIELDS_CONTENT_VECTOR],
1085 )
1086 async for result in results
1087 ]
1088 if not docs:
1089 raise ValueError(f"No {docs=}")

File ~/.local/lib/python3.11/site-packages/langchain_community/vectorstores/azuresearch.py:1084, in (.0)
1066 async with self._async_client() as async_client:
1067 results = await async_client.search(
1068 search_text=text_query,
1069 vector_queries=[
(...)
1078 **kwargs,
1079 )
1080 docs = [
1081 (
1082 _result_to_document(result),
1083 float(result["@search.score"]),
-> 1084 result[FIELDS_CONTENT_VECTOR],
1085 )
1086 async for result in results
1087 ]
1088 if not docs:
1089 raise ValueError(f"No {docs=}")

KeyError: 'content_vector'

Description

The async methods for searching documents (at least) do not work and raise an error. The async client is not being used for async methods for retrieval possibly.

System Info

langchain==0.2.6
langchain-community==0.2.4
langchain-core==0.2.11
langchain-openai==0.1.8
langchain-text-splitters==0.2.1

…24081) Thank you for contributing to LangChain! **Description**: This PR fixes a bug described in the issue in #24064, when using the AzureSearch Vectorstore with the asyncronous methods to do search which is also the method used for the retriever. The proposed change includes just change the access of the embedding as optional because is it not used anywhere to retrieve documents. Actually, the syncronous methods of retrieval do not use the embedding neither. With this PR the code given by the user in the issue works. ```python vectorstore = AzureSearch( azure_search_endpoint=os.getenv("AI_SEARCH_ENDPOINT_SECRET"), azure_search_key=os.getenv("AI_SEARCH_API_KEY"), index_name=os.getenv("AI_SEARCH_INDEX_NAME_SECRET"), fields=fields, embedding_function=encoder, ) retriever = vectorstore.as_retriever(search_type="hybrid", k=2) await vectorstore.avector_search("what is the capital of France") await retriever.ainvoke("what is the capital of France") ``` **Issue**: The Azure Search Vectorstore is not working when searching for documents with asyncronous methods, as described in issue #24064 **Dependencies**: There are no extra dependencies required for this change. --------- Co-authored-by: isaac hershenson <[email protected]>

@chrislrobert

**Description** Fix the asyncronous methods to retrieve documents from AzureSearch VectorStore. The previous changes from [this commit](ffe6ca9) create a similar code for the syncronous methods and the asyncronous ones but the asyncronous client return an asyncronous iterator "AsyncSearchItemPaged" as said in the issue #24740. To solve this issue, the syncronous iterators in asyncronous methods where changed to asyncronous iterators. @chrislrobert said in [this comment](#24740 (comment)) that there was a still a flaw due to `with` blocks that close the client after each call. I removed this `with` blocks in the `async_client` following the same pattern as the sync `client`. In order to close up the connections, a __del__ method is included to gently close up clients once the vectorstore object is destroyed. **Issue:** #24740 and #24064 **Dependencies:** No new dependencies for this change **Example notebook:** I created a notebook just to test the changes work and gives the same results as the syncronous methods for vector and hybrid search. With these changes, the asyncronous methods in the retriever work as well. ![image](https://github.com/user-attachments/assets/697e431b-9d7f-4d0d-b205-59d051ac2b67) **Lint and test**: Passes the tests and the linter

@chrislrobert

…-ai#24921) **Description** Fix the asyncronous methods to retrieve documents from AzureSearch VectorStore. The previous changes from [this commit](langchain-ai@ffe6ca9) create a similar code for the syncronous methods and the asyncronous ones but the asyncronous client return an asyncronous iterator "AsyncSearchItemPaged" as said in the issue langchain-ai#24740. To solve this issue, the syncronous iterators in asyncronous methods where changed to asyncronous iterators. @chrislrobert said in [this comment](langchain-ai#24740 (comment)) that there was a still a flaw due to `with` blocks that close the client after each call. I removed this `with` blocks in the `async_client` following the same pattern as the sync `client`. In order to close up the connections, a __del__ method is included to gently close up clients once the vectorstore object is destroyed. **Issue:** langchain-ai#24740 and langchain-ai#24064 **Dependencies:** No new dependencies for this change **Example notebook:** I created a notebook just to test the changes work and gives the same results as the syncronous methods for vector and hybrid search. With these changes, the asyncronous methods in the retriever work as well. ![image](https://github.com/user-attachments/assets/697e431b-9d7f-4d0d-b205-59d051ac2b67) **Lint and test**: Passes the tests and the linter

dosubot bot added Ɑ: vector store Related to vector store module 🤖:bug Related to a bug, vulnerability, unexpected error with an existing feature labels Jul 10, 2024

thedavgar mentioned this issue Jul 10, 2024

community: Fix Bug in Azure Search Vectorstore search asyncronously #24081

Merged

chrislrobert mentioned this issue Jul 27, 2024

AzureSearch.avector_search_with_score() triggers "TypeError: 'AsyncSearchItemPaged' object is not iterable" when calling _results_to_documents() #24740

Open

5 tasks

thedavgar mentioned this issue Aug 1, 2024

community: fix AzureSearch vectorstore asyncronous methods #24921

Merged

dosubot bot added the stale Issue has not had recent activity or appears to be solved. Stale issues will be automatically closed label Oct 9, 2024

dosubot bot closed this as not planned Won't fix, can't repro, duplicate, stale Oct 16, 2024

dosubot bot removed the stale Issue has not had recent activity or appears to be solved. Stale issues will be automatically closed label Oct 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AzureSearch vectorstore does not work asyncronously #24064

AzureSearch vectorstore does not work asyncronously #24064

thedavgar commented Jul 10, 2024

AzureSearch vectorstore does not work asyncronously #24064

AzureSearch vectorstore does not work asyncronously #24064

Comments

thedavgar commented Jul 10, 2024

Checked other resources

Example Code

Error Message and Stack Trace (if applicable)

Description

System Info