-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Poor results for RAG with Vector/Hybrid/Semantic search] #40983
Comments
Thank you for your feedback. Tagging and routing to the team member best able to assist. |
Hi @securigy , Sorry to hear the results aren't what you are expecting. Here's some answers to your questions in the comments
There have been backwards-incompatible changes to the nuget, but the same code that works on 11.5.0 should work with 11.5.0-beta5. For the list of specific changes to the 2023-11-01 api please review https://learn.microsoft.com/en-us/azure/search/search-api-migration#upgrade-to-2023-11-01
semantic configuration is not a file name. It's a JSON object that describes how semantic ranking will work. The name "my-semantic-config" means you can reference this config in the query by using the name "my-semantic-config". Here's a couple thoughts as to how you can improve search quality:
I hope this helps, |
searchOptions = CreateSearchOptions(searchTypeInt, k, embeddings); When I do the above and my filter word is "Amy" I get an error: Parameter name: $filter Content: The NamedEntities string field is very useful, it contains coma-separated words and phrases, like names of people, addresses, location names, dates, etc. So maybe I dont understand how filter is supposed to work. I tried it without single quate, and get the error: |
Assigning to @mattmsft and tagging as Service Attention |
Thanks for the feedback! We are routing this to the appropriate team for follow-up. cc @arv100kri @bleroy @tjacobhi. |
I hope this helps, |
#2. I will take a look, but based what I experienced with chunking Word, PDF, Excel, TEXT and CSV - chunking is different for every file type...
#3. I use Analytics and it does it for me. It really extracts names, addresses, titles, etc. and does a great job - verified results.
#4. I have a problem with filtering. I tried the suggestion ofsearchOptions.Filter = "NamedEntities eq 'Amy'" or
searchOptions.Filter = "NamedEntities eq '{filterText}'"
searchOptions.Filter = "NamedEntities eq {Amy}"
searchOptions.Filter = "NamedEntities eq {filterText}"
and it is either throws exception or kills any all the search results. Basically, my impression is that it will only produce search results where the field NamedEntities has the word Amy.
Am I wrong?
Another thing that I observed is that when the produce (8) results are ok, some of them really relevant info and I feed them into GPT-3.5-turbo-16k along with the prompt "Who is Amy" - it produces 0 responses...
On Thursday, January 4, 2024 at 11:06:29 AM PST, Matt ***@***.***> wrote:
- I’m glad you have a specific chunking strategy in mind.
- The chunking strategy for integrated vectorization is documented, and it’s customizable https://learn.microsoft.com/en-us/azure/search/cognitive-search-skill-textsplit
- How are you extracting named entities from your chunks? As long as you’re confident your named entity extraction strategy works then you can leave the filter in, but if it doesn’t find the most relevant named entities it might not improve search quality.
- What’s the type of the NamedEntities field? The syntax for filtering strings and collections of strings is different.
I hope this helps,
Matt
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you were mentioned.Message ID: ***@***.***>
|
NamedEntities is a string filed. The results of the Azure Analytics query produces strings that I concatenate into one string where words and phrases produced by analytics are separated by comma. So how would you find where filter text is includes in the NamedEntities as exact match or as part of the phrase.
|
Since NamedEntities is a string field, you might want to use the search.ismatchscoring function to issue a sub-query targeted towards just that field. Regarding why the filter "Source eq '{fileName}'" returns no results - it's possible the file name you have specified isn't present in the index with the exact same casing or spacing. eq is an exact string match. I hope this helps, |
Thank you @mattgotteiner for the assistance here. |
Hi @securigy. Thank you for opening this issue and giving us the opportunity to assist. We believe that this has been addressed. If you feel that further discussion is needed, please add a comment with the text "/unresolve" to remove the "issue-addressed" label and continue the conversation. |
search.ismatchscoring does not exist in C#/AzureSDK. |
Hi @securigy, since you haven’t asked that we |
Library name and version
Azure.Search.Documents and Azure.AI.OpenAI
Query/Question
I put together a RAG system. I segmented 2 Word documents on paragraph boundaries using only full sentences and successfully ingested them with Azure API. My search has a choice of Vector, Hybrid (vector + text), and Semantic (vector + text + semantic reordering). Those 2 documents are my resume and my friend's resume. When I ask "Who is [my name]?" I get a decent answer. However, when I ask "Who is [my friend's name] I get nothing in the form of "Based on provided information there is no data on...".
I tried that as either of 3 modes of search I mentioned above. The code that defines my search is :
An here is how the index is constructed:
`
As you can see, in addition to all I use Analytics (not shown in the displayed code) in order to extract Named Entities from every text segment and populate NamedEntities field for every segment I ingest, and KeywordsFields accordingly. When I use
Calling the code is as follows:
Now, I get responses that are relevant, that is, valid responses, where some of them have valid description who the person is, professionally. But when I feed the prompt along with the 8 results that I get above from vector search into Completion API, I get nothing.
I do use GPT-3.5-turbo-16k, because for whatever reason GPT-4 is not available to me at this time in Azure. So, my first suspicion is: is it because I do not use GPT-4? Is there any other reason you could think about? If you'd like you I can provide you with prompt and the context (8 results from vector search).
Environment
Windows 11, VS2022, Azure.Search.Document 11.5.1, Azure.AI.OpenAI 1.0.0-beta.11, Embeddings 1.0.0-beta.9,
Microsoft.AspNetCore.OpenAI 7.0.13
The text was updated successfully, but these errors were encountered: