Internationalization for RAG apps: Session Resources #65
Replies: 3 comments 2 replies
-
The code used for the query prep was: completion = client.chat.completions.create(
model=completions_deployment,
messages= [
{
"role": "system",
"content":
"""
Generate a full-text search query for a SQL database based on a user query.
Do not generate the whole SQL query; only generate string to go inside the MATCH parameter for FTS5 indexes.
Use SQL boolean operators if the user has been specific about what they want to exclude in the search.
If the query is not in English, always translate the query to English.
If you cannot generate a search query, return just the number 0.
"""
},
{ "role": "user",
"content": f"Generate a search query for: A really nice winter jacket"
},
{ "role": "assistant",
"content": "winter jacket"
},
{ "role": "user",
"content": "Generate a search query for: 夏のドレス"
},
{ "role": "assistant",
"content": "summer dress"
},
{
"role": "user",
"content": f"Generate a search query for: {query}"
}],
max_tokens=100, # maximum number of tokens to generate
n=1, # return only one completion
stop=None, # stop at the end of the completion
temperature=0.3, # more predictable
stream=False, # return the completion as a single string
seed=1, # seed for reproducibility
)
|
Beta Was this translation helpful? Give feedback.
-
Hi @tonybaloney one doubt. In the case were you need to index multiple documents from different languages (pt-BR and en-US in my case), and the user is, most probably a pt-BR speaker, whats the best approach? Using text-3-large is a good start, as pointed out in your presentation, but what else can we Do? Should we instruct the LLM to "think" in english (like to generate the search query)? Translate the documents before vectorizing? Store the language of the document on the database to full text search? Remembering the the output to the user will have to be in the same language he asked the question (which I know might be another issue since some RAG docs will be in english and this might cofuse the LLM). Thanks for the great session! Really got me thinking :) |
Beta Was this translation helpful? Give feedback.
-
This is the resources and discussion thread for this live stream: https://aka.ms/raghack/language
It will be updated with links after the session.
Recording: https://www.youtube.com/watch?v=GWb6fICZWZY
Please ask any follow-up questions here, and the speakers will see them.
Beta Was this translation helpful? Give feedback.
All reactions