-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ENH] Removing new line cleanup from OpenAI EF (python) #2125
base: main
Are you sure you want to change the base?
[ENH] Removing new line cleanup from OpenAI EF (python) #2125
Conversation
The latest updates on your projects. Learn more about Vercel for Git ↗︎
|
This stack of pull requests is managed by Graphite. Learn more about stacking. |
Reviewer ChecklistPlease leverage this checklist to ensure your code review is thorough before approving Testing, Bugs, Errors, Logs, Documentation
System Compatibility
Quality
|
I agree that this is correct. However, this may impact people who already have collections using the old EF. They would have to migrate, and we should let them know when we release this. Assigning myself here so that I remember to do this when this releases. |
I had similar realization that users' new queries will have slightly worse outcome for exact matches, but perhaps that would be marginal. I'll do a small test to prove myself wrong |
Another minutiae from my investigation was that non-code first gen models (-001) need new line removal, but it feels that while available from API perspective are being silently phased out by OpenAI (couldn't find refs on the latest embeddings OAI docs). |
Whether or not this makes retrieval accuracy worse, users may be relying on the embeddings themselves, e.g. for deduplication. We must notify them after this change. |
Will it make sense to add a flag (constructor param) so that whoever needs/uses new line removals can do so going forward? But perhaps making an opt-in thing by default + the warning for the change. |
I think we should just pull the trigger and not complicate the interface with cruft, while ensuring we communicate to users. |
Closes #2129
Description of changes
Summarize the changes made by this PR.
Test plan
How are these changes tested?
pytest
for python,yarn test
for js,cargo test
for rustDocumentation Changes
N/A