-
Notifications
You must be signed in to change notification settings - Fork 15.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Removed redundant and potentially error cause validation for single doc OpenAI embedding #3819
Conversation
fix batch process of openai embedding to avoid errors in token
restore chunk_size to original value
Fetch fork master
Fix no attr
Upstrem merge
Also, as pointed out here, embeddings/openai.py imports tiktoken in a different way than elsewhere in langchain, so I adjusted it accordingly. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I reached the same conclusion on my fork. #3811 has more documentation on why this is the right fix.
Was your
https://github.com/hwchase17/langchain/blob/master/langchain/embeddings/openai.py#L107 |
@Hase-U Hi , could you, please, resolve the merging issues? After that ping me and I push this PR for the review. Thanks! |
Closing because the PR wouldn't line up with the current directory structure of the library (would need to be in /libs/langchain/langchain instead of /langchain). Feel free to reopen against the current head if it's still relevant! |
The line here should actually be a length comparison with the text as token
https://github.com/hwchase17/langchain/blob/18ec22fe56049aaea446406daab6d66d172dd48f/langchain/embeddings/openai.py#L210
But realistically, there is no need to use a function like
len(encode(text))
here, and we can use self._get_len_safe_embeddings by default.All langchain users will need to install tiktoken, but it's natural to think that using tiktoken is also necessary when using openai's embedding.
So the difference from the current situation is