-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ENH] VoyageAI embedding function #1871
Conversation
VoyageAI Embedding function
Reviewer ChecklistPlease leverage this checklist to ensure your code review is thorough before approving Testing, Bugs, Errors, Logs, Documentation
System Compatibility
Quality
|
self._client.embed( | ||
input[i : i + self._batch_size], | ||
model=self._model_name, | ||
input_type="document", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is there a separate input_type
for queries? https://docs.voyageai.com/docs/embeddings#python-api
can you expand on how this needs to work?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jeffchuber, reading the API docs, I help but notice that there is special prompt that gets prepended to each document. We don't have semantics to represent what the embedding will be used for, and I think we should not default to anything else than None
to avoid skewing results.
body: JSON.stringify({ | ||
input: texts.slice(index, index + this.batch_size), | ||
model: this.model_name, | ||
truncation: this.truncation, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does this take a document
/query
param like python?
Voyageai embedding function
Voyageai embedding function
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few nits on the JS implementation and let's drop the python functionality in favor of #1327
[ENH] Corrections due to the comments
[ENH] Corrections due to the comments
Corrections due to the comments: removing the loop, raising exception
@tazarov Can you please recheck this PR? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good. Thank you so much for sticking with this @fzowl.
small nit: I think we recently added prettier support did you try to run that (I see some formatting changes in index.ts
@tazarov I just tried, but Prettier modified lots of files for me. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
I know. We recently added prettier to |
Our underlying impl has changed and so this PR is not landable as is. That being said - we'd still like to add this functionality and that is now tracked in this issue. |
Description of changes
Summarize the changes made by this PR.
Test plan
How are these changes tested?
pytest
for python,yarn test
for js,cargo test
for rustDocumentation Changes
Are all docstrings for user-facing APIs updated if required? Do we need to make documentation changes in the docs repository?