Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[python] New embeddings API #1023
[python] New embeddings API #1023
Changes from 6 commits
d126590
c221799
43534b0
b0472e3
ddafdf0
337b571
77d28a5
c91b1e8
1f53a94
49ced87
299aeb7
a867985
4656d89
1cbc2bc
520941e
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note: this will cause the census object to be re-opened. While this shouldn't be an issue, it will result into an extra call. With some effort I can refactor
get_embedding
to also accept an existing Census object, but I'm not sure if it's worth it.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMHO, you should refactor the code to have a (common, shared) function that accepts an already open Census handle
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I created a live corpus unit test for this method. This should ensure that this parsing method remains consistent across releases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this code has some lint - please run (an up to date) pre-commit across it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Options:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
strongly prefer 1) and having a
verbose
argumentThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this whole blob could be a simple comprehension which might make it more pythonic (this is a nit, up to you). E.g,
And unless there are duplicates expected, I'm not sure what the set adds? If there are duplicates, doesn't that imply you need more filter criteria?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The set is because multiple embeddings can exist for a single alias, and in this case we're only interested in the census version string, so it needs to be deduplicated. I'll rewrite using the comprehension.