-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Adding embedding cache for gdc case #76
Conversation
bdikit/mapping_algorithms/scope_reducing/_algorithms/contrastive_learning/cl_api.py
Outdated
Show resolved
Hide resolved
bdikit/mapping_algorithms/scope_reducing/_algorithms/contrastive_learning/cl_api.py
Outdated
Show resolved
Hide resolved
bdikit/mapping_algorithms/scope_reducing/_algorithms/contrastive_learning/cl_api.py
Outdated
Show resolved
Hide resolved
bdikit/mapping_algorithms/scope_reducing/_algorithms/contrastive_learning/cl_api.py
Outdated
Show resolved
Hide resolved
@EduardoPena Just tested and it seems to be working fine here. I think my last concern now is what happens when we update the model. We should probably include the model name as a subdirectory in the cache file path so that we don't reuse the cached embedding of a previous model. |
@aecio , the last commit integrates the model name as a parameter for the cache. Let me know if we are ready to merge into devel. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me.
When merging, I would use the squash option to merge everything in a single clean commit. |
This is related to #51 issue. This feature allows fast interactions with the user, for example, from ~30s to ~1.5s when matching the 740 columns of GDC using the CL approach. We might want to add support for arbitrary data sets as well, but I am not sure.