feat: Adding embedding cache for gdc case #76

EduardoPena · 2024-07-16T20:27:30Z

This is related to #51 issue. This feature allows fast interactions with the user, for example, from ~30s to ~1.5s when matching the 740 columns of GDC using the CL approach. We might want to add support for arbitrary data sets as well, but I am not sure.

bdikit/mapping_algorithms/scope_reducing/_algorithms/contrastive_learning/cl_api.py

bdikit/utils.py

bdikit/mapping_algorithms/scope_reducing/_algorithms/contrastive_learning/cl_api.py

aecio · 2024-07-24T16:32:24Z

@EduardoPena Just tested and it seems to be working fine here. I think my last concern now is what happens when we update the model. We should probably include the model name as a subdirectory in the cache file path so that we don't reuse the cached embedding of a previous model.

t.

EduardoPena · 2024-07-25T16:33:22Z

@aecio , the last commit integrates the model name as a parameter for the cache. Let me know if we are ready to merge into devel.

aecio

Looks good to me.

aecio · 2024-07-25T17:21:03Z

When merging, I would use the squash option to merge everything in a single clean commit.

Adding embedding cache for gdc case

8cc3c2d

EduardoPena requested review from aecio and roquelopez July 16, 2024 20:27

aecio reviewed Jul 17, 2024

View reviewed changes

Providing global access to GDC data and using that for cache lookup

6e9427c

aecio requested a review from EdenWuyifan July 22, 2024 22:52

aecio changed the title ~~Adding embedding cache for gdc case~~ feat: Adding embedding cache for gdc case Jul 24, 2024

EduardoPena added 2 commits July 25, 2024 11:26

Merge remote-tracking branch 'origin/devel' into caching_gdc_embedding

76a7e75

t.

Rebase

f3810ca

aecio approved these changes Jul 25, 2024

View reviewed changes

EduardoPena merged commit 0f5a164 into devel Jul 25, 2024
10 checks passed

roquelopez deleted the caching_gdc_embedding branch July 25, 2024 18:41

roquelopez mentioned this pull request Aug 9, 2024

Cache embeddings computed for GDC columns #51

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Adding embedding cache for gdc case #76

feat: Adding embedding cache for gdc case #76

EduardoPena commented Jul 16, 2024

aecio commented Jul 24, 2024

EduardoPena commented Jul 25, 2024

aecio left a comment

aecio commented Jul 25, 2024

feat: Adding embedding cache for gdc case #76

feat: Adding embedding cache for gdc case #76

Conversation

EduardoPena commented Jul 16, 2024

aecio commented Jul 24, 2024

EduardoPena commented Jul 25, 2024

aecio left a comment

Choose a reason for hiding this comment

aecio commented Jul 25, 2024