Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Retrieve MinHash from LSHForest #234

Merged
merged 6 commits into from
Mar 11, 2024
Merged

Conversation

123epsilon
Copy link
Contributor

@123epsilon 123epsilon commented Mar 2, 2024

Adds the ability to retrieve a MinHash from the MinHashLSHForest object, which is useful in workflows where we wish to manually threshold the results of a top-k query from the LSHForest.

lsh = MinHashLSHForest(...)
mh = lsh.get_minhash_from_key(mykey)

# now use mh.jaccard() ...

Please let me know if I should change any naming/formatting, I tried my best to keep in step with what was already present.

@ekzhu

closes #233

datasketch/lshforest.py Outdated Show resolved Hide resolved
Copy link
Owner

@ekzhu ekzhu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the delay. Some small things. Could you take a look?

test/test_lshforest.py Outdated Show resolved Hide resolved
datasketch/lshforest.py Outdated Show resolved Hide resolved
@123epsilon
Copy link
Contributor Author

@ekzhu Just pushed changes to preallocate the buffer

@ekzhu ekzhu merged commit f0ae48b into ekzhu:master Mar 11, 2024
6 checks passed
@ekzhu
Copy link
Owner

ekzhu commented Mar 11, 2024

Thanks @123epsilon for the hard work. This has been merged.

@ekzhu ekzhu mentioned this pull request Mar 11, 2024
@123epsilon
Copy link
Contributor Author

@ekzhu Thanks for the support!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Implementing MinHash retrieval from keys for MinHashLSHForest
2 participants