-
Notifications
You must be signed in to change notification settings - Fork 80
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[MRG] MinHash class refactoring #1128
Conversation
Codecov Report
@@ Coverage Diff @@
## latest #1128 +/- ##
==========================================
+ Coverage 92.33% 92.55% +0.21%
==========================================
Files 74 73 -1
Lines 5809 5764 -45
==========================================
- Hits 5364 5335 -29
+ Misses 445 429 -16
Continue to review full report at Codecov.
|
I think this might become close to mergable in 3.x if we deprecate rather than remove various methods, such as |
…ng compare to similarity
That would be awesome! Deprecating in 3.5 and removing in 4 is kind of short notice, but way better than just erroring. Still think we need a transition doc, but it can be way shorter ("check for DeprecationWarning!") |
Yay experimental PRs! Even if it is not all merged, we can pick most bits and pieces and add in other PRs =]
Yeah, I think the hash function switch on
I really like the view style!
I think the goal was to have something more
We can have a submodule for working with these translations, I think it is useful for prototyping new ideas.
I kinda like
👍
💯
I really like this style. |
I think we can also let many things persist ( |
Misc thoughts for putting this up for merge --
|
maaaybe think about which other parts of the top level API to deprecate. and/or put together some doctests for the top level API. |
hmm also make issue to refactor Rust API accordingly - e.g. (now #1134) |
|
heh, the last commit was necessary because of a fun chain of events:
|
Is this PR also covering #1145 now? In any case, we might need a PR derived from this one for the |
actually, this one should probably be merged into stable (and released), then
merged into latest, and then I should construct new PRs against latest to
remove the features to be removed in 4.0.
whee. :)
|
Co-authored-by: Luiz Irber <[email protected]>
Many minor refactors and associated deprecations of the
MinHash
class.Addresses some of #338
Fixes #611 by adding
flatten
Fixes #82, fixes #284, and fixes #896 by providing a
.hashes
attribute that implements a read-only dictionary interface to hashesFixes #951 by deprecating
is_molecule_type
.Fixes #618 by settling on
downsample(...)
as the downsampling API.Relevant to general MinHash API refactoring #720, #885 #999 but doesn't address core issues there.
TODO before merge:
MinHash
since it's darn usefultranslate_codon
or remove from MinHash class at any rate; @luizirber said "We can have a submodule for working with these translations, I think it is useful for prototyping new ideas."__init__
refactoring in a new issuemake test
Did it pass the tests?make coverage
Is the new code covered?without a major version increment. Changing file formats also requires a
major version number increment.
changes were made?
original comment - Pedestrian MinHash class refactoring for 4.0?
This is a trial cleanup & regularization of the MinHash class, to see how it all ...looks and feels. Comments welcome!
related issues:
#720
#82
#338
#611
#896
#999
Constructor refactoring
see specific @luizirber proposal in #999.
I'm kind of in favor of
n
->num
)changes to methods
(deprecate in 3.x, remove in 4.0.)
get_hashes
(?) orget_mins
, maybe in favor of a view object/property (see below) like.hashes
?subtract_mins
->
add_kmer`?translate_codon
or remove from MinHash class at any ratedownsample_n
todownsample_num
, ordownsample(num=...)
max_hash
stuff completely from public API, in favor ofscaled
mins
completely from public API, in favor ofhash
andhashes
is_molecule_type
Add:
flatten
to remove abundances.