-
Notifications
You must be signed in to change notification settings - Fork 80
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Compute oxidation #845
Compute oxidation #845
Conversation
Codecov Report
@@ Coverage Diff @@
## master #845 +/- ##
==========================================
- Coverage 91.19% 78.89% -12.3%
==========================================
Files 69 82 +13
Lines 4916 6993 +2077
Branches 0 479 +479
==========================================
+ Hits 4483 5517 +1034
- Misses 433 1174 +741
- Partials 0 302 +302
Continue to review full report at Codecov.
|
4a47061
to
ab08a76
Compare
c63d815
to
9f547cd
Compare
c6f4f84
to
ddf4bc5
Compare
I held off on fixing |
f9a7624
to
03bd393
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice work!
sourmash/command_compute.py
Outdated
return [sig] | ||
|
||
|
||
def save_siglist(siglist, output_fp, filename=None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe change output_fp
to something like output_filename
here?
03bd393
to
80bcf26
Compare
add benchmarks for add_sequence add_protein in Rust add a test for params, fix scaled default allow building ComputeParameters from compute argparse args
80bcf26
to
69ce7a6
Compare
This PR expose functionality in a way that doesn't requiring moving all the Python compute code into Rust.
make_minhashes
,add_seq
,build_siglist
,save_siglist
) to the outside scope, and made all the arguments explicit. It would be hard to refactor otherwise...add_sequence
andadd_protein
methods forSourmashSignature
. The idea here is to only cross the FFI layer once for each sequence, and being able to control better what can be improved on the Rust side (like using rayon for parallelization?).add_protein
inMinHash
was also moved to Rust, and it is 100x faster 🤣 /cc @olgabotadd_protein
) and in Rust (foradd_sequence
)add_sequence
in Rust is way simpler, and way faster. Turns out doing simpler things and letting the compiler optimize was better than my micro-optimizations =PFunny bits:
ComputeParameters
setters and getters are... a lot. But it's also very repetitive code (and maybe could become a macro, but then it becomes harder to reason about/fix bugs later).args
in thecompute
function signature, but since Rust doesn't have default arguments for functions I went in the other direction there (collapsing all the args into aComputeParameters
struct). Python's default arguments are so nice.Previous blurb:
I'm using this branch for implementing
decoct compute
and figuring out how to:Checklist
make test
Did it pass the tests?make coverage
Is the new code covered?without a major version increment. Changing file formats also requires a
major version number increment.
changes were made?