-
Notifications
You must be signed in to change notification settings - Fork 80
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
improve sketching performance #860
Comments
thank you!! stuff like this is red meat to @luizirber I think :)
|
This is what I suggest if you want to try it:
There are 4 benchmarks for add_sequence, and (I think) they cover most cases in the
Nice, I'll try it out! I'm also looking into the faster revcomp discussion, which can also help us.
@camillescott has suggestions too, for keeping track of valid/invalid positions (using a There was also a PR for the ntHash crate with similar suggestions (lookup vs match), and it was way faster. Thanks for the great suggestions! |
I used the current master (aka a601b4a). |
I am trying to assemble a number of these super fast DNA processing routines into a project I call libdna. It will take some time and/or help until its finished, though. |
I am glad that I could help. |
Mash has recently integrated some changes making it faster. I think sourmash could also benefit from them. However, my Rust isn't good enough for an actual pull request. So instead I will just point out the issues and suggest solutions.
In minhash.rs#L684
_checkdna
is repeatedly called on the same characters. Mash achieved a 30% performance boost by adding just one more counter.The function
_checkdna
uses amatch
statement which compiles to a chain of cmp and jumps. Using a lookup table will give you much better performance.Best,
Fabian
The text was updated successfully, but these errors were encountered: