-
Notifications
You must be signed in to change notification settings - Fork 80
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How do I use the API to created scaled signature? #289
Comments
Went into the code base hand found some questions :) I assume I can initialize a scaled signature like so:
Recarding what "scaled" does conceptually. It seems to me that it places an upper bound on the hash (space). When I then You could tell me to look at
but I am not very proficient in C++ :( |
On Fri, Jun 30, 2017 at 09:14:51AM +0000, Adrian Viehweger wrote:
Went into the code base hand found some questions :)
I assume I can initialize a scaled signature like so:
```
import sourmash_lib as sm
scaled=10000
sig = sm.MinHash(ksize=16, n=1000, max_hash=sm.MAX_HASH/scaled)
```
yes!
Recarding what "scaled" does conceptually. It seems to me that it places an upper bound on the hash (space). When I then `add_sequence`, what happens to kmers that hash to above that upper bound? Are they discarded? I.e., having initialized the signature as scaled, can I treat it from then on (programmatically) as I would an unscaled signature, trusting in that it takes care of the scaling thing?
yes, it places an upper bound on hash space, and they are discarded! you
can treat signatures programmatically the same way, yes.
but you cannot compare scaled to non-scaled without doing some extra work,
and if you compute signatures with two different scaled values you must
downsample them to the common scaled value. We have functions to do to
that, will go find them and get back to you :)
|
thank you, that would be great. I'm glad I finally got my head around minhash scaling. btw: are there any references to this scaling technique? |
no, not yet. it's mildly novel and still under exploration.
|
As a side note, I have a [WIP PR][0] to try to make CLI commands available
as Python functions.
[0]: #245
|
that would be convenient 👍 although I would miss that "in the code trenches" feeling ;) |
Did you have a chance to look? |
Sorry for the long delay... checkout the
|
Please re-open if you have any questions! |
#436 adds an example into the docs! |
Like, from within Python, how can I do the equivalent of
sourmash compute ... --scaled 100 ...
Thanks a lot!
The text was updated successfully, but these errors were encountered: