Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How do I use the API to created scaled signature? #289

Closed
phiweger opened this issue Jun 30, 2017 · 10 comments
Closed

How do I use the API to created scaled signature? #289

phiweger opened this issue Jun 30, 2017 · 10 comments

Comments

@phiweger
Copy link

Like, from within Python, how can I do the equivalent of

sourmash compute ... --scaled 100 ...

import sourmash_lib as sm
sig = sm.MinHash(ksize=16, n=1000)
# now scale

Thanks a lot!

@phiweger
Copy link
Author

Went into the code base hand found some questions :)

I assume I can initialize a scaled signature like so:

import sourmash_lib as sm
scaled=10000
sig = sm.MinHash(ksize=16, n=1000, max_hash=sm.MAX_HASH/scaled)

Recarding what "scaled" does conceptually. It seems to me that it places an upper bound on the hash (space). When I then add_sequence, what happens to kmers that hash to above that upper bound? Are they discarded? I.e., having initialized the signature as scaled, can I treat it from then on (programmatically) as I would an unscaled signature, trusting in that it takes care of the scaling thing?

You could tell me to look at

mh = new KmerMinHash(n, ksize, is_protein, seed, max_hash)

but I am not very proficient in C++ :(

@ctb
Copy link
Contributor

ctb commented Jun 30, 2017 via email

@phiweger
Copy link
Author

thank you, that would be great.

I'm glad I finally got my head around minhash scaling.

btw: are there any references to this scaling technique?

@ctb
Copy link
Contributor

ctb commented Jun 30, 2017 via email

@luizirber
Copy link
Member

luizirber commented Jun 30, 2017 via email

@phiweger
Copy link
Author

that would be convenient 👍 although I would miss that "in the code trenches" feeling ;)

@phiweger
Copy link
Author

@ctb

We have functions to do to that, will go find them and get back to you :)

Did you have a chance to look?

@ctb
Copy link
Contributor

ctb commented Sep 20, 2017

Sorry for the long delay...

checkout the MinHash.downsample_scaled function. Usage:

            max_scaled = max(node.data.minhash.scaled, sig.minhash.scaled)
            mh1 = node.data.minhash.downsample_scaled(max_scaled)
            mh2 = sig.minhash.downsample_scaled(max_scaled)

@ctb
Copy link
Contributor

ctb commented Sep 21, 2017

Please re-open if you have any questions!

@ctb ctb closed this as completed Sep 21, 2017
@ctb ctb mentioned this issue Sep 21, 2017
@ctb
Copy link
Contributor

ctb commented Dec 27, 2018

#436 adds an example into the docs!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants