-
Notifications
You must be signed in to change notification settings - Fork 80
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
write up downsampling details #407
Comments
Partially addressed in #436. TODO:
|
Just to explain the above notes a bit more -- I guess a few points need to be made. First, signatures can only be compared if they have the same num or the same scaled values. (That's intrinsic to the math.) Downsampling to a common num or scaled value is always possible with signatures, but you lose resolution as you downsample because downsampling essentially always involves removing hashes. SBTs use bloom filters for all internal nodes, and there is no way to remove hashes from a bloom filter. Therefore internal nodes cannot be downsampled without recalculating from scratch, which we don't support currently, and see no reason to support. This means that you can't search a signature with a scaled of 1e4 against of an SBT built on scaled signatures with scaled=1e3. Signatures can be downsampled to match an SBT, however. So you can search a signature with a scaled of 1e2 against an SBT built on scaled signatures with scaled=1e3, because you can raise the scaled of the signature through downsampling. Note, LCA databases can be downsampled. |
see #928 for discussion of why having multi-scaled queries in gather would be hard to implement, due to the need to subtract low resolution signatures from high resolution signatures. |
note that #1420 put this information into the |
there's an increasing amount of logic in sourmash around making sure that things get downsampled properly so that search/gather return accurate results.
this logic should be documented, and then enforced/made simple through the API.
FOR EXAMPLE,
sourmash gather
(as currently implemented) potentially downsamples with each match.The text was updated successfully, but these errors were encountered: