Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mash screen winner-take-all and multiple best number of hash hits #159

Open
a-damC opened this issue Jun 3, 2021 · 1 comment
Open

Mash screen winner-take-all and multiple best number of hash hits #159

a-damC opened this issue Jun 3, 2021 · 1 comment

Comments

@a-damC
Copy link

a-damC commented Jun 3, 2021

Hello,
I have noticed that a discrepancy between the results of using mash screen with and without the winner-take-all (wta) option.
For context I am screening a number of genomes (more specifically their constituent contigs) to see if they contain plasmids (which have been sketched -s 1000 -k 21).

I have found that if a contig has a multiple hits to a plasmid and the best plasmid hits have the same number of hash hits in a non-wta screen, this result will not appear in the wta screen - It appears the wta cannot choose between the many best hits and therefore picks none of them.

I have yet to try it but I think larger sketches and kmer values could circumvent this problem - having said this plasmids I am using are from the PLSDB, a very extensive catalogue of plasmids which does contain distinct but very similar plasmids and hence a bottom sketch method may still produce this problem.

I hope I have worded this clearly, please let me know if you need more detail.

Kind regards,Adam
P.S. I just graduated from a masters from Bioinf a few months ago, where I assumed BLAST was king of the heuristics. It has been really interesting to work with mash. Thanks

@ondovb
Copy link
Member

ondovb commented Jun 18, 2021

Hi Adam,
I think I understand the problem but I am not able to reproduce it. Is it possible that these plasmids are disappearing with WTA because all their hashes are assigned to another, better-scoring plasmid? A way to test that would be to isolate the ties in their own sketch, then start adding other ones. I can look into this more as well if you are able to share your data (my email is here).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants