Mash screen winner-take-all and multiple best number of hash hits #159

a-damC · 2021-06-03T13:21:47Z

Hello,
I have noticed that a discrepancy between the results of using mash screen with and without the winner-take-all (wta) option.
For context I am screening a number of genomes (more specifically their constituent contigs) to see if they contain plasmids (which have been sketched -s 1000 -k 21).

I have found that if a contig has a multiple hits to a plasmid and the best plasmid hits have the same number of hash hits in a non-wta screen, this result will not appear in the wta screen - It appears the wta cannot choose between the many best hits and therefore picks none of them.

I have yet to try it but I think larger sketches and kmer values could circumvent this problem - having said this plasmids I am using are from the PLSDB, a very extensive catalogue of plasmids which does contain distinct but very similar plasmids and hence a bottom sketch method may still produce this problem.

I hope I have worded this clearly, please let me know if you need more detail.

Kind regards,Adam
P.S. I just graduated from a masters from Bioinf a few months ago, where I assumed BLAST was king of the heuristics. It has been really interesting to work with mash. Thanks

ondovb · 2021-06-18T20:50:53Z

Hi Adam,
I think I understand the problem but I am not able to reproduce it. Is it possible that these plasmids are disappearing with WTA because all their hashes are assigned to another, better-scoring plasmid? A way to test that would be to isolate the ties in their own sketch, then start adding other ones. I can look into this more as well if you are able to share your data (my email is here).

ctb mentioned this issue Jun 7, 2021

gather does not break ties in any consistent manner sourmash-bio/sourmash#1366

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Mash screen winner-take-all and multiple best number of hash hits #159

Mash screen winner-take-all and multiple best number of hash hits #159

a-damC commented Jun 3, 2021

ondovb commented Jun 18, 2021 •

edited

Loading

Mash screen winner-take-all and multiple best number of hash hits #159

Mash screen winner-take-all and multiple best number of hash hits #159

Comments

a-damC commented Jun 3, 2021

ondovb commented Jun 18, 2021 • edited Loading

ondovb commented Jun 18, 2021 •

edited

Loading