You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hello,
I have noticed that a discrepancy between the results of using mash screen with and without the winner-take-all (wta) option.
For context I am screening a number of genomes (more specifically their constituent contigs) to see if they contain plasmids (which have been sketched -s 1000 -k 21).
I have found that if a contig has a multiple hits to a plasmid and the best plasmid hits have the same number of hash hits in a non-wta screen, this result will not appear in the wta screen - It appears the wta cannot choose between the many best hits and therefore picks none of them.
I have yet to try it but I think larger sketches and kmer values could circumvent this problem - having said this plasmids I am using are from the PLSDB, a very extensive catalogue of plasmids which does contain distinct but very similar plasmids and hence a bottom sketch method may still produce this problem.
I hope I have worded this clearly, please let me know if you need more detail.
Kind regards,Adam
P.S. I just graduated from a masters from Bioinf a few months ago, where I assumed BLAST was king of the heuristics. It has been really interesting to work with mash. Thanks
The text was updated successfully, but these errors were encountered:
Hi Adam,
I think I understand the problem but I am not able to reproduce it. Is it possible that these plasmids are disappearing with WTA because all their hashes are assigned to another, better-scoring plasmid? A way to test that would be to isolate the ties in their own sketch, then start adding other ones. I can look into this more as well if you are able to share your data (my email is here).
Hello,
I have noticed that a discrepancy between the results of using mash screen with and without the winner-take-all (wta) option.
For context I am screening a number of genomes (more specifically their constituent contigs) to see if they contain plasmids (which have been sketched -s 1000 -k 21).
I have found that if a contig has a multiple hits to a plasmid and the best plasmid hits have the same number of hash hits in a non-wta screen, this result will not appear in the wta screen - It appears the wta cannot choose between the many best hits and therefore picks none of them.
I have yet to try it but I think larger sketches and kmer values could circumvent this problem - having said this plasmids I am using are from the PLSDB, a very extensive catalogue of plasmids which does contain distinct but very similar plasmids and hence a bottom sketch method may still produce this problem.
I hope I have worded this clearly, please let me know if you need more detail.
Kind regards,Adam
P.S. I just graduated from a masters from Bioinf a few months ago, where I assumed BLAST was king of the heuristics. It has been really interesting to work with mash. Thanks
The text was updated successfully, but these errors were encountered: