-
Notifications
You must be signed in to change notification settings - Fork 80
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
"Pre-filtering" use case #1697
Comments
huh, that is a confusing error message. I can sort of dimly intuit what might be going on, but I'll have to dig! one immediate problem I see is that you'll want both in re the beginning question,
sourmash is unlikely to be faster, since mappers have been optimized quite a bit :). But k-mers were used for many years for filtering, before the Burroughs-Wheeler Transform methods became dominant; and software like mashmap uses k-mer sketching to do similar things, but only for longer sequences where the downsampling that happens with sketching still provides strong guarantees. You're heading in the right direction with scaled=10, but we have developed the spacegraphcats software to do graph-based k-mer matching, and that is used for situations where you're trying to retrieve graph-adjacent sequences where mappers simply won't work. thanks for posting the problem! I'll see what I can do to figure it out :) |
Hi, sorry for the big delay. I just wanted to mention that the issue persisted after using the Thanks a lot for your thoughts. I finally had the time to try |
closed in favor of genome-grist - dib-lab/genome-grist#193 |
Hi there sourmash team, I am having this issue with sourmash gather. I just updated from 4.4 to 4.6 and the issue persists. I see there was work done with genome-grist #193 and this issue was recently closed, but am unsure how to solve the issue myself. To generate signatures I'm running:
No output is being generated but I also do not have an error or warning message. @ctb Let me know if I should create a new issue for this. |
hi @sivico26 try adding If that works, what is happening is this: two different sketches are being generated by the The More generally, you can diagnose this kind of thing with Let me know if that works! Or if it doesn't 😆 |
Hi @ctb , thank you so much for the rapid response. :) I'm regenerating the signatures now. |
Works great! Thanks @ctb |
welcome! glad it was an easy fix 🎉 |
Hi there!
I wanted to try Sourmash for a specific use case, but after going through the docs I am still not sure how is the best way to use it.
I have a set of raw reads for which I want to know if they match to a genome or not, and filter them on that basis. Usually, you would do this by mapping those reads to the genome, but I wonder if it could be done faster with sketching tools such as Sourmash.
Specifically, I have a genome skimming dataset and wanted to extract the chloroplast reads. So my database is actually very small, but the number of queries is quite large.
This is what I was trying to do:
And I am getting this:
I also tried to specify
--dna
, not putting--ksize
, or any sensible combination but they generated the same error. I found that the same kind of error was reported in #1028 and #1089, but those seem triggered bylca
andmultigather
, while this one emerged callingsearch
.By the way, I am using
sourmash 4.2.1
installed through conda.Any thoughts on this use case, approach, and errors would be greatly appreciated.
Regards
The text was updated successfully, but these errors were encountered: