-
Notifications
You must be signed in to change notification settings - Fork 80
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Depleting kmers shared with another reference #3180
Comments
this is what gather does automatically - if you run If you want to do this manually, you can use You can also use The venn diagram plugin will also happily plot the overlap for you. It will return the same numbers as Ask as you have questions! |
Great, I will use I can see from Does Many thanks again for your very clear explanations! |
Great! You could also use
Yes! It may seem a little redundant but it's just a (slightly inefficient) way of making sure you know exactly where the abundances in the output sketch are coming from. And... the reason I didn't mention abundance before is that (at least for bacterial and archaeal genomes) I tend to recommend not using the abundance on the genome side, as the genomes are mostly single copy. Instead, in my bac/arc-focused metagenomics work, abundances are mostly used with the metagenomes, where abundance information can be really valuable. Note that for sourmash gather, the metagenome abundance is the only thing that's used. It's assumed that the genome sketches are flat (and ignores their abundances) because the gather algorithm doesn't apply to non-flat references.
Welcome! |
And this produce the same distinct kmers as
Ah so for metagenomic analysis is it only the query needs to be sketched with My end goal is to use this 'subtracted' genome as part of the reference database for metagenomic analysis. |
:) but it becomes a problem to find the darn commands... the issue tracker is a great place to search as well. Hmm, it might make the most sense to think about it as sets of k-mers - if you intersect A and B to get C, then
Correct!
No, it isn't! The abundances are just ignored. (This is an undocumented feature, so I've created an issue to document it! #3181)
Yep, that will work fine! I'll be curious if you find that this approach works better than gather on the unmodified sketches (but no worries if you don't look into that comparison ;). |
Hi @ctb Ive created the custom signature using
Many thanks, |
They're being added in #3162, but are not yet in a release, I'm afraid! For now you'll need to use |
hi @Amanda-Biocortex |
Hi,
I am looking to remove kmers from a reference genome which are shared with another genome.
For example, bacteria strain A and B share many regions of DNA however also have unique regions. I would like to create a signature for strain A reference which is depleted for kmers shared with strain B (ie the ignature hold kmers unique to strain A).
Can Sourmash search be used for this?
Many thanks,
Amanda
The text was updated successfully, but these errors were encountered: