You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
will search for the query genome query.sig in one or more metagenome.sig files, producing decent human-readable output and
(optionally) useful CSV outputs.
sourmash prefetch supports metagenome overlap search against many
genomes, which is the reverse of this use case. Moreover, prefetch doesn't provided weighted results
and its output isn't friendly.
sourmash gather has friendly and useful output, but can't be used to
calculate the overlap between a single query genome and many subject
metagenomes.
The manysearch command of the sourmash branchwater plugin
also does a nice containment search like this plugin, but it doesn't
provide nice human-readable output and it also doesn't provide
weighted results. (manysearch is, however, much lower memory &
probably a fair bit faster because it's mostly in Rust.)
Advanced info: implementation details
This command is streaming, in the sense that it will load each
metagenome, calculate the match, and then discard the metagenome.
Hence its memory usage peaks with the largest metagenome, and its max
should be driven by the size of the query + the size of the largest
metagenome.
The text was updated successfully, but these errors were encountered:
https://github.com/sourmash-bio/sourmash_plugin_containment_search/
From the README:
sourmash_plugin_containment_search: improved containment search for genomes in metagenomes
This plugin provides a command
sourmash scripts mgsearch
thatprovides new & nicer output for searching genomes against metagenomes.
Installation
Usage
This command:
will search for the query genome
query.sig
in one or moremetagenome.sig
files, producing decent human-readable output and(optionally) useful CSV outputs.
For example,
produces:
This plugin will work with all the standard sourmash database types, too.
Note that the metagenomes must have been sketched with
-p abund
.Backstory: Why this command?
sourmash search
supports sample search x sample search, broadly -perhaps too
broadly. And the output formats aren't that helpful.
sourmash prefetch
supports metagenome overlap search against manygenomes, which is the reverse of this use case. Moreover,
prefetch doesn't provided weighted results
and its output isn't friendly.
sourmash gather
has friendly and useful output, but can't be used tocalculate the overlap between a single query genome and many subject
metagenomes.
There is also some interest in
reverse containment search.
The
manysearch
command ofthe sourmash branchwater plugin
also does a nice containment search like this plugin, but it doesn't
provide nice human-readable output and it also doesn't provide
weighted results. (
manysearch
is, however, much lower memory &probably a fair bit faster because it's mostly in Rust.)
Advanced info: implementation details
This command is streaming, in the sense that it will load each
metagenome, calculate the match, and then discard the metagenome.
Hence its memory usage peaks with the largest metagenome, and its max
should be driven by the size of the query + the size of the largest
metagenome.
The text was updated successfully, but these errors were encountered: