Skip to content

Commit

Permalink
[MRG] fix doc titles in command-line.md and update description a bit (
Browse files Browse the repository at this point in the history
#1874)

* fix doc titles for prefetch section

* fix a few things, and add more details on sourmash

* Update doc/command-line.md

Co-authored-by: Tessa Pierce Ward <[email protected]>

Co-authored-by: Tessa Pierce Ward <[email protected]>
  • Loading branch information
ctb and bluegenes authored Mar 10, 2022
1 parent ba38d14 commit dff5309
Showing 1 changed file with 12 additions and 9 deletions.
21 changes: 12 additions & 9 deletions doc/command-line.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,13 +8,19 @@ From the command line, sourmash can be used to create
[MinHash sketches][0] from DNA and protein sequences, compare them to
each other, and plot the results; these sketches are saved into
"signature files". These signatures allow you to estimate sequence
similarity quickly and accurately in large collections, among other
capabilities.
similarity and containment quickly and accurately in large
collections, among other capabilities.

sourmash also provides a suite of metagenome functionality. This
includes genome search in metagenomes, metagenome decomposition into a
list of genomes from a database, and taxonomic classification
functionality.

Please see the [mash software][1] and the
[mash paper (Ondov et al., 2016)][2] for background information on
how and why MinHash sketches work.

how and why MinHash sketches work. The [FracMinHash preprint (Irber et al,
2022)](https://www.biorxiv.org/content/10.1101/2022.01.11.475838) describes
FracMinHash sketches as well as the metagenome-focused features of sourmash.

sourmash uses a subcommand syntax, so all commands start with
`sourmash` followed by a subcommand specifying the action to be
Expand Down Expand Up @@ -102,9 +108,6 @@ Finally, there are a number of utility and information commands:
Please use the command line option `--help` to get more detailed usage
information for each command.

Note that as of sourmash v3.4, all commands should load signatures from
indexed databases (the SBT and LCA formats) as well as from signature files.

### `sourmash sketch` - make sourmash signatures from sequence data

Most of the commands in sourmash work with **signatures**, which contain information about genomic or proteomic sequences. Each signature contains one or more **sketches**, which are compressed versions of these sequences. Using sourmash, you can search, compare, and analyze these sequences in various ways.
Expand Down Expand Up @@ -404,15 +407,15 @@ Other options include:
* `--force` to continue past survivable errors;
* `--picklist` will select a subset of signatures to search, using [a picklist](#using-picklists-to-subset-large-collections-of-signatures)

### Alternative search mode for low-memory (but slow) search: `--linear`
#### Alternative search mode for low-memory (but slow) search: `--linear`

By default, `sourmash prefetch` uses all information available for
faster search. In particular, for SBTs, `prefetch` will prune the search
tree. This can be slow and/or memory intensive for very large databases,
and `--linear` asks `sourmash prefetch` to instead use a linear search
across all leaf nodes in the tree.

### Caveats and comments
#### Caveats and comments

`sourmash prefetch` provides no guarantees on output order. It runs in
"streaming mode" on its inputs, in that each input file is loaded,
Expand Down

0 comments on commit dff5309

Please sign in to comment.