Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

adjust sourmash index argparse to accomodate --from-file #1066

Closed
ctb opened this issue Jun 29, 2020 · 8 comments · Fixed by #1186
Closed

adjust sourmash index argparse to accomodate --from-file #1066

ctb opened this issue Jun 29, 2020 · 8 comments · Fixed by #1186
Labels
4.0 issues to address for a 4.0 release

Comments

@ctb
Copy link
Contributor

ctb commented Jun 29, 2020

in #1059, we introduce sourmash index <sbt> --from-file <list-of-sigs>, but because of argparse behavior, the only way to keep backwards compatibility with nargs='+' for args.signatures is to require at least one signature file on the command line. we can fix this in 4.0.

@ctb ctb added the 4.0 issues to address for a 4.0 release label Jun 29, 2020
@ctb
Copy link
Contributor Author

ctb commented Jun 30, 2020

(the same is true of sourmash lca index)

@nmb85
Copy link

nmb85 commented Jul 29, 2020

While trying to build my own updated version of a genbank sbt db, I run this command with v3.4.1rc1:
sourmash index -k 31 /data/ncbi/genomes/all/GCA/gca_genbank.sbt.zip --from-file /data/ncbi/genomes/all/GCA/gca_latest_genomic_sig.txt

I get this error msg:
Indexing the sourmash signatures in an sbt json database
usage:

sourmash index -k 31 dbname *.sig

Create an on-disk database of signatures that can be searched in low
memory with 'search' and 'gather'. All signatures must be the same
k-mer size, molecule type, and num/scaled; the standard signature
selectors (-k/--ksize, --scaled, --dna/--protein) choose which
signatures to be added.

The key options for index are:

  • -k/--ksize <int>: k-mer size to select
  • --dna or --protein: nucleotide or protein signatures (default --dna`)
  • --traverse-directory: load all signatures below this directory

If dbname ends with .sbt.json, index will create the database as a
collection of multiple files, with an index dbname.sbt.json and a
subdirectory .sbt.dbname. If dbname ends with .sbt.zip, index
will create a zip archive containing the multiple files. For sourmash
v2 and v3, sbt.json will be added automatically; this behavior will
change in sourmash v4 to default to .sbt.zip.


index: error: the following arguments are required: signatures
Getting accession numbers from the sbt db
... at leaf 0

Is this error related to this issue?

@ctb
Copy link
Contributor Author

ctb commented Jul 29, 2020

yes, alas! you just need to provide one of the signatures on the command line. Unfortunately the command line parsing utility we use doesn't permit me to fix this for --from-file without breaking other 3.x uses!

@nmb85
Copy link

nmb85 commented Jul 29, 2020

I wish I could contribute some effort to the club good, but although I can read python and write data analysis scripts in jupyter, etc., I'm not savvy to conventions of software development. Is there a way that I can bang my head against some of the cli command peculiarities to improve them? Maybe write up a complete tutorial on how to prepare an sbt and lca db from a fresh dl of all genbank *_genomic.fna.gz files (building on 2018-ncbi-lineages instructions and existing tutorials, but filling in some gaps)? Seems like a broad-enough use case to justify writing a tutorial...

@ctb
Copy link
Contributor Author

ctb commented Jul 29, 2020

well, we're automating that - https://github.com/dib-lab/sourmash_databases/ - it's too big a lift for any one person I think!

but anything you can do to make suggestions, improve documentation by engaging in issues, etc. is REALLY helpful. most users of sourmash are silent :). I/we can do the code changes... it's the brainstorming/broad thinking where we need help!

@ctb
Copy link
Contributor Author

ctb commented Jul 29, 2020

like, excellent feedback that I want to start providing myself in issues is,

I tried merging signatures like so,

sourmash sig cat <database> | sourmash sig flatten - | sourmash sig merge -

and it didn't work, why not? can that be made to work?

and then we'll work on expanding the next set of adjacent possibilities ™️ !

@nmb85
Copy link

nmb85 commented Jul 29, 2020

Okay, I'll start complaining - in a kind way - on these issues as I trip across them :)

@ctb
Copy link
Contributor Author

ctb commented Jul 29, 2020 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
4.0 issues to address for a 4.0 release
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants