Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

simplify/refactor load_* sig/db functions in sourmash_args.py #1877

Open
ctb opened this issue Mar 10, 2022 · 8 comments
Open

simplify/refactor load_* sig/db functions in sourmash_args.py #1877

ctb opened this issue Mar 10, 2022 · 8 comments

Comments

@ctb
Copy link
Contributor

ctb commented Mar 10, 2022

There's a confusing mess of loading functions in sourmash_args.py that's slowly converging as we simplify and refactor our Index handling code.

In no particular order,

there's query loading code, load_query_signature. This is responsible for loading a single signature. I think it can probably stay.

there's generic file loading code, load_file_as_signatures and load_file_as_index. I think these can be combined pretty easily now, since they are both simple wrappers around another function.

there's subject loading code, load_many_signatures and load_dbs_and_sigs, which I bet could be combined. The main difference seems to be in diagnostic output.

there's utility functions, traverse_find_sigs and load_pathlist_from_file, which can probably be moved under sourmash.index now, or otherwise refactored out of sourmash_args.

and finally the SignatureLoadingProgress class can probably go away, since CLI functions are mostly moving away from loading long lists of signatures.

@ctb
Copy link
Contributor Author

ctb commented Mar 10, 2022

more generally, per throwaway comment in #1871, would be good to take a holistic look at all the crud in the argument loading/parsing code for commands.py and sig/__main__.py and figure out how much of it can be put in one place. I'm guessing a lot of the signature loading compatibility code can be refactored into a single function.

@ctb
Copy link
Contributor Author

ctb commented Mar 12, 2022

also relevant: picking query ksize automatically to match provided databases #809

@ctb
Copy link
Contributor Author

ctb commented Mar 25, 2022

ref #1894 - remove is_database?

@ctb
Copy link
Contributor Author

ctb commented May 4, 2022

ref #1312

@ctb
Copy link
Contributor Author

ctb commented May 4, 2022

ref #1062 and #1060

@ctb
Copy link
Contributor Author

ctb commented Aug 3, 2022

wow, this has become a tangled mess of interconnected issues with no clear path forward. yay!

the main comment from the issues that resonates most is from #1426:

I'm wondering if the right answer is to track the total number of signatures in a collection (using e.g. manifests) and when doing a search of some kind, provide a generic indicator of what fraction of the collection is actually being searched? This should be straightforward.

@ctb
Copy link
Contributor Author

ctb commented Aug 14, 2022

PR #2204 cleans up signature load/selection reporting for load_dbs_and_sigs.

@ctb
Copy link
Contributor Author

ctb commented May 14, 2024

moving parts of this comment here -

  • load_query_signature is used in only a few places, and is only used for CLI functionality around loading a query signature from a file. It is not very well written.

My current hot take is that

ctb added a commit that referenced this issue Jun 4, 2024
…signatures (#3153)

Fix `sig overlap` and `sig subtract` to take more than just JSON
signatures.

Also, adds a function `sourmash_args.load_one_signature` that I think
should (eventually) replace the now-deprecated
`sourmash.signature.load_one_signature`. This will be the topic of a new
PR - for now, I think it's a nice quick fix!

Fixes #3136

Related issues:
* #1062 - will do another
PR to close this issue
* #1877
* #1312
* #1060

TODO:
- [x] test uncovered code
- [x] do a bit more of a search and digest of related issues to see if
there's other low hanging fruit

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant