[WIP] explore reverse indexing in LCA v2 databases #604
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Ref #306 #581 and #533.
In #533, we ensured that LCA databases contained all input hashes, whether or not the owning signatures had a lineage assignment. This makes them good candidates for general reverse indexing, but modifications are needed to make this functionality usable and friendly. Here is where we are experimenting with better functionality.
One question to think about in this PR is this: is
sourmash lca
about taxonomic stuff, or about all LCA databases, and if the latter, should we rename them from lca databases to something else, and/or should we support taxonomic information in regular signatures (which would support them in SBTs)? Deep thunks to be plumbed.Specifically, this PR:
sourmash lca revindex
command that mimicssourmash index
but does not require a taxonomy spreadsheet;sourmash lca gather
to be clearer when there is no available lineage assignment;sourmash lca gather
when there isn't a lineage assignment!lca gather
as per Layer phylogenetic options on to sourmash lca gather? #583, sincelca gather
is now only a mild increment on top ofgather
(since [MRG] add lca DBs as inputs to 'sourmash search' and 'gather' #533);Review checklist:
make test
Did it pass the tests?make coverage
Is the new code covered?without a major version increment. Changing file formats also requires a
major version number increment.
changes were made?