allow gather to mixed databases, letting manifests/selectors sort it out? #2200

bluegenes · 2022-08-12T16:49:06Z

While running gather, we search with a single ksize and now use manifests to figure out what signatures we can search against. That means:

if we have multiple ksizes in a zip database, we only search against the right one and everything is fine.
BUT,
if we provide multiple ksizes over different databases, we error out with: no compatible signatures found in '{database}', rather than searching against the compatible sigs.

Could we just allow search databases to have no compatible signatures, as long as there are some compatible signatures to search?

use case: makes passing in search databases easier for cases where you may want to run more than one ksize in a workflow.

The text was updated successfully, but these errors were encountered:

ctb · 2022-08-12T17:05:58Z

hmm, maybe. see #1426, especially my comment:

I really like the idea that with manifests, we just output something like this:
loaded/found a total of X sketches
after sketch selection, Y sketches remaining

IIRC, the history is:

before zipfiles, we (mostly) had databases that were locked to a single ksize. so if an incompatible database was specified, it seemed good to error out.

and, before manifests, we had no fast/easy way of counting the number of matching signatures, so we had problems doing accurate and informative display of number of signatures early on in the process.

but this has all changed. while I think we do want to support legacy formats, the existence of select and manifests and the default inclusion of manifests in zipfiles means we can do something nicer here.

as another sign that we should be doing something different, I note this ugly bit of code in commands.py::prefetch --

        for db in databases:
            counter = None
            try:
                counter = db.counter_gather(prefetch_query, args.threshold_bp)
            except ValueError:
                if picklist or pattern_search:
                    # catch "no signatures to search" ValueError from filtering
                    continue
                else:
                    raise       # re-raise other errors, if no picklist.

which is needed to maintain the error exit when no compatible signatures exist... I keep on stumbling over this and would very much like to refactor it!

codewise, it should be as simple as calling len() on each database after a select, although I'd want to check that the various databases (LCA, SqliteIndex, and SBTs) handle picklists/pattern search properly with len first.

ctb · 2022-08-14T16:34:13Z

#2204 fixes this issue, and cleans out some unreachable code, too.

note that prefetch mostly does the right thing already and continues past empty databases, although the output message could be clearer. The output message is also fixed in #2204.

ctb · 2022-08-14T16:34:21Z

Duplicate of #1637 :)

ctb mentioned this issue Aug 14, 2022

[MRG] update database load UX for gather etc. #2204

Merged

4 tasks

ctb closed this as completed in #2204 Aug 15, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

allow gather to mixed databases, letting manifests/selectors sort it out? #2200

allow gather to mixed databases, letting manifests/selectors sort it out? #2200

bluegenes commented Aug 12, 2022

ctb commented Aug 12, 2022

ctb commented Aug 14, 2022

ctb commented Aug 14, 2022

allow gather to mixed databases, letting manifests/selectors sort it out? #2200

allow gather to mixed databases, letting manifests/selectors sort it out? #2200

Comments

bluegenes commented Aug 12, 2022

ctb commented Aug 12, 2022

ctb commented Aug 14, 2022

ctb commented Aug 14, 2022