Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

allow gather to mixed databases, letting manifests/selectors sort it out? #2200

Closed
bluegenes opened this issue Aug 12, 2022 · 3 comments · Fixed by #2204
Closed

allow gather to mixed databases, letting manifests/selectors sort it out? #2200

bluegenes opened this issue Aug 12, 2022 · 3 comments · Fixed by #2204

Comments

@bluegenes
Copy link
Contributor

While running gather, we search with a single ksize and now use manifests to figure out what signatures we can search against. That means:

  • if we have multiple ksizes in a zip database, we only search against the right one and everything is fine.
    BUT,
  • if we provide multiple ksizes over different databases, we error out with: no compatible signatures found in '{database}', rather than searching against the compatible sigs.

Could we just allow search databases to have no compatible signatures, as long as there are some compatible signatures to search?

use case: makes passing in search databases easier for cases where you may want to run more than one ksize in a workflow.

@ctb
Copy link
Contributor

ctb commented Aug 12, 2022

hmm, maybe. see #1426, especially my comment:

I really like the idea that with manifests, we just output something like this:

loaded/found a total of X sketches
after sketch selection, Y sketches remaining

IIRC, the history is:

before zipfiles, we (mostly) had databases that were locked to a single ksize. so if an incompatible database was specified, it seemed good to error out.

and, before manifests, we had no fast/easy way of counting the number of matching signatures, so we had problems doing accurate and informative display of number of signatures early on in the process.

but this has all changed. while I think we do want to support legacy formats, the existence of select and manifests and the default inclusion of manifests in zipfiles means we can do something nicer here.

as another sign that we should be doing something different, I note this ugly bit of code in commands.py::prefetch --

        for db in databases:
            counter = None
            try:
                counter = db.counter_gather(prefetch_query, args.threshold_bp)
            except ValueError:
                if picklist or pattern_search:
                    # catch "no signatures to search" ValueError from filtering
                    continue
                else:
                    raise       # re-raise other errors, if no picklist.

which is needed to maintain the error exit when no compatible signatures exist... I keep on stumbling over this and would very much like to refactor it!

codewise, it should be as simple as calling len() on each database after a select, although I'd want to check that the various databases (LCA, SqliteIndex, and SBTs) handle picklists/pattern search properly with len first.

@ctb
Copy link
Contributor

ctb commented Aug 14, 2022

#2204 fixes this issue, and cleans out some unreachable code, too.

note that prefetch mostly does the right thing already and continues past empty databases, although the output message could be clearer. The output message is also fixed in #2204.

@ctb
Copy link
Contributor

ctb commented Aug 14, 2022

Duplicate of #1637 :)

@ctb ctb closed this as completed in #2204 Aug 15, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants