-
Notifications
You must be signed in to change notification settings - Fork 80
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[MRG] add sourmash sig grep
#1864
Conversation
Codecov Report
@@ Coverage Diff @@
## latest #1864 +/- ##
==========================================
+ Coverage 82.38% 90.46% +8.07%
==========================================
Files 119 91 -28
Lines 12944 8859 -4085
Branches 1729 1751 +22
==========================================
- Hits 10664 8014 -2650
+ Misses 2016 579 -1437
- Partials 264 266 +2
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report at Codecov.
|
@taylorreiter @bluegenes @luizirber any thoughts on additional functionality beyond regexps, -v, and -i? this is already an immediately useful command 😆 |
Maybe also |
yeah, I like this! what about other outputs, like and another brainstorming thought - I'm guessing it might be handy to support switching the output to a manifest-style CSV so you can use this to create picklists. |
Added I think (modulo tests and some refactoring) this PR might be feature complete, at least until we get some experience in using the functionality and have more ideas on what to add. I'll push forward on the tests as I have time, but it's pretty usable already, so let me know if you have any thoughts or comments! |
Welp, got the tests and docs done sooner than expected. Ready for review & merge @sourmash-bio/devs! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like sig grep
works with picklist input as well - picklist is applied first, then pattern match. If yes, I think we need a test for this case (pattern + picklist input). Happy to review again with add'l test, or leave to your discretion.
if picklist: | ||
sourmash_args.report_picklist(args, picklist) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does this need a test for grep + picklist? I see one for when this will fail (LCA)...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ooh, nice catch! test_sig_grep_7_picklist_md5_lca_fail
tests the failure, but there's no test for the successful use of --picklist
. Will add.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added in 76b2b02
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good!
Co-authored-by: Tessa Pierce Ward <[email protected]>
🎉 |
Implements
sourmash sig grep
as in #1075.sig grep
does substring matching on name, filename, and md5. It supports regular expressions, as well as the-v
(exclude instead of include) and-i
(case insensitive) options. It also supports CSV output in manifest format (for use as picklists), and the-c/--count
option to just count the number of signatures in each file.A key feature of
grep
is that it (by default) requires manifests. That way users won't get caught up accidentally in searches of very large databases w/o manifests, which has been happening to me a lot recently.Fixes #1075.
Examples
Extract by acccession:
Fail when no manifest present:
Count some things and output a manifest:
Notes
eventually we should probably add the default manifest requirement to several more sig subcommands, like extract, but this will require a major version bump.
TODO
sig grep
command? #1075 and extract non-grep stuff to new issue