-
Notifications
You must be signed in to change notification settings - Fork 185
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Enhance phys/root/filter functionality #3250
feat: Enhance phys/root/filter functionality #3250
Conversation
WalkthroughThe changes introduce three new filtering rules ( Changes
Recent review detailsConfiguration used: CodeRabbit UI Files selected for processing (1)
Files skipped from review due to trivial changes (1)
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
0fcbce1
to
2fa7501
Compare
phys/root/filter/wrapper.py
Outdated
ROOT.DisableImplicitMT() | ||
|
||
# Parse criteria | ||
_smk_criteria = snakemake.params.get("criteria", "true") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Won't "true"
be interpreted as a string filtering criteria with 'criterion = "true"; label = "true"
? Is this intended?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, yes it will be! This is the behaviour in master currently—the only use case I can think of would be to copy a tree in one file to a differently named tree in another file, but there's rootcp
for this (may be the subject of a PR in the near future!). So I think here it would be safe to change this and require that criteria
is passed:
_smk_criteria = snakemake.params.get("criteria", "true") | |
_smk_criteria = snakemake.params.criteria |
where a missing params.criteria
should raise a TypeError
when being converted to criteria
and labels
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But isn't criteria
optional? At least that is what it says on the meta.yaml
file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
criteria
is optional currently, but there isn't a use case I can think of for not passing criteria
which isn't already covered by rootcp
(for which I've opened #3251). I'll update this in meta.yaml
to reflect this
@coderabbitai review |
Actions performedReview triggered.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
Outside diff range and nitpick comments (1)
phys/root/filter/test/Snakefile (1)
34-52
: Consider increasing the number of threads.The
filter_dict
rule is implemented correctly:
- It specifies the input and output files correctly.
- The criteria dictionary contains valid ROOT expressions with named keys that filter events based on the specified thresholds for
pt
andpz
.- It saves the specified branches.
- Enabling verbose mode can provide additional logging information.
However, the rule uses only 1 thread for processing. Consider increasing the number of threads to improve the processing performance, similar to the other filtering rules.
Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Files selected for processing (3)
- phys/root/filter/test/Snakefile (1 hunks)
- phys/root/filter/wrapper.py (2 hunks)
- test.py (1 hunks)
Additional context used
Path-based instructions (2)
phys/root/filter/wrapper.py (2)
Pattern
**/*.py
: Do not try to improve formatting.
Do not suggest type annotations for functions that are defined inside of functions or methods.
Do not suggest type annotation of theself
argument of methods.
Do not suggest type annotation of thecls
argument of classmethods.
Do not suggest return type annotation if a function or method does not contain areturn
statement.
Pattern
**/wrapper.py
: Do not complain about use of undefined variable calledsnakemake
.test.py (1)
Pattern
**/*.py
: Do not try to improve formatting.
Do not suggest type annotations for functions that are defined inside of functions or methods.
Do not suggest type annotation of theself
argument of methods.
Do not suggest type annotation of thecls
argument of classmethods.
Do not suggest return type annotation if a function or method does not contain areturn
statement.
Ruff
phys/root/filter/wrapper.py
9-9: Undefined name
snakemake
(F821)
10-10: Undefined name
snakemake
(F821)
15-15: Undefined name
snakemake
(F821)
28-28: Undefined name
snakemake
(F821)
30-30: Undefined name
snakemake
(F821)
30-30: Undefined name
snakemake
(F821)
34-34: Undefined name
snakemake
(F821)
46-46: Undefined name
snakemake
(F821)
Additional comments not posted (12)
phys/root/filter/test/Snakefile (2)
1-15
: LGTM!The
filter_str
rule is implemented correctly:
- It specifies the input and output files correctly.
- The criteria string is a valid ROOT expression that filters events based on the specified thresholds for
pt
andpz
.- It saves the specified branches.
- Using 2 threads can improve the processing performance.
17-32
: LGTM!The
filter_list
rule is implemented correctly:
- It specifies the input and output files correctly.
- The criteria list contains valid ROOT expressions that filter events based on the specified thresholds for
pt
andpz
.- It saves the specified branches.
- Using 2 threads can improve the processing performance.
- Enabling verbose mode can provide additional logging information.
phys/root/filter/wrapper.py (7)
1-3
: LGTM!The addition of the new contributor to the
__author__
and__email__
metadata is appreciated. Keeping the metadata up to date is a good practice.
6-6
: LGTM!The import statement for
Dict
andList
types from thetyping
module is necessary for using them in the type hints. Using type hints is a good practice for better code readability and maintainability.
9-12
: LGTM!The refined logic for enabling or disabling implicit multi-threading based on the number of threads specified by
snakemake.threads
is a good optimization. It ensures that multi-threading is used only when the available resources allow for it.Tools
Ruff
9-9: Undefined name
snakemake
(F821)
10-10: Undefined name
snakemake
(F821)
14-26
: LGTM!The modified criteria parsing logic is a great enhancement. Supporting multiple formats (string, list, dictionary) for specifying the filtering criteria makes the code more flexible and user-friendly. The code correctly handles the different types of criteria and constructs the
criteria
andlabels
lists accordingly. Raising aTypeError
with a descriptive message for unsupported criteria types is a good practice for error handling.Tools
Ruff
15-15: Undefined name
snakemake
(F821)
31-32
: LGTM!The updated filtering process that iterates over the
criteria
andlabels
lists and applies each filter with its corresponding label is a great improvement. It enables more complex filtering scenarios where multiple criteria can be specified. The code correctly uses theFilter
method of theRDataFrame
to apply the filters.
34-35
: LGTM!The addition of the verbose reporting feature is a useful enhancement. Generating a report of the data frame operations when the
verbose
parameter is set to true provides users with valuable insights into the data processing pipeline. This feature is particularly helpful for debugging and understanding the transformations applied to the data. The code correctly uses theReport
method of theRDataFrame
to generate the report.Tools
Ruff
34-34: Undefined name
snakemake
(F821)
46-47
: LGTM!Printing the generated report when the
verbose
parameter is set to true is the final step in the verbose reporting feature. It ensures that the user can see the report and gain insights into the data frame operations. The code correctly uses thereport
object to display the report.Tools
Ruff
46-46: Undefined name
snakemake
(F821)
test.py (3)
Line range hint
832-836
: Looks good!The new
test_root_filter_str
function properly tests running thephys/root/filter
wrapper with the expected arguments and output file.
Line range hint
839-844
: Looks good!The new
test_root_filter_list
function properly tests running thephys/root/filter
wrapper with the expected arguments and output file.
Line range hint
847-851
: Looks good!The new
test_root_filter_dict
function properly tests running thephys/root/filter
wrapper with the expected arguments and output file.
Thanks a lot @fgvieira! 😄 |
🤖 I have created a release \*beep\* \*boop\* --- ## [4.5.0](https://www.github.com/snakemake/snakemake-wrappers/compare/v4.4.0...v4.5.0) (2024-09-20) ### Features * Add wrappers for ROOT rootcp CLI tool ([#3251](https://www.github.com/snakemake/snakemake-wrappers/issues/3251)) ([0be5d56](https://www.github.com/snakemake/snakemake-wrappers/commit/0be5d566f4767b7cd2ea9ba78b0d83a6f79a4803)) * Bump meryl version ([#3266](https://www.github.com/snakemake/snakemake-wrappers/issues/3266)) ([448a1cb](https://www.github.com/snakemake/snakemake-wrappers/commit/448a1cb793d04f7bd280c36bc4dd37d2d06aa104)) * Enhance phys/root/filter functionality ([#3250](https://www.github.com/snakemake/snakemake-wrappers/issues/3250)) ([4797d76](https://www.github.com/snakemake/snakemake-wrappers/commit/4797d76630b0cc6ea05778a49727f7917b7874dc)) * Parse threads ([#3249](https://www.github.com/snakemake/snakemake-wrappers/issues/3249)) ([9e63554](https://www.github.com/snakemake/snakemake-wrappers/commit/9e63554b0cf19b2a22513566a576105c39f47e3b)) ### Bug Fixes * name of bamqc ([#1464](https://www.github.com/snakemake/snakemake-wrappers/issues/1464)) ([ee04ec2](https://www.github.com/snakemake/snakemake-wrappers/commit/ee04ec22b24c8d380ef98f5cee677f4ff4730ad3)) ### Performance Improvements * autobump bio/cnv_facets ([#3253](https://www.github.com/snakemake/snakemake-wrappers/issues/3253)) ([c5c8ddd](https://www.github.com/snakemake/snakemake-wrappers/commit/c5c8ddded41ba96fd8bbc69790e1e17998551734)) * autobump bio/emu/abundance ([#3256](https://www.github.com/snakemake/snakemake-wrappers/issues/3256)) ([6e42aef](https://www.github.com/snakemake/snakemake-wrappers/commit/6e42aef12570e7708dedd4ed24a7406a69356d81)) * autobump bio/emu/collapse-taxonomy ([#3255](https://www.github.com/snakemake/snakemake-wrappers/issues/3255)) ([969067e](https://www.github.com/snakemake/snakemake-wrappers/commit/969067e8a94210d99bb67dfb3525c076f7731d02)) * autobump bio/emu/combine-outputs ([#3254](https://www.github.com/snakemake/snakemake-wrappers/issues/3254)) ([de2a1be](https://www.github.com/snakemake/snakemake-wrappers/commit/de2a1bef7e9d330c4d6484bf0f1f250d7ad6c0c9)) * autobump bio/freebayes ([#3257](https://www.github.com/snakemake/snakemake-wrappers/issues/3257)) ([80630dd](https://www.github.com/snakemake/snakemake-wrappers/commit/80630dd19aa113ea94dd55f89f596b83e81ebc34)) * autobump bio/galah ([#3258](https://www.github.com/snakemake/snakemake-wrappers/issues/3258)) ([285d57a](https://www.github.com/snakemake/snakemake-wrappers/commit/285d57a8dd082fb515250fdc370cca11142fff44)) * autobump bio/gdc-api/bam-slicing ([#3259](https://www.github.com/snakemake/snakemake-wrappers/issues/3259)) ([27b6958](https://www.github.com/snakemake/snakemake-wrappers/commit/27b695863bc123ba93fff53a130a0d7a06b4b2c1)) * autobump bio/igv-reports ([#3260](https://www.github.com/snakemake/snakemake-wrappers/issues/3260)) ([a7d57ba](https://www.github.com/snakemake/snakemake-wrappers/commit/a7d57ba191bb59060dc82b9009a11c78dbaba86e)) * autobump bio/lofreq/call ([#3262](https://www.github.com/snakemake/snakemake-wrappers/issues/3262)) ([13626f0](https://www.github.com/snakemake/snakemake-wrappers/commit/13626f0b9d3d25bafd04a3253f37b6bfd91414bc)) * autobump bio/lofreq/indelqual ([#3261](https://www.github.com/snakemake/snakemake-wrappers/issues/3261)) ([76c854e](https://www.github.com/snakemake/snakemake-wrappers/commit/76c854e127cd792b5f74f8dc357f09fddb07998c)) * autobump bio/multiqc ([#3263](https://www.github.com/snakemake/snakemake-wrappers/issues/3263)) ([d4d1475](https://www.github.com/snakemake/snakemake-wrappers/commit/d4d14750f10aa5f10fd5b20f560e13985a0f758f)) * autobump bio/tabix/index ([#3264](https://www.github.com/snakemake/snakemake-wrappers/issues/3264)) ([e39e97e](https://www.github.com/snakemake/snakemake-wrappers/commit/e39e97e96fa26ab40e34a207ed62410453d28bae)) * autobump bio/vep/annotate ([#3265](https://www.github.com/snakemake/snakemake-wrappers/issues/3265)) ([7f0b02a](https://www.github.com/snakemake/snakemake-wrappers/commit/7f0b02ac64b40a5aca8bd08c90f8b7df80ea4bed)) --- This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please). Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
This PR expands on the functionality of the wrapper written by @laf070810 for
ROOT.RDataFrame.Filter()
. The core of the wrapper is still the same; however, this PR adds the option to provide the selection criteria as a list of criterion strings or a dictionary wherein the keys define names for each criterion. Averbose
mode is added for users to receive a report of their selections, as perROOT.RDataFrame.Report()
.QC
snakemake-wrappers
.While the contributions guidelines are more extensive, please particularly ensure that:
test.py
was updated to call any added or updated example rules in aSnakefile
input:
andoutput:
file paths in the rules can be chosen arbitrarilyinput:
oroutput:
)tempfile.gettempdir()
points tometa.yaml
contains a link to the documentation of the respective tool or command underurl:
Summary by CodeRabbit
New Features
filter_str
,filter_list
, andfilter_dict
, enhancing data processing flexibility.Bug Fixes
Tests