Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Enhance phys/root/filter functionality #3250

Merged
merged 7 commits into from
Sep 17, 2024

Conversation

GoodingJamie
Copy link
Contributor

@GoodingJamie GoodingJamie commented Sep 17, 2024

This PR expands on the functionality of the wrapper written by @laf070810 for ROOT.RDataFrame.Filter(). The core of the wrapper is still the same; however, this PR adds the option to provide the selection criteria as a list of criterion strings or a dictionary wherein the keys define names for each criterion. A verbose mode is added for users to receive a report of their selections, as per ROOT.RDataFrame.Report().

QC

While the contributions guidelines are more extensive, please particularly ensure that:

  • test.py was updated to call any added or updated example rules in a Snakefile
  • input: and output: file paths in the rules can be chosen arbitrarily
  • wherever possible, command line arguments are inferred and set automatically (e.g. based on file extensions in input: or output:)
  • temporary files are either written to a unique hidden folder in the working directory, or (better) stored where the Python function tempfile.gettempdir() points to
  • the meta.yaml contains a link to the documentation of the respective tool or command under url:
  • conda environments use a minimal amount of channels and packages, in recommended ordering

Summary by CodeRabbit

  • New Features

    • Introduced three new filtering rules: filter_str, filter_list, and filter_dict, enhancing data processing flexibility.
    • Added support for multi-threading in data filtering based on specified thread count.
    • Included a verbose reporting feature for better insight into data processing operations.
  • Bug Fixes

    • Improved criteria parsing to accept string, list, or dictionary formats for filtering.
  • Tests

    • Expanded the testing suite with new test functions for each filtering method, ensuring robust functionality.

Copy link
Contributor

coderabbitai bot commented Sep 17, 2024

Walkthrough

The changes introduce three new filtering rules (filter_str, filter_list, and filter_dict) in the Snakefile, enhancing the data filtering process from ROOT files. The filtering criteria can now be specified in string, list, or dictionary formats. Additionally, updates to wrapper.py improve multi-threading handling and criteria parsing, while test.py adds new test functions to validate the filtering functionality.

Changes

File Change Summary
phys/root/filter/test/Snakefile Added three new rules: filter_str, filter_list, and filter_dict for enhanced data filtering with different criteria formats.
phys/root/filter/wrapper.py Updated author information, refined multi-threading logic, modified criteria handling to support string, list, and dictionary formats, and added verbose reporting.
phys/root/filter/meta.yaml Modified the description of the criteria parameter in the params section for clarity.
test.py Renamed test_root_filter to test_root_filter_str and added two new test functions: test_root_filter_list and test_root_filter_dict for comprehensive testing.

Recent review details

Configuration used: CodeRabbit UI
Review profile: CHILL

Commits

Files that changed from the base of the PR and between 37233f4 and 42f09d7.

Files selected for processing (1)
  • phys/root/filter/meta.yaml (1 hunks)
Files skipped from review due to trivial changes (1)
  • phys/root/filter/meta.yaml

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

Share
Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    -- I pushed a fix in commit <commit_id>, please review it.
    -- Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    -- @coderabbitai generate unit testing code for this file.
    -- @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    -- @coderabbitai generate interesting stats about this repository and render them as a table.
    -- @coderabbitai read src/utils.ts and generate unit testing code.
    -- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    -- @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@GoodingJamie GoodingJamie changed the title Draft: Enhance phys root filter feat: DRAFT Enhance phys root filter Sep 17, 2024
ROOT.DisableImplicitMT()

# Parse criteria
_smk_criteria = snakemake.params.get("criteria", "true")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Won't "true" be interpreted as a string filtering criteria with 'criterion = "true"; label = "true"? Is this intended?

Copy link
Contributor Author

@GoodingJamie GoodingJamie Sep 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, yes it will be! This is the behaviour in master currently—the only use case I can think of would be to copy a tree in one file to a differently named tree in another file, but there's rootcp for this (may be the subject of a PR in the near future!). So I think here it would be safe to change this and require that criteria is passed:

Suggested change
_smk_criteria = snakemake.params.get("criteria", "true")
_smk_criteria = snakemake.params.criteria

where a missing params.criteria should raise a TypeError when being converted to criteria and labels.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But isn't criteria optional? At least that is what it says on the meta.yaml file.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

criteria is optional currently, but there isn't a use case I can think of for not passing criteria which isn't already covered by rootcp (for which I've opened #3251). I'll update this in meta.yaml to reflect this

@GoodingJamie GoodingJamie changed the title feat: DRAFT Enhance phys root filter feat: Enhance phys/root/filter Sep 17, 2024
@GoodingJamie GoodingJamie changed the title feat: Enhance phys/root/filter feat: Enhance phys/root/filter functionality Sep 17, 2024
@fgvieira
Copy link
Collaborator

@coderabbitai review

Copy link
Contributor

coderabbitai bot commented Sep 17, 2024

Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Outside diff range and nitpick comments (1)
phys/root/filter/test/Snakefile (1)

34-52: Consider increasing the number of threads.

The filter_dict rule is implemented correctly:

  • It specifies the input and output files correctly.
  • The criteria dictionary contains valid ROOT expressions with named keys that filter events based on the specified thresholds for pt and pz.
  • It saves the specified branches.
  • Enabling verbose mode can provide additional logging information.

However, the rule uses only 1 thread for processing. Consider increasing the number of threads to improve the processing performance, similar to the other filtering rules.

Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

Commits

Files that changed from the base of the PR and between 14e8c8e and 46c76b6.

Files selected for processing (3)
  • phys/root/filter/test/Snakefile (1 hunks)
  • phys/root/filter/wrapper.py (2 hunks)
  • test.py (1 hunks)
Additional context used
Path-based instructions (2)
phys/root/filter/wrapper.py (2)

Pattern **/*.py: Do not try to improve formatting.
Do not suggest type annotations for functions that are defined inside of functions or methods.
Do not suggest type annotation of the self argument of methods.
Do not suggest type annotation of the cls argument of classmethods.
Do not suggest return type annotation if a function or method does not contain a return statement.


Pattern **/wrapper.py: Do not complain about use of undefined variable called snakemake.

test.py (1)

Pattern **/*.py: Do not try to improve formatting.
Do not suggest type annotations for functions that are defined inside of functions or methods.
Do not suggest type annotation of the self argument of methods.
Do not suggest type annotation of the cls argument of classmethods.
Do not suggest return type annotation if a function or method does not contain a return statement.

Ruff
phys/root/filter/wrapper.py

9-9: Undefined name snakemake

(F821)


10-10: Undefined name snakemake

(F821)


15-15: Undefined name snakemake

(F821)


28-28: Undefined name snakemake

(F821)


30-30: Undefined name snakemake

(F821)


30-30: Undefined name snakemake

(F821)


34-34: Undefined name snakemake

(F821)


46-46: Undefined name snakemake

(F821)

Additional comments not posted (12)
phys/root/filter/test/Snakefile (2)

1-15: LGTM!

The filter_str rule is implemented correctly:

  • It specifies the input and output files correctly.
  • The criteria string is a valid ROOT expression that filters events based on the specified thresholds for pt and pz.
  • It saves the specified branches.
  • Using 2 threads can improve the processing performance.

17-32: LGTM!

The filter_list rule is implemented correctly:

  • It specifies the input and output files correctly.
  • The criteria list contains valid ROOT expressions that filter events based on the specified thresholds for pt and pz.
  • It saves the specified branches.
  • Using 2 threads can improve the processing performance.
  • Enabling verbose mode can provide additional logging information.
phys/root/filter/wrapper.py (7)

1-3: LGTM!

The addition of the new contributor to the __author__ and __email__ metadata is appreciated. Keeping the metadata up to date is a good practice.


6-6: LGTM!

The import statement for Dict and List types from the typing module is necessary for using them in the type hints. Using type hints is a good practice for better code readability and maintainability.


9-12: LGTM!

The refined logic for enabling or disabling implicit multi-threading based on the number of threads specified by snakemake.threads is a good optimization. It ensures that multi-threading is used only when the available resources allow for it.

Tools
Ruff

9-9: Undefined name snakemake

(F821)


10-10: Undefined name snakemake

(F821)


14-26: LGTM!

The modified criteria parsing logic is a great enhancement. Supporting multiple formats (string, list, dictionary) for specifying the filtering criteria makes the code more flexible and user-friendly. The code correctly handles the different types of criteria and constructs the criteria and labels lists accordingly. Raising a TypeError with a descriptive message for unsupported criteria types is a good practice for error handling.

Tools
Ruff

15-15: Undefined name snakemake

(F821)


31-32: LGTM!

The updated filtering process that iterates over the criteria and labels lists and applies each filter with its corresponding label is a great improvement. It enables more complex filtering scenarios where multiple criteria can be specified. The code correctly uses the Filter method of the RDataFrame to apply the filters.


34-35: LGTM!

The addition of the verbose reporting feature is a useful enhancement. Generating a report of the data frame operations when the verbose parameter is set to true provides users with valuable insights into the data processing pipeline. This feature is particularly helpful for debugging and understanding the transformations applied to the data. The code correctly uses the Report method of the RDataFrame to generate the report.

Tools
Ruff

34-34: Undefined name snakemake

(F821)


46-47: LGTM!

Printing the generated report when the verbose parameter is set to true is the final step in the verbose reporting feature. It ensures that the user can see the report and gain insights into the data frame operations. The code correctly uses the Print method of the report object to display the report.

Tools
Ruff

46-46: Undefined name snakemake

(F821)

test.py (3)

Line range hint 832-836: Looks good!

The new test_root_filter_str function properly tests running the phys/root/filter wrapper with the expected arguments and output file.


Line range hint 839-844: Looks good!

The new test_root_filter_list function properly tests running the phys/root/filter wrapper with the expected arguments and output file.


Line range hint 847-851: Looks good!

The new test_root_filter_dict function properly tests running the phys/root/filter wrapper with the expected arguments and output file.

@fgvieira fgvieira merged commit 4797d76 into snakemake:master Sep 17, 2024
7 checks passed
@GoodingJamie
Copy link
Contributor Author

Thanks a lot @fgvieira! 😄

@fgvieira fgvieira self-assigned this Sep 17, 2024
johanneskoester pushed a commit that referenced this pull request Sep 20, 2024
🤖 I have created a release \*beep\* \*boop\*
---
##
[4.5.0](https://www.github.com/snakemake/snakemake-wrappers/compare/v4.4.0...v4.5.0)
(2024-09-20)


### Features

* Add wrappers for ROOT rootcp CLI tool
([#3251](https://www.github.com/snakemake/snakemake-wrappers/issues/3251))
([0be5d56](https://www.github.com/snakemake/snakemake-wrappers/commit/0be5d566f4767b7cd2ea9ba78b0d83a6f79a4803))
* Bump meryl version
([#3266](https://www.github.com/snakemake/snakemake-wrappers/issues/3266))
([448a1cb](https://www.github.com/snakemake/snakemake-wrappers/commit/448a1cb793d04f7bd280c36bc4dd37d2d06aa104))
* Enhance phys/root/filter functionality
([#3250](https://www.github.com/snakemake/snakemake-wrappers/issues/3250))
([4797d76](https://www.github.com/snakemake/snakemake-wrappers/commit/4797d76630b0cc6ea05778a49727f7917b7874dc))
* Parse threads
([#3249](https://www.github.com/snakemake/snakemake-wrappers/issues/3249))
([9e63554](https://www.github.com/snakemake/snakemake-wrappers/commit/9e63554b0cf19b2a22513566a576105c39f47e3b))


### Bug Fixes

* name of bamqc
([#1464](https://www.github.com/snakemake/snakemake-wrappers/issues/1464))
([ee04ec2](https://www.github.com/snakemake/snakemake-wrappers/commit/ee04ec22b24c8d380ef98f5cee677f4ff4730ad3))


### Performance Improvements

* autobump bio/cnv_facets
([#3253](https://www.github.com/snakemake/snakemake-wrappers/issues/3253))
([c5c8ddd](https://www.github.com/snakemake/snakemake-wrappers/commit/c5c8ddded41ba96fd8bbc69790e1e17998551734))
* autobump bio/emu/abundance
([#3256](https://www.github.com/snakemake/snakemake-wrappers/issues/3256))
([6e42aef](https://www.github.com/snakemake/snakemake-wrappers/commit/6e42aef12570e7708dedd4ed24a7406a69356d81))
* autobump bio/emu/collapse-taxonomy
([#3255](https://www.github.com/snakemake/snakemake-wrappers/issues/3255))
([969067e](https://www.github.com/snakemake/snakemake-wrappers/commit/969067e8a94210d99bb67dfb3525c076f7731d02))
* autobump bio/emu/combine-outputs
([#3254](https://www.github.com/snakemake/snakemake-wrappers/issues/3254))
([de2a1be](https://www.github.com/snakemake/snakemake-wrappers/commit/de2a1bef7e9d330c4d6484bf0f1f250d7ad6c0c9))
* autobump bio/freebayes
([#3257](https://www.github.com/snakemake/snakemake-wrappers/issues/3257))
([80630dd](https://www.github.com/snakemake/snakemake-wrappers/commit/80630dd19aa113ea94dd55f89f596b83e81ebc34))
* autobump bio/galah
([#3258](https://www.github.com/snakemake/snakemake-wrappers/issues/3258))
([285d57a](https://www.github.com/snakemake/snakemake-wrappers/commit/285d57a8dd082fb515250fdc370cca11142fff44))
* autobump bio/gdc-api/bam-slicing
([#3259](https://www.github.com/snakemake/snakemake-wrappers/issues/3259))
([27b6958](https://www.github.com/snakemake/snakemake-wrappers/commit/27b695863bc123ba93fff53a130a0d7a06b4b2c1))
* autobump bio/igv-reports
([#3260](https://www.github.com/snakemake/snakemake-wrappers/issues/3260))
([a7d57ba](https://www.github.com/snakemake/snakemake-wrappers/commit/a7d57ba191bb59060dc82b9009a11c78dbaba86e))
* autobump bio/lofreq/call
([#3262](https://www.github.com/snakemake/snakemake-wrappers/issues/3262))
([13626f0](https://www.github.com/snakemake/snakemake-wrappers/commit/13626f0b9d3d25bafd04a3253f37b6bfd91414bc))
* autobump bio/lofreq/indelqual
([#3261](https://www.github.com/snakemake/snakemake-wrappers/issues/3261))
([76c854e](https://www.github.com/snakemake/snakemake-wrappers/commit/76c854e127cd792b5f74f8dc357f09fddb07998c))
* autobump bio/multiqc
([#3263](https://www.github.com/snakemake/snakemake-wrappers/issues/3263))
([d4d1475](https://www.github.com/snakemake/snakemake-wrappers/commit/d4d14750f10aa5f10fd5b20f560e13985a0f758f))
* autobump bio/tabix/index
([#3264](https://www.github.com/snakemake/snakemake-wrappers/issues/3264))
([e39e97e](https://www.github.com/snakemake/snakemake-wrappers/commit/e39e97e96fa26ab40e34a207ed62410453d28bae))
* autobump bio/vep/annotate
([#3265](https://www.github.com/snakemake/snakemake-wrappers/issues/3265))
([7f0b02a](https://www.github.com/snakemake/snakemake-wrappers/commit/7f0b02ac64b40a5aca8bd08c90f8b7df80ea4bed))
---


This PR was generated with [Release
Please](https://github.com/googleapis/release-please). See
[documentation](https://github.com/googleapis/release-please#release-please).

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
@coderabbitai coderabbitai bot mentioned this pull request Sep 24, 2024
7 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants