-
Notifications
You must be signed in to change notification settings - Fork 889
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove support for skip_rows / num_rows options in the parquet reader. #11503
Remove support for skip_rows / num_rows options in the parquet reader. #11503
Conversation
Please post the impact on the reader benchmarks. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like these kinds of changes!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Python binding changes seem a cleanup leftover.
Just realized the CI failure was because of actual compile errors in the benchmarks. Fixed. |
@gpucibot merge |
Codecov Report
@@ Coverage Diff @@
## branch-22.10 #11503 +/- ##
===============================================
Coverage ? 86.48%
===============================================
Files ? 145
Lines ? 22840
Branches ? 0
===============================================
Hits ? 19753
Misses ? 3087
Partials ? 0 Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. |
…t reader. (rapidsai#11503)" This reverts commit d39b957.
…r. (#11657) Reverts: #11480 <s>https://github.com/rapidsai/cudf/pull/11503</s> Authors: - https://github.com/nvdbaranec Approvers: - Nghia Truong (https://github.com/ttnghia) - Yunsong Wang (https://github.com/PointKernel) URL: #11657
Removes support for skip_rows / num_rows options in the parquet reader. Users retain control of what gets read via row groups.
Did some before/after benchmarking. As expected, this doesn't change much except for a minor boost in list reading (due to simplification of the preprocessing step). Most of the ways the row bounds affected the code was in the page setup process (making it slippery to think through the logic) and didn't do much in the actual process of decoding. A selection of before/after benchmarks (all input files ~512 MB)