-
Notifications
You must be signed in to change notification settings - Fork 889
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add skiprows and nrows to parquet reader #16214
Merged
Merged
Changes from 20 commits
Commits
Show all changes
30 commits
Select commit
Hold shift + click to select a range
164533e
wip
lithomas1 3b0427a
revert polars changes
lithomas1 03c1889
fixes
lithomas1 bf5e902
rollback changes to chunked parquet reader
lithomas1 56c88ed
revert changes to parquetreader
lithomas1 f52b606
raise notimplemented for chunked parquet reader nrows/skiprows
lithomas1 3eeb95a
fix docs
lithomas1 0c722da
Merge branch 'branch-24.08' of github.com:rapidsai/cudf into parquet-…
lithomas1 cc37737
notimplemented for partitioned as well
lithomas1 5e3037e
Merge branch 'branch-24.08' of github.com:rapidsai/cudf into parquet-…
lithomas1 917761f
Merge branch 'branch-24.08' of github.com:rapidsai/cudf into parquet-…
lithomas1 9c6a5da
buggy chunked parquet reader nrows/skiprows
lithomas1 a78f97f
Merge branch 'branch-24.08' of github.com:rapidsai/cudf into parquet-…
lithomas1 4f929e5
fix some tests
lithomas1 3bb52c1
more data
lithomas1 3900019
Merge branch 'branch-24.08' into parquet-nrows
mhaseeb123 5055fd0
Merge branch 'branch-24.08' of github.com:rapidsai/cudf into parquet-…
lithomas1 7bd2438
fix range index metadata processing
lithomas1 e1982fa
Update python/cudf/cudf/tests/test_parquet.py
lithomas1 30faf88
update
lithomas1 d140184
rename params
lithomas1 2943f74
Merge branch 'branch-24.10' of github.com:rapidsai/cudf into parquet-…
lithomas1 07411c1
fix typo
lithomas1 9ce3ceb
Merge branch 'branch-24.10' of github.com:rapidsai/cudf into parquet-…
lithomas1 18082c7
another missed one
lithomas1 fec25b0
fix pylibcudf tests
lithomas1 a2fad68
last fixes
lithomas1 6447f12
Merge branch 'parquet-nrows' of github.com:lithomas1/cudf into parque…
lithomas1 9ea93ba
Merge branch 'branch-24.10' into parquet-nrows
lithomas1 593ccd2
fix cudf-polars
lithomas1 File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's be consistent with either
num_rows
ornrows
across the files. @galipremsagar I can't find the same option inpyarrow.read_table
orpd.read_parquet
so I am sure what should be preferred here. If arbitrary, my vote would benum_rows
to be consistent with C++ counterpart but not a blocker.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah not sure which is better.
nrows would be consistent with read_csv, and num_rows would be consistent with libcudf.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's go with
nrows
then and further the PR to merge! 🙂