Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] skip_rows doesn't work properly in ChunkedParquetReader #16273

Closed
lithomas1 opened this issue Jul 12, 2024 · 1 comment
Closed

[BUG] skip_rows doesn't work properly in ChunkedParquetReader #16273

lithomas1 opened this issue Jul 12, 2024 · 1 comment
Labels
bug Something isn't working cuIO cuIO issue libcudf Affects libcudf (C++/CUDA) code.

Comments

@lithomas1
Copy link
Contributor

Describe the bug
A clear and concise description of what the bug is.

I'm getting bad CUDA behavior like

RuntimeError: parallel_for failed: cudaErrorInvalidDevice: invalid device ordinal

when skiprows is greater than 0
nrows looks OK, though.

I don't think this is an issue with my GPU since other tests like the pylibcudf I/O tests pass.

Steps/Code to reproduce bug
Follow this guide http://matthewrocklin.com/blog/work/2018/02/28/minimal-bug-reports to craft a minimal bug report. This helps us reproduce the issue you're having and resolve the issue more quickly.

Checkout my branch https://github.com/lithomas1/cudf/tree/parquet-nrows-bug
which adds bindings for the chunked parquet reader nrows/skip_rows options

and then run the test

pytest python/cudf/cudf/tests/test_parquet.py -v -k "test_parquet_chunked_reader_nrows_skiprows"

Expected behavior
A clear and concise description of what you expected to happen.

The included test should pass.

Environment overview (please complete the following information)

  • Environment location: [Bare-metal, Docker, Cloud(specify cloud provider)]
  • Method of cuDF install: [conda, Docker, or from source]
    • If method of install is [Docker], provide docker pull & docker run commands used

Environment details
Please run and paste the output of the cudf/print_env.sh script here, to gather any other relevant environment details

Additional context
Add any other context about the problem here.

I think this is also related to the chunk_read_limit/pass_read_limit.
When those are set to 0, the test passes.
(but maybe that is because that goes through a separate path, I haven't looked at the C++ code for this)

@lithomas1 lithomas1 added bug Something isn't working libcudf Affects libcudf (C++/CUDA) code. cuIO cuIO issue labels Jul 12, 2024
@lithomas1
Copy link
Contributor Author

duplicate of #16186

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working cuIO cuIO issue libcudf Affects libcudf (C++/CUDA) code.
Projects
Status: Done
Development

No branches or pull requests

1 participant