Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] parquet reader data corruption in nested schema after https://github.com/rapidsai/cudf/pull/13302 #9948

Closed
abellina opened this issue Dec 4, 2023 · 1 comment
Assignees
Labels
bug Something isn't working cudf_dependency An issue or PR with this label depends on a new feature in cudf

Comments

@abellina
Copy link
Collaborator

abellina commented Dec 4, 2023

After string column changes included rapidsai/cudf#13302 a customer with nested schemas reported a corruption where a struct<map<string, struct<...>>> column had issues with the keys in the inner map.

We bisected cuDF changes until we found the culprit and have worked with the author of that PR to produce a fix.

The symptom from our side was that the last offset in the offset buffer of the keys string column was way too large, pointing to memory that was not part of the string data column. This produced garbage output that was later carried around and eventually written to file. The issue didn't trigger compute-sanitizer in our attempts.

PR to fix rapidsai/cudf#14557

@abellina abellina added bug Something isn't working ? - Needs Triage Need team to review and classify labels Dec 4, 2023
@abellina abellina changed the title [BUG] data corruption seen after https://github.com/rapidsai/cudf/pull/13302 [BUG] parquet reader data corruption in nested schema after https://github.com/rapidsai/cudf/pull/13302 Dec 4, 2023
@jlowe jlowe added the cudf_dependency An issue or PR with this label depends on a new feature in cudf label Dec 5, 2023
@mattahrens mattahrens removed the ? - Needs Triage Need team to review and classify label Dec 5, 2023
@abellina
Copy link
Collaborator Author

abellina commented Dec 6, 2023

The cuDF issue is merged. I tested the PR and the corruption goes away for the customer example.

@abellina abellina closed this as completed Dec 6, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working cudf_dependency An issue or PR with this label depends on a new feature in cudf
Projects
None yet
Development

No branches or pull requests

3 participants