Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Return an error instead of a panic when reading a corrupted Parquet file with mismatched column counts #5362

Merged
merged 3 commits into from
Feb 4, 2024

Conversation

mmaitre314
Copy link
Contributor

Which issue does this PR close?

Closes #5315

Rationale for this change

Data pipelines reading Parquet files may encounter corrupted Parquet files as part of their regular activities and be resilient to them through graceful handling (ex: emit telemetry and skip the files). This behavior is more natural to implement when the Parquet crate returns an error instead of a panic.

What changes are included in this PR?

Replace an assert with an if statement.

Are there any user-facing changes?

Caller receives an error instead of a panic.

@github-actions github-actions bot added the parquet Changes to the parquet crate label Feb 3, 2024
Copy link
Contributor

@Jefffrey Jefffrey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good 👍

parquet/src/file/metadata.rs Outdated Show resolved Hide resolved
@tustvold tustvold merged commit 0dda129 into apache:master Feb 4, 2024
16 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
parquet Changes to the parquet crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ParquetRecordBatchStreamBuilder::new() panics instead of erroring out when opening a corrupted file
4 participants