Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pass empty vectors as min/max for all null pages when building ColumnIndex #6316

Merged
merged 3 commits into from
Aug 31, 2024

Conversation

etseidl
Copy link
Contributor

@etseidl etseidl commented Aug 27, 2024

Which issue does this PR close?

Closes #6315.

Rationale for this change

Pages with all null values should write an empty array for min and max to the ColumnIndex. The current behavior is to write one 0 byte for each.

What changes are included in this PR?

Pass empty vectors to ColumnIndexBuilder::append.

Are there any user-facing changes?

No, since min/max statistics should be ignored for pages with all nulls.

@github-actions github-actions bot added the parquet Changes to the parquet crate label Aug 27, 2024
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @etseidl -- would it be possible to get a test case for this?

@alamb alamb merged commit 1336973 into apache:master Aug 31, 2024
16 checks passed
@etseidl etseidl deleted the issue_6315 branch September 9, 2024 15:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
parquet Changes to the parquet crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Parquet writer should not write any min/max data to ColumnIndex when all values are null
2 participants