Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Workaround for missing Parquet page indexes in ParquetMetadaReader #6450

Merged
merged 4 commits into from
Sep 25, 2024

Conversation

etseidl
Copy link
Contributor

@etseidl etseidl commented Sep 24, 2024

Which issue does this PR close?

Part of #6447

Rationale for this change

There is an inconsistency in how the various Parquet metadata parsing APIs handle files that lack page indexes. This PR temporarily modifies ParquetMetaDataReader to match the current behavior in parquet::file::metadata::page_index::index_reader (return empty vectors rather than None).

What changes are included in this PR?

Are there any user-facing changes?

No

@github-actions github-actions bot added the parquet Changes to the parquet crate label Sep 24, 2024
@alamb alamb changed the title Workaround for missing Parquet page indexes Workaround for missing Parquet page indexes in ParquetMetadaReader Sep 25, 2024
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @etseidl -- this change makes sense to me. Presumably this will make replacing the existing APIs without test changes possble, so I think it is ok for now that it isn't covered by tests.

I left some small suggestions for comments, but I don't think they are necessary to merge

Thanks again

parquet/src/file/metadata/reader.rs Show resolved Hide resolved
@etseidl
Copy link
Contributor Author

etseidl commented Sep 25, 2024

Presumably this will make replacing the existing APIs without test changes possble, so I think it is ok for now that it isn't covered by tests.

Yes, that's the hope. As deprecated methods are replaced by this API, their tests should cover the new API.

@alamb
Copy link
Contributor

alamb commented Sep 25, 2024

Pushed a small commit to make cargo fmt happy, then I will merge this PR so we can get #6451 in

@alamb alamb merged commit d48010e into apache:master Sep 25, 2024
16 checks passed
@etseidl etseidl deleted the missing_page_index branch September 26, 2024 01:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
parquet Changes to the parquet crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants