Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow setting write.parquet.page-row-limit #1017

Merged
merged 3 commits into from
Aug 9, 2024

Conversation

Fokko
Copy link
Contributor

@Fokko Fokko commented Aug 7, 2024

Noticed this when working on #1016

It is being passed down to PyArrow here:

"write_batch_size": property_as_int(
properties=table_properties,
property_name=TableProperties.PARQUET_PAGE_ROW_LIMIT,
default=TableProperties.PARQUET_PAGE_ROW_LIMIT_DEFAULT,

@Fokko Fokko added this to the PyIceberg 0.7.1 release milestone Aug 7, 2024
@ndrluis
Copy link
Collaborator

ndrluis commented Aug 7, 2024

WDYT about add a test verifying that the configuration from write.parquet.page-row-limit is passed down through write_batch_sizeas expected?

@Fokko
Copy link
Contributor Author

Fokko commented Aug 8, 2024

I tried coming up with a test in two ways:

  • Inspecting the Parquet file through PyArrow to see if we can inspect the files, but the low-level page information is not exposed through the Arrow APIs.
  • Passing in something bad (like a negative number, but this is all fine with Arrow).

@Fokko Fokko merged commit 7d25bad into apache:main Aug 9, 2024
7 checks passed
@Fokko Fokko deleted the fd-allow-setting-page-size branch August 9, 2024 08:34
sungwy pushed a commit that referenced this pull request Aug 9, 2024
sungwy pushed a commit that referenced this pull request Aug 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants