Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEA] Change PQ writer's row group size default from 128 MB to 1M rows. #16733

Closed
mhaseeb123 opened this issue Sep 4, 2024 · 0 comments · Fixed by #16750
Closed

[FEA] Change PQ writer's row group size default from 128 MB to 1M rows. #16733

mhaseeb123 opened this issue Sep 4, 2024 · 0 comments · Fixed by #16750
Assignees
Labels
feature request New feature or request

Comments

@mhaseeb123
Copy link
Member

Is your feature request related to a problem? Please describe.
Currently, we can end up with very-thin-strip like row groups when writing wide columns in Parquet which can hurt reader's performance quite a bit (reading metadata for each row group). I wish we could remove the default 128MB limit on writer and use 1M rows limit to end up with fairly squared wide-tables.

Describe the solution you'd like
Remove the default 128MB row group limit unless explicitly specified by the user as options.

Describe alternatives you've considered
N/A

Additional context
We also need a benchmark wide-table to measure the before and after performance.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant