-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Expose max_row_group_length
option for Parquet writer
#589
feat: Expose max_row_group_length
option for Parquet writer
#589
Conversation
@@ -9,13 +9,16 @@ import ( | |||
"github.com/invopop/jsonschema" | |||
) | |||
|
|||
const defaultMaxRowGroupLength = 128 * 1024 * 1024 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For some reason this was set to 128 mibi rows but it's actually a row count.
@@ -94,5 +115,8 @@ func (s *ParquetSpec) Validate() error { | |||
if !slices.Contains(allowedRootRepetitions, s.RootRepetition) { | |||
return fmt.Errorf("invalid rootRepetition: %s. Allowed values are %s", s.RootRepetition, strings.Join(allowedRootRepetitions, ", ")) | |||
} | |||
if s.MaxRowGroupLength != nil && *s.MaxRowGroupLength <= 0 { | |||
return fmt.Errorf("invalid: maxRowGroupLength: %v. Must be zero or positive", *s.MaxRowGroupLength) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is zero or a small number ok?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Zero would be going back to the default of the arrow parquetwriter. Updated this check to be < 0
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🤖 I have created a release *beep* *boop* --- ## [4.5.0](v4.4.0...v4.5.0) (2024-11-07) ### Features * Expose `max_row_group_length` option for Parquet writer ([#589](#589)) ([4a4b389](4a4b389)) ### Bug Fixes * **deps:** Update module github.com/cloudquery/plugin-sdk/v4 to v4.65.0 ([#581](#581)) ([136b836](136b836)) * **deps:** Update module github.com/cloudquery/plugin-sdk/v4 to v4.66.0 ([#582](#582)) ([5ca0a10](5ca0a10)) * **deps:** Update module github.com/cloudquery/plugin-sdk/v4 to v4.66.1 ([#583](#583)) ([69a6156](69a6156)) * **deps:** Update module github.com/cloudquery/plugin-sdk/v4 to v4.67.0 ([#584](#584)) ([cab4c19](cab4c19)) * **deps:** Update module github.com/cloudquery/plugin-sdk/v4 to v4.67.1 ([#585](#585)) ([d590743](d590743)) * **deps:** Update module github.com/cloudquery/plugin-sdk/v4 to v4.68.0 ([#586](#586)) ([b19c239](b19c239)) * **deps:** Update module github.com/cloudquery/plugin-sdk/v4 to v4.68.1 ([#587](#587)) ([3b9f708](3b9f708)) * **deps:** Update module github.com/cloudquery/plugin-sdk/v4 to v4.68.2 ([#588](#588)) ([f2be0a6](f2be0a6)) * Optimized CSV writing ([#579](#579)) ([d688971](d688971)) --- This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please).
#### Summary The support for setting the group length was added in #19578 implicitly via cloudquery/filetypes#589. This PR adds documentation for it.
No description provided.