Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[HUDI-5684] Fix CTAS and Insert Into to avoid combine-on-insert by default #7813

Merged
merged 6 commits into from
Feb 2, 2023

Conversation

alexeykudinkin
Copy link
Contributor

@alexeykudinkin alexeykudinkin commented Feb 1, 2023

Change Logs

Currently, InsertIntoHoodieTable by default sets COMBINE_BEFORE_INSERT config whenever pre-combine field is specified and it's specified in a way that doesn't allow it to be overridden by the user.

Following changes are made to address it, all Spark SQL feature-specific configs are split into dichotomy:

  • Default: settings serving as a default (or preferred) value for the feature (could be overridden by the user)
  • Overriding: settings serving as required values for the feature (could NOT be overridden by the user)

Impact

Avoids combining on insertion for Insert Into and CTAS statements in Spark SQL

Risk level (write none, low medium or high below)

Low

Documentation Update

N/A

Contributor's checklist

  • Read through contributor's guide
  • Change Logs and Impact were stated clearly
  • Adequate tests were added if applicable
  • CI passed

@alexeykudinkin alexeykudinkin changed the title [MINOR] Fix CTAS and Insert Into to avoid combine-on-insert by default [MINOR][Stacked on 7821] Fix CTAS and Insert Into to avoid combine-on-insert by default Feb 2, 2023
@alexeykudinkin alexeykudinkin force-pushed the ak/ctas-dedup-fix branch 2 times, most recently from 22bd2b7 to cab2849 Compare February 2, 2023 02:35
Alexey Kudinkin added 6 commits February 1, 2023 21:35
@alexeykudinkin alexeykudinkin changed the title [MINOR][Stacked on 7821] Fix CTAS and Insert Into to avoid combine-on-insert by default [HUDI-5684] Fix CTAS and Insert Into to avoid combine-on-insert by default Feb 2, 2023
@hudi-bot
Copy link

hudi-bot commented Feb 2, 2023

CI report:

Bot commands @hudi-bot supports the following commands:
  • @hudi-bot run azure re-run the last Azure build

@codope codope merged commit 1459edd into apache:master Feb 2, 2023
yihua pushed a commit that referenced this pull request Feb 2, 2023
…fault (#7813)

* Remove `COMBINE_BEFORE_INSERT` config being overridden for insert operations

* Revisited Spark SQL feature configuration to allow dichotomy of having:
  - (Feature-)specific "default" configuration (that could be overridden by the user)
  - "Overriding" configuration (that could NOT be overridden by the user)

* Restoring existing behavior for Insert Into to deduplicate by default (if pre-combine is specified)

* Fixing compilation

* Fixing compilation (one more time)

* Fixing options combination ordering
fengjian428 pushed a commit to fengjian428/hudi that referenced this pull request Apr 5, 2023
…fault (apache#7813)

* Remove `COMBINE_BEFORE_INSERT` config being overridden for insert operations

* Revisited Spark SQL feature configuration to allow dichotomy of having:
  - (Feature-)specific "default" configuration (that could be overridden by the user)
  - "Overriding" configuration (that could NOT be overridden by the user)

* Restoring existing behavior for Insert Into to deduplicate by default (if pre-combine is specified)

* Fixing compilation

* Fixing compilation (one more time)

* Fixing options combination ordering
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

5 participants