-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[HUDI-5684] Fix CTAS and Insert Into to avoid combine-on-insert by default #7813
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
nsivabalan
approved these changes
Feb 1, 2023
yihua
approved these changes
Feb 1, 2023
alexeykudinkin
force-pushed
the
ak/ctas-dedup-fix
branch
from
February 2, 2023 00:52
bd42788
to
4f2eef7
Compare
alexeykudinkin
changed the title
[MINOR] Fix CTAS and Insert Into to avoid combine-on-insert by default
[MINOR][Stacked on 7821] Fix CTAS and Insert Into to avoid combine-on-insert by default
Feb 2, 2023
alexeykudinkin
force-pushed
the
ak/ctas-dedup-fix
branch
2 times, most recently
from
February 2, 2023 02:35
22bd2b7
to
cab2849
Compare
- (Feature-)specific "default" configuration (that could be overridden by the user) - "Overriding" configuration (that could NOT be overridden by the user)
… (if pre-combine is specified)
alexeykudinkin
force-pushed
the
ak/ctas-dedup-fix
branch
from
February 2, 2023 05:35
a3b0274
to
3ff4e90
Compare
alexeykudinkin
changed the title
[MINOR][Stacked on 7821] Fix CTAS and Insert Into to avoid combine-on-insert by default
[HUDI-5684] Fix CTAS and Insert Into to avoid combine-on-insert by default
Feb 2, 2023
4 tasks
yihua
pushed a commit
that referenced
this pull request
Feb 2, 2023
…fault (#7813) * Remove `COMBINE_BEFORE_INSERT` config being overridden for insert operations * Revisited Spark SQL feature configuration to allow dichotomy of having: - (Feature-)specific "default" configuration (that could be overridden by the user) - "Overriding" configuration (that could NOT be overridden by the user) * Restoring existing behavior for Insert Into to deduplicate by default (if pre-combine is specified) * Fixing compilation * Fixing compilation (one more time) * Fixing options combination ordering
fengjian428
pushed a commit
to fengjian428/hudi
that referenced
this pull request
Apr 5, 2023
…fault (apache#7813) * Remove `COMBINE_BEFORE_INSERT` config being overridden for insert operations * Revisited Spark SQL feature configuration to allow dichotomy of having: - (Feature-)specific "default" configuration (that could be overridden by the user) - "Overriding" configuration (that could NOT be overridden by the user) * Restoring existing behavior for Insert Into to deduplicate by default (if pre-combine is specified) * Fixing compilation * Fixing compilation (one more time) * Fixing options combination ordering
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Change Logs
Currently,
InsertIntoHoodieTable
by default setsCOMBINE_BEFORE_INSERT
config whenever pre-combine field is specified and it's specified in a way that doesn't allow it to be overridden by the user.Following changes are made to address it, all Spark SQL feature-specific configs are split into dichotomy:
Impact
Avoids combining on insertion for Insert Into and CTAS statements in Spark SQL
Risk level (write none, low medium or high below)
Low
Documentation Update
N/A
Contributor's checklist