Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[HUDI-2343]Fix the exception for mergeInto when the primaryKey and preCombineField of source table and target table differ in case only #3517

Merged
merged 1 commit into from
Sep 21, 2021

Conversation

dongkelun
Copy link
Contributor

What is the purpose of the pull request

Fix the exception for mergeInto when the primaryKey and preCombineField of source table and target table differ in case only

Verify this pull request

Unit test.

@hudi-bot
Copy link

hudi-bot commented Aug 21, 2021

CI report:

Bot commands @hudi-bot supports the following commands:
  • @hudi-bot run travis re-run the last Travis build
  • @hudi-bot run azure re-run the last Azure build

@dongkelun dongkelun changed the title Fix the exception for mergeInto when the primaryKey and preCombineField of source table and target table differ in case only [HUDI-2343]Fix the exception for mergeInto when the primaryKey and preCombineField of source table and target table differ in case only Aug 21, 2021
…eCombineField of source table and target table differ in case only
@dongkelun
Copy link
Contributor Author

@pengzhiwei2018 Hi,can you please take a look?

sourceExpression match {
case attr: AttributeReference if attr.name.equalsIgnoreCase(targetColumnName) => true
case Cast(attr: AttributeReference, _, _) if attr.name.equalsIgnoreCase(targetColumnName) => true
case attr: AttributeReference if sourceColNameMap(attr.name.toLowerCase).equals(targetColumnName) => true

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use sparkSession.sessionState.conf.resolver to compare the column name?

Copy link
Contributor Author

@dongkelun dongkelun Sep 21, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi,is it like this?

val resolver = sparkSession.sessionState.conf.resolver
case attr: AttributeReference if resolver(attr.name, targetColumnName) => true

I'm not sure if I understand,resolver is not case sensitive when comparing equality.However, the comparison of equality here must be case sensitive.Therefore, use sourceColNameMap(attr.name.toLowerCase) to obtain the original column name of source table without case conversion,Then compare with targetColumnName for equality.If not, add the corresponding column name with withColumn later. It is case sensitive because sourceDF is case sensitive when writing data.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make sense to me. +1 for this.

@pengzhiwei2018 pengzhiwei2018 merged commit 5a94043 into apache:master Sep 21, 2021
@dongkelun dongkelun deleted the HUDI-2343 branch June 14, 2022 12:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants