Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[HUDI-3598] Row Data to Hoodie Record Operator parallelism needs to always be consistent with input operator for chaining purpose #5049

Merged
merged 1 commit into from
Mar 18, 2022

Conversation

JerryYue-M
Copy link
Contributor

…o adjust the number of parallelism

Tips

What is the purpose of the pull request

(For example: This pull request adds quick-start document.)

Brief change log

(for example:)

  • Modify AnnotationLocation checkstyle rule in checkstyle.xml

Verify this pull request

(Please pick either of the following options)

This pull request is a trivial rework / code cleanup without any test coverage.

(or)

This pull request is already covered by existing tests, such as (please describe tests).

(or)

This change added tests and can be verified as follows:

(example:)

  • Added integration tests for end-to-end.
  • Added HoodieClientWriteTest to verify the change.
  • Manually verified the change by running a job locally.

Committer checklist

  • Has a corresponding JIRA in PR title & commit

  • Commit message is descriptive of the change

  • CI is green

  • Necessary doc changes done or have another open PR

  • For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.

@JerryYue-M
Copy link
Contributor Author

@wangxianghu
thanks for review, I had fixed the problem you pointed out

Copy link
Contributor

@danny0405 danny0405 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you explain why this param is needed ? Generally we avoid introducing too many options if it is not necessary.

The default parallelism would make the map operator chain with source if the souce also has default parallelism.

@JerryYue-M
Copy link
Contributor Author

@danny0405
in the real product scene,source operator don't use the default parallelism of job. for example, kafka source will set the parallelism mapping to partition num. in this case we can't chain the source operator with map function.that's why we need the config.

1 similar comment
@JerryYue-M
Copy link
Contributor Author

@danny0405
in the real product scene,source operator don't use the default parallelism of job. for example, kafka source will set the parallelism mapping to partition num. in this case we can't chain the source operator with map function.that's why we need the config.

@danny0405
Copy link
Contributor

@danny0405 in the real product scene,source operator don't use the default parallelism of job. for example, kafka source will set the parallelism mapping to partition num. in this case we can't chain the source operator with map function.that's why we need the config.

The more proper way is using the source/input parallelism so that the options can be avoided.

@wangxianghu
Copy link
Contributor

@wangxianghu thanks for review, I had fixed the problem you pointed out

Thanks @JerryYue-M, I am ok with the change.
will merge as long as @danny0405 approve it

@wangxianghu
Copy link
Contributor

@danny0405 in the real product scene,source operator don't use the default parallelism of job. for example, kafka source will set the parallelism mapping to partition num. in this case we can't chain the source operator with map function.that's why we need the config.

The more proper way is using the source/input parallelism so that the options can be avoided.

This could be a better way, avoid introducing new options.

@JerryYue-M
Copy link
Contributor Author

@danny0405 @wangxianghu
indeed, i will follow this way
thanks all

@JerryYue-M JerryYue-M changed the title [HUDI-3598] Row Data to Hoodie Record Operator should support users t… [HUDI-3598] row data to hoodie record map operator need always use the input operator parallelism to chained with source operator Mar 16, 2022
…lways be consistent with input operator

for chaining purpose
@JerryYue-M JerryYue-M changed the title [HUDI-3598] row data to hoodie record map operator need always use the input operator parallelism to chained with source operator [HUDI-3598] Row Data to Hoodie Record Operator parallelism needs to always be consistent with input operator for chaining purpose Mar 16, 2022
Copy link
Contributor

@danny0405 danny0405 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

@wangxianghu
Copy link
Contributor

@hudi-bot run azure

@hudi-bot
Copy link

CI report:

Bot commands @hudi-bot supports the following commands:
  • @hudi-bot run azure re-run the last Azure build

@yihua
Copy link
Contributor

yihua commented Mar 17, 2022

@JerryYue-M could you rebase this PR on the latest master? The CI failure is due to clustering failure which has been fixed on the latest master. Once CI passes, the PR is ready for merging.

@danny0405
Copy link
Contributor

@yihua I would just merge it because this change is trivial.

@danny0405 danny0405 merged commit 6fe4d6e into apache:master Mar 18, 2022
vingov pushed a commit to vingov/hudi that referenced this pull request Apr 3, 2022
…lways be consistent with input operator (apache#5049)

for chaining purpose

Co-authored-by: jerryyue <[email protected]>
stayrascal pushed a commit to stayrascal/hudi that referenced this pull request Apr 12, 2022
…lways be consistent with input operator (apache#5049)

for chaining purpose

Co-authored-by: jerryyue <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
flink Issues related to flink
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants