-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[HUDI-5278] Support more conf to cluster procedure #7304
Conversation
627aedf
to
1b87e59
Compare
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodieClusteringConfig.java
Outdated
Show resolved
Hide resolved
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodieClusteringConfig.java
Show resolved
Hide resolved
...ark/src/main/scala/org/apache/spark/sql/hudi/command/procedures/RunClusteringProcedure.scala
Outdated
Show resolved
Hide resolved
assert(1 == metaClient.getActiveTimeline.getCompletedReplaceTimeline.getInstants.count()) | ||
assert(0 == metaClient.getActiveTimeline.filterPendingReplaceTimeline().getInstants.count()) | ||
|
||
spark.sql(s"call run_clustering(table => '$tableName')") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
missing test case for scheduleandexecute and invalid op?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
scheduleandexecute is default, I will add invalid op case
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh, it already has invalid case checkExceptionContain(s"call run_clustering(table => '$tableName', op => 'null')")("Invalid value")
/** | ||
* only execute then pending clustering plans | ||
*/ | ||
EXECUTE("execute"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we execute specific pending clustering plan instead of all pending clustering plans?
|
||
pendingClustering = instantsStr match { | ||
case Some(inst) => | ||
operator = ClusteringOperator.EXECUTE |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
here why we need set operator to EXECUTE but in line#144 we do not need?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because the user does not specify the instants
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need check if users specify the instants with SCHEDULE and SCHEDULE_AND_EXECUTE, we should throw exception instead of set it to EXECUTE when specify instants.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please put the line operator = ClusteringOperator.EXECUTE
below the line logInfo("No op")
and please change logInfo("No op")
to logInfo("No op and set it to EXECUTE with instants specified.")
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the line can not put below logInfo("No op")
, operator default is scheduleAndExecute
, if user specific instants, it need be set to execute after check
@leesf can you help me resolve this issue, after lastest success ci code, the code has not change any all, and my local compile is success |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@KnightChess would you please rebase to latest master as I see the master has some code changed. |
e839fd6
to
728191c
Compare
@leesf done, and I update the test to resolve Java CI compile eroor, but I still don't know why |
728191c
to
20a511d
Compare
I will reopen ci after #7319 be merged |
20a511d
to
3f8f9a4
Compare
Don't merge, I meet some question in cluster |
look like is something bug in our Internal version, open version is ok, no blocked |
@hudi-bot run azure |
@leesf @stream2000 thanks for review |
""".stripMargin) | ||
|
||
val fileNum = 20 | ||
val numRecords = 400000 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @KnightChess, do we need so many files and records per file for this test? This test currently could cost much time.
Change Logs
spark sql cluster procedure support new params: op, order_strategy, options
Impact
none
Risk level (write none, low medium or high below)
low
Documentation Update
will open doc pr to update
Contributor's checklist