Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[HUDI-5278] Support more conf to cluster procedure #7304

Merged
merged 1 commit into from
Nov 30, 2022

Conversation

KnightChess
Copy link
Contributor

Change Logs

spark sql cluster procedure support new params: op, order_strategy, options

Impact

none

Risk level (write none, low medium or high below)

low

Documentation Update

will open doc pr to update

Contributor's checklist

  • Read through contributor's guide
  • Change Logs and Impact were stated clearly
  • Adequate tests were added if applicable
  • CI passed

assert(1 == metaClient.getActiveTimeline.getCompletedReplaceTimeline.getInstants.count())
assert(0 == metaClient.getActiveTimeline.filterPendingReplaceTimeline().getInstants.count())

spark.sql(s"call run_clustering(table => '$tableName')")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missing test case for scheduleandexecute and invalid op?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

scheduleandexecute is default, I will add invalid op case

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh, it already has invalid case checkExceptionContain(s"call run_clustering(table => '$tableName', op => 'null')")("Invalid value")

/**
* only execute then pending clustering plans
*/
EXECUTE("execute"),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we execute specific pending clustering plan instead of all pending clustering plans?

@leesf leesf self-assigned this Nov 26, 2022

pendingClustering = instantsStr match {
case Some(inst) =>
operator = ClusteringOperator.EXECUTE
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here why we need set operator to EXECUTE but in line#144 we do not need?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because the user does not specify the instants

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need check if users specify the instants with SCHEDULE and SCHEDULE_AND_EXECUTE, we should throw exception instead of set it to EXECUTE when specify instants.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please put the line operator = ClusteringOperator.EXECUTE below the line logInfo("No op") and please change logInfo("No op") to logInfo("No op and set it to EXECUTE with instants specified.")

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the line can not put below logInfo("No op"), operator default is scheduleAndExecute, if user specific instants, it need be set to execute after check

@KnightChess KnightChess reopened this Nov 28, 2022
@KnightChess
Copy link
Contributor Author

@leesf can you help me resolve this issue, after lastest success ci code, the code has not change any all, and my local compile is success

image

image

@leesf leesf closed this Nov 29, 2022
@leesf leesf reopened this Nov 29, 2022
Copy link
Contributor

@leesf leesf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@leesf leesf closed this Nov 29, 2022
@leesf leesf reopened this Nov 29, 2022
@leesf
Copy link
Contributor

leesf commented Nov 29, 2022

@KnightChess would you please rebase to latest master as I see the master has some code changed.

@KnightChess
Copy link
Contributor Author

@leesf done, and I update the test to resolve Java CI compile eroor, but I still don't know why

@KnightChess
Copy link
Contributor Author

flink moudle error, and I found
image

@KnightChess
Copy link
Contributor Author

I will reopen ci after #7319 be merged

@KnightChess
Copy link
Contributor Author

KnightChess commented Nov 29, 2022

Don't merge, I meet some question in cluster

@KnightChess
Copy link
Contributor Author

Don't merge, I meet some question in cluster

look like is something bug in our Internal version, open version is ok, no blocked

@codope codope added spark-sql priority:major degraded perf; unable to move forward; potential bugs labels Nov 29, 2022
@codope codope changed the title [HUDI-5278]support more conf to cluster procedure [HUDI-5278] Support more conf to cluster procedure Nov 29, 2022
@leesf
Copy link
Contributor

leesf commented Nov 29, 2022

@hudi-bot run azure

@hudi-bot
Copy link

CI report:

Bot commands @hudi-bot supports the following commands:
  • @hudi-bot run azure re-run the last Azure build

@leesf leesf merged commit 418091b into apache:master Nov 30, 2022
@KnightChess
Copy link
Contributor Author

@leesf @stream2000 thanks for review

""".stripMargin)

val fileNum = 20
val numRecords = 400000
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @KnightChess, do we need so many files and records per file for this test? This test currently could cost much time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
priority:major degraded perf; unable to move forward; potential bugs spark-sql
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

6 participants