Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-35215][SQL] Update custom metric per certain rows and at the end of the task #32330

Closed
wants to merge 4 commits into from

Conversation

viirya
Copy link
Member

@viirya viirya commented Apr 25, 2021

What changes were proposed in this pull request?

This patch changes custom metric updating to update per certain rows (currently 100), instead of per row.

Why are the changes needed?

Based on previous discussion #31451 (comment), we should only update custom metrics per certain (e.g. 100) rows and also at the end of the task. Updating per row doesn't make too much benefit.

Does this PR introduce any user-facing change?

No

How was this patch tested?

Existing unit test.

@SparkQA
Copy link

SparkQA commented Apr 25, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/42432/

@SparkQA
Copy link

SparkQA commented Apr 25, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/42432/

@SparkQA
Copy link

SparkQA commented Apr 25, 2021

Test build #137908 has finished for PR 32330 at commit 1a98660.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@viirya
Copy link
Member Author

viirya commented Apr 26, 2021

cc @cloud-fan

s"${metric.name()}")
customMetrics(metric.name()).set(metric.value())
if (numRow % CustomMetrics.numRowsPerUpdate == 0) {
reader.currentMetricsValues.foreach { metric =>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we move it into a method to reuse code?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a reused method.

@@ -25,6 +25,8 @@ import org.apache.spark.sql.connector.CustomMetric
object CustomMetrics {
private[spark] val V2_CUSTOM = "v2Custom"

private[spark] val numRowsPerUpdate = 100L
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does it need to be a long?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

numRow is a long, I guess this can be just int.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Made it as int.

customMetrics(metric.name()).set(metric.value())
if (numRow % CustomMetrics.numRowsPerUpdate == 0) {
partitionReader.currentMetricsValues.foreach { metric =>
assert(customMetrics.contains(metric.name()),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure how useful is the assert here. It's for internal error only and customMetrics(metric.name()) will fail too.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can remove it. I also thought it is not necessary but just added for a comment before.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed

@dongjoon-hyun
Copy link
Member

#32348 is merged. Do we need to rebase this PR, @viirya ?

@viirya
Copy link
Member Author

viirya commented Apr 26, 2021

@dongjoon-hyun Yes, I will rebase this PR. Thanks.

@viirya
Copy link
Member Author

viirya commented May 4, 2021

Rebased and updated for the comments. @cloud-fan @dongjoon-hyun

@SparkQA
Copy link

SparkQA commented May 5, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/42682/

@SparkQA
Copy link

SparkQA commented May 5, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/42682/

@SparkQA
Copy link

SparkQA commented May 5, 2021

Test build #138161 has finished for PR 32330 at commit 4f0be6c.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.


object CustomMetrics {
private[spark] val V2_CUSTOM = "v2Custom"

private[spark] val numRowsPerUpdate = 100
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: NUM_ROWS_PER_UPDATE since it's a constant?

@SparkQA
Copy link

SparkQA commented May 6, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/42719/

@SparkQA
Copy link

SparkQA commented May 6, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/42719/

@SparkQA
Copy link

SparkQA commented May 6, 2021

Test build #138198 has finished for PR 32330 at commit 5f6cf5a.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor

thanks, merging to master!

@cloud-fan cloud-fan closed this in 6cd5cf5 May 6, 2021
@viirya viirya deleted the metric-update branch December 27, 2023 18:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants