Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-3377] [SPARK-3610] Metrics can be accidentally aggregated / History server log name should not be based on user input #2432

Closed
wants to merge 74 commits into from

Conversation

sarutak
Copy link
Member

@sarutak sarutak commented Sep 17, 2014

This PR is another solution for #2250

I'm using codahale base MetricsSystem of Spark with JMX or Graphite, and I saw following 2 problems.

(1) When applications which have same spark.app.name run on cluster at the same time, some metrics names are mixed. For instance, if 2+ application is running on the cluster at the same time, each application emits the same named metric like "SparkPi.DAGScheduler.stage.failedStages" and Graphite cannot distinguish the metrics is for which application.

(2) When 2+ executors run on the same machine, JVM metrics of each executors are mixed. For instance, 2+ executors running on the same node can emit the same named metric "jvm.memory" and Graphite cannot distinguish the metrics is from which application.

And there is an similar issue. The directory for event logs is named using application name.
Application name is defined by user and the name can includes illegal character for path names.
Further more, the directory name consists of application name and System.currentTimeMillis even though each application has unique Application ID so if we run jobs which have same name, it's difficult to identify which directory is for which application.

Closes #2250
Closes #1067

…ause the instance of SparkContext is no longer used
This reverts commit e4a4593.
@SparkQA
Copy link

SparkQA commented Sep 17, 2014

QA tests have started for PR 2432 at commit efcb6e1.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Sep 17, 2014

QA tests have finished for PR 2432 at commit efcb6e1.

  • This patch fails unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Sep 17, 2014

QA tests have started for PR 2432 at commit 4776f9e.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Sep 17, 2014

QA tests have finished for PR 2432 at commit 4776f9e.

  • This patch fails unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@sarutak
Copy link
Member Author

sarutak commented Sep 17, 2014

retest this please.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/21239/

@@ -33,8 +33,7 @@ import akka.remote.{DisassociatedEvent, RemotingLifecycleEvent}
import akka.serialization.SerializationExtension

import org.apache.spark.{Logging, SecurityManager, SparkConf, SparkException}
import org.apache.spark.deploy.{ApplicationDescription, DriverDescription, ExecutorState,
SparkHadoopUtil}
import org.apache.spark.deploy.{ApplicationDescription, DriverDescription, ExecutorState, SparkHadoopUtil}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is > 100 chars

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I know, but scalastyle for Spark except for import statements. In fact, lots of code in Spark have 100+ columns import statements.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I'm aware... I don't know if that's a good thing.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

O.K. For now, I indent the line with 2 spaces.

@andrewor14
Copy link
Contributor

Hey @sarutak I left a few more minor comments. Otherwise this LGTM.

@SparkQA
Copy link

SparkQA commented Oct 3, 2014

QA tests have started for PR 2432 at commit 2cc09aa.

  • This patch merges cleanly.

@sarutak
Copy link
Member Author

sarutak commented Oct 3, 2014

Thanks @andrewor14 I'm waiting for tests.

@SparkQA
Copy link

SparkQA commented Oct 3, 2014

QA tests have finished for PR 2432 at commit 2cc09aa.

  • This patch fails unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • case e: Exception => logError("Source class " + classPath + " cannot be instantiated", e)

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/21264/Test FAILed.

Replaced getApplicationId with applicationId in SparkContext

Replaced == with === in MetricsSystemSuite
@sarutak sarutak force-pushed the metrics-structure-improvement2 branch from 2cc09aa to 3288b2b Compare October 3, 2014 19:30
@SparkQA
Copy link

SparkQA commented Oct 3, 2014

QA tests have started for PR 2432 at commit 3288b2b.

  • This patch merges cleanly.

@andrewor14
Copy link
Contributor

Ok, this is ready to go from my perspective. Merging once the tests pass. Thanks @sarutak.

@SparkQA
Copy link

SparkQA commented Oct 3, 2014

QA tests have finished for PR 2432 at commit 3288b2b.

  • This patch passes unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/21265/Test PASSed.

@andrewor14
Copy link
Contributor

Alright, this went into master. There were too many merge conflicts for this to also go into 1.1. @sarutak if you feel inclined feel free to open another one against that branch.

@sarutak
Copy link
Member Author

sarutak commented Oct 3, 2014

@andrewor14 O.K, I'll open another PR for 1.1. Thanks.

@asfgit asfgit closed this in 79e45c9 Oct 3, 2014
@tgravescs
Copy link
Contributor

This broken the yarn-alpha build. Please make sure to update YarnAllocationHandler for it also if you do any other prs

https://issues.apache.org/jira/browse/SPARK-3848

@sarutak
Copy link
Member Author

sarutak commented Oct 8, 2014

Oops... I'll fix soon.

@sarutak
Copy link
Member Author

sarutak commented Oct 8, 2014

@tgravescs Sorry for having you waiting. I've fixed the issue at #2715 .

asfgit pushed a commit that referenced this pull request Oct 8, 2014
yarn alpha build was broken by #2432
as it added an argument to YarnAllocator but not to yarn/alpha YarnAllocationHandler
commit 79e45c9

Author: Kousuke Saruta <[email protected]>

Closes #2715 from sarutak/SPARK-3848 and squashes the following commits:

bafb8d1 [Kousuke Saruta] Fixed parameters for the default constructor of alpha/YarnAllocatorHandler.
@sarutak sarutak deleted the metrics-structure-improvement2 branch April 11, 2015 05:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants