[SPARK-3377] [SPARK-3610] Metrics can be accidentally aggregated / History server log name should not be based on user input #2432

sarutak · 2014-09-17T18:10:07Z

This PR is another solution for #2250

I'm using codahale base MetricsSystem of Spark with JMX or Graphite, and I saw following 2 problems.

(1) When applications which have same spark.app.name run on cluster at the same time, some metrics names are mixed. For instance, if 2+ application is running on the cluster at the same time, each application emits the same named metric like "SparkPi.DAGScheduler.stage.failedStages" and Graphite cannot distinguish the metrics is for which application.

(2) When 2+ executors run on the same machine, JVM metrics of each executors are mixed. For instance, 2+ executors running on the same node can emit the same named metric "jvm.memory" and Graphite cannot distinguish the metrics is from which application.

And there is an similar issue. The directory for event logs is named using application name.
Application name is defined by user and the name can includes illegal character for path names.
Further more, the directory name consists of application name and System.currentTimeMillis even though each application has unique Application ID so if we run jobs which have same name, it's difficult to identify which directory is for which application.

Closes #2250
Closes #1067

…rkConf

…fiers

…nagerSource

…d and driver/executor-id

… BlockManagerSource" This reverts commit 71609f5.

…cture-improvement

…nagerSource

…ause the instance of SparkContext is no longer used

…turn null when correspondin entry is absent

…cture-improvement

…ark into metrics-structure-improvement

…cture-improvement

This reverts commit e4a4593.

…cture-improvement

SparkQA · 2014-09-17T18:14:24Z

QA tests have started for PR 2432 at commit efcb6e1.

This patch merges cleanly.

SparkQA · 2014-09-17T19:04:22Z

QA tests have finished for PR 2432 at commit efcb6e1.

This patch fails unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2014-09-17T19:19:18Z

QA tests have started for PR 2432 at commit 4776f9e.

This patch merges cleanly.

SparkQA · 2014-09-17T20:08:24Z

QA tests have finished for PR 2432 at commit 4776f9e.

This patch fails unit tests.
This patch merges cleanly.
This patch adds no public classes.

sarutak · 2014-09-17T20:23:09Z

retest this please.

AmplabJenkins · 2014-10-03T06:18:01Z

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/21239/

andrewor14 · 2014-10-03T18:59:14Z

core/src/main/scala/org/apache/spark/deploy/master/Master.scala

@@ -33,8 +33,7 @@ import akka.remote.{DisassociatedEvent, RemotingLifecycleEvent}
 import akka.serialization.SerializationExtension

 import org.apache.spark.{Logging, SecurityManager, SparkConf, SparkException}
-import org.apache.spark.deploy.{ApplicationDescription, DriverDescription, ExecutorState,
-  SparkHadoopUtil}
+import org.apache.spark.deploy.{ApplicationDescription, DriverDescription, ExecutorState, SparkHadoopUtil}


this is > 100 chars

Yes I know, but scalastyle for Spark except for import statements. In fact, lots of code in Spark have 100+ columns import statements.

Yeah I'm aware... I don't know if that's a good thing.

O.K. For now, I indent the line with 2 spaces.

andrewor14 · 2014-10-03T19:09:43Z

Hey @sarutak I left a few more minor comments. Otherwise this LGTM.

SparkQA · 2014-10-03T19:24:48Z

QA tests have started for PR 2432 at commit 2cc09aa.

This patch merges cleanly.

sarutak · 2014-10-03T19:25:51Z

Thanks @andrewor14 I'm waiting for tests.

SparkQA · 2014-10-03T19:25:51Z

QA tests have finished for PR 2432 at commit 2cc09aa.

This patch fails unit tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
- case e: Exception => logError("Source class " + classPath + " cannot be instantiated", e)

AmplabJenkins · 2014-10-03T19:25:52Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/21264/Test FAILed.

Replaced getApplicationId with applicationId in SparkContext Replaced == with === in MetricsSystemSuite

…cture-improvement2

SparkQA · 2014-10-03T19:34:40Z

QA tests have started for PR 2432 at commit 3288b2b.

This patch merges cleanly.

andrewor14 · 2014-10-03T20:03:22Z

Ok, this is ready to go from my perspective. Merging once the tests pass. Thanks @sarutak.

SparkQA · 2014-10-03T20:43:14Z

QA tests have finished for PR 2432 at commit 3288b2b.

This patch passes unit tests.
This patch merges cleanly.
This patch adds no public classes.

AmplabJenkins · 2014-10-03T20:43:18Z

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/21265/Test PASSed.

andrewor14 · 2014-10-03T20:49:58Z

Alright, this went into master. There were too many merge conflicts for this to also go into 1.1. @sarutak if you feel inclined feel free to open another one against that branch.

sarutak · 2014-10-03T20:56:08Z

@andrewor14 O.K, I'll open another PR for 1.1. Thanks.

tgravescs · 2014-10-08T15:11:58Z

This broken the yarn-alpha build. Please make sure to update YarnAllocationHandler for it also if you do any other prs

https://issues.apache.org/jira/browse/SPARK-3848

sarutak · 2014-10-08T15:21:31Z

Oops... I'll fix soon.

sarutak · 2014-10-08T15:43:16Z

@tgravescs Sorry for having you waiting. I've fixed the issue at #2715 .

yarn alpha build was broken by #2432 as it added an argument to YarnAllocator but not to yarn/alpha YarnAllocationHandler commit 79e45c9 Author: Kousuke Saruta <[email protected]> Closes #2715 from sarutak/SPARK-3848 and squashes the following commits: bafb8d1 [Kousuke Saruta] Fixed parameters for the default constructor of alpha/YarnAllocatorHandler.

sarutak added 23 commits September 3, 2014 17:23

Modified SparkContext to retain spark.unique.app.name property in Spa…

4180993

…rkConf

Modified SparkContext and Executor to set spark.executor.id to identi…

55debab

…fiers

Modified sourceName of ExecutorSource, DAGSchedulerSource and BlockMa…

71609f5

…nagerSource

Modified MetricsSystem to set registry name with unique application-i…

868e326

…d and driver/executor-id

Revert "Modified sourceName of ExecutorSource, DAGSchedulerSource and…

85ffc02

… BlockManagerSource" This reverts commit 71609f5.

Merge branch 'master' of git://git.apache.org/spark into metrics-stru…

4e057c9

…cture-improvement

Modified sourceName of ExecutorSource, DAGSchedulerSource and BlockMa…

6fc5560

…nagerSource

Modified constructor of DAGSchedulerSource and BlockManagerSource bec…

6f7dcd4

…ause the instance of SparkContext is no longer used

Modified MetricsSystem#buildRegistryName because conf.get does not re…

15f88a3

…turn null when correspondin entry is absent

Merge branch 'master' of git://git.apache.org/spark into metrics-stru…

fa7175b

…cture-improvement

Merge branch 'master' of git://git.apache.org/spark into metrics-stru…

4603a39

…cture-improvement

Merge branch 'master' of git://git.apache.org/spark into metrics-stru…

3e098d8

…cture-improvement

tmp

e4a4593

Merge branch 'master' of git://git.apache.org/spark into metrics-stru…

912a637

…cture-improvement

Merge branch 'metrics-structure-improvement' of github.com:sarutak/sp…

848819c

…ark into metrics-structure-improvement

Merge branch 'master' of git://git.apache.org/spark into metrics-stru…

93e263a

…cture-improvement

Merge branch 'master' of git://git.apache.org/spark into metrics-stru…

45bd33d

…cture-improvement

Merge branch 'master' of git://git.apache.org/spark into metrics-stru…

7b67f5a

…cture-improvement

Revert "tmp"

08e627e

This reverts commit e4a4593.

Merge branch 'master' of git://git.apache.org/spark into metrics-stru…

ead8966

…cture-improvement

Merge branch 'master' of git://git.apache.org/spark into metrics-stru…

3ea7896

…cture-improvement

Merge branch 'master' of git://git.apache.org/spark into metrics-stru…

2ec848a

…cture-improvement

Modified to add application id to metrics name

efcb6e1

sarutak mentioned this pull request Sep 17, 2014

[SPARK-3377] [Metrics] Metrics can be accidentally aggregated #2250

Closed

Modified MetricsSystemSuite.scala

4776f9e

andrewor14 reviewed Oct 3, 2014
View reviewed changes

sarutak added 2 commits October 4, 2014 04:28

Fixed style

39169e4

Replaced getApplicationId with applicationId in SparkContext Replaced == with === in MetricsSystemSuite

Merge branch 'master' of git://git.apache.org/spark into metrics-stru…

3288b2b

…cture-improvement2

sarutak force-pushed the metrics-structure-improvement2 branch from 2cc09aa to 3288b2b Compare October 3, 2014 19:30

asfgit closed this in 79e45c9 Oct 3, 2014

sarutak mentioned this pull request Oct 8, 2014

[SPARK-3848] yarn alpha doesn't build on master #2715

Closed

sarutak deleted the metrics-structure-improvement2 branch April 11, 2015 05:21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-3377] [SPARK-3610] Metrics can be accidentally aggregated / History server log name should not be based on user input #2432

[SPARK-3377] [SPARK-3610] Metrics can be accidentally aggregated / History server log name should not be based on user input #2432

sarutak commented Sep 17, 2014

SparkQA commented Sep 17, 2014

SparkQA commented Sep 17, 2014

SparkQA commented Sep 17, 2014

SparkQA commented Sep 17, 2014

sarutak commented Sep 17, 2014

AmplabJenkins commented Oct 3, 2014

andrewor14 Oct 3, 2014

sarutak Oct 3, 2014

andrewor14 Oct 3, 2014

sarutak Oct 3, 2014

andrewor14 commented Oct 3, 2014

SparkQA commented Oct 3, 2014

sarutak commented Oct 3, 2014

SparkQA commented Oct 3, 2014

AmplabJenkins commented Oct 3, 2014

SparkQA commented Oct 3, 2014

andrewor14 commented Oct 3, 2014

SparkQA commented Oct 3, 2014

AmplabJenkins commented Oct 3, 2014

andrewor14 commented Oct 3, 2014

sarutak commented Oct 3, 2014

tgravescs commented Oct 8, 2014

sarutak commented Oct 8, 2014

sarutak commented Oct 8, 2014

[SPARK-3377] [SPARK-3610] Metrics can be accidentally aggregated / History server log name should not be based on user input #2432

[SPARK-3377] [SPARK-3610] Metrics can be accidentally aggregated / History server log name should not be based on user input #2432

Conversation

sarutak commented Sep 17, 2014

SparkQA commented Sep 17, 2014

SparkQA commented Sep 17, 2014

SparkQA commented Sep 17, 2014

SparkQA commented Sep 17, 2014

sarutak commented Sep 17, 2014

AmplabJenkins commented Oct 3, 2014

andrewor14 Oct 3, 2014

Choose a reason for hiding this comment

sarutak Oct 3, 2014

Choose a reason for hiding this comment

andrewor14 Oct 3, 2014

Choose a reason for hiding this comment

sarutak Oct 3, 2014

Choose a reason for hiding this comment

andrewor14 commented Oct 3, 2014

SparkQA commented Oct 3, 2014

sarutak commented Oct 3, 2014

SparkQA commented Oct 3, 2014

AmplabJenkins commented Oct 3, 2014

SparkQA commented Oct 3, 2014

andrewor14 commented Oct 3, 2014

SparkQA commented Oct 3, 2014

AmplabJenkins commented Oct 3, 2014

andrewor14 commented Oct 3, 2014

sarutak commented Oct 3, 2014

tgravescs commented Oct 8, 2014

sarutak commented Oct 8, 2014

sarutak commented Oct 8, 2014