[SPARK-4282][YARN] Stopping flag in YarnClientSchedulerBackend should be volatile #3143

sarutak · 2014-11-06T22:01:50Z

In YarnClientSchedulerBackend, a variable "stopping" is used as a flag and it's accessed by some threads so it should be volatile.

SparkQA · 2014-11-06T22:02:38Z

Test build #23019 has started for PR 3143 at commit 58fdcc9.

This patch merges cleanly.

SparkQA · 2014-11-06T23:26:57Z

Test build #23019 has finished for PR 3143 at commit 58fdcc9.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

AmplabJenkins · 2014-11-06T23:27:01Z

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/23019/
Test PASSed.

JoshRosen · 2014-11-07T20:16:32Z

Why not use proper synchronization / waiting instead, or a semaphore?

sarutak · 2014-11-07T20:19:43Z

Ah... embarrassing. Yes, we should use synchronization.

aarondav · 2014-11-07T20:44:49Z

Why does the thread interrupt itself after it's completed the while loop? There is nothing that would see this.

aarondav · 2014-11-07T20:46:10Z

Also, I'm not certain of the advantage of using proper synchronization over a volatile variable here, the latter seems sufficient to the task, unless you want to wait for the thread to terminate, which seems unnecessary since it's daemonic.

JoshRosen · 2014-11-07T20:49:54Z

This looked like busy-waiting to me, which seems like an antipattern compared to waiting on a condition variable / semaphore, but I guess we need to poll YARN to find out whether the application has exited. Sorry for the confusion.

aarondav · 2014-11-07T20:51:38Z

Right, I believe this is not busy-waiting, but a periodic task executing in a separate thread. Perhaps a ScheduledExecutorService would be clearer, but this seems like a reasonable use of volatile.

JoshRosen · 2014-11-07T20:57:55Z

I'm not too familiar with the YARN code, but it looks like the Client has a monitorApplication() method that blocks until the application completes and returns its status. To avoid code duplication / improve understandability, couldn't we spin off a thread that calls that method (since that already implements the polling logic) and calls sc.stop() once that method returns? I think sc.stop() is thread-safe (or can be).

aarondav · 2014-11-07T21:16:18Z

That sounds good, but do note that that wouldn't be quite the same semantics, as the current pattern allows someone calling sc.stop() to terminate this thread cleanly, independent of what YARN says. Perhaps this is not a useful distinction, however.

JoshRosen · 2014-11-07T21:22:48Z

the current pattern allows someone calling sc.stop() to terminate this thread cleanly, independent of what YARN says

It looks like stop() calls client.stop(), which could implement the volatile variable to interrupt any threads that called monitorApplication. So, it sounds like we'll probably end up implementing the @volatile in either case, but I'm suggesting that this synchronization / interruption logic should be hidden inside the Client rather than exposed here.

aarondav · 2014-11-07T22:36:13Z

SGTM

sarutak · 2014-11-08T01:25:58Z

@JoshRosen You mean we should call ClientBase#monitorApplication() with blocking parameter instead of implementing similar code. In a nut shell, like following code?

private def asyncMonitorApplication(): Unit = {
  assert(client != null && appId != null, "Application has not been submitted yet!")
  val t = new Thread {
    override def run() {
      val (state, _) = client.monitorApplication(appId, returnOnRunning = false) // blocking
      if (!stopping) {
        logError(s"Yarn application has already exited with state $state!")
        sc.stop()
      }
    }
  }
  t.setName("Yarn application state monitor")
  t.setDaemon(true)
  t.start()
}

tgravescs · 2014-11-11T15:17:13Z

I agree all we need is to make it volatile, no other synchronization is required. It sounds good to commonize the monitor logic inside of Client/ClientBase. Lets make sure to not have monitorApplication spin off an extra thread in cluster mode as it would just add extra overhead.

If we want to get this into 1.2 I would rather just have this bug fix the volatile flag, then file a separate jira to do the other cleanup.

JoshRosen · 2014-11-11T15:25:01Z

@tgravescs That seems fine to me; you can go ahead and merge this, if you'd like, or I'll do it when I get back later this morning.

tgravescs · 2014-11-11T18:33:37Z

+1, looks good. Filed SPARK-4346 to do the cleanup and commonization

… be volatile In YarnClientSchedulerBackend, a variable "stopping" is used as a flag and it's accessed by some threads so it should be volatile. Author: Kousuke Saruta <[email protected]> Closes #3143 from sarutak/stopping-flag-volatile and squashes the following commits: 58fdcc9 [Kousuke Saruta] Marked stoppig flag as volatile (cherry picked from commit 7f37188) Signed-off-by: Thomas Graves <[email protected]>

1. YarnClientSchedulerBack.asyncMonitorApplication use Client.monitorApplication so that commonize the monitor logic 2. Support changing the yarn client monitor interval, see #5292 3. More details see discussion on #3143 Author: unknown <[email protected]> Author: Sephiroth-Lin <[email protected]> Closes #5305 from Sephiroth-Lin/SPARK-4346_3596 and squashes the following commits: 47c0014 [unknown] Edit conflicts 52b29fe [unknown] Interrupt thread when we call stop() d4298a1 [unknown] Unused, don't push aaacb42 [Sephiroth-Lin] don't wrap the entire block in the try ee2b2fd [Sephiroth-Lin] update 6483a2a [unknown] Catch exception 6b47ff7 [unknown] Update code 568f46f [unknown] YarnClientSchedulerBack.asyncMonitorApplication should be common with Client.monitorApplication

Marked stoppig flag as volatile

58fdcc9

sarutak changed the title ~~[SPARK-4282] Stopping flag in YarnClientSchedulerBackend should be volatile~~ [SPARK-4282][YARN] Stopping flag in YarnClientSchedulerBackend should be volatile Nov 6, 2014

asfgit closed this in 7f37188 Nov 11, 2014

Sephiroth-Lin mentioned this pull request Apr 1, 2015

[SPARK-4346][SPARK-3596][YARN] Commonize the monitor logic #5305

Closed

sarutak deleted the stopping-flag-volatile branch April 11, 2015 05:19

Sephiroth-Lin mentioned this pull request Aug 1, 2015

[SPARK-9519][Yarn] Confirm stop sc successfully when application was killed #7846

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-4282][YARN] Stopping flag in YarnClientSchedulerBackend should be volatile #3143

[SPARK-4282][YARN] Stopping flag in YarnClientSchedulerBackend should be volatile #3143

sarutak commented Nov 6, 2014

SparkQA commented Nov 6, 2014

SparkQA commented Nov 6, 2014

AmplabJenkins commented Nov 6, 2014

JoshRosen commented Nov 7, 2014

sarutak commented Nov 7, 2014

aarondav commented Nov 7, 2014

aarondav commented Nov 7, 2014

JoshRosen commented Nov 7, 2014

aarondav commented Nov 7, 2014

JoshRosen commented Nov 7, 2014

aarondav commented Nov 7, 2014

JoshRosen commented Nov 7, 2014

aarondav commented Nov 7, 2014

sarutak commented Nov 8, 2014

tgravescs commented Nov 11, 2014

JoshRosen commented Nov 11, 2014

tgravescs commented Nov 11, 2014

[SPARK-4282][YARN] Stopping flag in YarnClientSchedulerBackend should be volatile #3143

[SPARK-4282][YARN] Stopping flag in YarnClientSchedulerBackend should be volatile #3143

Conversation

sarutak commented Nov 6, 2014

SparkQA commented Nov 6, 2014

SparkQA commented Nov 6, 2014

AmplabJenkins commented Nov 6, 2014

JoshRosen commented Nov 7, 2014

sarutak commented Nov 7, 2014

aarondav commented Nov 7, 2014

aarondav commented Nov 7, 2014

JoshRosen commented Nov 7, 2014

aarondav commented Nov 7, 2014

JoshRosen commented Nov 7, 2014

aarondav commented Nov 7, 2014

JoshRosen commented Nov 7, 2014

aarondav commented Nov 7, 2014

sarutak commented Nov 8, 2014

tgravescs commented Nov 11, 2014

JoshRosen commented Nov 11, 2014

tgravescs commented Nov 11, 2014