Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-4282][YARN] Stopping flag in YarnClientSchedulerBackend should be volatile #3143

Closed
wants to merge 1 commit into from

Conversation

sarutak
Copy link
Member

@sarutak sarutak commented Nov 6, 2014

In YarnClientSchedulerBackend, a variable "stopping" is used as a flag and it's accessed by some threads so it should be volatile.

@sarutak sarutak changed the title [SPARK-4282] Stopping flag in YarnClientSchedulerBackend should be volatile [SPARK-4282][YARN] Stopping flag in YarnClientSchedulerBackend should be volatile Nov 6, 2014
@SparkQA
Copy link

SparkQA commented Nov 6, 2014

Test build #23019 has started for PR 3143 at commit 58fdcc9.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Nov 6, 2014

Test build #23019 has finished for PR 3143 at commit 58fdcc9.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/23019/
Test PASSed.

@JoshRosen
Copy link
Contributor

Why not use proper synchronization / waiting instead, or a semaphore?

@sarutak
Copy link
Member Author

sarutak commented Nov 7, 2014

Ah... embarrassing. Yes, we should use synchronization.

@aarondav
Copy link
Contributor

aarondav commented Nov 7, 2014

Why does the thread interrupt itself after it's completed the while loop? There is nothing that would see this.

@aarondav
Copy link
Contributor

aarondav commented Nov 7, 2014

Also, I'm not certain of the advantage of using proper synchronization over a volatile variable here, the latter seems sufficient to the task, unless you want to wait for the thread to terminate, which seems unnecessary since it's daemonic.

@JoshRosen
Copy link
Contributor

This looked like busy-waiting to me, which seems like an antipattern compared to waiting on a condition variable / semaphore, but I guess we need to poll YARN to find out whether the application has exited. Sorry for the confusion.

@aarondav
Copy link
Contributor

aarondav commented Nov 7, 2014

Right, I believe this is not busy-waiting, but a periodic task executing in a separate thread. Perhaps a ScheduledExecutorService would be clearer, but this seems like a reasonable use of volatile.

@JoshRosen
Copy link
Contributor

I'm not too familiar with the YARN code, but it looks like the Client has a monitorApplication() method that blocks until the application completes and returns its status. To avoid code duplication / improve understandability, couldn't we spin off a thread that calls that method (since that already implements the polling logic) and calls sc.stop() once that method returns? I think sc.stop() is thread-safe (or can be).

@aarondav
Copy link
Contributor

aarondav commented Nov 7, 2014

That sounds good, but do note that that wouldn't be quite the same semantics, as the current pattern allows someone calling sc.stop() to terminate this thread cleanly, independent of what YARN says. Perhaps this is not a useful distinction, however.

@JoshRosen
Copy link
Contributor

the current pattern allows someone calling sc.stop() to terminate this thread cleanly, independent of what YARN says

It looks like stop() calls client.stop(), which could implement the volatile variable to interrupt any threads that called monitorApplication. So, it sounds like we'll probably end up implementing the @volatile in either case, but I'm suggesting that this synchronization / interruption logic should be hidden inside the Client rather than exposed here.

@aarondav
Copy link
Contributor

aarondav commented Nov 7, 2014

SGTM

@sarutak
Copy link
Member Author

sarutak commented Nov 8, 2014

@JoshRosen You mean we should call ClientBase#monitorApplication() with blocking parameter instead of implementing similar code. In a nut shell, like following code?

private def asyncMonitorApplication(): Unit = {
  assert(client != null && appId != null, "Application has not been submitted yet!")
  val t = new Thread {
    override def run() {
      val (state, _) = client.monitorApplication(appId, returnOnRunning = false) // blocking
      if (!stopping) {
        logError(s"Yarn application has already exited with state $state!")
        sc.stop()
      }
    }
  }
  t.setName("Yarn application state monitor")
  t.setDaemon(true)
  t.start()
}

@tgravescs
Copy link
Contributor

I agree all we need is to make it volatile, no other synchronization is required. It sounds good to commonize the monitor logic inside of Client/ClientBase. Lets make sure to not have monitorApplication spin off an extra thread in cluster mode as it would just add extra overhead.

If we want to get this into 1.2 I would rather just have this bug fix the volatile flag, then file a separate jira to do the other cleanup.

@JoshRosen
Copy link
Contributor

@tgravescs That seems fine to me; you can go ahead and merge this, if you'd like, or I'll do it when I get back later this morning.

@tgravescs
Copy link
Contributor

+1, looks good. Filed SPARK-4346 to do the cleanup and commonization

asfgit pushed a commit that referenced this pull request Nov 11, 2014
… be volatile

In YarnClientSchedulerBackend, a variable "stopping" is used as a flag and it's accessed by some threads so it should be volatile.

Author: Kousuke Saruta <[email protected]>

Closes #3143 from sarutak/stopping-flag-volatile and squashes the following commits:

58fdcc9 [Kousuke Saruta] Marked stoppig flag as volatile

(cherry picked from commit 7f37188)
Signed-off-by: Thomas Graves <[email protected]>
@asfgit asfgit closed this in 7f37188 Nov 11, 2014
asfgit pushed a commit that referenced this pull request Apr 8, 2015
1. YarnClientSchedulerBack.asyncMonitorApplication use Client.monitorApplication so that commonize the monitor logic
2. Support changing the yarn client monitor interval, see #5292
3. More details see discussion on #3143

Author: unknown <[email protected]>
Author: Sephiroth-Lin <[email protected]>

Closes #5305 from Sephiroth-Lin/SPARK-4346_3596 and squashes the following commits:

47c0014 [unknown] Edit conflicts
52b29fe [unknown] Interrupt thread when we call stop()
d4298a1 [unknown] Unused, don't push
aaacb42 [Sephiroth-Lin] don't wrap the entire block in the try
ee2b2fd [Sephiroth-Lin] update
6483a2a [unknown] Catch exception
6b47ff7 [unknown] Update code
568f46f [unknown] YarnClientSchedulerBack.asyncMonitorApplication should be common with Client.monitorApplication
@sarutak sarutak deleted the stopping-flag-volatile branch April 11, 2015 05:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants