Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-27065][CORE] avoid more than one active task set managers for a stage #23927

Closed
wants to merge 6 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -212,14 +212,20 @@ private[spark] class TaskSchedulerImpl(
val stage = taskSet.stageId
val stageTaskSets =
taskSetsByStageIdAndAttempt.getOrElseUpdate(stage, new HashMap[Int, TaskSetManager])
stageTaskSets(taskSet.stageAttemptId) = manager
val conflictingTaskSet = stageTaskSets.exists { case (_, ts) =>
ts.taskSet != taskSet && !ts.isZombie
}
if (conflictingTaskSet) {
throw new IllegalStateException(s"more than one active taskSet for stage $stage:" +
s" ${stageTaskSets.toSeq.map{_._2.taskSet.id}.mkString(",")}")

// Mark all the existing TaskSetManagers of this stage as zombie, as we are adding a new one.
// This is necessary to handle a corner case. Let's say a stage has 10 partitions and has 2
// TaskSetManagers: TSM1(zombie) and TSM2(active). TSM1 has a running task for partition 10
// and it completes. TSM2 finishes tasks for partition 1-9, and thinks he is still active
// because partition 10 is not completed yet. However, DAGScheduler gets task completion
// events for all the 10 partitions and thinks the stage is finished. If it's a shuffle stage
// and somehow it has missing map outputs, then DAGScheduler will resubmit it and create a
// TSM3 for it. As a stage can't have more than one active task set managers, we must mark
// TSM2 as zombie (it actually is).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If TSM3 is created just after TSM2 finished partition 10, so, how does TSM3 know about the finished partition 10?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't need to know, and Spark will just waste resource to run unnecessary tasks. The cluster will not crush.

That's why I said

After this PR, #21131 becomes a pure optimization, to avoid launching unnecessary tasks. #22806 and #23871 are still valuable to improve this optimization.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If TSM here is for a result stage, so when TSM2 finished partition 10 which commited output to HDFS, TSM3 would throw TaskCommitDeniedException due to launch task for partition 10. And I think this is what #22806 and #23871 try to fix.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR focuses on fixing the potential occurrence of java.lang.IllegalStateException: more than one active taskSet for stage, which is described in https://issues.apache.org/jira/browse/SPARK-23433 .

https://issues.apache.org/jira/browse/SPARK-25250 remains unfixed and will be addressed in #22806 or #23871 .

Note that, SPARK-23433 can crush the cluster, even #22806 or #23871 can fix it as well, we need a simple fix and backport to 2.3/2.4.

SPARK-25250 is just a matter of wasting resource, we can keep the fix in master only.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's make sense and the Update.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep it makes sense to fix the issue that this PR addresses alongwith the other PR's for SPARK-25250.

stageTaskSets.foreach { case (_, ts) =>
ts.isZombie = true
}
stageTaskSets(taskSet.stageAttemptId) = manager
schedulableBuilder.addTaskSetManager(manager, manager.taskSet.properties)

if (!isLocal && !hasReceivedTask) {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -201,28 +201,39 @@ class TaskSchedulerImplSuite extends SparkFunSuite with LocalSparkContext with B
// Even if one of the task sets has not-serializable tasks, the other task set should
// still be processed without error
taskScheduler.submitTasks(FakeTask.createTaskSet(1))
taskScheduler.submitTasks(taskSet)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can't have 2 active task set managers at the same time.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we shall just give it another stageId ?

val taskSet2 = new TaskSet(
Array(new NotSerializableFakeTask(1, 0), new NotSerializableFakeTask(0, 1)), 1, 0, 0, null)
taskScheduler.submitTasks(taskSet2)
taskDescriptions = taskScheduler.resourceOffers(multiCoreWorkerOffers).flatten
assert(taskDescriptions.map(_.executorId) === Seq("executor0"))
}

test("refuse to schedule concurrent attempts for the same stage (SPARK-8103)") {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this part of code is reverted in this PR, so remove the test as well

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is fine, but do we also want to add a test case to ensure the new behavior will not break ?

test("concurrent attempts for the same stage only have one active taskset") {
val taskScheduler = setupScheduler()
def isTasksetZombie(taskset: TaskSet): Boolean = {
taskScheduler.taskSetManagerForAttempt(taskset.stageId, taskset.stageAttemptId).get.isZombie
}

val attempt1 = FakeTask.createTaskSet(1, 0)
val attempt2 = FakeTask.createTaskSet(1, 1)
taskScheduler.submitTasks(attempt1)
intercept[IllegalStateException] { taskScheduler.submitTasks(attempt2) }
// The first submitted taskset is active
assert(!isTasksetZombie(attempt1))

// OK to submit multiple if previous attempts are all zombie
taskScheduler.taskSetManagerForAttempt(attempt1.stageId, attempt1.stageAttemptId)
.get.isZombie = true
val attempt2 = FakeTask.createTaskSet(1, 1)
taskScheduler.submitTasks(attempt2)
// The first submitted taskset is zombie now
assert(isTasksetZombie(attempt1))
// The newly submitted taskset is active
assert(!isTasksetZombie(attempt2))

val attempt3 = FakeTask.createTaskSet(1, 2)
intercept[IllegalStateException] { taskScheduler.submitTasks(attempt3) }
taskScheduler.taskSetManagerForAttempt(attempt2.stageId, attempt2.stageAttemptId)
.get.isZombie = true
taskScheduler.submitTasks(attempt3)
assert(!failedTaskSet)
// The first submitted taskset remains zombie
assert(isTasksetZombie(attempt1))
// The second submitted taskset is zombie now
assert(isTasksetZombie(attempt2))
// The newly submitted taskset is active
assert(!isTasksetZombie(attempt3))
}

test("don't schedule more tasks after a taskset is zombie") {
Expand Down