taskreaper: Wait for tasks to stop running #1948

aaronlehmann · 2017-02-10T21:56:07Z

Currently, the task reaper only waits for a task's desired state to go
past "running" before deleting it. This can cause problems with rolling
updates when task-history-limit is set to 1. The old task can get
deleted by the reaper before the agent has a chance to update its actual
state to "shutdown", so the update never sees that the task has
completed, and has to wait awhile for a timeout.

This fixes it by making the task reaper only delete tasks whose desired
and actual state has moved past "running". It's also necessary to keep
slots in the "dirty" list until there is only one task in that slot with
desired state or actual state <= "running", so that the old task still
gets cleaned up once the actual state moves. Finally, "deleteTasks" is
changed to a map so that a task which is both part of a dirty slot and
orphaned won't cause two delete attempts (one of which would fail).

Note that this means tasks on unavailable nodes will stay around for
awhile, until the "orphaned" state is reached.

Tested by vendoring into docker and using the repro steps from moby/moby#28291.

cc @aluzzardi @dongluochen

Currently, the task reaper only waits for a task's desired state to go past "running" before deleting it. This can cause problems with rolling updates when task-history-limit is set to 1. The old task can get deleted by the reaper before the agent has a chance to update its actual state to "shutdown", so the update never sees that the task has completed, and has to wait awhile for a timeout. This fixes it by making the task reaper only delete tasks whose desired *and* actual state has moved past "running". It's also necessary to keep slots in the "dirty" list until there is only one task in that slot with desired state or actual state <= "running", so that the old task still gets cleaned up once the actual state moves. Finally, "deleteTasks" is changed to a map so that a task which is both part of a dirty slot and orphaned won't cause two delete attempts (one of which would fail). Note that this means tasks on unavailable nodes will stay around for awhile, until the "orphaned" state is reached. Signed-off-by: Aaron Lehmann <[email protected]>

dongluochen · 2017-02-11T00:31:51Z

Flaky test?

--- FAIL: TestDemoteToSingleManager (89.78s)
	Error Trace:	integration_test.go:118
			integration_test.go:351
	Error:		Received unexpected error worker node mrqdmyhbrpofzeljref7wt4v6 should not have manager status, returned &ManagerStatus{RaftID:8991989376972981745,Addr:127.0.0.1:33826,Leader:false,Reachability:UNREACHABLE,}
			github.com/docker/swarmkit/manager/state/raft/testutils.PollFuncWithTimeout
				/home/ubuntu/.go_workspace/src/github.com/docker/swarmkit/manager/state/raft/testutils/testutils.go:76: polling failed

aaronlehmann · 2017-02-11T00:32:44Z

Yeah, definitely unrelated.

#1939 might help.

codecov-io · 2017-02-11T00:42:05Z

Codecov Report

Merging #1948 into master will increase coverage by 0.46%.

@@            Coverage Diff             @@
##           master    #1948      +/-   ##
==========================================
+ Coverage   54.05%   54.51%   +0.46%     
==========================================
  Files         108      108              
  Lines       18547    18548       +1     
==========================================
+ Hits        10026    10112      +86     
+ Misses       7289     7198      -91     
- Partials     1232     1238       +6

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update ccdf5e0...2d575ef. Read the comment docs.

dongluochen · 2017-02-11T00:48:34Z

LGTM

aluzzardi · 2017-02-11T02:25:43Z

LGTM

aaronlehmann added this to the 1.13.2 milestone Feb 10, 2017

aaronlehmann mentioned this pull request Feb 10, 2017

Took long time for Docker Swarm service turn desired state from Ready to Running moby/moby#28291

Closed

aaronlehmann added the process/cherry-pick label Feb 11, 2017

aaronlehmann merged commit c7a2fb9 into moby:master Feb 11, 2017

aaronlehmann deleted the taskreaper-keep-running-tasks branch February 11, 2017 02:32

aaronlehmann added process/cherry-picked and removed process/cherry-pick labels Feb 15, 2017

aaronlehmann mentioned this pull request Feb 21, 2017

update CHANGELOG for 17.03.0-ce moby/moby#31205

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

taskreaper: Wait for tasks to stop running #1948

taskreaper: Wait for tasks to stop running #1948

aaronlehmann commented Feb 10, 2017

dongluochen commented Feb 11, 2017

aaronlehmann commented Feb 11, 2017

codecov-io commented Feb 11, 2017 •

edited

Loading

dongluochen commented Feb 11, 2017

aluzzardi commented Feb 11, 2017

taskreaper: Wait for tasks to stop running #1948

taskreaper: Wait for tasks to stop running #1948

Conversation

aaronlehmann commented Feb 10, 2017

dongluochen commented Feb 11, 2017

aaronlehmann commented Feb 11, 2017

codecov-io commented Feb 11, 2017 • edited Loading

Codecov Report

dongluochen commented Feb 11, 2017

aluzzardi commented Feb 11, 2017

codecov-io commented Feb 11, 2017 •

edited

Loading