[manager/orchestrator/reaper] Fix the condition used for skipping over running tasks #2677

anshulpundir · 2018-06-27T21:43:42Z

Addresses the following from #2672 (comment):

The previous logic for skipping over running tasks in tick() was:

if desired=running AND state <= running then don't delete else delete

For example, if a task is (desired=complete, state=running) then this code will delete it from SwarmKit, causing SwarmKit to believe that its resources are no longer in use, which is not correct.

This fixes the logic to ignore tasks which are running (including tasks which are desired to be shutdown), or which are desired to be running (desired state running).

codecov · 2018-06-27T21:55:38Z

Codecov Report

Merging #2677 into master will increase coverage by 0.26%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##           master    #2677      +/-   ##
==========================================
+ Coverage   61.99%   62.25%   +0.26%     
==========================================
  Files         134      134              
  Lines       21771    21746      -25     
==========================================
+ Hits        13496    13539      +43     
+ Misses       6818     6750      -68     
  Partials     1457     1457

dperny · 2018-06-27T22:14:54Z

manager/orchestrator/taskreaper/task_reaper_test.go

-		})
-		return nil
-	}))
-


why did all of this get removed?

I don't know why it was removed, but I'd also like to point out that 1 isn't a negative number...

// set TaskHistoryRetentionLimit to a negative value, so // that tasks are cleaned up right away. TaskHistoryRetentionLimit: 1,

cluster object is not needed for this test

Non-blocking: I think it will default to this anyway, but just for good test hygiene, would it make sense to set taskReaper.taskHistory = 0 immediately after the task reaper is created, just like with your new test?

taskReaper.taskHistory should be 0 by default ? The rest of the test requires it to be 1.

Yep, it is definitely the default. I was just suggesting that setting it would be more explicit/easier to read/consistent style-wise as it is in the other test. Not important though.

dperny · 2018-06-27T22:15:48Z

LGTM except comment on removed code.

cyli

LGTM

talex5 · 2018-06-28T09:11:03Z

manager/orchestrator/taskreaper/task_reaper.go

-					// Don't delete running tasks
+				// Ignore tasks which are running (including tasks which are desired to be shutdown),
+				// or which are desired to be running (desired state running).
+				if t.Status.State == api.TaskStateRunning || t.DesiredState <= api.TaskStateRunning {


What about e.g. (state=starting, desired=shutdown)? We shouldn't reap that.

As I understand it, the reaper is only allowed to remove a task when:

state >= completed (resources freed), OR

state < assigned AND desired >= completed (resources will never be assigned on a node)

fair point. I think a task can still start even though it hasn't yet and even though it has been marked for shutdown, so it shouldn't be reaped in that case.

For readability, perhaps these should be split over two if-statements; it allows describing/documenting per condition, which may help understanding the logic)

if t.Status.State == api.TaskStateRunning { runningTasks++ continue } if t.DesiredState <= api.TaskStateRunning { runningTasks++ continue }

or a switch, whatever looks best 👍

switch { case t.Status.State == api.TaskStateRunning: runningTasks++ continue case t.DesiredState <= api.TaskStateRunning: runningTasks++ continue }

thaJeztah · 2018-07-27T19:09:13Z

manager/orchestrator/taskreaper/task_reaper.go

+				// 1. The task has reached a terminal state i.e. actual state beyond TaskStateRunning.
+				// 2. The task has not yet become running and desired state is a terminal state i.e.
+				// actual state not yet TaskStateAssigned and desired state beyond TaskStateRunning.
+				if t.Status.State > api.TaskStateRunning ||


This is becoming hard to grasp (also a reason we now need a long comment to explain)

Perhaps we should split these up (may give some duplicated code) or extract the check to a function; e.g okToCleanup(task) bool

Inside that function we can do an early return for each check to reduce complexity

ahh fair point. I'll try to make it easier to read.

BTW the big comment is needed.

Yes, the comment itself is useful, sorry, didn't meant to imply that, just that it's easy to make mistakes in these conditions; breaking them up makes the code easier to read, and less likely to make mistakes

thaJeztah

LGTM, thanks for updating!

I can't merge (don't have permissions to merge in protected branches)

cyli

Non-blocking comments, LGTM though.

cyli · 2018-07-30T23:06:48Z

manager/orchestrator/taskreaper/task_reaper_test.go

-		})
-		return nil
-	}))
-


Non-blocking: I think it will default to this anyway, but just for good test hygiene, would it make sense to set taskReaper.taskHistory = 0 immediately after the task reaper is created, just like with your new test?

cyli · 2018-07-30T23:19:13Z

manager/orchestrator/taskreaper/task_reaper_test.go

+		assert.Equal(t, "id1task3", deletedTask1.ID)
+	}
+
+	// desired = TaskStateRunning, actual = TaskStateNew


Non-blocking: These all seem to be mostly repeated blocks of code. Would it make sense for this to just a range over the variable objects (desired state, actual state, cleaned up)?

for _, testcase := range []struct{ desired, actual api.TaskState cleanedUp bool } { {desired: api.TaskStateRunning, actual: api.TaskStateNew, cleanedUp: False}, ... } { testfunc(testcase.desired, testcase.actual) assert.Zero(t, len(taskReaper.dirty)) if testcase.cleanedUp { waitForTaskDelete(api.TaskStateRunning, api.TaskStateCompleted) } s.View(func(tx store.ReadTx) { task := store.GetTask(tx, "id1task3") if testcase.cleanedUp { assert.Nil(t, task) } else { assert.NotNil(t, task) } } }

…r running tasks. Signed-off-by: Anshul Pundir <[email protected]>

anshulpundir requested review from cyli, talex5, dperny and MagnusS June 27, 2018 21:43

dperny reviewed Jun 27, 2018

View reviewed changes

cyli approved these changes Jun 27, 2018

View reviewed changes

talex5 reviewed Jun 28, 2018

View reviewed changes

anshulpundir force-pushed the reaper branch from 8fc2e5e to fcb6519 Compare July 26, 2018 23:21

anshulpundir mentioned this pull request Jul 26, 2018

Services in global mode do not reschedule when stopped #2705

Closed

thaJeztah reviewed Jul 27, 2018

View reviewed changes

anshulpundir force-pushed the reaper branch from fcb6519 to 00b11eb Compare July 27, 2018 22:07

thaJeztah approved these changes Jul 28, 2018

View reviewed changes

cyli approved these changes Jul 30, 2018

View reviewed changes

[manager/orchestrator/reaper] Fix the condition used for skipping ove…

8c5d353

…r running tasks. Signed-off-by: Anshul Pundir <[email protected]>

anshulpundir force-pushed the reaper branch from 00b11eb to 8c5d353 Compare July 31, 2018 23:40

cyli merged commit 9f35cb5 into moby:master Aug 1, 2018

This was referenced Aug 3, 2018

Bump SwarmKit to 8852e8840e30d69db0b39a4a3d6447362e17c64f moby/moby#37586

Merged

[18.06] Bump SwarmKit to 8852e88 docker-archive/engine#32

Merged

[18.03] [manager/orchestrator/reaper] Fix the condition used for skipping over running tasks. #2724

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[manager/orchestrator/reaper] Fix the condition used for skipping over running tasks #2677

[manager/orchestrator/reaper] Fix the condition used for skipping over running tasks #2677

anshulpundir commented Jun 27, 2018

codecov bot commented Jun 27, 2018 •

edited

Loading

dperny Jun 27, 2018

talex5 Jun 28, 2018

anshulpundir Jun 29, 2018

cyli Jul 30, 2018

anshulpundir Jul 31, 2018

cyli Aug 1, 2018

dperny commented Jun 27, 2018

cyli left a comment

talex5 Jun 28, 2018

anshulpundir Jun 29, 2018

thaJeztah Jul 19, 2018

thaJeztah Jul 27, 2018

anshulpundir Jul 27, 2018 •

edited

Loading

thaJeztah Jul 28, 2018

thaJeztah left a comment

cyli left a comment

cyli Jul 30, 2018

cyli Jul 30, 2018

[manager/orchestrator/reaper] Fix the condition used for skipping over running tasks #2677

[manager/orchestrator/reaper] Fix the condition used for skipping over running tasks #2677

Conversation

anshulpundir commented Jun 27, 2018

codecov bot commented Jun 27, 2018 • edited Loading

Codecov Report

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dperny commented Jun 27, 2018

cyli left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

anshulpundir Jul 27, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

thaJeztah left a comment

Choose a reason for hiding this comment

cyli left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

codecov bot commented Jun 27, 2018 •

edited

Loading

anshulpundir Jul 27, 2018 •

edited

Loading