-
Notifications
You must be signed in to change notification settings - Fork 615
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[manager/orchestrator/reaper] Fix the condition used for skipping over running tasks #2677
Conversation
Codecov Report
@@ Coverage Diff @@
## master #2677 +/- ##
==========================================
+ Coverage 61.99% 62.25% +0.26%
==========================================
Files 134 134
Lines 21771 21746 -25
==========================================
+ Hits 13496 13539 +43
+ Misses 6818 6750 -68
Partials 1457 1457 |
}) | ||
return nil | ||
})) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why did all of this get removed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know why it was removed, but I'd also like to point out that 1
isn't a negative number...
// set TaskHistoryRetentionLimit to a negative value, so
// that tasks are cleaned up right away.
TaskHistoryRetentionLimit: 1,
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cluster object is not needed for this test
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Non-blocking: I think it will default to this anyway, but just for good test hygiene, would it make sense to set taskReaper.taskHistory = 0
immediately after the task reaper is created, just like with your new test?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
taskReaper.taskHistory should be 0 by default ? The rest of the test requires it to be 1.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, it is definitely the default. I was just suggesting that setting it would be more explicit/easier to read/consistent style-wise as it is in the other test. Not important though.
LGTM except comment on removed code. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
// Don't delete running tasks | ||
// Ignore tasks which are running (including tasks which are desired to be shutdown), | ||
// or which are desired to be running (desired state running). | ||
if t.Status.State == api.TaskStateRunning || t.DesiredState <= api.TaskStateRunning { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about e.g. (state=starting, desired=shutdown)? We shouldn't reap that.
As I understand it, the reaper is only allowed to remove a task when:
state >= completed
(resources freed), ORstate < assigned
ANDdesired >= completed
(resources will never be assigned on a node)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fair point. I think a task can still start even though it hasn't yet and even though it has been marked for shutdown, so it shouldn't be reaped in that case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For readability, perhaps these should be split over two if-statements; it allows describing/documenting per condition, which may help understanding the logic)
if t.Status.State == api.TaskStateRunning {
runningTasks++
continue
}
if t.DesiredState <= api.TaskStateRunning {
runningTasks++
continue
}
or a switch
, whatever looks best 👍
switch {
case t.Status.State == api.TaskStateRunning:
runningTasks++
continue
case t.DesiredState <= api.TaskStateRunning:
runningTasks++
continue
}
// 1. The task has reached a terminal state i.e. actual state beyond TaskStateRunning. | ||
// 2. The task has not yet become running and desired state is a terminal state i.e. | ||
// actual state not yet TaskStateAssigned and desired state beyond TaskStateRunning. | ||
if t.Status.State > api.TaskStateRunning || |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is becoming hard to grasp (also a reason we now need a long comment to explain)
Perhaps we should split these up (may give some duplicated code) or extract the check to a function; e.g okToCleanup(task) bool
Inside that function we can do an early return for each check to reduce complexity
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ahh fair point. I'll try to make it easier to read.
BTW the big comment is needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the comment itself is useful, sorry, didn't meant to imply that, just that it's easy to make mistakes in these conditions; breaking them up makes the code easier to read, and less likely to make mistakes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks for updating!
I can't merge (don't have permissions to merge in protected branches)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Non-blocking comments, LGTM though.
}) | ||
return nil | ||
})) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Non-blocking: I think it will default to this anyway, but just for good test hygiene, would it make sense to set taskReaper.taskHistory = 0
immediately after the task reaper is created, just like with your new test?
assert.Equal(t, "id1task3", deletedTask1.ID) | ||
} | ||
|
||
// desired = TaskStateRunning, actual = TaskStateNew |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Non-blocking: These all seem to be mostly repeated blocks of code. Would it make sense for this to just a range over the variable objects (desired state, actual state, cleaned up)?
for _, testcase := range []struct{
desired, actual api.TaskState
cleanedUp bool
} {
{desired: api.TaskStateRunning, actual: api.TaskStateNew, cleanedUp: False},
...
} {
testfunc(testcase.desired, testcase.actual)
assert.Zero(t, len(taskReaper.dirty))
if testcase.cleanedUp {
waitForTaskDelete(api.TaskStateRunning, api.TaskStateCompleted)
}
s.View(func(tx store.ReadTx) {
task := store.GetTask(tx, "id1task3")
if testcase.cleanedUp {
assert.Nil(t, task)
} else {
assert.NotNil(t, task)
}
}
}
…r running tasks. Signed-off-by: Anshul Pundir <[email protected]>
Addresses the following from #2672 (comment):
The previous logic for skipping over running tasks in tick() was:
if desired=running AND state <= running then don't delete else delete
For example, if a task is (desired=complete, state=running) then this code will delete it from SwarmKit, causing SwarmKit to believe that its resources are no longer in use, which is not correct.
This fixes the logic to ignore tasks which are running (including tasks which are desired to be shutdown), or which are desired to be running (desired state running).