Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[manager/orchestrator/reaper] Fix the condition used for skipping over running tasks #2677

Merged
merged 1 commit into from
Aug 1, 2018

Conversation

anshulpundir
Copy link
Contributor

Addresses the following from #2672 (comment):

The previous logic for skipping over running tasks in tick() was:

if desired=running AND state <= running then don't delete else delete

For example, if a task is (desired=complete, state=running) then this code will delete it from SwarmKit, causing SwarmKit to believe that its resources are no longer in use, which is not correct.

This fixes the logic to ignore tasks which are running (including tasks which are desired to be shutdown), or which are desired to be running (desired state running).

@codecov
Copy link

codecov bot commented Jun 27, 2018

Codecov Report

Merging #2677 into master will increase coverage by 0.26%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##           master    #2677      +/-   ##
==========================================
+ Coverage   61.99%   62.25%   +0.26%     
==========================================
  Files         134      134              
  Lines       21771    21746      -25     
==========================================
+ Hits        13496    13539      +43     
+ Misses       6818     6750      -68     
  Partials     1457     1457

})
return nil
}))

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why did all of this get removed?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know why it was removed, but I'd also like to point out that 1 isn't a negative number...

					// set TaskHistoryRetentionLimit to a negative value, so
					// that tasks are cleaned up right away.
					TaskHistoryRetentionLimit: 1,

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cluster object is not needed for this test

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Non-blocking: I think it will default to this anyway, but just for good test hygiene, would it make sense to set taskReaper.taskHistory = 0 immediately after the task reaper is created, just like with your new test?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

taskReaper.taskHistory should be 0 by default ? The rest of the test requires it to be 1.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, it is definitely the default. I was just suggesting that setting it would be more explicit/easier to read/consistent style-wise as it is in the other test. Not important though.

@dperny
Copy link
Collaborator

dperny commented Jun 27, 2018

LGTM except comment on removed code.

Copy link
Contributor

@cyli cyli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

// Don't delete running tasks
// Ignore tasks which are running (including tasks which are desired to be shutdown),
// or which are desired to be running (desired state running).
if t.Status.State == api.TaskStateRunning || t.DesiredState <= api.TaskStateRunning {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about e.g. (state=starting, desired=shutdown)? We shouldn't reap that.

As I understand it, the reaper is only allowed to remove a task when:

  • state >= completed (resources freed), OR
  • state < assigned AND desired >= completed (resources will never be assigned on a node)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fair point. I think a task can still start even though it hasn't yet and even though it has been marked for shutdown, so it shouldn't be reaped in that case.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For readability, perhaps these should be split over two if-statements; it allows describing/documenting per condition, which may help understanding the logic)

if t.Status.State == api.TaskStateRunning {
	runningTasks++
	continue
}
if t.DesiredState <= api.TaskStateRunning {
	runningTasks++
	continue
}

or a switch, whatever looks best 👍

switch {
case t.Status.State == api.TaskStateRunning:
	runningTasks++
	continue

case  t.DesiredState <= api.TaskStateRunning:
	runningTasks++
	continue
}

// 1. The task has reached a terminal state i.e. actual state beyond TaskStateRunning.
// 2. The task has not yet become running and desired state is a terminal state i.e.
// actual state not yet TaskStateAssigned and desired state beyond TaskStateRunning.
if t.Status.State > api.TaskStateRunning ||
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is becoming hard to grasp (also a reason we now need a long comment to explain)

Perhaps we should split these up (may give some duplicated code) or extract the check to a function; e.g okToCleanup(task) bool

Inside that function we can do an early return for each check to reduce complexity

Copy link
Contributor Author

@anshulpundir anshulpundir Jul 27, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ahh fair point. I'll try to make it easier to read.

BTW the big comment is needed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the comment itself is useful, sorry, didn't meant to imply that, just that it's easy to make mistakes in these conditions; breaking them up makes the code easier to read, and less likely to make mistakes

Copy link
Member

@thaJeztah thaJeztah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for updating!

I can't merge (don't have permissions to merge in protected branches)

Copy link
Contributor

@cyli cyli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Non-blocking comments, LGTM though.

})
return nil
}))

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Non-blocking: I think it will default to this anyway, but just for good test hygiene, would it make sense to set taskReaper.taskHistory = 0 immediately after the task reaper is created, just like with your new test?

assert.Equal(t, "id1task3", deletedTask1.ID)
}

// desired = TaskStateRunning, actual = TaskStateNew
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Non-blocking: These all seem to be mostly repeated blocks of code. Would it make sense for this to just a range over the variable objects (desired state, actual state, cleaned up)?

for _, testcase := range []struct{
    desired, actual api.TaskState
    cleanedUp bool
} {
    {desired: api.TaskStateRunning, actual: api.TaskStateNew, cleanedUp: False},
    ...
} {
    testfunc(testcase.desired, testcase.actual)
    assert.Zero(t, len(taskReaper.dirty))
    if testcase.cleanedUp {
        waitForTaskDelete(api.TaskStateRunning, api.TaskStateCompleted)
    }
    s.View(func(tx store.ReadTx) {
        task := store.GetTask(tx, "id1task3")
        if testcase.cleanedUp {
            assert.Nil(t, task)
        } else {
            assert.NotNil(t, task)
        }
    }
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants