Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

orchestrator: Flag tasks that shouldn't be restarted #2327

Closed
wants to merge 1 commit into from

Conversation

aaronlehmann
Copy link
Collaborator

Previously, restart conditions other than "OnAny" were honored on a
best-effort basis. A service-level reconciliation, for example after a
leader election, would see that not enough tasks were running, and start
replacement tasks regardless of the restart policy. This limited the
usefulness of the other restart conditions.

This change adds a DontRestart flag to Task. It can be set by the
restart supervisor when it shuts down a task and decides not to start a
replacement task. The orchestrators look for the presence of this flag
and honor it when doing service-level reconciliation. If the flag is
set, the dead task is passed to the updater along with the running
tasks, so the updater can start a replacement if and only if the service
definition has changed relative to the dead task.

The task reaper has been modified so it will never delete the last task
in a slot, if that task has the DontRestart flag set.

Fixes #932
Counterproposal to #2290, informed by corner cases encountered there

cc @cyli @aluzzardi

@codecov
Copy link

codecov bot commented Jul 21, 2017

Codecov Report

Merging #2327 into master will decrease coverage by 0.06%.
The diff coverage is 9.43%.

@@            Coverage Diff             @@
##           master    #2327      +/-   ##
==========================================
- Coverage   60.31%   60.24%   -0.07%     
==========================================
  Files         128      128              
  Lines       26002    26032      +30     
==========================================
  Hits        15683    15683              
- Misses       8929     8959      +30     
  Partials     1390     1390

@diogomonica
Copy link
Contributor

High-level seems like a decent solution. Following #2290 I don't see a better alternative, but I'm also no longer super familiar with this portion of the codebase.

}

service.UpdateStatus.State = api.UpdateStatus_ROLLBACK_STARTED
service.UpdateStatus.Message = message
err = batch.Update(func(tx store.Tx) error {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Non-blocking nitpick: maybe just return batch.Update here?

Copy link
Contributor

@cyli cyli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As someone not very familiar with the intricacies of the orchestrator, reaper, updater, etc., this solution makes more sense to me and is easier to understand.

It seems fine to me to rely on the task history, for --task-history-limit 0 to be a best effort sort of thing (maybe if that's specified, we can just hide the task history in the API but actually still keep it around)?

Previously, restart conditions other than "OnAny" were honored on a
best-effort basis. A service-level reconciliation, for example after a
leader election, would see that not enough tasks were running, and start
replacement tasks regardless of the restart policy. This limited the
usefulness of the other restart conditions.

This change adds a DontRestart flag to Task. It can be set by the
restart supervisor when it shuts down a task and decides not to start a
replacement task. The orchestrators look for the presence of this flag
and honor it when doing service-level reconciliation. If the flag is
set, the dead task is passed to the updater along with the running
tasks, so the updater can start a replacement if and only if the service
definition has changed relative to the dead task.

The task reaper has been modified so it will never delete the last task
in a slot, if that task has the DontRestart flag set.

Signed-off-by: Aaron Lehmann <[email protected]>
@aaronlehmann
Copy link
Collaborator Author

Superseded by #2332

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants