Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle "cleared when running" more gracefully #16680

Closed
yuqian90 opened this issue Jun 27, 2021 · 0 comments · Fixed by #16681
Closed

Handle "cleared when running" more gracefully #16680

yuqian90 opened this issue Jun 27, 2021 · 0 comments · Fixed by #16681
Assignees
Labels
kind:feature Feature Requests

Comments

@yuqian90
Copy link
Contributor

yuqian90 commented Jun 27, 2021

Description
There are many scenarios a user may clear running TaskInstances. The current behavior is to set the TaskInstance state to SHUTDOWN and then SIGTERM is sent to the task, causing it to fail and its on_failure_callback is called. This can be noisy. It usually makes more sense to silently clear + terminate the running instance and retry. Here are some example scenarios:

  1. A long running task needs to pick up a code change. The user makes the change and clears the task. The user probably doesn't want on_failure_callback to be called when he clears it. He just wants the task to be restarted with his code change, gracefully.
  2. A task failed for some external reason. The user fixed the underlying issue and cleared the failed task to retry. Soon after he cleared it, he realizes that the fix he introduced is not good enough so he introduced another fix. Then he cleared the task again while it's still running. This kills the task and makes it fail. A better behavior is to silently kill the task and retry gracefully.

The point I'm making is that clearing a running TaskInstance is different from marking a running TaskInstance failed. At the moment, both operations do the same thing: the TaskInstance is first set to SHUTDOWN and then FAILED.

One suggestion is to introduce a new State called CLEARED_WHEN_RUNNING. As the name suggests, a TaskInstance should be set to this state when it's cleared while running. Most of the places can handle this state the same way SHUTDOWN is handled, except in TaskInstance.is_eligible_to_retry, where it should always be treated as eligible for retry.

Are you willing to submit a PR?

Yes!

@yuqian90 yuqian90 added the kind:feature Feature Requests label Jun 27, 2021
@yuqian90 yuqian90 self-assigned this Jun 27, 2021
@yuqian90 yuqian90 changed the title Handled "cleared when running" more gracefully Handle "cleared when running" more gracefully Jun 27, 2021
yuqian90 added a commit that referenced this issue Jul 31, 2021
closes: #16680

This PR makes sure that when a user clears a running task, the task does not fail. Instead it is killed and retried gracefully.

This is done by introducing a new State called RESTARTING. As the name suggests, a TaskInstance is set to this state when it's cleared while running. Most of the places handles RESTARTING the same way SHUTDOWN is handled, except in TaskInstance.is_eligible_to_retry, where it is always be treated as eligible for retry.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind:feature Feature Requests
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant