-
Notifications
You must be signed in to change notification settings - Fork 5.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Workflow] Simplify recovery algorithm #24594
Conversation
@suquark could you add more details in description area about what you did here? |
Also, does this also support cancellation if we store the execution queue in the workflow manager? |
@iycheng what kind of cancellation you want? If you would like to cancel the "construct the workflow for recovery" process, then yes this PR makes it possible; but it does not directly enables cancelling of a "already recovering" workflow. However, this PR would still be a necessary starting point for fully supporting workflow cancelling. Later I would also refactor the workflow execution code like this, and they would share a few code. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. I think we probably should pass the execution_queue to the execution engine with what we'll be able to have better support of wait/cancel/get and even suspension (workflow event)
The CI errors are not related. I'll merge it. |
Why are these changes needed?
The workflow recovery algorithm was originally implemented with recursions, this makes it harder to check, maintain and optimize, because you only debug variables in the current stack frames, and there are potential risks of stack overflow due to recursion. This also results in duplicated code between
_reconstruct_wait_step
and_construct_resume_workflow_from_step
.This PR simplifies the workflow recovery algorithm. It's now easier to check, maintain and optimize. The recovery algorithm is divided into 3 steps in a clean way:
Checks
scripts/format.sh
to lint the changes in this PR.