-
Notifications
You must be signed in to change notification settings - Fork 5.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[core] Streaming generator task waits for all object report acks before finishing the task #44092
[core] Streaming generator task waits for all object report acks before finishing the task #44092
Conversation
…re finishing the task (ray-project#44079) This fixes a bug in handling task (worker process) failures for streaming generator tasks. Streaming generator executors report the values of the dynamic returns to the caller asynchronously during the task execution. If a report is lost, then the caller will not be able to get the value of the yield'ed ObjectRef. Therefore, we need to wait until all in-flight object reports have been acked before we can finish the streaming generator task. Otherwise, it is possible for the caller to think that the task has finished, but then the executor dies before it can report the dynamic return object's value. Also cleans up some generator_waiter.cc headers, etc. Signed-off-by: Stephanie Wang <[email protected]>
@stephanie-wang we are pretty late into the release cycle. How critical is it tp idc this one into 2.10? |
We need it for Ray Data recovery. If we don't cherry-pick, I think we need to do a minor release at the very least. @c21 @raulchen can weigh in too. |
Yes, let's cherry pick the PR. |
yes, it's a critical stability fix |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@khluu Let's pick but be careful about possible issues (e.g. release test failures)
e032d7c
into
ray-project:releases/2.10.0
merged thanks all |
Why are these changes needed?
Cherry-pick #44079, fixes for #43914 and #43978.
Checks
git commit -s
) in this PR.scripts/format.sh
to lint the changes in this PR.method in Tune, I've added it in
doc/source/tune/api/
under thecorresponding
.rst
file.