[core] Streaming generator executor waits for item report to complete before continuing #44257

stephanie-wang · 2024-03-23T20:58:02Z

Why are these changes needed?

#42260 updated streaming generator tasks to asynchronously report generator returns, instead of synchronously reporting each generator return before yielding the next return. However this has a couple problems:

If the task still has a reference to the yielded value, it may modify the value. The serialized and reported return will then have a different value than expected.
As per [core] Streaming generator task waits for all object report acks before finishing the task #44079, we need to track the number of in-flight RPCs to report generator returns, so that we can wait for them all to reply before we return from the end of the task. If we increment the count of in-flight RPCs asynchronously, we can end up returning from the task while there are still in-flight RPCs.

So this PR reverts some of the logic in #42260 to wait for the generator return to be serialized into the protobuf sent back to the caller. Note that we do not wait for the reply (unless under backpressure).

We can later re-introduce asynchronous generator reports, but we will need to evaluate the performance benefit of a new implementation that also addresses both of the above points.

Checks

I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
I've run scripts/format.sh to lint the changes in this PR.
I've included any doc changes needed for https://docs.ray.io/en/master/.
- I've added any new APIs to the API Reference. For example, if I added a
  method in Tune, I've added it in doc/source/tune/api/ under the
  corresponding .rst file.
I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
Testing Strategy
- Unit tests
- Release tests
- This PR is not tested :(

Signed-off-by: Stephanie Wang <[email protected]>

alexeykudinkin · 2024-03-25T18:27:42Z

python/ray/_raylet.pyx

+                # we do not wait for the reply here (unless we are under
+                # backpressure).


Please clarify that back-pressure doesn't apply to RPC Actor calls (only Ray Data)

Signed-off-by: Stephanie Wang <[email protected]>

…44197) When a streaming generator task is cancelled, we should mark the end of the stream at the caller's current pointer into the stream. Otherwise, if we receive out-of-order item reports, we may end up hanging, because the cancelled task will never report the intermediate items. We may end up dropping some values that were already reported, but this is OK since the user already cancelled the task. Closes #43852. Note: #44257 is also enough to make the relevant flaky test stable, probably because it makes it less likely to produce out-of-order item reports. Meanwhile, this PR addresses the root cause of the flaky test, i.e. out-of-order item reports during task cancellation. --------- Signed-off-by: Stephanie Wang <[email protected]>

…ay-project#44197) When a streaming generator task is cancelled, we should mark the end of the stream at the caller's current pointer into the stream. Otherwise, if we receive out-of-order item reports, we may end up hanging, because the cancelled task will never report the intermediate items. We may end up dropping some values that were already reported, but this is OK since the user already cancelled the task. Closes ray-project#43852. Note: ray-project#44257 is also enough to make the relevant flaky test stable, probably because it makes it less likely to produce out-of-order item reports. Meanwhile, this PR addresses the root cause of the flaky test, i.e. out-of-order item reports during task cancellation. --------- Signed-off-by: Stephanie Wang <[email protected]>

… before continuing (ray-project#44257) ray-project#42260 updated streaming generator tasks to asynchronously report generator returns, instead of synchronously reporting each generator return before yielding the next return. However this has a couple problems: If the task still has a reference to the yielded value, it may modify the value. The serialized and reported return will then have a different value than expected. As per [core] Streaming generator task waits for all object report acks before finishing the task ray-project#44079, we need to track the number of in-flight RPCs to report generator returns, so that we can wait for them all to reply before we return from the end of the task. If we increment the count of in-flight RPCs asynchronously, we can end up returning from the task while there are still in-flight RPCs. So this PR reverts some of the logic in ray-project#42260 to wait for the generator return to be serialized into the protobuf sent back to the caller. Note that we do not wait for the reply (unless under backpressure). We can later re-introduce asynchronous generator reports, but we will need to evaluate the performance benefit of a new implementation that also addresses both of the above points. --------- Signed-off-by: Stephanie Wang <[email protected]>

Wait for async report_streaming_generator_output to finish

4fd1646

Signed-off-by: Stephanie Wang <[email protected]>

stephanie-wang mentioned this pull request Mar 23, 2024

[core] Mark end of streaming generator task when task is cancelled #44197

Merged

8 tasks

stephanie-wang assigned alexeykudinkin and rkooo567 Mar 23, 2024

alexeykudinkin approved these changes Mar 25, 2024

View reviewed changes

comment

0e4b14c

Signed-off-by: Stephanie Wang <[email protected]>

stephanie-wang assigned stephanie-wang and unassigned alexeykudinkin and rkooo567 Mar 25, 2024

Merge branch 'master' into revert-42260

5258c28

stephanie-wang merged commit 11506ca into ray-project:master Mar 26, 2024
5 checks passed

stephanie-wang deleted the revert-42260 branch March 26, 2024 02:05

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[core] Streaming generator executor waits for item report to complete before continuing #44257

[core] Streaming generator executor waits for item report to complete before continuing #44257

stephanie-wang commented Mar 23, 2024 •

edited

Loading

alexeykudinkin Mar 25, 2024

		# we do not wait for the reply here (unless we are under
		# backpressure).

[core] Streaming generator executor waits for item report to complete before continuing #44257

[core] Streaming generator executor waits for item report to complete before continuing #44257

Conversation

stephanie-wang commented Mar 23, 2024 • edited Loading

Why are these changes needed?

Checks

alexeykudinkin Mar 25, 2024

Choose a reason for hiding this comment

stephanie-wang commented Mar 23, 2024 •

edited

Loading