Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CI test linux://python/ray/tests:test_streaming_generator_regression is flaky #43852

Closed
can-anyscale opened this issue Mar 9, 2024 · 5 comments · Fixed by #44197
Closed

CI test linux://python/ray/tests:test_streaming_generator_regression is flaky #43852

can-anyscale opened this issue Mar 9, 2024 · 5 comments · Fixed by #44197
Assignees
Labels
bug Something that is supposed to be working; but isn't ci-test core Issues that should be addressed in Ray Core flaky-tracker Issue created via Flaky Test Tracker https://flaky-tests.ray.io/ P0 Issues that should be fixed in short order ray-test-bot Issues managed by OSS test policy stability weekly-release-blocker Issues that will be blocking Ray weekly releases

Comments

@can-anyscale
Copy link
Collaborator

CI test linux://python/ray/tests:test_streaming_generator_regression is flaky. Recent failures:
- https://buildkite.com/ray-project/postmerge/builds/3402#018e20bd-8ad0-4db6-86a9-ea7eeb68d97e

DataCaseName-linux://python/ray/tests:test_streaming_generator_regression-END
Managed by OSS Test Policy

@can-anyscale can-anyscale added bug Something that is supposed to be working; but isn't ci-test core Issues that should be addressed in Ray Core flaky-tracker Issue created via Flaky Test Tracker https://flaky-tests.ray.io/ ray-test-bot Issues managed by OSS test policy stability triage Needs triage (eg: priority, bug/not-bug, and owning component) weekly-release-blocker Issues that will be blocking Ray weekly releases labels Mar 9, 2024
@jjyao jjyao added P0 Issues that should be fixed in short order and removed triage Needs triage (eg: priority, bug/not-bug, and owning component) labels Mar 11, 2024
@can-anyscale
Copy link
Collaborator Author

This one was opened due to a bug in CI (it fails only one so it's not that flaky), let's close to see if it happens again

@can-anyscale
Copy link
Collaborator Author

@can-anyscale can-anyscale reopened this Mar 15, 2024
@stephanie-wang stephanie-wang self-assigned this Mar 18, 2024
@alexeykudinkin alexeykudinkin self-assigned this Mar 21, 2024
@alexeykudinkin
Copy link
Contributor

alexeykudinkin commented Mar 21, 2024

@can-anyscale test is failing b/c of filename:

[Errno 36] File name too long: '/artifact-mount/failed_test_logs/bazel-out::k8-opt::bin::python::ray::tests::test_streaming_generator_regression.runfiles::com_github_ray_project_ray::python::ray::tests::test_streaming_generator_regression.py::test_segfault_report_streaming_generator_output[ray_start_cluster0-0]_1709949220.7908.zip'

I don't think it's hanging at all

--- UPDATED

I stand corrected. It indeed failed to upload artifacts due to the length of the test name, but the test indeed fails (it's hanging)

stephanie-wang added a commit that referenced this issue Mar 26, 2024
…44197)

When a streaming generator task is cancelled, we should mark the end of the stream at the caller's current pointer into the stream. Otherwise, if we receive out-of-order item reports, we may end up hanging, because the cancelled task will never report the intermediate items. We may end up dropping some values that were already reported, but this is OK since the user already cancelled the task.

Closes #43852.

Note: #44257 is also enough to make the relevant flaky test stable, probably because it makes it less likely to produce out-of-order item reports. Meanwhile, this PR addresses the root cause of the flaky test, i.e. out-of-order item reports during task cancellation.

---------

Signed-off-by: Stephanie Wang <[email protected]>
stephanie-wang added a commit to stephanie-wang/ray that referenced this issue Mar 27, 2024
…ay-project#44197)

When a streaming generator task is cancelled, we should mark the end of the stream at the caller's current pointer into the stream. Otherwise, if we receive out-of-order item reports, we may end up hanging, because the cancelled task will never report the intermediate items. We may end up dropping some values that were already reported, but this is OK since the user already cancelled the task.

Closes ray-project#43852.

Note: ray-project#44257 is also enough to make the relevant flaky test stable, probably because it makes it less likely to produce out-of-order item reports. Meanwhile, this PR addresses the root cause of the flaky test, i.e. out-of-order item reports during task cancellation.

---------

Signed-off-by: Stephanie Wang <[email protected]>
@can-anyscale
Copy link
Collaborator Author

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something that is supposed to be working; but isn't ci-test core Issues that should be addressed in Ray Core flaky-tracker Issue created via Flaky Test Tracker https://flaky-tests.ray.io/ P0 Issues that should be fixed in short order ray-test-bot Issues managed by OSS test policy stability weekly-release-blocker Issues that will be blocking Ray weekly releases
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants