Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Core][Release test] microbenchmark_38.aws failed #34915

Closed
rkooo567 opened this issue May 1, 2023 · 14 comments · Fixed by #35494
Closed

[Core][Release test] microbenchmark_38.aws failed #34915

rkooo567 opened this issue May 1, 2023 · 14 comments · Fixed by #35494
Assignees
Labels
bug Something that is supposed to be working; but isn't core Issues that should be addressed in Ray Core P0 Issues that should be fixed in short order release-blocker P0 Issue that blocks the release release-test release test

Comments

@rkooo567
Copy link
Contributor

rkooo567 commented May 1, 2023

What happened + What you expected to happen

multi client put calls (Plasma Store) per second 12224.22 +- 138.19
single client put gigabytes per second 19.33 +- 5.13
single client tasks and get batch per second 9.54 +- 0.71
multi client put gigabytes per second 32.88 +- 2.09
single client get object containing 10k refs per second 12.29 +- 0.02
single client wait 1k refs per second 4.09 +- 0.06
single client tasks sync per second 1320.24 +- 34.96
single client tasks async per second 10520.47 +- 403.57
multi client tasks async per second 31509.83 +- 2904.41
1:1 actor calls sync per second 2508.25 +- 43.64
1:1 actor calls async per second 8679.05 +- 167.69
1:1 actor calls concurrent per second 5051.74 +- 204.91
1:n actor calls async per second 10362.16 +- 180.06
n:n actor calls async per second 30507.68 +- 413.78
n:n actor calls with arg async per second 2588.27 +- 99.46
1:1 async-actor calls sync per second 1659.86 +- 34.58
1:1 async-actor calls async per second 3014.39 +- 152.17
Traceback (most recent call last):
  File "run_microbenchmark.py", line 16, in <module>
    results = main() or []
  File "/home/ray/anaconda3/lib/python3.8/site-packages/ray/_private/ray_perf.py", line 258, in main
    results += timeit("1:1 async-actor calls with args async", async_actor, 1000)
  File "/home/ray/anaconda3/lib/python3.8/site-packages/ray/_private/ray_microbenchmark_helpers.py", line 26, in timeit
    fn()
  File "/home/ray/anaconda3/lib/python3.8/site-packages/ray/_private/ray_perf.py", line 256, in async_actor
    ray.get([a.small_value_with_arg.remote(i) for i in range(1000)])
  File "/home/ray/anaconda3/lib/python3.8/site-packages/ray/_private/client_mode_hook.py", line 105, in wrapper
    return func(*args, **kwargs)
  File "/home/ray/anaconda3/lib/python3.8/site-packages/ray/_private/worker.py", line 2534, in get
    raise value.as_instanceof_cause()
ray.exceptions.RayTaskError(RecursionError): ray::AsyncActor.small_value_with_arg() (pid=66650, ip=10.0.5.225, repr=<ray._private.ray_perf.AsyncActor object at 0x7fe544f8b6a0>)
  File "/home/ray/anaconda3/lib/python3.8/asyncio/tasks.py", line 918, in run_coroutine_threadsafe
    future = concurrent.futures.Future()
  File "/home/ray/anaconda3/lib/python3.8/concurrent/futures/_base.py", line 318, in __init__
    self._condition = threading.Condition()
  File "/home/ray/anaconda3/lib/python3.8/threading.py", line 224, in __init__
    lock = RLock()
RecursionError: maximum recursion depth exceeded
(AsyncActor pid=66650) sys:1: RuntimeWarning: coroutine 'execute_task.<locals>.deserialize_args' was never awaited
Subprocess return code: 1

https://console.anyscale-staging.com/o/anyscale-internal/jobs/prodjob_f6k46srheek1fh6v8hc24q6992

Versions / Dependencies

master

Reproduction script

n.a

Issue Severity

None

@rkooo567 rkooo567 added bug Something that is supposed to be working; but isn't P0 Issues that should be fixed in short order core Issues that should be addressed in Ray Core labels May 1, 2023
@jjyao jjyao self-assigned this May 1, 2023
@jjyao
Copy link
Collaborator

jjyao commented May 1, 2023

Might be related: #30498

@jjyao
Copy link
Collaborator

jjyao commented May 1, 2023

@can-anyscale can-anyscale added the release-test release test label May 2, 2023
@fishbone
Copy link
Contributor

fishbone commented May 3, 2023

Could this issue relate to #35015 ?

@jjyao
Copy link
Collaborator

jjyao commented May 3, 2023

Hmm probably not. For this one, I was not able to reproduce it and it's passing in master now.

@rkooo567 rkooo567 assigned scv119 and unassigned jjyao May 3, 2023
@scv119
Copy link
Contributor

scv119 commented May 3, 2023

#30476 also might be related.

@scv119
Copy link
Contributor

scv119 commented May 5, 2023

@can-anyscale
Copy link
Collaborator

It seems like the issue is now a lot more persistent now: https://buildkite.com/ray-project/release-tests-branch/builds/1647#01880bf9-1862-4b17-a764-d1cc1438e596.

I failed to bisect this due to its flakiness level

CC: @scv119

@scv119
Copy link
Contributor

scv119 commented May 11, 2023

@scv119 scv119 added release-blocker P0 Issue that blocks the release and removed release-test release test labels May 15, 2023
@can-anyscale
Copy link
Collaborator

Ok, after so many bisect, I think this is the culprit: cc3fa33

Could you help verify @vitsai , @scv119

Bisect job: https://buildkite.com/ray-project/release-tests-bisect/builds/152#01882575-5a1e-4c22-8671-24cb54da02dc

@can-anyscale can-anyscale added the release-test release test label May 16, 2023
@scv119
Copy link
Contributor

scv119 commented May 17, 2023

thanks @can-anyscale I think probably it's a side effect of the PR make this bug more frequent..

@can-anyscale
Copy link
Collaborator

FYI, this test is reliably passing on the release branch, and I think the only relevant pick is this https://buildkite.com/ray-project/oss-ci-build-branch/builds/3954. That PR is reverted on master though cause it breaks other thing, but want to let you know it fixed the tests somehow.

@rkooo567 rkooo567 assigned rkooo567 and unassigned scv119 May 18, 2023
@can-anyscale
Copy link
Collaborator

Since this is a release blocker, can you help do a cherry pick as well @rkooo567 , thanks

@rkooo567
Copy link
Contributor Author

There's already a PR out!

@rkooo567
Copy link
Contributor Author

#35532

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something that is supposed to be working; but isn't core Issues that should be addressed in Ray Core P0 Issues that should be fixed in short order release-blocker P0 Issue that blocks the release release-test release test
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants