-
Notifications
You must be signed in to change notification settings - Fork 5.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Core] investigate why Ray hangs with grpcio==1.48.0 #27299
Comments
Repro'd. I see an interesting error message when installing
|
It looks like grpcio yanked 1.48.0 from PyPi, see grpc/grpc#30446 (comment). If I install grpcio 1.48.1, the hang no longer reproduces. @scv119 @jjyao I think the hang issue is not due to Ray -- it's because grpc shipped a bug causing hangs in multithreaded libraries. Want to brainstorm ways to harden our processes against this in the future? The bug was originally seen and fixed for apache Beam -- the grpc folks suggested they test against the latest grpcio rc in the future to catch these issues. It's not clear to me how that would help us. Also, it seems that the Ray inconsistent dependencies between runtime envs and system is orthogonal to this, how should I look into that? |
Signed-off-by: Cade Daniel <[email protected]>
Can we revert #27244 then given 1.48.0 is yanked? Since we don't pin down our dependencies so I don't know how we can avoid this kind of things from happening in the future. |
Yes, I think we can widen the range of grpcio versions to include 1.48.1. Do you know why the original I'll work on this in parallel with current AIR benchmarking work. |
FWIW, [1.45 - 1.48) suffers from spammy logs issue https://github.com/grpc/grpc/commit/e8ca82b9a4374c8f87c2d190509c39055deda42a |
I don't think spammy logs are a good enough reason to forbid a version, much less such a big range of them. On the conda-forge side, we try to be very faithful to the restrictions specified by the project, but non-essential restrictions like this just make that work a lot harder. For context, we need to rebuild packages for newer host dependencies, and grpc (together with abseil + protobuf) are a lot of work to keep consistent across the ecosystem. We cannot seriously maintain more than 2-3 versions of any package at a time, and pinning ray to grpcio <=1.43 means it will not be co-installable with all the other packages (e.g. tensorflow, etc.) that have already moved on (and been rebuilt for newer protobuf 3.21, abseil 20220623, grpc 1.46/1.47, etc.) |
I think you have a good point @h-vetinari. Let me check to see if there were any other reasons (besides the spammy logs) which caused us to limit versions [1.45, 1.48). If there's nothing else then I'll create a PR for people to comment on. |
#22518 has most of the context. The extra logs show up in the otherwise sparse stdout as well, which is distracting to users and they opened issues in the Ray repo. But if the restriction for [1.45, 1.48) is removed now, maybe most of the new Ray installations will not get these versions anyway. |
Cool, thanks! So we'll open up the range to <=1.47 for now. If people have issues with the spammy logs, we can patch the grpc within conda-forge to include that fix. |
Hi @jjyao @cadedaniel : could you help provide an update on this one (since it's P0)? Thanks for working on this! |
@cadedaniel will work on it on Monday. We already found the root cause and know how to fix. |
What happened + What you expected to happen
We need to dive deep down to why
ray.init
hangs with grpcio==1.48.0.Bonus point: we need to figure out our strategy moving forward, since grpcio has been caused several issues.
502c3e1
e3051eb
fda3453
Versions / Dependencies
latest ray
Reproduction script
pip install ray
pip install grpcio==1.48.0
python
This will hang.
Issue Severity
Medium: It is a significant difficulty but I can work around it.
The text was updated successfully, but these errors were encountered: