-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cluster manager fixes #30172
cluster manager fixes #30172
Conversation
@amitmurthy could you please review? thanks. |
This looks okay, but it would be good to have a test |
10de262
to
433a17a
Compare
@amitmurthy may not be watching. |
Why don't we just remove "failed" from julia/stdlib/Distributed/src/cluster.jl Line 301 in a50d5af
|
7592ab8
to
ef52319
Compare
lastly, i've added documentation as to how to asynchronously launch workers. it'd be great if this PR could be reviewed again. thanks for the previous comments. much better now. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good to me.
b8aa79f
to
271e501
Compare
271e501
to
e47a65e
Compare
LGTM |
* kill workers which don't launch properly * don't emit spurious error messages * document how to asynchronously launch workers
* kill workers which don't launch properly * don't emit spurious error messages * document how to asynchronously launch workers (cherry picked from commit 121e814)
thanks! @amitmurthy could you please review the related JuliaParallel/ClusterManagers.jl#74 too? |
* kill workers which don't launch properly * don't emit spurious error messages * document how to asynchronously launch workers
We can backport this to 1.0 if it's critical for somebody's cluster environment, but for now we won't plan to. |
* kill workers which don't launch properly * don't emit spurious error messages * document how to asynchronously launch workers
fixes #30031
the two commit messages say it all.
i'll happily add a third commit to fix the REPL being blocked during worker startup if someone gives me a hint as to what to change. this is a problem when the remote worker ends up in the pending queue on a busy cluster. in julia 0.6 the remote workers were launched in the background. i've tried making a few changed to the various
@asyncs
that are sprinked through the relevant code, but nothing has worked so far.