-
Notifications
You must be signed in to change notification settings - Fork 218
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[feature request] Support elastic #278
Comments
Issue-Label Bot is automatically applying the labels:
Please mark this comment with 👍 or 👎 to give our bot feedback! |
That's interesting. Has any of you tried it out yet? We'll need some refactoring on the launcher logic and then support
|
Tried it locally, not on k8s. We should handle discover_hosts.sh for it if we want to support it. |
A simple idea for Are there any shortcuts we can exploit from the StatefulSet features so no pod-operator communication is needed? I believe we discussed using ConfigMap to store and update the status of all pods in a StatefulSet. The concern comes from the latency of ConfigMap. |
Issue-Label Bot is automatically applying the labels:
Please mark this comment with 👍 or 👎 to give our bot feedback! |
how to pass horovodrun's parameters like --host-discovery-script and --min-np when using mpirun command? |
Do you want to use it in mpijob or just in horovod? |
@gaocegege want to use it in mpijob. I tried like this
not failed, but --host-discovery-script and --min-np not work. |
#332 is working on this issue. |
TODO list:
|
https://github.com/horovod/horovod/blob/master/docs/elastic.rst
It will be better if we support elastic training.
The text was updated successfully, but these errors were encountered: