-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fluid distributed training TODO #10279
Comments
Some extra that might worth adding: distributed data reader (should unify with single machine reader) |
fault-tolerance is a basic distributed training feature that probably doesn't belong to EDL only. |
|
Maybe we should devide
|
The overall future roadmap should include the following parts:
|
Thanks, @panyx0718 @seiriosPlus @typhoonzero, I updated this issue followed by your comments. |
Do we need design communication backend's abstract interface to be compatible with various implementations:
|
I think that it's maybe many things to do and we'd better do them with orders, classification, and priority. |
Closing this issue, most of the work are done except brpc and EDL related. |
Fluid Distribute Training Features
EDL
Support different communication library
Experiment
CE
Future
The text was updated successfully, but these errors were encountered: