-
Notifications
You must be signed in to change notification settings - Fork 184
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Mpicc #135
Mpicc #135
Conversation
snoweye
commented
Feb 16, 2020
•
edited
Loading
edited
- MPI C implementations with an example tested with OpenMPI and CI in Linux and osx.
- The application can be found in pbdXGB with CI tested in Linux (OpenMPI) and Widnows (MS-MPI).
Thanks for the PR and sorry for the wait! Feel free to ping me in the future ;-). Is it possible to completely replace the original c++ implementation? |
Yes, the implementation is a complete replacement except some difference where
I can get rid of C++ and simplify those unnecessary steps wherever is possible. However, this had not been discussed as in the issue #127. Would others agree with the changes? |
@snoweye I think there's an implementation with mpi backend here. Can we just replace that without any architectural change? Feel free to point out the issues here. |
It is difficult without architectural change, though, possibly ends up rewrite/redefine most what MPI C++ deprecated binding did if keeping those MPI namespace is needed. This PR is to keep those MPI namespace when no mpi is needed, but it switches the binding where mpi or speed is needed. |
I have the other branch However, it is not completely without changing architectural because I have trouble to get ride of these lines regarding |
@trivialfis Would you mind take a quick look of the new changes in the branch |
@trivialfis Can we merge this for now, given that this doesn't modify existing code but adds a new functionality? We can even mark it as "experimental". On a related note, we can consider adding MPI in the XGBoost testing pipeline. |
No. I don't want to add new API to rabit. Allow me to explain a little. Right now I have a huge headache over rabit's model recovery functionality. It saves model and model only, so anything in DMatrix is not check pointed. That's why it doesn't work for dask because we requires DMatrix to handle itself. Also there's an unclear behaviour regarding repeated calls before training. Adding more APIs will only make things more difficult to work with regarding model recovery. |
Recent developments of Rabit have been moved into the XGBoost repository. See discussion in dmlc/xgboost#5995. |