Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

C vs C++ MPI Usage #127

Closed
wrathematics opened this issue Nov 4, 2019 · 8 comments
Closed

C vs C++ MPI Usage #127

wrathematics opened this issue Nov 4, 2019 · 8 comments

Comments

@wrathematics
Copy link

As you know, the MPI standard deprecated C++ bindings. Some implementations still offer them (e.g. OpenMPI) while others do not (e.g. MSMPI). Would you be receptive to a re-write of the MPI-backend internals to use the MPI C API? Looking at #31 it seems like you're open to the change at least in principle. I am willing to do this, but I wanted to make sure before I started.

@hcho3
Copy link
Contributor

hcho3 commented Nov 4, 2019

@trivialfis
Copy link
Member

Thanks! PRs are welcomed! Just currently due to MPI backend doesn't support fault tolerance so we don't really use it. @chenqin may provide more inputs here.

@chenqin
Copy link
Contributor

chenqin commented Nov 4, 2019

Thanks! PRs are welcomed! Just currently due to MPI backend doesn't support fault tolerance so we don't really use it. @chenqin may provide more inputs here.

Thanks Travis. MPI were not prioritized because it doesn’t support fault recovery we expected from socket implementation. I put some thoughts on building a overlay on top and founds it’s not very straightforward. Meanwhile, we are happy to work with you on this if that’s what’s your passioned about.

@thvasilo
Copy link

thvasilo commented Nov 4, 2019

If it's just a straightforward translation from the C++ calls to the C ones I don't see why we wouldn't do it.

Note that AFAI understand, my research use cases still make use of the sockets version for communication, I'm only using MPI/SLURM as a tracker.

Re. what @chenqin suggested I've started recently looking into LightGBM and it has a nice thin layer between MPI and their sockets implementation, with collectives making heavy use of a sendrecv operation internally, meaning that it doesn't take too much duplication to support the two.

For the future, it might be worth taking a look.

@wrathematics
Copy link
Author

Great, thanks!

One thing I would add is that changing to manual send/recv patterns will likely come at the cost of performance. I think most MPI implementations use recursive doubling for allreduce, which is a communication-avoiding strategy. And on HPC systems with boutique interconnects, collectives are optimized even further to take advantage of the network topology.

@thvasilo
Copy link

thvasilo commented Nov 5, 2019

Agreed, no point in reinventing the wheel, we should keep it as high level as possible.

@snoweye
Copy link

snoweye commented Feb 16, 2020

Please see this PR #135 for the MPI C implementation and tested with CI using OpenMPI in Linux and osx. An application pbdXGB (xgboost for distributed learning) is also packaged and tested with CI in both Linux and Windows (MS-MPI).

@snoweye snoweye mentioned this issue Mar 3, 2020
@hcho3
Copy link
Contributor

hcho3 commented Nov 5, 2020

Closing, as Rabit have been moved into dmlc/xgboost. See discussion in dmlc/xgboost#5995.

@hcho3 hcho3 closed this as completed Nov 5, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants