Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multi-threading and Rabit allreduce/broadcast ops #103

Closed
nateagr opened this issue Sep 11, 2019 · 2 comments
Closed

Multi-threading and Rabit allreduce/broadcast ops #103

nateagr opened this issue Sep 11, 2019 · 2 comments

Comments

@nateagr
Copy link
Contributor

nateagr commented Sep 11, 2019

Hi everyone,

I've developed allreduce, broadcast and allgather ops for TensorFlow based on Rabit ops. While digging into Rabit ops, I realized that they are not thread safe. So I limited to 1 the number of threads used by TensorFlow to compute the graph so far.

Now, I wonder if there is a way to execute several allreduce/broadcast/allgather in parallel. I've looked into the code of XGBoost to get any hint but I did not manage to find parallel calls of Rabit ops. Is there any plan to make Rabit ops thread safe ?

Thanks in advance for your help.

@chenqin
Copy link
Contributor

chenqin commented Sep 11, 2019

@nateagr Can you elaborate your use cases a bit more. At least for XGBoost, those calls were sequential and supposed to be blocked. Underneath, yes, it's possible to init several rabit instances. But you need to spawn multiple trackers as well.

@hcho3
Copy link
Contributor

hcho3 commented Nov 5, 2020

Closing, as Rabit have been moved into dmlc/xgboost. See discussion in dmlc/xgboost#5995.

@hcho3 hcho3 closed this as completed Nov 5, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants