Support bitwise allreduce in NCCL communicator #9300

rongou · 2023-06-14T19:48:17Z

Use AllGather to collect all the data first, then do the bitwise allreduce locally.

rongou · 2023-06-14T19:49:25Z

@trivialfis

trivialfis · 2023-06-14T21:04:20Z

Apologies for the oversight, the no_sync is a CTK-12 feature.

rongou · 2023-06-15T00:41:00Z

Switched to use LaunchN.

trivialfis · 2023-06-16T16:28:22Z

src/collective/nccl_device_communicator.cu

+  nccl_unique_id_ = GetUniqueId();
+  dh::safe_cuda(cudaSetDevice(device_ordinal_));
+  dh::safe_nccl(ncclCommInitRank(&nccl_comm_, world, nccl_unique_id_, rank));
+  dh::safe_cuda(cudaStreamCreate(&cuda_stream_));


Just a side note, there's dh::CUDAStream, which is a RAII version of async stream.

Switched to it.

Support bitwise allreduce in NCCL communicator

3954028

rongou mentioned this pull request Jun 14, 2023

Vertical Federated Learning RFC #8424

Open

trivialfis approved these changes Jun 14, 2023

View reviewed changes

use launchn instead of thrust transform

978d506

rongou added 3 commits June 15, 2023 09:31

support nccl off

abeade4

fix cpplint error

4cc5e71

Merge remote-tracking branch 'upstream/master' into bitwise-allreduce

f891ed7

trivialfis reviewed Jun 16, 2023

View reviewed changes

trivialfis merged commit d8beb51 into dmlc:master Jun 16, 2023
1 check passed

rongou mentioned this pull request Jun 28, 2023

Add bitwise reduce operator NVIDIA/nccl#240

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support bitwise allreduce in NCCL communicator #9300

Support bitwise allreduce in NCCL communicator #9300

rongou commented Jun 14, 2023

rongou commented Jun 14, 2023

trivialfis commented Jun 14, 2023

rongou commented Jun 15, 2023

trivialfis Jun 16, 2023

rongou Jun 16, 2023

Support bitwise allreduce in NCCL communicator #9300

Support bitwise allreduce in NCCL communicator #9300

Conversation

rongou commented Jun 14, 2023

rongou commented Jun 14, 2023

trivialfis commented Jun 14, 2023

rongou commented Jun 15, 2023

trivialfis Jun 16, 2023

Choose a reason for hiding this comment

rongou Jun 16, 2023

Choose a reason for hiding this comment