Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

“multi-gpu error” dist.all_gather(gathered_samples, sample) # gather not supported with NCCL #46

Open
fikry102 opened this issue Oct 3, 2023 · 1 comment

Comments

@fikry102
Copy link

fikry102 commented Oct 3, 2023

mpiexec -n 8 python scripts/image_sample.py --batch_size 32 --training_mode consistency_distillation --sampler multistep --ts 0,62,150 --steps 151 --model_path ./ct_cat256.pt --attention_resolutions 32,16,8 --class_cond False --use_scale_shift_norm False --dropout 0.0 --image_size 256 --num_channels 256 --num_head_channels 64 --num_res_blocks 2 --num_samples 500 --resblock_updown True --use_fp16 True --weight_schedule uniform

"home/anaconda3/envs/consistency/lib/python3.8/site-packages/torch/distributed/distributed_c10d.py", line 2433, in all_gather
    work = default_pg.allgather([tensor_list], [tensor])
torch.distributed.DistBackendError: NCCL error in: ../torch/csrc/distributed/c10d/ProcessGroupNCCL.cpp:1275, internal error, NCCL version 2.14.3
ncclInternalError: Internal check failed.
Last error:
Cuda failure 'peer access is not supported between these two devices'
@fikry102
Copy link
Author

fikry102 commented Oct 3, 2023

Traceback (most recent call last):
File "scripts/image_sample.py", line 143, in
main()
File "scripts/image_sample.py", line 91, in main
dist.all_gather(gathered_samples, sample) # gather not supported with NCCL

@fikry102 fikry102 changed the title multi-gpu error “multi-gpu error” dist.all_gather(gathered_samples, sample) # gather not supported with NCCL Oct 3, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant