Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AssertionError: Default process group is not initialized #8

Closed
HaoweiGis opened this issue Jul 11, 2020 · 8 comments
Closed

AssertionError: Default process group is not initialized #8

HaoweiGis opened this issue Jul 11, 2020 · 8 comments

Comments

@HaoweiGis
Copy link

Describe the bug
python tools/train.py configs/danet/danet_r50-d8_512x1024_40k_cityscapes.py. I get an error when using custom data for model training, AssertionError: Default process group is not initialized.
GPU now has two target detection networks running, is this the reason? mmdetection can train multiple networks simultaneously.

Environment info
sys.platform: linux
Python: 3.7.7 (default, Mar 23 2020, 22:36:06) [GCC 7.3.0]
CUDA available: True
CUDA_HOME: /usr/local/cuda
NVCC: Cuda compilation tools, release 10.1, V10.1.243
GPU 0: Tesla V100-PCIE-32GB
GCC: gcc (Ubuntu 9.3.0-10ubuntu2) 9.3.0
PyTorch: 1.5.0
PyTorch compiling details: PyTorch built with:

  • GCC 7.3
  • C++ Version: 201402
  • Intel(R) Math Kernel Library Version 2020.0.0 Product Build 20191122 for Intel(R) 64 architecture applications
  • Intel(R) MKL-DNN v0.21.1 (Git Hash 7d2fd500bc78936d1d648ca713b901012f470dbc)
  • OpenMP 201511 (a.k.a. OpenMP 4.5)
  • NNPACK is enabled
  • CPU capability usage: AVX2
  • CUDA Runtime 10.1
  • NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_37,code=compute_37
  • CuDNN 7.6.3
  • Magma 2.5.2
  • Build settings: BLAS=MKL, BUILD_TYPE=Release, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -fopenmp -DNDEBUG -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DUSE_INTERNAL_THREADPOOL_IMPL -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, USE_CUDA=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON, USE_STATIC_DISPATCH=OFF,

TorchVision: 0.6.0a0+82fd1c8
OpenCV: 4.2.0
MMCV: 1.0.2
MMSegmentation: 0.5.0+b72a6d0
MMCV Compiler: GCC 7.3
MMCV CUDA Compiler: 10.1

@HaoweiGis
Copy link
Author

mmdetection and mmsegmentation can be installed in a docker container?

@xvjiarui
Copy link
Collaborator

Hi @HaoweiGis
If you would like to debug with non-distributed training, you need to change SyncBNto BN since distributed training is required by PyTorch SyncBN.

@HaoweiGis
Copy link
Author

HaoweiGis commented Jul 11, 2020

Hello@xvjiarui
Thank you for your reply, After change SyncBN to BN. The model is training.

@xvjiarui
Copy link
Collaborator

Hi @HaoweiGis
Thanks for letting us know. Please note that all models in this repo are trained with 4 GPUs.
If you are training with the same iterations on single GPU, the performance may drop.

@AbdullahRJafar
Copy link

I am having the same problem. How do I change SyncBN to BN?

@xvjiarui
Copy link
Collaborator

xvjiarui commented Feb 5, 2021

Hi @AbdullahRJafar
You may modify it in the config file.

@PriyankaJain-1998
Copy link

facing same issue, which file under configs folder?

@xiexinch
Copy link
Collaborator

xiexinch commented Apr 20, 2021

facing same issue, which file under configs folder?

Hi @PriyankaJain-1998
At each config/_base_/models/xxx.py.
And you can also run tools/dist_train.sh by setting GPUS=1, like
./tools/dist_train.sh config.py 1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants