Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LVIS maskrcnn training bug: TypeError: can't pickle _thread.RLock objects #4112

Closed
nemonameless opened this issue Nov 14, 2020 · 3 comments
Closed
Assignees

Comments

@nemonameless
Copy link

Thanks for your error report and we appreciate it a lot.

Describe the bug
training on COCO dataset is ok, but when I train on LVIS meet this bug, It seems long time no updating the code about LVIS training?

Reproduction

  1. What command or script did you run?
CUDA_VISIBLE_DEVICES=6,7  bash ./tools/dist_train.sh configs/lvis/mask_rcnn_r50_fpn_sample1e-3_mstrain_1x_lvis_v1.py 2 --work-dir work_dirs/mask_rcnn_r50_fpn_sample1e-3_mstrain_1x_lvis_v1 --no-validate
  1. Did you make any modifications on the code or config? No
  2. What dataset did you use? LVIS

Environment

  1. Please run python mmdet/utils/collect_env.py to collect necessary environment information and paste it here.
sys.platform: linux
Python: 3.6.10 |Anaconda, Inc.| (default, May  8 2020, 02:54:21) [GCC 7.3.0]
CUDA available: False
GCC: gcc (GCC) 5.2.0
PyTorch: 1.4.0
PyTorch compiling details: PyTorch built with:
  - GCC 7.3
  - Intel(R) Math Kernel Library Version 2019.0.4 Product Build 20190411 for Intel(R) 64 architecture applications
  - Intel(R) MKL-DNN v0.21.1 (Git Hash 7d2fd500bc78936d1d648ca713b901012f470dbc)
  - OpenMP 201511 (a.k.a. OpenMP 4.5)
  - NNPACK is enabled
  - Build settings: BLAS=MKL, BUILD_NAMEDTENSOR=OFF, BUILD_TYPE=Release, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -fopenmp -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -O2 -fPIC -Wno-narrowing -Wall -Wextra -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Wno-stringop-overflow, DISABLE_NUMA=1, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, USE_CUDA=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON, USE_STATIC_DISPATCH=OFF, 

TorchVision: 0.5.0
OpenCV: 4.4.0
MMCV: 1.2.0
MMCV Compiler: GCC 7.3
MMCV CUDA Compiler: 10.1
MMDetection: 2.6.0+7ec0f03
  1. You may add addition that may be helpful for locating the problem, such as
    • How you installed PyTorch ? pip

Error traceback
If applicable, paste the error trackback here.

2020-11-14 18:52:53,609 - mmdet - INFO - workflow: [('train', 1)], max: 12 epochs
Traceback (most recent call last):
  File "./tools/train.py", line 178, in <module>
    main()
  File "./tools/train.py", line 174, in main
    meta=meta)
  File "/data/cdp_algo_ceph_ssd/users/georgeni/mmdetnf/mmdet/apis/train.py", line 150, in train_detector
    runner.run(data_loaders, cfg.workflow, cfg.total_epochs)
  File "/data/anaconda3/envs/mmdet/lib/python3.6/site-packages/mmcv/runner/epoch_based_runner.py", line 125, in run
    epoch_runner(data_loaders[i], **kwargs)
  File "/data/anaconda3/envs/mmdet/lib/python3.6/site-packages/mmcv/runner/epoch_based_runner.py", line 47, in train
    for i, data_batch in enumerate(self.data_loader):
  File "/data/anaconda3/envs/mmdet/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 279, in __iter__
    return _MultiProcessingDataLoaderIter(self)
  File "/data/anaconda3/envs/mmdet/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 719, in __init__
    w.start()
  File "/data/anaconda3/envs/mmdet/lib/python3.6/multiprocessing/process.py", line 105, in start
    self._popen = self._Popen(self)
  File "/data/anaconda3/envs/mmdet/lib/python3.6/multiprocessing/context.py", line 223, in _Popen
    return _default_context.get_context().Process._Popen(process_obj)
  File "/data/anaconda3/envs/mmdet/lib/python3.6/multiprocessing/context.py", line 284, in _Popen
    return Popen(process_obj)
  File "/data/anaconda3/envs/mmdet/lib/python3.6/multiprocessing/popen_spawn_posix.py", line 32, in __init__
    super().__init__(process_obj)
  File "/data/anaconda3/envs/mmdet/lib/python3.6/multiprocessing/popen_fork.py", line 19, in __init__
    self._launch(process_obj)
  File "/data/anaconda3/envs/mmdet/lib/python3.6/multiprocessing/popen_spawn_posix.py", line 47, in _launch
    reduction.dump(process_obj, fp)
  File "/data/anaconda3/envs/mmdet/lib/python3.6/multiprocessing/reduction.py", line 60, in dump
    ForkingPickler(file, protocol).dump(obj)
TypeError: can't pickle _thread.RLock objects
Traceback (most recent call last):
  File "./tools/train.py", line 178, in <module>
    main()
  File "./tools/train.py", line 174, in main
    meta=meta)
  File "/data/cdp_algo_ceph_ssd/users/georgeni/mmdetnf/mmdet/apis/train.py", line 150, in train_detector
    runner.run(data_loaders, cfg.workflow, cfg.total_epochs)
  File "/data/anaconda3/envs/mmdet/lib/python3.6/site-packages/mmcv/runner/epoch_based_runner.py", line 125, in run
    epoch_runner(data_loaders[i], **kwargs)
  File "/data/anaconda3/envs/mmdet/lib/python3.6/site-packages/mmcv/runner/epoch_based_runner.py", line 47, in train
    for i, data_batch in enumerate(self.data_loader):
  File "/data/anaconda3/envs/mmdet/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 279, in __iter__
    return _MultiProcessingDataLoaderIter(self)
  File "/data/anaconda3/envs/mmdet/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 719, in __init__
    w.start()
  File "/data/anaconda3/envs/mmdet/lib/python3.6/multiprocessing/process.py", line 105, in start
    self._popen = self._Popen(self)
  File "/data/anaconda3/envs/mmdet/lib/python3.6/multiprocessing/context.py", line 223, in _Popen
    return _default_context.get_context().Process._Popen(process_obj)
  File "/data/anaconda3/envs/mmdet/lib/python3.6/multiprocessing/context.py", line 284, in _Popen
    return Popen(process_obj)
  File "/data/anaconda3/envs/mmdet/lib/python3.6/multiprocessing/popen_spawn_posix.py", line 32, in __init__
    super().__init__(process_obj)
  File "/data/anaconda3/envs/mmdet/lib/python3.6/multiprocessing/popen_fork.py", line 19, in __init__
    self._launch(process_obj)
  File "/data/anaconda3/envs/mmdet/lib/python3.6/multiprocessing/popen_spawn_posix.py", line 47, in _launch
    reduction.dump(process_obj, fp)
  File "/data/anaconda3/envs/mmdet/lib/python3.6/multiprocessing/reduction.py", line 60, in dump
    ForkingPickler(file, protocol).dump(obj)
TypeError: can't pickle _thread.RLock objects
Traceback (most recent call last):
  File "/data/anaconda3/envs/mmdet/lib/python3.6/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/data/anaconda3/envs/mmdet/lib/python3.6/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/data/anaconda3/envs/mmdet/lib/python3.6/site-packages/torch/distributed/launch.py", line 263, in <module>
    main()
  File "/data/anaconda3/envs/mmdet/lib/python3.6/site-packages/torch/distributed/launch.py", line 259, in main
    cmd=cmd)
subprocess.CalledProcessError: Command '['/data/anaconda3/envs/mmdet/bin/python', '-u', './tools/train.py', '--local_rank=1', 'configs/lvis/mask_rcnn_r50_fpn_sample1e-3_mstrain_1x_lvis_v1.py', '--launcher', 'pytorch', '--work-dir', 'work_dirs/mask_rcnn_r50_fpn_sample1e-3_mstrain_1x_lvis_v1', '--no-validate']' returned non-zero exit status 1.
@Chauncy-Cai
Copy link

I meet the same problem. Hoping it can be solved.
Before that, I got an error as followed.
"""
assert not self.custom_classes, 'LVIS custom classes is not supported'
AttributeError: 'LVISV05Dataset' object has no attribute 'custom_classes'
"""
Then I found that "self" doesn't have attributes custom_classes
(even if this attribute is None, It will not block here)
so I ignore this ASSERT
Then I get such an error

@nemonameless
Copy link
Author

@Chauncy-Cai I just omment out the line "assert not self.custom_classes, 'LVIS custom classes is not supported'". And I also meet the "TypeError: can't pickle _thread.RLock objects" when I use mmdet2.4.0 and mmcv1.1.2.

@xvjiarui
Copy link
Collaborator

xvjiarui commented Jan 4, 2021

Hi @nemonameless , @Chauncy-Cai
You may try Python 3.7. I just fixed custom_classes issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants