Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RuntimeError: cannot join current thread #98

Open
MichaelCong opened this issue Jul 22, 2019 · 3 comments
Open

RuntimeError: cannot join current thread #98

MichaelCong opened this issue Jul 22, 2019 · 3 comments

Comments

@MichaelCong
Copy link

shuffling indices...
0%| | 0/500000 [00:00<?, ?it/s]Traceback (most recent call last):
File "train.py", line 249, in
main(None, ngpus_per_node, args)
File "train.py", line 233, in main
train(training_dbs, validation_db, system_config, model, args)
File "train.py", line 165, in train
training_loss = nnet.train(**training)
File "/home/rencong/CornerNet-Lite/core/nnet/py_factory.py", line 93, in train
loss = self.network(xs, ys)
File "/home/rencong/anaconda3/envs/CornerNet_Lite/lib/python3.7/site-packages/torch/nn/modules/module.py", line 489, in call
result = self.forward(*input, **kwargs)
File "/home/rencong/CornerNet-Lite/core/models/py_utils/data_parallel.py", line 66, in forward
inputs, kwargs = self.scatter(inputs, kwargs, self.device_ids, self.chunk_sizes)
File "/home/rencong/CornerNet-Lite/core/models/py_utils/data_parallel.py", line 77, in scatter
return scatter_kwargs(inputs, kwargs, device_ids, dim=self.dim, chunk_sizes=self.chunk_sizes)
File "/home/rencong/CornerNet-Lite/core/models/py_utils/scatter_gather.py", line 30, in scatter_kwargs
inputs = scatter(inputs, target_gpus, dim, chunk_sizes) if inputs else []
File "/home/rencong/CornerNet-Lite/core/models/py_utils/scatter_gather.py", line 25, in scatter
return scatter_map(inputs)
File "/home/rencong/CornerNet-Lite/core/models/py_utils/scatter_gather.py", line 18, in scatter_map
return list(zip(map(scatter_map, obj)))
File "/home/rencong/CornerNet-Lite/core/models/py_utils/scatter_gather.py", line 20, in scatter_map
return list(map(list, zip(map(scatter_map, obj))))
File "/home/rencong/CornerNet-Lite/core/models/py_utils/scatter_gather.py", line 15, in scatter_map
return Scatter.apply(target_gpus, chunk_sizes, dim, obj)
File "/home/rencong/anaconda3/envs/CornerNet_Lite/lib/python3.7/site-packages/torch/nn/parallel/_functions.py", line 89, in forward
outputs = comm.scatter(input, target_gpus, chunk_sizes, ctx.dim, streams)
File "/home/rencong/anaconda3/envs/CornerNet_Lite/lib/python3.7/site-packages/torch/cuda/comm.py", line 148, in scatter
return tuple(torch._C._scatter(tensor, devices, chunk_sizes, dim, streams))
RuntimeError: CUDA error: invalid device ordinal (exchangeDevice at /opt/conda/conda-bld/pytorch_1544202130060/work/aten/src/ATen/cuda/detail/CUDAGuardImpl.h:28)
frame #0: c10::Error::Error(c10::SourceLocation, std::string const&) + 0x45 (0x7fe841f45cc5 in /home/rencong/anaconda3/envs/CornerNet_Lite/lib/python3.7/site-packages/torch/lib/libc10.so)
frame #1: + 0x4f291f (0x7fe882aac91f in /home/rencong/anaconda3/envs/CornerNet_Lite/lib/python3.7/site-packages/torch/lib/libtorch_python.so)
frame #2: + 0x4f0222 (0x7fe842647222 in /home/rencong/anaconda3/envs/CornerNet_Lite/lib/python3.7/site-packages/torch/lib/libcaffe2.so)
frame #3: + 0x68bcd5 (0x7fe8427e2cd5 in /home/rencong/anaconda3/envs/CornerNet_Lite/lib/python3.7/site-packages/torch/lib/libcaffe2.so)
frame #4: at::TypeDefault::copy(at::Tensor const&, bool, c10::optionalc10::Device) const + 0x56 (0x7fe842920d16 in /home/rencong/anaconda3/envs/CornerNet_Lite/lib/python3.7/site-packages/torch/lib/libcaffe2.so)
frame #5: + 0x5fa057 (0x7fe842751057 in /home/rencong/anaconda3/envs/CornerNet_Lite/lib/python3.7/site-packages/torch/lib/libcaffe2.so)
frame #6: at::native::to(at::Tensor const&, at::TensorOptions const&, bool, bool) + 0x295 (0x7fe842752cd5 in /home/rencong/anaconda3/envs/CornerNet_Lite/lib/python3.7/site-packages/torch/lib/libcaffe2.so)
frame #7: at::TypeDefault::to(at::Tensor const&, at::TensorOptions const&, bool, bool) const + 0x17 (0x7fe8428e6d27 in /home/rencong/anaconda3/envs/CornerNet_Lite/lib/python3.7/site-packages/torch/lib/libcaffe2.so)
frame #8: torch::autograd::VariableType::to(at::Tensor const&, at::TensorOptions const&, bool, bool) const + 0x17a (0x7fe83f491b2a in /home/rencong/anaconda3/envs/CornerNet_Lite/lib/python3.7/site-packages/torch/lib/libtorch.so.1)
frame #9: torch::cuda::scatter(at::Tensor const&, c10::ArrayRef, c10::optional<std::vector<long, std::allocator > > const&, long, c10::optional<std::vector<c10::optionalat::cuda::CUDAStream, std::allocator<c10::optionalat::cuda::CUDAStream > > > const&) + 0x491 (0x7fe882aaf161 in /home/rencong/anaconda3/envs/CornerNet_Lite/lib/python3.7/site-packages/torch/lib/libtorch_python.so)
frame #10: + 0x4fae71 (0x7fe882ab4e71 in /home/rencong/anaconda3/envs/CornerNet_Lite/lib/python3.7/site-packages/torch/lib/libtorch_python.so)
frame #11: + 0x112176 (0x7fe8826cc176 in /home/rencong/anaconda3/envs/CornerNet_Lite/lib/python3.7/site-packages/torch/lib/libtorch_python.so)

frame #19: THPFunction_apply(_object
, _object
) + 0x5a1 (0x7fe8828c7bf1 in /home/rencong/anaconda3/envs/CornerNet_Lite/lib/python3.7/site-packages/torch/lib/libtorch_python.so)

Exception ignored in: <function tqdm.del at 0x7fe83e2a0b70>
Traceback (most recent call last):
File "/home/rencong/anaconda3/envs/CornerNet_Lite/lib/python3.7/site-packages/tqdm/_tqdm.py", line 885, in del
self.close()
File "/home/rencong/anaconda3/envs/CornerNet_Lite/lib/python3.7/site-packages/tqdm/_tqdm.py", line 1090, in close
self._decr_instances(self)
File "/home/rencong/anaconda3/envs/CornerNet_Lite/lib/python3.7/site-packages/tqdm/_tqdm.py", line 454, in _decr_instances
cls.monitor.exit()
File "/home/rencong/anaconda3/envs/CornerNet_Lite/lib/python3.7/site-packages/tqdm/_monitor.py", line 52, in exit
self.join()
File "/home/rencong/anaconda3/envs/CornerNet_Lite/lib/python3.7/threading.py", line 1029, in join
raise RuntimeError("cannot join current thread")
RuntimeError: cannot join current thread

@H-Wenfeng
Copy link

I have the same problem, did you solve it?

@lmmmmmmmmm
Copy link

I also have the same problem, did you solve it?

@jerrywgz
Copy link

jerrywgz commented Oct 30, 2019

#17 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants