Hi,I have this problem in training the CornerNet! #88

BCWang93 · 2019-06-12T12:52:09Z

I get some error in training CornerNet!
'
shuffling indices...
0%| | 0/500000 [00:00<?, ?it/s]
Traceback (most recent call last):
File "train.py", line 249, in
main(None, ngpus_per_node, args)
File "train.py", line 233, in main
train(training_dbs, validation_db, system_config, model, args)
File "train.py", line 165, in train
training_loss = nnet.train(**training)
File "/home/a/Bcw_data/CornerNet-Lite/core/nnet/py_factory.py", line 93, in train
loss = self.network(xs, ys)
File "/home/a/anaconda3/envs/bcw_env/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in call
result = self.forward(*input, **kwargs)
File "/home/a/Bcw_data/CornerNet-Lite/core/models/py_utils/data_parallel.py", line 66, in forward
inputs, kwargs = self.scatter(inputs, kwargs, self.device_ids, self.chunk_sizes)
File "/home/a/Bcw_data/CornerNet-Lite/core/models/py_utils/data_parallel.py", line 77, in scatter
return scatter_kwargs(inputs, kwargs, device_ids, dim=self.dim, chunk_sizes=self.chunk_sizes)
File "/home/a/Bcw_data/CornerNet-Lite/core/models/py_utils/scatter_gather.py", line 30, in scatter_kwargs
inputs = scatter(inputs, target_gpus, dim, chunk_sizes) if inputs else []
File "/home/a/Bcw_data/CornerNet-Lite/core/models/py_utils/scatter_gather.py", line 25, in scatter
return scatter_map(inputs)
File "/home/a/Bcw_data/CornerNet-Lite/core/models/py_utils/scatter_gather.py", line 18, in scatter_map
return list(zip(map(scatter_map, obj)))
File "/home/a/Bcw_data/CornerNet-Lite/core/models/py_utils/scatter_gather.py", line 20, in scatter_map
return list(map(list, zip(map(scatter_map, obj))))
File "/home/a/Bcw_data/CornerNet-Lite/core/models/py_utils/scatter_gather.py", line 15, in scatter_map
return Scatter.apply(target_gpus, chunk_sizes, dim, obj)
File "/home/a/anaconda3/envs/bcw_env/lib/python3.6/site-packages/torch/nn/parallel/_functions.py", line 89, in forward
outputs = comm.scatter(input, target_gpus, chunk_sizes, ctx.dim, streams)
File "/home/a/anaconda3/envs/bcw_env/lib/python3.6/site-packages/torch/cuda/comm.py", line 148, in scatter
return tuple(torch._C._scatter(tensor, devices, chunk_sizes, dim, streams))
RuntimeError: CUDA error: invalid device ordinal (exchangeDevice at /pytorch/aten/src/ATen/cuda/detail/CUDAGuardImpl.h:28)
frame #0: std::function<std::string ()>::operator()() const + 0x11 (0x7f8c30015021 in /home/a/anaconda3/envs/bcw_env/lib/python3.6/site-packages/torch/lib/libc10.so)
frame #1: c10::Error::Error(c10::SourceLocation, std::string const&) + 0x2a (0x7f8c300148ea in /home/a/anaconda3/envs/bcw_env/lib/python3.6/site-packages/torch/lib/libc10.so)
frame #2: + 0x4e414f (0x7f8c6a72514f in /home/a/anaconda3/envs/bcw_env/lib/python3.6/site-packages/torch/lib/libtorch_python.so)
frame #3: + 0x8cdfa2 (0x7f8c30af3fa2 in /home/a/anaconda3/envs/bcw_env/lib/python3.6/site-packages/torch/lib/libcaffe2.so)
frame #4: + 0xa14ae5 (0x7f8c30c3aae5 in /home/a/anaconda3/envs/bcw_env/lib/python3.6/site-packages/torch/lib/libcaffe2.so)
frame #5: at::TypeDefault::copy(at::Tensor const&, bool, c10::optionalc10::Device) const + 0x56 (0x7f8c30d77c76 in /home/a/anaconda3/envs/bcw_env/lib/python3.6/site-packages/torch/lib/libcaffe2.so)
frame #6: + 0x977f47 (0x7f8c30b9df47 in /home/a/anaconda3/envs/bcw_env/lib/python3.6/site-packages/torch/lib/libcaffe2.so)
frame #7: at::native::to(at::Tensor const&, at::TensorOptions const&, bool, bool) + 0x295 (0x7f8c30b9faf5 in /home/a/anaconda3/envs/bcw_env/lib/python3.6/site-packages/torch/lib/libcaffe2.so)
frame #8: at::TypeDefault::to(at::Tensor const&, at::TensorOptions const&, bool, bool) const + 0x17 (0x7f8c30d3e4f7 in /home/a/anaconda3/envs/bcw_env/lib/python3.6/site-packages/torch/lib/libcaffe2.so)
frame #9: torch::autograd::VariableType::to(at::Tensor const&, at::TensorOptions const&, bool, bool) const + 0x17a (0x7f8c2f27ebaa in /home/a/anaconda3/envs/bcw_env/lib/python3.6/site-packages/torch/lib/libtorch.so.1)
frame #10: torch::cuda::scatter(at::Tensor const&, c10::ArrayRef, c10::optional<std::vector<long, std::allocator > > const&, long, c10::optional<std::vector<c10::optionalat::cuda::CUDAStream, std::allocator<c10::optionalat::cuda::CUDAStream > > > const&) + 0x391 (0x7f8c6a7274d1 in /home/a/anaconda3/envs/bcw_env/lib/python3.6/site-packages/torch/lib/libtorch_python.so)
frame #11: + 0x4ebc2f (0x7f8c6a72cc2f in /home/a/anaconda3/envs/bcw_env/lib/python3.6/site-packages/torch/lib/libtorch_python.so)
frame #12: + 0x11642e (0x7f8c6a35742e in /home/a/anaconda3/envs/bcw_env/lib/python3.6/site-packages/torch/lib/libtorch_python.so)

frame #23: THPFunction_apply(_object, _object) + 0x581 (0x7f8c6a553ab1 in /home/a/anaconda3/envs/bcw_env/lib/python3.6/site-packages/torch/lib/libtorch_python.so)

'
Can you help me solve this problem?Thanks!

Looson · 2019-08-22T01:53:49Z

Have you solve this problem? Thank you!

BCWang93 closed this as completed Jun 12, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Hi,I have this problem in training the CornerNet! #88

Hi,I have this problem in training the CornerNet! #88

BCWang93 commented Jun 12, 2019

Looson commented Aug 22, 2019

Hi,I have this problem in training the CornerNet! #88

Hi,I have this problem in training the CornerNet! #88

Comments

BCWang93 commented Jun 12, 2019

Looson commented Aug 22, 2019