Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hi,I have this problem in training the CornerNet! #88

Closed
BCWang93 opened this issue Jun 12, 2019 · 1 comment
Closed

Hi,I have this problem in training the CornerNet! #88

BCWang93 opened this issue Jun 12, 2019 · 1 comment

Comments

@BCWang93
Copy link

I get some error in training CornerNet!
'
shuffling indices...
0%| | 0/500000 [00:00<?, ?it/s]
Traceback (most recent call last):
File "train.py", line 249, in
main(None, ngpus_per_node, args)
File "train.py", line 233, in main
train(training_dbs, validation_db, system_config, model, args)
File "train.py", line 165, in train
training_loss = nnet.train(**training)
File "/home/a/Bcw_data/CornerNet-Lite/core/nnet/py_factory.py", line 93, in train
loss = self.network(xs, ys)
File "/home/a/anaconda3/envs/bcw_env/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in call
result = self.forward(*input, **kwargs)
File "/home/a/Bcw_data/CornerNet-Lite/core/models/py_utils/data_parallel.py", line 66, in forward
inputs, kwargs = self.scatter(inputs, kwargs, self.device_ids, self.chunk_sizes)
File "/home/a/Bcw_data/CornerNet-Lite/core/models/py_utils/data_parallel.py", line 77, in scatter
return scatter_kwargs(inputs, kwargs, device_ids, dim=self.dim, chunk_sizes=self.chunk_sizes)
File "/home/a/Bcw_data/CornerNet-Lite/core/models/py_utils/scatter_gather.py", line 30, in scatter_kwargs
inputs = scatter(inputs, target_gpus, dim, chunk_sizes) if inputs else []
File "/home/a/Bcw_data/CornerNet-Lite/core/models/py_utils/scatter_gather.py", line 25, in scatter
return scatter_map(inputs)
File "/home/a/Bcw_data/CornerNet-Lite/core/models/py_utils/scatter_gather.py", line 18, in scatter_map
return list(zip(map(scatter_map, obj)))
File "/home/a/Bcw_data/CornerNet-Lite/core/models/py_utils/scatter_gather.py", line 20, in scatter_map
return list(map(list, zip(map(scatter_map, obj))))
File "/home/a/Bcw_data/CornerNet-Lite/core/models/py_utils/scatter_gather.py", line 15, in scatter_map
return Scatter.apply(target_gpus, chunk_sizes, dim, obj)
File "/home/a/anaconda3/envs/bcw_env/lib/python3.6/site-packages/torch/nn/parallel/_functions.py", line 89, in forward
outputs = comm.scatter(input, target_gpus, chunk_sizes, ctx.dim, streams)
File "/home/a/anaconda3/envs/bcw_env/lib/python3.6/site-packages/torch/cuda/comm.py", line 148, in scatter
return tuple(torch._C._scatter(tensor, devices, chunk_sizes, dim, streams))
RuntimeError: CUDA error: invalid device ordinal (exchangeDevice at /pytorch/aten/src/ATen/cuda/detail/CUDAGuardImpl.h:28)
frame #0: std::function<std::string ()>::operator()() const + 0x11 (0x7f8c30015021 in /home/a/anaconda3/envs/bcw_env/lib/python3.6/site-packages/torch/lib/libc10.so)
frame #1: c10::Error::Error(c10::SourceLocation, std::string const&) + 0x2a (0x7f8c300148ea in /home/a/anaconda3/envs/bcw_env/lib/python3.6/site-packages/torch/lib/libc10.so)
frame #2: + 0x4e414f (0x7f8c6a72514f in /home/a/anaconda3/envs/bcw_env/lib/python3.6/site-packages/torch/lib/libtorch_python.so)
frame #3: + 0x8cdfa2 (0x7f8c30af3fa2 in /home/a/anaconda3/envs/bcw_env/lib/python3.6/site-packages/torch/lib/libcaffe2.so)
frame #4: + 0xa14ae5 (0x7f8c30c3aae5 in /home/a/anaconda3/envs/bcw_env/lib/python3.6/site-packages/torch/lib/libcaffe2.so)
frame #5: at::TypeDefault::copy(at::Tensor const&, bool, c10::optionalc10::Device) const + 0x56 (0x7f8c30d77c76 in /home/a/anaconda3/envs/bcw_env/lib/python3.6/site-packages/torch/lib/libcaffe2.so)
frame #6: + 0x977f47 (0x7f8c30b9df47 in /home/a/anaconda3/envs/bcw_env/lib/python3.6/site-packages/torch/lib/libcaffe2.so)
frame #7: at::native::to(at::Tensor const&, at::TensorOptions const&, bool, bool) + 0x295 (0x7f8c30b9faf5 in /home/a/anaconda3/envs/bcw_env/lib/python3.6/site-packages/torch/lib/libcaffe2.so)
frame #8: at::TypeDefault::to(at::Tensor const&, at::TensorOptions const&, bool, bool) const + 0x17 (0x7f8c30d3e4f7 in /home/a/anaconda3/envs/bcw_env/lib/python3.6/site-packages/torch/lib/libcaffe2.so)
frame #9: torch::autograd::VariableType::to(at::Tensor const&, at::TensorOptions const&, bool, bool) const + 0x17a (0x7f8c2f27ebaa in /home/a/anaconda3/envs/bcw_env/lib/python3.6/site-packages/torch/lib/libtorch.so.1)
frame #10: torch::cuda::scatter(at::Tensor const&, c10::ArrayRef, c10::optional<std::vector<long, std::allocator > > const&, long, c10::optional<std::vector<c10::optionalat::cuda::CUDAStream, std::allocator<c10::optionalat::cuda::CUDAStream > > > const&) + 0x391 (0x7f8c6a7274d1 in /home/a/anaconda3/envs/bcw_env/lib/python3.6/site-packages/torch/lib/libtorch_python.so)
frame #11: + 0x4ebc2f (0x7f8c6a72cc2f in /home/a/anaconda3/envs/bcw_env/lib/python3.6/site-packages/torch/lib/libtorch_python.so)
frame #12: + 0x11642e (0x7f8c6a35742e in /home/a/anaconda3/envs/bcw_env/lib/python3.6/site-packages/torch/lib/libtorch_python.so)

frame #23: THPFunction_apply(_object
, _object
) + 0x581 (0x7f8c6a553ab1 in /home/a/anaconda3/envs/bcw_env/lib/python3.6/site-packages/torch/lib/libtorch_python.so)

'
Can you help me solve this problem?Thanks!

@Looson
Copy link

Looson commented Aug 22, 2019

Have you solve this problem? Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants