RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [3, 256, 7, 7]], which is output 0 of CudnnConvolutionBackward, is at version 1; expected version 0 instead. Hint: the backtrace further above shows the operation that failed to compute its gradient. The variable in question was changed in there or anywhere later. Good luck! #9

dogydev · 2020-02-25T21:08:13Z

RuntimeError: one of the variables needed for gradient computation has been modified by an in-place operation: [torch.cuda.FloatTensor [3, 256, 7, 7]], which is output 0 of CudnnConvolutionBackward, is at version 1; expected version 0 instead. Hint: the backtrace further above shows the operation that failed to compute its gradient. The variable in question was changed in there or anywhere later. Good luck!
while running the training code using the command provided in the readme. Further research showed that I could debug this error using the PyTorch anomaly detector. This showed that the error occurred when calling a forward function.
Full traceback:
/pytorch/torch/csrc/autograd/python_anomaly_mode.cpp:57: UserWarning: Traceback of forward call that caused the error:
File "scripts/trainval.py", line 401, in
main()
File "scripts/trainval.py", line 173, in main
main_train(train_dataset, validation_dataset, extra_dataset)
File "scripts/trainval.py", line 291, in main_train
train_epoch(epoch, trainer, train_dataloader, meters)
File "scripts/trainval.py", line 346, in train_epoch
loss, monitors, output_dict, extra_info = trainer.step(feed_dict, cast_tensor=False)
File "/home/user/Jacinle/jactorch/train/env.py", line 135, in step
loss, monitors, output_dict = self._model(feed_dict)
File "/home/user/.local/lib/python3.7/site-packages/torch/nn/modules/module.py", line 547, in call
result = self.forward(*input, **kwargs)
File "experiments/clevr/desc_nscl_derender.py", line 40, in forward
f_sng = self.scene_graph(f_scene, feed_dict.objects, feed_dict.objects_length)
File "/home/weichen/.local/lib/python3.7/site-packages/torch/nn/modules/module.py", line 547, in call
result = self.forward(*input, **kwargs)
File "/data2/PycharmProjects/NSCL-PyTorch-Release/nscl/nn/scene_graph/scene_graph.py", line 125, in forward
this_object_features[sub_id], this_object_features[obj_id],

Traceback (most recent call last):
File "scripts/trainval.py", line 401, in
main()
File "scripts/trainval.py", line 173, in main
main_train(train_dataset, validation_dataset, extra_dataset)
File "scripts/trainval.py", line 291, in main_train
train_epoch(epoch, trainer, train_dataloader, meters)
File "scripts/trainval.py", line 346, in train_epoch
loss, monitors, output_dict, extra_info = trainer.step(feed_dict, cast_tensor=False)
File "/home/user/Jacinle/jactorch/train/env.py", line 155, in step
loss.backward()
File "/home/user/.local/lib/python3.7/site-packages/torch/tensor.py", line 118, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph)
File "/home/user/.local/lib/python3.7/site-packages/torch/autograd/init.py", line 93, in backward
allow_unreachable=True) # allow_unreachable flag
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [3, 256, 7, 7]], which is output 0 of CudnnConvolutionBackward, is at version 1; expected version 0 instead. Hint: the backtrace further above shows the operation that failed to compute its gradient. The variable in question was changed in there or anywhere later. Good luck!

The error occurred in epoch 6. The loss value seemed to be decreasing normally.
Epoch 6 acc/qa=0.687500 loss=0.708516 loss/qa=0.708516 time/data=0.542045 time/step=1.222588: 0%| | 1/469 [00:01<13:45, 1.76s/i
Epoch 6 acc/qa=0.562500 loss=0.713307 loss/qa=0.713307 time/data=0.009401 time/step=1.012083: 0%| | 1/469 [00:02<13:45, 1.76s/i
Epoch 6 acc/qa=0.562500 loss=0.713307 loss/qa=0.713307 time/data=0.009401 time/step=1.012083: 0%| | 2/469 [00:02<11:59, 1.54s/i
Epoch 6 acc/qa=0.812500 loss=0.566920 loss/qa=0.566920 time/data=0.011552 time/step=1.050238: 0%| | 2/469 [00:03<11:59, 1.54s/i
Epoch 6 acc/qa=0.812500 loss=0.566920 loss/qa=0.566920 time/data=0.011552 time/step=1.050238: 1%| | 3/469 [00:03<10:51, 1.40s/it]/

Any ideas,
Thanks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

dogydev commented Feb 25, 2020

Comments

dogydev commented Feb 25, 2020