RuntimeError: Caught RuntimeError in replica 1 on device 1. #27

17853313621 · 2021-05-30T13:30:18Z

python3 train.py /media/disk1/xgl/cc-pil/formatted --dispnet DispResNet6 --posenet PoseNetB6 --masknet MaskNet6 --flownet Back2Future --pretrained-disp /media/disk1/xgl/cc-pil/geometry/dispnet_k.pth.tar --pretrained-pose /media/disk1/xgl/cc-pil/geometry/posenet.pth.tar --pretrained-flow /media/disk1/xgl/cc-pil/geometry/back2future.pth.tar --pretrained-mask /media/disk1/xgl/cc-pil/geometry/masknet.pth.tar -b4 -m0.1 -pf 0.5 -pc 1.0 -s0.1 -c0.3 --epoch-size 1000 --log-output -f 0 --nlevels 6 --lr 1e-4 -wssim 0.997 --with-flow-gt --with-depth-gt --epochs 100 --smoothness-type edgeaware --fix-masknet --fix-flownet --log-terminal --name EXPERIMENT_NAME
=> will save everything to checkpoints/EXPERIMENT_NAME
=> fetching scenes in '/media/disk1/xgl/cc-pil/formatted'
588 samples found in 5 train scenes
154 samples found in 1 valid scenes
=> creating model
=> using pre-trained weights for explainabilty and pose net
=> using pre-trained weights for explainabilty and pose net
=> using pre-trained weights from /media/disk1/xgl/cc-pil/geometry/dispnet_k.pth.tar
=> using pre-trained weights for FlowNet
=> setting adam solver

N/A% (0 of 100) | | Elapsed Time: 0:00:00 ETA: --:--:--

N/A% (0 of 147) | | Elapsed Time: 0:00:00 ETA: --:--:--

N/A% (0 of 38) | | Elapsed Time: 0:00:00 ETA: --:--:--

/home/xgl/anaconda3/envs/cuda10/lib/python3.7/site-packages/torch/nn/functional.py:2941: UserWarning: nn.functional.upsample is deprecated. Use nn.functional.interpolate instead.
warnings.warn("nn.functional.upsample is deprecated. Use nn.functional.interpolate instead.")
/home/xgl/anaconda3/envs/cuda10/lib/python3.7/site-packages/torch/nn/functional.py:3121: UserWarning: Default upsampling behavior when mode=bilinear is changed to align_corners=False since 0.4.0. Please specify align_corners=True if the old behavior is desired. See the documentation of nn.Upsample for details.
"See the documentation of nn.Upsample for details.".format(mode))
/home/xgl/anaconda3/envs/cuda10/lib/python3.7/site-packages/torch/nn/functional.py:1625: UserWarning: nn.functional.sigmoid is deprecated. Use torch.sigmoid instead.
warnings.warn("nn.functional.sigmoid is deprecated. Use torch.sigmoid instead.")
/home/xgl/anaconda3/envs/cuda10/lib/python3.7/site-packages/torch/nn/functional.py:3384: UserWarning: Default grid_sample and affine_grid behavior has changed to align_corners=False since 1.3.0. Please specify align_corners=True if the old behavior is desired. See the documentation of grid_sample for details.
warnings.warn("Default grid_sample and affine_grid behavior has changed "
Traceback (most recent call last):
File "train.py", line 784, in
main()
File "train.py", line 353, in main
train_loss = train(train_loader, disp_net, pose_net, mask_net, flow_net, optimizer, args.epoch_size, logger, training_writer)
File "train.py", line 463, in train
flow_fwd, flow_bwd, _ = flow_net(tgt_img_var, ref_imgs_var[1:3])
File "/home/xgl/anaconda3/envs/cuda10/lib/python3.7/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/xgl/anaconda3/envs/cuda10/lib/python3.7/site-packages/torch/nn/parallel/data_parallel.py", line 155, in forward
outputs = self.parallel_apply(replicas, inputs, kwargs)
File "/home/xgl/anaconda3/envs/cuda10/lib/python3.7/site-packages/torch/nn/parallel/data_parallel.py", line 165, in parallel_apply
return parallel_apply(replicas, inputs, kwargs, self.device_ids[:len(replicas)])
File "/home/xgl/anaconda3/envs/cuda10/lib/python3.7/site-packages/torch/nn/parallel/parallel_apply.py", line 85, in parallel_apply
output.reraise()
File "/home/xgl/anaconda3/envs/cuda10/lib/python3.7/site-packages/torch/_utils.py", line 395, in reraise
raise self.exc_type(msg)
RuntimeError: Caught RuntimeError in replica 1 on device 1.
Original Traceback (most recent call last):
100% (100 of 100) |###################################################################################################################################################| Elapsed Time: 0:00:04 Time: 0:00:04
output = module(*input, **kwargs)
File "/home/xgl/anaconda3/envs/cuda10/lib/python3.7/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
100% (147 of 147) |###################################################################################################################################################| Elapsed Time: 0:00:04 Time: 0:00:04
File "/media/disk1/xgl/cc-pil/models/back2future.py", line 174, in forward
corr6_fwd = corr6_fwd.index_select(1,self.idx_fwd)
100% (38 of 38) |#####################################################################################################################################################| Elapsed Time: 0:00:04 Time: 0:00:04

Excuse me, can you help me solve this problem? thank you very much.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RuntimeError: Caught RuntimeError in replica 1 on device 1. #27

RuntimeError: Caught RuntimeError in replica 1 on device 1. #27

17853313621 commented May 30, 2021

RuntimeError: Caught RuntimeError in replica 1 on device 1. #27

RuntimeError: Caught RuntimeError in replica 1 on device 1. #27

Comments

17853313621 commented May 30, 2021