Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RuntimeError: Caught RuntimeError in replica 1 on device 1. #27

Open
17853313621 opened this issue May 30, 2021 · 0 comments
Open

RuntimeError: Caught RuntimeError in replica 1 on device 1. #27

17853313621 opened this issue May 30, 2021 · 0 comments

Comments

@17853313621
Copy link

python3 train.py /media/disk1/xgl/cc-pil/formatted --dispnet DispResNet6 --posenet PoseNetB6 --masknet MaskNet6 --flownet Back2Future --pretrained-disp /media/disk1/xgl/cc-pil/geometry/dispnet_k.pth.tar --pretrained-pose /media/disk1/xgl/cc-pil/geometry/posenet.pth.tar --pretrained-flow /media/disk1/xgl/cc-pil/geometry/back2future.pth.tar --pretrained-mask /media/disk1/xgl/cc-pil/geometry/masknet.pth.tar -b4 -m0.1 -pf 0.5 -pc 1.0 -s0.1 -c0.3 --epoch-size 1000 --log-output -f 0 --nlevels 6 --lr 1e-4 -wssim 0.997 --with-flow-gt --with-depth-gt --epochs 100 --smoothness-type edgeaware --fix-masknet --fix-flownet --log-terminal --name EXPERIMENT_NAME
=> will save everything to checkpoints/EXPERIMENT_NAME
=> fetching scenes in '/media/disk1/xgl/cc-pil/formatted'
588 samples found in 5 train scenes
154 samples found in 1 valid scenes
=> creating model
=> using pre-trained weights for explainabilty and pose net
=> using pre-trained weights for explainabilty and pose net
=> using pre-trained weights from /media/disk1/xgl/cc-pil/geometry/dispnet_k.pth.tar
=> using pre-trained weights for FlowNet
=> setting adam solver

N/A% (0 of 100) | | Elapsed Time: 0:00:00 ETA: --:--:--

N/A% (0 of 147) | | Elapsed Time: 0:00:00 ETA: --:--:--

N/A% (0 of 38) | | Elapsed Time: 0:00:00 ETA: --:--:--

/home/xgl/anaconda3/envs/cuda10/lib/python3.7/site-packages/torch/nn/functional.py:2941: UserWarning: nn.functional.upsample is deprecated. Use nn.functional.interpolate instead.
warnings.warn("nn.functional.upsample is deprecated. Use nn.functional.interpolate instead.")
/home/xgl/anaconda3/envs/cuda10/lib/python3.7/site-packages/torch/nn/functional.py:3121: UserWarning: Default upsampling behavior when mode=bilinear is changed to align_corners=False since 0.4.0. Please specify align_corners=True if the old behavior is desired. See the documentation of nn.Upsample for details.
"See the documentation of nn.Upsample for details.".format(mode))
/home/xgl/anaconda3/envs/cuda10/lib/python3.7/site-packages/torch/nn/functional.py:1625: UserWarning: nn.functional.sigmoid is deprecated. Use torch.sigmoid instead.
warnings.warn("nn.functional.sigmoid is deprecated. Use torch.sigmoid instead.")
/home/xgl/anaconda3/envs/cuda10/lib/python3.7/site-packages/torch/nn/functional.py:3384: UserWarning: Default grid_sample and affine_grid behavior has changed to align_corners=False since 1.3.0. Please specify align_corners=True if the old behavior is desired. See the documentation of grid_sample for details.
warnings.warn("Default grid_sample and affine_grid behavior has changed "
Traceback (most recent call last):
File "train.py", line 784, in
main()
File "train.py", line 353, in main
train_loss = train(train_loader, disp_net, pose_net, mask_net, flow_net, optimizer, args.epoch_size, logger, training_writer)
File "train.py", line 463, in train
flow_fwd, flow_bwd, _ = flow_net(tgt_img_var, ref_imgs_var[1:3])
File "/home/xgl/anaconda3/envs/cuda10/lib/python3.7/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/xgl/anaconda3/envs/cuda10/lib/python3.7/site-packages/torch/nn/parallel/data_parallel.py", line 155, in forward
outputs = self.parallel_apply(replicas, inputs, kwargs)
File "/home/xgl/anaconda3/envs/cuda10/lib/python3.7/site-packages/torch/nn/parallel/data_parallel.py", line 165, in parallel_apply
return parallel_apply(replicas, inputs, kwargs, self.device_ids[:len(replicas)])
File "/home/xgl/anaconda3/envs/cuda10/lib/python3.7/site-packages/torch/nn/parallel/parallel_apply.py", line 85, in parallel_apply
output.reraise()
File "/home/xgl/anaconda3/envs/cuda10/lib/python3.7/site-packages/torch/_utils.py", line 395, in reraise
raise self.exc_type(msg)
RuntimeError: Caught RuntimeError in replica 1 on device 1.
Original Traceback (most recent call last):
100% (100 of 100) |###################################################################################################################################################| Elapsed Time: 0:00:04 Time: 0:00:04
output = module(*input, **kwargs)
File "/home/xgl/anaconda3/envs/cuda10/lib/python3.7/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
100% (147 of 147) |###################################################################################################################################################| Elapsed Time: 0:00:04 Time: 0:00:04
File "/media/disk1/xgl/cc-pil/models/back2future.py", line 174, in forward
corr6_fwd = corr6_fwd.index_select(1,self.idx_fwd)
100% (38 of 38) |#####################################################################################################################################################| Elapsed Time: 0:00:04 Time: 0:00:04

Excuse me, can you help me solve this problem? thank you very much.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant