You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am using Ubuntu 22.04 system with dual 4090GPU and 18.04Ubuntu under Docker. After configuring the environment and modifying the code, I tried to train a dataset of 3 frames per group. The training error message is as follows:
Do any friends know how to solve it? Thank you very much!
(flavr_env) root@22727250d64b :/dataset/FLAVR# python main.py --batch_size 32 --test_batch_size 32 --dataset vimeo90K_septuplet --loss 1L1 --max_epoch 200 --lr 0.0002 --data_root /dataset/vimeo_triplet --n_outputs 1
CUDA version: 10.1
CuDNN version: 7603
Is CUDA available: True
Namespace(batch_size=32, beta1=0.9, beta2=0.99, checkpoint_dir='.', cuda=True, data_root='/dataset/vimeo_triplet', dataset='vimeo90K_septuplet', exp_name='exp', joinType='concat', load_from=None, log_iter=60, loss='1L1', lr=0.0002, max_epoch=200, model='unet_18', n_outputs=1, nbr_frame=4, nbr_width=1, num_gpu=1, num_workers=16, pretrained=None, random_seed=12345, resume=False, resume_exp=None, start_epoch=0, test_batch_size=32, upmode='transpose', use_tensorboard=False, val_freq=1)
Building model: unet_18
Preparing loss function:
1.000 * L1
terminate called after throwing an instance of 'std::runtime_error'
what(): NCCL Error 1: unhandled cuda error
Aborted (core dumped)
The text was updated successfully, but these errors were encountered:
I am using Ubuntu 22.04 system with dual 4090GPU and 18.04Ubuntu under Docker. After configuring the environment and modifying the code, I tried to train a dataset of 3 frames per group. The training error message is as follows:
Do any friends know how to solve it? Thank you very much!
(flavr_env) root@22727250d64b :/dataset/FLAVR# python main.py --batch_size 32 --test_batch_size 32 --dataset vimeo90K_septuplet --loss 1L1 --max_epoch 200 --lr 0.0002 --data_root /dataset/vimeo_triplet --n_outputs 1
CUDA version: 10.1
CuDNN version: 7603
Is CUDA available: True
Namespace(batch_size=32, beta1=0.9, beta2=0.99, checkpoint_dir='.', cuda=True, data_root='/dataset/vimeo_triplet', dataset='vimeo90K_septuplet', exp_name='exp', joinType='concat', load_from=None, log_iter=60, loss='1L1', lr=0.0002, max_epoch=200, model='unet_18', n_outputs=1, nbr_frame=4, nbr_width=1, num_gpu=1, num_workers=16, pretrained=None, random_seed=12345, resume=False, resume_exp=None, start_epoch=0, test_batch_size=32, upmode='transpose', use_tensorboard=False, val_freq=1)
Building model: unet_18
Preparing loss function:
1.000 * L1
terminate called after throwing an instance of 'std::runtime_error'
what(): NCCL Error 1: unhandled cuda error
Aborted (core dumped)
The text was updated successfully, but these errors were encountered: