Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Errors that occur during training(训练出错) #68

Open
shenshaowei opened this issue Jun 11, 2024 · 3 comments
Open

Errors that occur during training(训练出错) #68

shenshaowei opened this issue Jun 11, 2024 · 3 comments

Comments

@shenshaowei
Copy link

(sam3d) a@a-Super-Server:/media/a/DATA/ssw-baselines/SAM-Med3D$ python3 train.py
Loaded checkpoint from ckpt/sam_med3d.pth (epoch 0)
Epoch: 0/199
0%| | 0/150 [00:00<?, ?it/s]/home/a/miniconda3/envs/sam3d/lib/python3.9/site-packages/torch/nn/modules/conv.py:605: UserWarning: Plan failed with a cudnnException: CUDNN_BACKEND_EXECUTION_PLAN_DESCRIPTOR: cudnnFinalize Descriptor Failed cudnn_status: CUDNN_STATUS_NOT_SUPPORTED (Triggered internally at ../aten/src/ATen/native/cudnn/Conv_v8.cpp:919.)
return F.conv3d(
1%|▌ | 1/150 [00:05<14:11, 5.72s/it]
Traceback (most recent call last):
File "/media/a/DATA/ssw-baselines/SAM-Med3D/train.py", line 520, in
main()
File "/media/a/DATA/ssw-baselines/SAM-Med3D/train.py", line 479, in main
trainer.train()
File "/media/a/DATA/ssw-baselines/SAM-Med3D/train.py", line 374, in train
epoch_loss, epoch_iou, epoch_dice, pred_list = self.train_epoch(epoch, num_clicks)
File "/media/a/DATA/ssw-baselines/SAM-Med3D/train.py", line 294, in train_epoch
for step, (image3D, gt3D) in enumerate(tbar):
File "/home/a/miniconda3/envs/sam3d/lib/python3.9/site-packages/tqdm/std.py", line 1181, in iter
for obj in iterable:
File "/home/a/miniconda3/envs/sam3d/lib/python3.9/site-packages/prefetch_generator/init.py", line 116, in next
raise next_item
File "/home/a/miniconda3/envs/sam3d/lib/python3.9/site-packages/prefetch_generator/init.py", line 98, in run
for item in self.generator: self.queue.put((True , item))
File "/home/a/miniconda3/envs/sam3d/lib/python3.9/site-packages/torch/utils/data/dataloader.py", line 631, in next
data = self._next_data()
File "/home/a/miniconda3/envs/sam3d/lib/python3.9/site-packages/torch/utils/data/dataloader.py", line 1326, in _next_data
return self._process_data(data)
File "/home/a/miniconda3/envs/sam3d/lib/python3.9/site-packages/torch/utils/data/dataloader.py", line 1372, in _process_data
data.reraise()
File "/home/a/miniconda3/envs/sam3d/lib/python3.9/site-packages/torch/_utils.py", line 705, in reraise
raise exception
RuntimeError: Caught RuntimeError in DataLoader worker process 1.
Original Traceback (most recent call last):
File "/home/a/miniconda3/envs/sam3d/lib/python3.9/site-packages/torch/utils/data/_utils/worker.py", line 308, in _worker_loop
data = fetcher.fetch(index) # type: ignore[possibly-undefined]
File "/home/a/miniconda3/envs/sam3d/lib/python3.9/site-packages/torch/utils/data/_utils/fetch.py", line 54, in fetch
return self.collate_fn(data)
File "/home/a/miniconda3/envs/sam3d/lib/python3.9/site-packages/torch/utils/data/_utils/collate.py", line 316, in default_collate
return collate(batch, collate_fn_map=default_collate_fn_map)
File "/home/a/miniconda3/envs/sam3d/lib/python3.9/site-packages/torch/utils/data/_utils/collate.py", line 173, in collate
return [collate(samples, collate_fn_map=collate_fn_map) for samples in transposed] # Backwards compatibility.
File "/home/a/miniconda3/envs/sam3d/lib/python3.9/site-packages/torch/utils/data/_utils/collate.py", line 173, in
return [collate(samples, collate_fn_map=collate_fn_map) for samples in transposed] # Backwards compatibility.
File "/home/a/miniconda3/envs/sam3d/lib/python3.9/site-packages/torch/utils/data/_utils/collate.py", line 141, in collate
return collate_fn_map[elem_type](batch, collate_fn_map=collate_fn_map)
File "/home/a/miniconda3/envs/sam3d/lib/python3.9/site-packages/torch/utils/data/_utils/collate.py", line 213, in collate_tensor_fn
return torch.stack(batch, 0, out=out)
RuntimeError: torch.cat(): input types can't be cast to the desired output type Int

It's been stuck for a long time, is there a solution?(卡了很久了,有解决办法吗?)

@shenshaowei
Copy link
Author

1718125464770
这是错误截图

@RRouhi
Copy link

RRouhi commented Jun 22, 2024

I got the same error "RuntimeError: torch.cat(): input types can't be cast to the desired output type Int". Setting the --batch_size 1 solved the issue.

@tuan-ld
Copy link

tuan-ld commented Jul 9, 2024

I got the same error "RuntimeError: torch.cat(): input types can't be cast to the desired output type Int". Setting the --batch_size 1 solved the issue.

Thank you, it works!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants