You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I get the following output, terminating with an error:
04-16 14:16:07, INFO The isonet_maps folder already exists, outputs will write into this folder
04-16 14:16:08, INFO voxel_size 1.125
04-16 14:16:11, WARNING The isonet_maps/J248_006_volume_map_half_A_data folder already exists. The old isonet_maps/J248_006_volume_map_half_A_data folder will be moved to isonet_maps/J248_006_volume_map_half_A_data~
04-16 14:16:11, WARNING The isonet_maps/J248_006_volume_map_half_B_data folder already exists. The old isonet_maps/J248_006_volume_map_half_B_data folder will be moved to isonet_maps/J248_006_volume_map_half_B_data~
04-16 14:16:11, INFO spIsoNet correction until resolution 3.95A!
Information beyond 3.95A remains unchanged
04-16 14:16:21, INFO Start preparing subvolumes!
04-16 14:16:54, INFO Done preparing subvolumes!
04-16 14:16:54, INFO Start training!
04-16 14:17:02, INFO Port number: 44237
learning rate 0.0003
['isonet_maps/J248_006_volume_map_half_A_data', 'isonet_maps/J248_006_volume_map_half_B_data']
Traceback (most recent call last):
File "/home/user/software/miniconda3/envs/spisonet/bin/spisonet.py", line 8, in<module>sys.exit(main())
File "/home/user/software/miniconda3/envs/spisonet/lib/python3.10/site-packages/spIsoNet/bin/spisonet.py", line 549, in main
fire.Fire(ISONET)
File "/home/user/software/miniconda3/envs/spisonet/lib/python3.10/site-packages/fire/core.py", line 143, in Fire
component_trace = _Fire(component, args, parsed_flag_args, context, name)
File "/home/user/software/miniconda3/envs/spisonet/lib/python3.10/site-packages/fire/core.py", line 477, in _Fire
component, remaining_args = _CallAndUpdateTrace(
File "/home/user/software/miniconda3/envs/spisonet/lib/python3.10/site-packages/fire/core.py", line 693, in _CallAndUpdateTrace
component = fn(*varargs, **kwargs)
File "/home/user/software/miniconda3/envs/spisonet/lib/python3.10/site-packages/spIsoNet/bin/spisonet.py", line 182, in reconstruct
map_refine_n2n(halfmap1,halfmap2, mask_vol, fsc3d, alpha = alpha,beta=beta, voxel_size=voxel_size, output_dir=output_dir,
File "/home/user/software/miniconda3/envs/spisonet/lib/python3.10/site-packages/spIsoNet/bin/map_refine.py", line 145, in map_refine_n2n
network.train([data_dir_1,data_dir_2], output_dir, alpha=alpha,beta=beta, output_base=output_base0, batch_size=batch_size, epochs = epochs, steps_per_epoch = 1000,
File "/home/user/software/miniconda3/envs/spisonet/lib/python3.10/site-packages/spIsoNet/models/network_n2n.py", line 265, in train
mp.spawn(ddp_train, args=(self.world_size, self.port_number, self.model,alpha,beta,
File "/home/user/software/miniconda3/envs/spisonet/lib/python3.10/site-packages/torch/multiprocessing/spawn.py", line 241, in spawn
return start_processes(fn, args, nprocs, join, daemon, start_method="spawn")
File "/home/user/software/miniconda3/envs/spisonet/lib/python3.10/site-packages/torch/multiprocessing/spawn.py", line 197, in start_processes
while not context.join():
File "/home/user/software/miniconda3/envs/spisonet/lib/python3.10/site-packages/torch/multiprocessing/spawn.py", line 158, in join
raise ProcessRaisedException(msg, error_index, failed_process.pid)
torch.multiprocessing.spawn.ProcessRaisedException:
-- Process 2 terminated with the following error:
Traceback (most recent call last):
File "/home/user/software/miniconda3/envs/spisonet/lib/python3.10/site-packages/torch/multiprocessing/spawn.py", line 68, in _wrap
fn(i, *args)
File "/home/user/software/miniconda3/envs/spisonet/lib/python3.10/site-packages/spIsoNet/models/network_n2n.py", line 68, in ddp_train
train_loader = torch.utils.data.DataLoader(train_dataset, batch_size=batch_size_gpu, persistent_workers=True,
File "/home/user/software/miniconda3/envs/spisonet/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 356, in __init__
batch_sampler = BatchSampler(sampler, batch_size, drop_last)
File "/home/user/software/miniconda3/envs/spisonet/lib/python3.10/site-packages/torch/utils/data/sampler.py", line 267, in __init__
raise ValueError(f"batch_size should be a positive integer value, but got batch_size={batch_size}")
ValueError: batch_size should be a positive integer value, but got batch_size=0
When I explicitly set the batch size to 4 (--batch_size 4) it still fails with the same error. Am I doing something wrong, or is this a bug of some kind? Happy to provide inputs if helpful.
EDIT:
Ah, it seems I was using the --acc_batch and --batch size parameters for a single GPU, but selecting 4 GPUs. When I use the recommended params for 4 GPUs (batch size 8, acc_batch 1), it runs out of GPU RAM though, even though all 4 GPUs are 11GB GPUs... increased acc_batch to 2 and it seems to run successfully now.
The text was updated successfully, but these errors were encountered:
Hi, when I try to run spisonet using the following command:
I get the following output, terminating with an error:
When I explicitly set the batch size to 4 (
--batch_size 4
) it still fails with the same error. Am I doing something wrong, or is this a bug of some kind? Happy to provide inputs if helpful.EDIT:
Ah, it seems I was using the
--acc_batch
and--batch size
parameters for a single GPU, but selecting 4 GPUs. When I use the recommended params for 4 GPUs (batch size 8, acc_batch 1), it runs out of GPU RAM though, even though all 4 GPUs are 11GB GPUs... increased acc_batch to 2 and it seems to run successfully now.The text was updated successfully, but these errors were encountered: