-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
An error occurred while I was training model #1
Comments
i got this error too. |
What a coincidence, and you reply to me.
My GPU is the NVIDIA GeForce GTX 1080 Ti, there are 4 computer memory to run is 64,
and I changed batch_size and num_epoch, still the same error.
…------------------ 原始邮件 ------------------
发件人: "Weixin Luo (罗伟鑫)"<[email protected]>;
发送时间: 2020年6月8日(星期一) 下午4:16
收件人: "1zgh/st-gcn"<[email protected]>;
抄送: "张小媛"<[email protected]>;"Author"<[email protected]>;
主题: Re: [1zgh/st-gcn] An error occurred while I was training model (#1)
i got this error too.
"OverflowError: cannot serialize a bytes object larger than 4 GiB" is the problem,
may you need a good GPU.
Or you can try to change the code , for example batchsize or something else , i am doing this now
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub, or unsubscribe.
|
Hello, I use the model to test demo own training, the following error occurs:
Traceback (most recent call last):
File "C:\Users\admin\Anaconda3\envs\pytorch\lib\site-packages\torchlight-1.0-py3.6.egg\torchlight\io.py", line 82, in load_weights
__doc__ = _io._TextIOBase.__doc__
File "C:\Users\admin\Anaconda3\envs\pytorch\lib\site-packages\torch\nn\modules\module.py", line 769, in load_state_dict
self.__class__.__name__, "\n\t".join(error_msgs)))
RuntimeError: Error(s) in loading state_dict for Model:
size mismatch for A: copying a param with shape torch.Size([3, 25, 25]) from checkpoint, the shape in current model is torch.Size([3, 18, 18]).
size mismatch for data_bn.weight: copying a param with shape torch.Size([75]) from checkpoint, the shape in current model is torch.Size([54]).
size mismatch for data_bn.bias: copying a param with shape torch.Size([75]) from checkpoint, the shape in current model is torch.Size([54]).
size mismatch for data_bn.running_mean: copying a param with shape torch.Size([75]) from checkpoint, the shape in current model is torch.Size([54]).
size mismatch for data_bn.running_var: copying a param with shape torch.Size([75]) from checkpoint, the shape in current model is torch.Size([54]).
size mismatch for edge_importance.0: copying a param with shape torch.Size([3, 25, 25]) from checkpoint, the shape in current model is torch.Size([3, 18, 18]).
size mismatch for edge_importance.1: copying a param with shape torch.Size([3, 25, 25]) from checkpoint, the shape in current model is torch.Size([3, 18, 18]).
size mismatch for edge_importance.2: copying a param with shape torch.Size([3, 25, 25]) from checkpoint, the shape in current model is torch.Size([3, 18, 18]).
size mismatch for edge_importance.3: copying a param with shape torch.Size([3, 25, 25]) from checkpoint, the shape in current model is torch.Size([3, 18, 18]).
size mismatch for edge_importance.4: copying a param with shape torch.Size([3, 25, 25]) from checkpoint, the shape in current model is torch.Size([3, 18, 18]).
size mismatch for edge_importance.5: copying a param with shape torch.Size([3, 25, 25]) from checkpoint, the shape in current model is torch.Size([3, 18, 18]).
size mismatch for edge_importance.6: copying a param with shape torch.Size([3, 25, 25]) from checkpoint, the shape in current model is torch.Size([3, 18, 18]).
size mismatch for edge_importance.7: copying a param with shape torch.Size([3, 25, 25]) from checkpoint, the shape in current model is torch.Size([3, 18, 18]).
size mismatch for edge_importance.8: copying a param with shape torch.Size([3, 25, 25]) from checkpoint, the shape in current model is torch.Size([3, 18, 18]).
size mismatch for edge_importance.9: copying a param with shape torch.Size([3, 25, 25]) from checkpoint, the shape in current model is torch.Size([3, 18, 18]).
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "main.py", line 31, in <module>
p = Processor(sys.argv[2:])
File "D:\zxy\st-gcn\processor\io.py", line 28, in __init__
self.load_weights()
File "D:\zxy\st-gcn\processor\io.py", line 75, in load_weights
self.arg.ignore_weights)
File "C:\Users\admin\Anaconda3\envs\pytorch\lib\site-packages\torchlight-1.0-py3.6.egg\torchlight\io.py", line 89, in load_weights
File "C:\Users\admin\Anaconda3\envs\pytorch\lib\site-packages\torch\nn\modules\module.py", line 769, in load_state_dict
self.__class__.__name__, "\n\t".join(error_msgs)))
RuntimeError: Error(s) in loading state_dict for Model:
size mismatch for A: copying a param with shape torch.Size([3, 25, 25]) from checkpoint, the shape in current model is torch.Size([3, 18, 18]).
size mismatch for data_bn.weight: copying a param with shape torch.Size([75]) from checkpoint, the shape in current model is torch.Size([54]).
size mismatch for data_bn.bias: copying a param with shape torch.Size([75]) from checkpoint, the shape in current model is torch.Size([54]).
size mismatch for data_bn.running_mean: copying a param with shape torch.Size([75]) from checkpoint, the shape in current model is torch.Size([54]).
size mismatch for data_bn.running_var: copying a param with shape torch.Size([75]) from checkpoint, the shape in current model is torch.Size([54]).
size mismatch for edge_importance.0: copying a param with shape torch.Size([3, 25, 25]) from checkpoint, the shape in current model is torch.Size([3, 18, 18]).
size mismatch for edge_importance.1: copying a param with shape torch.Size([3, 25, 25]) from checkpoint, the shape in current model is torch.Size([3, 18, 18]).
size mismatch for edge_importance.2: copying a param with shape torch.Size([3, 25, 25]) from checkpoint, the shape in current model is torch.Size([3, 18, 18]).
size mismatch for edge_importance.3: copying a param with shape torch.Size([3, 25, 25]) from checkpoint, the shape in current model is torch.Size([3, 18, 18]).
size mismatch for edge_importance.4: copying a param with shape torch.Size([3, 25, 25]) from checkpoint, the shape in current model is torch.Size([3, 18, 18]).
size mismatch for edge_importance.5: copying a param with shape torch.Size([3, 25, 25]) from checkpoint, the shape in current model is torch.Size([3, 18, 18]).
size mismatch for edge_importance.6: copying a param with shape torch.Size([3, 25, 25]) from checkpoint, the shape in current model is torch.Size([3, 18, 18]).
size mismatch for edge_importance.7: copying a param with shape torch.Size([3, 25, 25]) from checkpoint, the shape in current model is torch.Size([3, 18, 18]).
size mismatch for edge_importance.8: copying a param with shape torch.Size([3, 25, 25]) from checkpoint, the shape in current model is torch.Size([3, 18, 18]).
size mismatch for edge_importance.9: copying a param with shape torch.Size([3, 25, 25]) from checkpoint, the shape in current model is torch.Size([3, 18, 18]).
do you know this is where there is a problem, how should amend? thank you very much!
…------------------ 原始邮件 ------------------
发件人: "Weixin Luo (罗伟鑫)"<[email protected]>;
发送时间: 2020年6月8日(星期一) 下午4:16
收件人: "1zgh/st-gcn"<[email protected]>;
抄送: "张小媛"<[email protected]>;"Author"<[email protected]>;
主题: Re: [1zgh/st-gcn] An error occurred while I was training model (#1)
i got this error too.
"OverflowError: cannot serialize a bytes object larger than 4 GiB" is the problem,
may you need a good GPU.
Or you can try to change the code , for example batchsize or something else , i am doing this now
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub, or unsubscribe.
|
你解决了吗,我昨天搞了下,发现是因为训练集太大了,一次读入的是29GB,而我的内存是16GB,所以内存不够,或许需要更大的内存。 |
@XieLinMofromsomewhere Hello! I also encountered this problem when training the model. My laptop has only 16GB of RAM. How do you solve it? I am looking forward to your reply.Thank you! |
[06.08.20|11:09:56] Training epoch: 0
Traceback (most recent call last):
File "main.py", line 33, in
p.start()
File "D:\zxy\st-gcn\processor\processor.py", line 113, in start
self.train()
File "D:\zxy\st-gcn\processor\recognition.py", line 84, in train
for data, label in loader:
File "C:\Users\admin\Anaconda3\envs\pytorch\lib\site-packages\torch\utils\data\dataloader.py", line 819, in iter
return _DataLoaderIter(self)
File "C:\Users\admin\Anaconda3\envs\pytorch\lib\site-packages\torch\utils\data\dataloader.py", line 560, in init
w.start()
File "C:\Users\admin\Anaconda3\envs\pytorch\lib\multiprocessing\process.py", line 105, in start
self._popen = self._Popen(self)
File "C:\Users\admin\Anaconda3\envs\pytorch\lib\multiprocessing\context.py", line 223, in _Popen
return _default_context.get_context().Process._Popen(process_obj)
File "C:\Users\admin\Anaconda3\envs\pytorch\lib\multiprocessing\context.py", line 322, in _Popen
return Popen(process_obj)
File "C:\Users\admin\Anaconda3\envs\pytorch\lib\multiprocessing\popen_spawn_win32.py", line 65, in init
reduction.dump(process_obj, to_child)
File "C:\Users\admin\Anaconda3\envs\pytorch\lib\multiprocessing\reduction.py", line 60, in dump
ForkingPickler(file, protocol).dump(obj)
OverflowError: cannot serialize a bytes object larger than 4 GiB
Traceback (most recent call last):
File "", line 1, in
File "C:\Users\admin\Anaconda3\envs\pytorch\lib\multiprocessing\spawn.py", line 105, in spawn_main
exitcode = _main(fd)
File "C:\Users\admin\Anaconda3\envs\pytorch\lib\multiprocessing\spawn.py", line 115, in _main
self = reduction.pickle.load(from_parent)
EOFError: Ran out of input
Can you help me?
The text was updated successfully, but these errors were encountered: