Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

An error occurred while I was training model #1

Open
a726 opened this issue Jun 8, 2020 · 5 comments
Open

An error occurred while I was training model #1

a726 opened this issue Jun 8, 2020 · 5 comments

Comments

@a726
Copy link

a726 commented Jun 8, 2020

[06.08.20|11:09:56] Training epoch: 0
Traceback (most recent call last):
File "main.py", line 33, in
p.start()
File "D:\zxy\st-gcn\processor\processor.py", line 113, in start
self.train()
File "D:\zxy\st-gcn\processor\recognition.py", line 84, in train
for data, label in loader:
File "C:\Users\admin\Anaconda3\envs\pytorch\lib\site-packages\torch\utils\data\dataloader.py", line 819, in iter
return _DataLoaderIter(self)
File "C:\Users\admin\Anaconda3\envs\pytorch\lib\site-packages\torch\utils\data\dataloader.py", line 560, in init
w.start()
File "C:\Users\admin\Anaconda3\envs\pytorch\lib\multiprocessing\process.py", line 105, in start
self._popen = self._Popen(self)
File "C:\Users\admin\Anaconda3\envs\pytorch\lib\multiprocessing\context.py", line 223, in _Popen
return _default_context.get_context().Process._Popen(process_obj)
File "C:\Users\admin\Anaconda3\envs\pytorch\lib\multiprocessing\context.py", line 322, in _Popen
return Popen(process_obj)
File "C:\Users\admin\Anaconda3\envs\pytorch\lib\multiprocessing\popen_spawn_win32.py", line 65, in init
reduction.dump(process_obj, to_child)
File "C:\Users\admin\Anaconda3\envs\pytorch\lib\multiprocessing\reduction.py", line 60, in dump
ForkingPickler(file, protocol).dump(obj)
OverflowError: cannot serialize a bytes object larger than 4 GiB
Traceback (most recent call last):
File "", line 1, in
File "C:\Users\admin\Anaconda3\envs\pytorch\lib\multiprocessing\spawn.py", line 105, in spawn_main
exitcode = _main(fd)
File "C:\Users\admin\Anaconda3\envs\pytorch\lib\multiprocessing\spawn.py", line 115, in _main
self = reduction.pickle.load(from_parent)
EOFError: Ran out of input

Can you help me?

@XieLinMofromsomewhere
Copy link

i got this error too.
"OverflowError: cannot serialize a bytes object larger than 4 GiB" is the problem,
may you need a good GPU.
Or you can try to change the code , for example batchsize or something else , i am doing this now

@a726
Copy link
Author

a726 commented Jun 8, 2020 via email

@a726
Copy link
Author

a726 commented Jun 14, 2020 via email

@XieLinMofromsomewhere
Copy link

XieLinMofromsomewhere commented Jun 24, 2020

你解决了吗,我昨天搞了下,发现是因为训练集太大了,一次读入的是29GB,而我的内存是16GB,所以内存不够,或许需要更大的内存。
你可以试试把验证集(2GB)改名为训练集的名字(标签也要改),再看看进行训练,试试还报这个错不

@Thomas-yx
Copy link

@XieLinMofromsomewhere Hello! I also encountered this problem when training the model. My laptop has only 16GB of RAM. How do you solve it? I am looking forward to your reply.Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants