Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OSError: [Errno 12] Cannot allocate memory #241

Closed
daibin88 opened this issue Jul 29, 2021 · 1 comment
Closed

OSError: [Errno 12] Cannot allocate memory #241

daibin88 opened this issue Jul 29, 2021 · 1 comment

Comments

@daibin88
Copy link

I use our voc format dataset train yolox-m model,when train 75 epoch, error occur,as follow:
2021-07-29 09:47:41.510 | INFO | yolox.core.trainer:after_iter:237 - epoch: 75/150, iter: 1440/1445, mem: 12955Mb, iter_time: 0.961s, data_time: 0.563s, total_loss: 4.0, iou_loss: 1.8, l1_loss: 0.0, conf_loss: 1.4, cls_loss: 0.8, lr: 2.339e-03, size: 576, ETA: 1 day, 0:24:16
2021-07-29 09:47:45.812 | INFO | yolox.core.trainer:save_ckpt:307 - Save weights to ./YOLOX_outputs/yolox_voc_m
2021-07-29 09:47:50.812 | INFO | yolox.core.trainer:after_train:183 - Training of experiment is done and the best AP is 46.55
2021-07-29 09:47:50.812 | ERROR | yolox.core.launch:_distributed_worker:104 - An error has been caught in function '_distributed_worker', process 'SpawnProcess-1' (1941), thread 'MainThread' (140297766901504):
Traceback (most recent call last):

File "", line 1, in
File "/home/anaconda3/envs/yolox/lib/python3.8/multiprocessing/spawn.py", line 116, in spawn_main
exitcode = _main(fd, parent_sentinel)
│ │ └ 3
│ └ 24
└ <function _main at 0x7f999cea74c0>
File "/home/anaconda3/envs/yolox/lib/python3.8/multiprocessing/spawn.py", line 129, in _main
return self._bootstrap(parent_sentinel)
│ │ └ 3
│ └ <function BaseProcess._bootstrap at 0x7f999cf8d670>

File "/home/anaconda3/envs/yolox/lib/python3.8/multiprocessing/process.py", line 315, in _bootstrap
self.run()
│ └ <function BaseProcess.run at 0x7f999cfa4ca0>

File "/home/anaconda3/envs/yolox/lib/python3.8/multiprocessing/process.py", line 108, in run
self._target(*self._args, **self._kwargs)
│ │ │ │ │ └ {}
│ │ │ │ └
│ │ │ └ (<function _distributed_worker at 0x7f98d82c8e50>, 0, (<function main at 0x7f98caa14d30>, 4, 4, 0, 'nccl', 'tcp://127.0.0.1:3...
│ │ └
│ └ <function _wrap at 0x7f98dc8f0ca0>

File "/home/anaconda3/envs/yolox/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 59, in _wrap
fn(i, *args)
│ │ └ (<function main at 0x7f98caa14d30>, 4, 4, 0, 'nccl', 'tcp://127.0.0.1:38695', (╒══════════════════╤══════════════════════════...
│ └ 0
└ <function _distributed_worker at 0x7f98d82c8e50>

File "/home/dane/project/detection/YOLOX/yolox/core/launch.py", line 104, in _distributed_worker
main_func(*args)
│ └ (╒══════════════════╤════════════════════════════════════════════════════════════════════════════════════════════════════════...
└ <function main at 0x7f98caa14d30>

File "/home/**/dane/project/detection/YOLOX/tools/train.py", line 101, in main
trainer.train()
│ └ <function Trainer.train at 0x7f98cc94fe50>
└ <yolox.core.trainer.Trainer object at 0x7f98ca97efa0>

File "/home/dane/project/detection/YOLOX/yolox/core/trainer.py", line 70, in train
self.train_in_epoch()
│ └ <function Trainer.train_in_epoch at 0x7f98ca9e5f70>
└ <yolox.core.trainer.Trainer object at 0x7f98ca97efa0>

File "/home/dane/project/detection/YOLOX/yolox/core/trainer.py", line 80, in train_in_epoch
self.after_epoch()
│ └ <function Trainer.after_epoch at 0x7f98ca9f54c0>
└ <yolox.core.trainer.Trainer object at 0x7f98ca97efa0>

File "/home/dane/project/detection/YOLOX/yolox/core/trainer.py", line 210, in after_epoch
self.evaluate_and_save_model()
│ └ <function Trainer.evaluate_and_save_model at 0x7f98ca9f5790>
└ <yolox.core.trainer.Trainer object at 0x7f98ca97efa0>

File "/home/dane/project/detection/YOLOX/yolox/core/trainer.py", line 293, in evaluate_and_save_model
ap50_95, ap50, summary = self.exp.eval(evalmodel, self.evaluator, self.is_distributed)
│ │ │ │ │ │ │ └ True
│ │ │ │ │ │ └ <yolox.core.trainer.Trainer object at 0x7f98ca97efa0>
│ │ │ │ │ └ <yolox.evaluators.voc_evaluator.VOCEvaluator object at 0x7f992f3da460>
│ │ │ │ └ <yolox.core.trainer.Trainer object at 0x7f98ca97efa0>
│ │ │ └ YOLOX(
│ │ │ (backbone): YOLOPAFPN(
│ │ │ (backbone): CSPDarknet(
│ │ │ (stem): Focus(
│ │ │ (conv): BaseConv(
│ │ │ (conv): ...
│ │ └ <function Exp.eval at 0x7f98caa153a0>
│ └ ╒══════════════════╤═════════════════════════════════════════════════════════════════════════════════════════════════════════...
└ <yolox.core.trainer.Trainer object at 0x7f98ca97efa0>

File "/home/dane/project/detection/YOLOX/yolox/exp/yolox_base.py", line 250, in eval
return evaluator.evaluate(model, is_distributed, half)
│ │ │ │ └ False
│ │ │ └ True
│ │ └ YOLOX(
│ │ (backbone): YOLOPAFPN(
│ │ (backbone): CSPDarknet(
│ │ (stem): Focus(
│ │ (conv): BaseConv(
│ │ (conv): ...
│ └ <function VOCEvaluator.evaluate at 0x7f98ca9e18b0>
└ <yolox.evaluators.voc_evaluator.VOCEvaluator object at 0x7f992f3da460>

File "/home/dane/project/detection/YOLOX/yolox/evaluators/voc_evaluator.py", line 82, in evaluate
for cur_iter, (imgs, _, info_imgs, ids) in enumerate(progress_bar(self.dataloader)):
│ │ │ └ <torch.utils.data.dataloader.DataLoader object at 0x7f992f3da610>
│ │ └ <yolox.evaluators.voc_evaluator.VOCEvaluator object at 0x7f992f3da460>
│ └ <class 'tqdm.std.tqdm'>
└ []

File "/home/anaconda3/envs/yolox/lib/python3.8/site-packages/tqdm/std.py", line 1185, in iter
for obj in iterable:
└ <torch.utils.data.dataloader.DataLoader object at 0x7f992f3da610>
File "/home/anaconda3/envs/yolox/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 359, in iter
return self._get_iterator()
│ └ <function DataLoader._get_iterator at 0x7f98dc500820>
└ <torch.utils.data.dataloader.DataLoader object at 0x7f992f3da610>
File "/home/anaconda3/envs/yolox/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 305, in _get_iterator
return _MultiProcessingDataLoaderIter(self)
│ └ <torch.utils.data.dataloader.DataLoader object at 0x7f992f3da610>
└ <class 'torch.utils.data.dataloader._MultiProcessingDataLoaderIter'>
File "/home/anaconda3/envs/yolox/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 918, in init
w.start()
│ └ <function BaseProcess.start at 0x7f999cfa4d30>

File "/home/anaconda3/envs/yolox/lib/python3.8/multiprocessing/process.py", line 121, in start
self._popen = self._Popen(self)
│ │ │ │ └
│ │ │ └ <staticmethod object at 0x7f999cfdd880>
│ │ └
│ └ None

File "/home/anaconda3/envs/yolox/lib/python3.8/multiprocessing/context.py", line 224, in _Popen
return _default_context.get_context().Process._Popen(process_obj)
│ │ └
│ └ <function DefaultContext.get_context at 0x7f999cf39dc0>
└ <multiprocessing.context.DefaultContext object at 0x7f999cfa6bb0>
File "/home/anaconda3/envs/yolox/lib/python3.8/multiprocessing/context.py", line 284, in _Popen
return Popen(process_obj)
│ └
└ <class 'multiprocessing.popen_spawn_posix.Popen'>
File "/home/anaconda3/envs/yolox/lib/python3.8/multiprocessing/popen_spawn_posix.py", line 32, in init
super().init(process_obj)

File "/home/anaconda3/envs/yolox/lib/python3.8/multiprocessing/popen_fork.py", line 19, in init
self._launch(process_obj)
│ │ └
│ └ <function Popen._launch at 0x7f98ca97d4c0>
└ <multiprocessing.popen_spawn_posix.Popen object at 0x7f993dc61730>
File "/home/anaconda3/envs/yolox/lib/python3.8/multiprocessing/popen_spawn_posix.py", line 58, in _launch
self.pid = util.spawnv_passfds(spawn.get_executable(),
│ │ │ │ └ <function get_executable at 0x7f999cea71f0>
│ │ │ └ <module 'multiprocessing.spawn' from '/home/anaconda3/envs/yolox/lib/python3.8/multiprocessing/spawn.py'>
│ │ └ <function spawnv_passfds at 0x7f999cea7040>
│ └ <module 'multiprocessing.util' from '/home/anaconda3/envs/yolox/lib/python3.8/multiprocessing/util.py'>
└ <multiprocessing.popen_spawn_posix.Popen object at 0x7f993dc61730>
File "/home/anaconda3/envs/yolox/lib/python3.8/multiprocessing/util.py", line 452, in spawnv_passfds
return _posixsubprocess.fork_exec(
│ └
└ <module '_posixsubprocess' from '/home/anaconda3/envs/yolox/lib/python3.8/lib-dynload/posixsubprocess.cpython-38-x86...

OSError: [Errno 12] Cannot allocate memory

@GOATmessi8
Copy link
Member

Plz update to our latest version. See #216 and #224

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants