Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Out of Memory #24

Open
hbkooo opened this issue Jul 28, 2019 · 1 comment
Open

Out of Memory #24

hbkooo opened this issue Jul 28, 2019 · 1 comment

Comments

@hbkooo
Copy link

hbkooo commented Jul 28, 2019

When I was training my data of 291 epochs or other, some errors may occur randomly. And I saw the GPU memory , at first it was steady, then it suddenly increased and occur the following errors. How to solve the problem? Thank you.

Allocator (GPU_0_bfc) ran out of memory trying to allocate 4.27GiB. Current allocation summary follows.
2019-07-28 14:29:37.479031: I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (256): Total Chunks: 4, Chunks in use: 0 1.0KiB allocated for chunks. 19B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
2019-07-28 14:29:37.479056: I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (512): Total Chunks: 3, Chunks in use: 0 1.5KiB allocated for chunks. 210B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
2019-07-28 14:29:37.479072: I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (1024): Total Chunks: 1, Chunks in use: 0 1.0KiB allocated for chunks. 80B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
...
2019-07-28 14:29:37.541280: I tensorflow/core/common_runtime/bfc_allocator.cc:696] 2 Chunks of size 21607936 totalling 41.21MiB
2019-07-28 14:29:37.541286: I tensorflow/core/common_runtime/bfc_allocator.cc:696] 9 Chunks of size 23040000 totalling 197.75MiB
2019-07-28 14:29:37.541293: I tensorflow/core/common_runtime/bfc_allocator.cc:696] 1 Chunks of size 29791744 totalling 28.41MiB
2019-07-28 14:29:37.541302: I tensorflow/core/common_runtime/bfc_allocator.cc:696] 1 Chunks of size 30857472 totalling 29.43MiB
2019-07-28 14:29:37.541312: I tensorflow/core/common_runtime/bfc_allocator.cc:696] 1 Chunks of size 44332032 totalling 42.28MiB
2019-07-28 14:29:37.541321: I tensorflow/core/common_runtime/bfc_allocator.cc:696] 3 Chunks of size 51380224 totalling 147.00MiB
2019-07-28 14:29:37.541330: I tensorflow/core/common_runtime/bfc_allocator.cc:700] Sum Total of in-use chunks: 1.99GiB
2019-07-28 14:29:37.541345: I tensorflow/core/common_runtime/bfc_allocator.cc:702] Stats:
Limit: 10990990132
InUse: 2133677824
MaxInUse: 6298388480
NumAllocs: 1694962
MaxAllocSize: 4294967296

2019-07-28 14:29:37.541752: W tensorflow/core/common_runtime/bfc_allocator.cc:277] _________________________________________________________________*********************************
2019-07-28 14:29:38.079996: W tensorflow/core/kernels/queue_base.cc:294] _0_get_batch/input_producer: Skipping cancelled enqueue attempt with queue not closed
2019-07-28 14:29:38.080480: W tensorflow/core/kernels/queue_base.cc:294] _1_get_batch/batch/padding_fifo_queue: Skipping cancelled enqueue attempt with queue not closed
2019-07-28 14:29:38.080751: W tensorflow/core/kernels/queue_base.cc:294] _1_get_batch/batch/padding_fifo_queue: Skipping cancelled enqueue attempt with queue not closed
2019-07-28 14:29:38.080771: W tensorflow/core/kernels/queue_base.cc:294] _1_get_batch/batch/padding_fifo_queue: Skipping cancelled enqueue attempt with queue not closed
2019-07-28 14:29:38.080789: W tensorflow/core/kernels/queue_base.cc:294] _1_get_batch/batch/padding_fifo_queue: Skipping cancelled enqueue attempt with queue not closed
2019-07-28 14:29:38.080805: W tensorflow/core/kernels/queue_base.cc:294] _1_get_batch/batch/padding_fifo_queue: Skipping cancelled enqueue attempt with queue not closed
2019-07-28 14:29:38.080858: W tensorflow/core/kernels/queue_base.cc:294] _1_get_batch/batch/padding_fifo_queue: Skipping cancelled enqueue attempt with queue not closed
2019-07-28 14:29:38.080873: W tensorflow/core/kernels/queue_base.cc:294] _1_get_batch/batch/padding_fifo_queue: Skipping cancelled enqueue attempt with queue not closed
2019-07-28 14:29:38.080886: W tensorflow/core/kernels/queue_base.cc:294] _1_get_batch/batch/padding_fifo_queue: Skipping cancelled enqueue attempt with queue not closed
2019-07-28 14:29:38.080937: W tensorflow/core/kernels/queue_base.cc:294] _1_get_batch/batch/padding_fifo_queue: Skipping cancelled enqueue attempt with queue not closed
2019-07-28 14:29:38.080997: W tensorflow/core/kernels/queue_base.cc:294] _1_get_batch/batch/padding_fifo_queue: Skipping cancelled enqueue attempt with queue not closed
2019-07-28 14:29:38.081011: W tensorflow/core/kernels/queue_base.cc:294] _1_get_batch/batch/padding_fifo_queue: Skipping cancelled enqueue attempt with queue not closed
2019-07-28 14:29:38.081027: W tensorflow/core/kernels/queue_base.cc:294] _1_get_batch/batch/padding_fifo_queue: Skipping cancelled enqueue attempt with queue not closed
2019-07-28 14:29:38.081041: W tensorflow/core/kernels/queue_base.cc:294] _1_get_batch/batch/padding_fifo_queue: Skipping cancelled enqueue attempt with queue not closed
2019-07-28 14:29:38.081054: W tensorflow/core/kernels/queue_base.cc:294] _1_get_batch/batch/padding_fifo_queue: Skipping cancelled enqueue attempt with queue not closed
2019-07-28 14:29:38.081067: W tensorflow/core/kernels/queue_base.cc:294] _1_get_batch/batch/padding_fifo_queue: Skipping cancelled enqueue attempt with queue not closed
2019-07-28 14:29:38.081078: W tensorflow/core/kernels/queue_base.cc:294] _1_get_batch/batch/padding_fifo_queue: Skipping cancelled enqueue attempt with queue not closed
Traceback (most recent call last):
File "train.py", line 299, in
train()
File "train.py", line 260, in train
fast_rcnn_total_loss, total_loss, train_op])
File "/home/hbk/miniconda3/envs/mytensorflow/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 789, in run
run_metadata_ptr)
File "/home/hbk/miniconda3/envs/mytensorflow/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 997, in _run
feed_dict_string, options, run_metadata)
File "/home/hbk/miniconda3/envs/mytensorflow/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1132, in _do_run
target_list, options, run_metadata)
File "/home/hbk/miniconda3/envs/mytensorflow/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1152, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InternalError: Dst tensor is not initialized.
[[Node: rpn_losses/rpn_minibatch/rpn_find_positive_negative_samples/PyFunc/_3605 = _Recvclient_terminated=false, recv_device="/job:localhost/replica:0/task:0/gpu:0", send_device="/job:localhost/replica:0/task:0/cpu:0", send_device_incarnation=1, tensor_name="edge_20948_rpn_losses/rpn_minibatch/rpn_find_positive_negative_samples/PyFunc", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/gpu:0"]]

@Avant-Gardiste
Copy link

Try to reduce the batch size !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants