You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When I was training my data of 291 epochs or other, some errors may occur randomly. And I saw the GPU memory , at first it was steady, then it suddenly increased and occur the following errors. How to solve the problem? Thank you.
Allocator (GPU_0_bfc) ran out of memory trying to allocate 4.27GiB. Current allocation summary follows.
2019-07-28 14:29:37.479031: I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (256): Total Chunks: 4, Chunks in use: 0 1.0KiB allocated for chunks. 19B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
2019-07-28 14:29:37.479056: I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (512): Total Chunks: 3, Chunks in use: 0 1.5KiB allocated for chunks. 210B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
2019-07-28 14:29:37.479072: I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (1024): Total Chunks: 1, Chunks in use: 0 1.0KiB allocated for chunks. 80B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
...
2019-07-28 14:29:37.541280: I tensorflow/core/common_runtime/bfc_allocator.cc:696] 2 Chunks of size 21607936 totalling 41.21MiB
2019-07-28 14:29:37.541286: I tensorflow/core/common_runtime/bfc_allocator.cc:696] 9 Chunks of size 23040000 totalling 197.75MiB
2019-07-28 14:29:37.541293: I tensorflow/core/common_runtime/bfc_allocator.cc:696] 1 Chunks of size 29791744 totalling 28.41MiB
2019-07-28 14:29:37.541302: I tensorflow/core/common_runtime/bfc_allocator.cc:696] 1 Chunks of size 30857472 totalling 29.43MiB
2019-07-28 14:29:37.541312: I tensorflow/core/common_runtime/bfc_allocator.cc:696] 1 Chunks of size 44332032 totalling 42.28MiB
2019-07-28 14:29:37.541321: I tensorflow/core/common_runtime/bfc_allocator.cc:696] 3 Chunks of size 51380224 totalling 147.00MiB
2019-07-28 14:29:37.541330: I tensorflow/core/common_runtime/bfc_allocator.cc:700] Sum Total of in-use chunks: 1.99GiB
2019-07-28 14:29:37.541345: I tensorflow/core/common_runtime/bfc_allocator.cc:702] Stats:
Limit: 10990990132
InUse: 2133677824
MaxInUse: 6298388480
NumAllocs: 1694962
MaxAllocSize: 4294967296
2019-07-28 14:29:37.541752: W tensorflow/core/common_runtime/bfc_allocator.cc:277] _________________________________________________________________*********************************
2019-07-28 14:29:38.079996: W tensorflow/core/kernels/queue_base.cc:294] _0_get_batch/input_producer: Skipping cancelled enqueue attempt with queue not closed
2019-07-28 14:29:38.080480: W tensorflow/core/kernels/queue_base.cc:294] _1_get_batch/batch/padding_fifo_queue: Skipping cancelled enqueue attempt with queue not closed
2019-07-28 14:29:38.080751: W tensorflow/core/kernels/queue_base.cc:294] _1_get_batch/batch/padding_fifo_queue: Skipping cancelled enqueue attempt with queue not closed
2019-07-28 14:29:38.080771: W tensorflow/core/kernels/queue_base.cc:294] _1_get_batch/batch/padding_fifo_queue: Skipping cancelled enqueue attempt with queue not closed
2019-07-28 14:29:38.080789: W tensorflow/core/kernels/queue_base.cc:294] _1_get_batch/batch/padding_fifo_queue: Skipping cancelled enqueue attempt with queue not closed
2019-07-28 14:29:38.080805: W tensorflow/core/kernels/queue_base.cc:294] _1_get_batch/batch/padding_fifo_queue: Skipping cancelled enqueue attempt with queue not closed
2019-07-28 14:29:38.080858: W tensorflow/core/kernels/queue_base.cc:294] _1_get_batch/batch/padding_fifo_queue: Skipping cancelled enqueue attempt with queue not closed
2019-07-28 14:29:38.080873: W tensorflow/core/kernels/queue_base.cc:294] _1_get_batch/batch/padding_fifo_queue: Skipping cancelled enqueue attempt with queue not closed
2019-07-28 14:29:38.080886: W tensorflow/core/kernels/queue_base.cc:294] _1_get_batch/batch/padding_fifo_queue: Skipping cancelled enqueue attempt with queue not closed
2019-07-28 14:29:38.080937: W tensorflow/core/kernels/queue_base.cc:294] _1_get_batch/batch/padding_fifo_queue: Skipping cancelled enqueue attempt with queue not closed
2019-07-28 14:29:38.080997: W tensorflow/core/kernels/queue_base.cc:294] _1_get_batch/batch/padding_fifo_queue: Skipping cancelled enqueue attempt with queue not closed
2019-07-28 14:29:38.081011: W tensorflow/core/kernels/queue_base.cc:294] _1_get_batch/batch/padding_fifo_queue: Skipping cancelled enqueue attempt with queue not closed
2019-07-28 14:29:38.081027: W tensorflow/core/kernels/queue_base.cc:294] _1_get_batch/batch/padding_fifo_queue: Skipping cancelled enqueue attempt with queue not closed
2019-07-28 14:29:38.081041: W tensorflow/core/kernels/queue_base.cc:294] _1_get_batch/batch/padding_fifo_queue: Skipping cancelled enqueue attempt with queue not closed
2019-07-28 14:29:38.081054: W tensorflow/core/kernels/queue_base.cc:294] _1_get_batch/batch/padding_fifo_queue: Skipping cancelled enqueue attempt with queue not closed
2019-07-28 14:29:38.081067: W tensorflow/core/kernels/queue_base.cc:294] _1_get_batch/batch/padding_fifo_queue: Skipping cancelled enqueue attempt with queue not closed
2019-07-28 14:29:38.081078: W tensorflow/core/kernels/queue_base.cc:294] _1_get_batch/batch/padding_fifo_queue: Skipping cancelled enqueue attempt with queue not closed
Traceback (most recent call last):
File "train.py", line 299, in
train()
File "train.py", line 260, in train
fast_rcnn_total_loss, total_loss, train_op])
File "/home/hbk/miniconda3/envs/mytensorflow/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 789, in run
run_metadata_ptr)
File "/home/hbk/miniconda3/envs/mytensorflow/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 997, in _run
feed_dict_string, options, run_metadata)
File "/home/hbk/miniconda3/envs/mytensorflow/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1132, in _do_run
target_list, options, run_metadata)
File "/home/hbk/miniconda3/envs/mytensorflow/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1152, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InternalError: Dst tensor is not initialized.
[[Node: rpn_losses/rpn_minibatch/rpn_find_positive_negative_samples/PyFunc/_3605 = _Recvclient_terminated=false, recv_device="/job:localhost/replica:0/task:0/gpu:0", send_device="/job:localhost/replica:0/task:0/cpu:0", send_device_incarnation=1, tensor_name="edge_20948_rpn_losses/rpn_minibatch/rpn_find_positive_negative_samples/PyFunc", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/gpu:0"]]
The text was updated successfully, but these errors were encountered:
When I was training my data of 291 epochs or other, some errors may occur randomly. And I saw the GPU memory , at first it was steady, then it suddenly increased and occur the following errors. How to solve the problem? Thank you.
Allocator (GPU_0_bfc) ran out of memory trying to allocate 4.27GiB. Current allocation summary follows.
2019-07-28 14:29:37.479031: I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (256): Total Chunks: 4, Chunks in use: 0 1.0KiB allocated for chunks. 19B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
2019-07-28 14:29:37.479056: I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (512): Total Chunks: 3, Chunks in use: 0 1.5KiB allocated for chunks. 210B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
2019-07-28 14:29:37.479072: I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (1024): Total Chunks: 1, Chunks in use: 0 1.0KiB allocated for chunks. 80B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
...
2019-07-28 14:29:37.541280: I tensorflow/core/common_runtime/bfc_allocator.cc:696] 2 Chunks of size 21607936 totalling 41.21MiB
2019-07-28 14:29:37.541286: I tensorflow/core/common_runtime/bfc_allocator.cc:696] 9 Chunks of size 23040000 totalling 197.75MiB
2019-07-28 14:29:37.541293: I tensorflow/core/common_runtime/bfc_allocator.cc:696] 1 Chunks of size 29791744 totalling 28.41MiB
2019-07-28 14:29:37.541302: I tensorflow/core/common_runtime/bfc_allocator.cc:696] 1 Chunks of size 30857472 totalling 29.43MiB
2019-07-28 14:29:37.541312: I tensorflow/core/common_runtime/bfc_allocator.cc:696] 1 Chunks of size 44332032 totalling 42.28MiB
2019-07-28 14:29:37.541321: I tensorflow/core/common_runtime/bfc_allocator.cc:696] 3 Chunks of size 51380224 totalling 147.00MiB
2019-07-28 14:29:37.541330: I tensorflow/core/common_runtime/bfc_allocator.cc:700] Sum Total of in-use chunks: 1.99GiB
2019-07-28 14:29:37.541345: I tensorflow/core/common_runtime/bfc_allocator.cc:702] Stats:
Limit: 10990990132
InUse: 2133677824
MaxInUse: 6298388480
NumAllocs: 1694962
MaxAllocSize: 4294967296
2019-07-28 14:29:37.541752: W tensorflow/core/common_runtime/bfc_allocator.cc:277] _________________________________________________________________*********************************
2019-07-28 14:29:38.079996: W tensorflow/core/kernels/queue_base.cc:294] _0_get_batch/input_producer: Skipping cancelled enqueue attempt with queue not closed
2019-07-28 14:29:38.080480: W tensorflow/core/kernels/queue_base.cc:294] _1_get_batch/batch/padding_fifo_queue: Skipping cancelled enqueue attempt with queue not closed
2019-07-28 14:29:38.080751: W tensorflow/core/kernels/queue_base.cc:294] _1_get_batch/batch/padding_fifo_queue: Skipping cancelled enqueue attempt with queue not closed
2019-07-28 14:29:38.080771: W tensorflow/core/kernels/queue_base.cc:294] _1_get_batch/batch/padding_fifo_queue: Skipping cancelled enqueue attempt with queue not closed
2019-07-28 14:29:38.080789: W tensorflow/core/kernels/queue_base.cc:294] _1_get_batch/batch/padding_fifo_queue: Skipping cancelled enqueue attempt with queue not closed
2019-07-28 14:29:38.080805: W tensorflow/core/kernels/queue_base.cc:294] _1_get_batch/batch/padding_fifo_queue: Skipping cancelled enqueue attempt with queue not closed
2019-07-28 14:29:38.080858: W tensorflow/core/kernels/queue_base.cc:294] _1_get_batch/batch/padding_fifo_queue: Skipping cancelled enqueue attempt with queue not closed
2019-07-28 14:29:38.080873: W tensorflow/core/kernels/queue_base.cc:294] _1_get_batch/batch/padding_fifo_queue: Skipping cancelled enqueue attempt with queue not closed
2019-07-28 14:29:38.080886: W tensorflow/core/kernels/queue_base.cc:294] _1_get_batch/batch/padding_fifo_queue: Skipping cancelled enqueue attempt with queue not closed
2019-07-28 14:29:38.080937: W tensorflow/core/kernels/queue_base.cc:294] _1_get_batch/batch/padding_fifo_queue: Skipping cancelled enqueue attempt with queue not closed
2019-07-28 14:29:38.080997: W tensorflow/core/kernels/queue_base.cc:294] _1_get_batch/batch/padding_fifo_queue: Skipping cancelled enqueue attempt with queue not closed
2019-07-28 14:29:38.081011: W tensorflow/core/kernels/queue_base.cc:294] _1_get_batch/batch/padding_fifo_queue: Skipping cancelled enqueue attempt with queue not closed
2019-07-28 14:29:38.081027: W tensorflow/core/kernels/queue_base.cc:294] _1_get_batch/batch/padding_fifo_queue: Skipping cancelled enqueue attempt with queue not closed
2019-07-28 14:29:38.081041: W tensorflow/core/kernels/queue_base.cc:294] _1_get_batch/batch/padding_fifo_queue: Skipping cancelled enqueue attempt with queue not closed
2019-07-28 14:29:38.081054: W tensorflow/core/kernels/queue_base.cc:294] _1_get_batch/batch/padding_fifo_queue: Skipping cancelled enqueue attempt with queue not closed
2019-07-28 14:29:38.081067: W tensorflow/core/kernels/queue_base.cc:294] _1_get_batch/batch/padding_fifo_queue: Skipping cancelled enqueue attempt with queue not closed
2019-07-28 14:29:38.081078: W tensorflow/core/kernels/queue_base.cc:294] _1_get_batch/batch/padding_fifo_queue: Skipping cancelled enqueue attempt with queue not closed
Traceback (most recent call last):
File "train.py", line 299, in
train()
File "train.py", line 260, in train
fast_rcnn_total_loss, total_loss, train_op])
File "/home/hbk/miniconda3/envs/mytensorflow/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 789, in run
run_metadata_ptr)
File "/home/hbk/miniconda3/envs/mytensorflow/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 997, in _run
feed_dict_string, options, run_metadata)
File "/home/hbk/miniconda3/envs/mytensorflow/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1132, in _do_run
target_list, options, run_metadata)
File "/home/hbk/miniconda3/envs/mytensorflow/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1152, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InternalError: Dst tensor is not initialized.
[[Node: rpn_losses/rpn_minibatch/rpn_find_positive_negative_samples/PyFunc/_3605 = _Recvclient_terminated=false, recv_device="/job:localhost/replica:0/task:0/gpu:0", send_device="/job:localhost/replica:0/task:0/cpu:0", send_device_incarnation=1, tensor_name="edge_20948_rpn_losses/rpn_minibatch/rpn_find_positive_negative_samples/PyFunc", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/gpu:0"]]
The text was updated successfully, but these errors were encountered: