model.ckpt.data-00000-of-00001: Data loss: not an sstable (bad magic number): perhaps your file is in a different file format and you need to use a different restore operator? #2676

wpq3142 · 2017-11-01T07:36:50Z

System information

What is the top-level directory of the model you are using: /home/wpq/workspace/models-master/research
Have I written custom code (as opposed to using a stock example script provided in TensorFlow):no
OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Linux Ubuntu 16.04
TensorFlow installed from (source or binary): source
TensorFlow version (use command below): 1.4.0-rc1
Bazel version (if compiling from source):
CUDA/cuDNN version:cuDNN v7.0.3 (Sept 28, 2017), CUDA 9.0
GPU model and memory:gtx650 2g
Exact command to reproduce:
python3 object_detection/train.py
--clone_on_cpu true
--logtostderr
--pipeline_config_path /home/wpq/data/potato/model/rfcn_resnet101_coco.config
--train_dir /home/wpq/data/potato/model/train

Describe the problem

download the new :faster_rcnn_inception_resnet_v2_atrous_coco_11_06_2017.tar.gz

rfcn_resnet101_coco.config :
model {
faster_rcnn {
num_classes: 37
image_resizer {
keep_aspect_ratio_resizer {
min_dimension: 600
max_dimension: 1024
}
}
feature_extractor {
type: 'faster_rcnn_inception_resnet_v2'
first_stage_features_stride: 8
}

Source code / logs

2017-11-01 15:11:40.186072: W tensorflow/core/util/tensor_slice_reader.cc:95] Could not open /home/wpq/data/potato/data/model.ckpt.data-00000-of-00001: Data loss: not an sstable (bad magic number): perhaps your file is in a different file format and you need to use a different restore operator?
Traceback (most recent call last):
File "/home/wpq/workspace/models-master/research/object_detection/train.py", line 163, in
tf.app.run()
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/platform/app.py", line 48, in run
_sys.exit(main(_sys.argv[:1] + flags_passthrough))
File "/home/wpq/workspace/models-master/research/object_detection/train.py", line 159, in main
worker_job_name, is_chief, FLAGS.train_dir)
File "/home/wpq/workspace/models-master/research/object_detection/trainer.py", line 254, in train
var_map, train_config.fine_tune_checkpoint))
File "/home/wpq/workspace/models-master/research/object_detection/utils/variables_helper.py", line 122, in get_variables_available_in_checkpoint
ckpt_reader = tf.train.NewCheckpointReader(checkpoint_path)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/pywrap_tensorflow_internal.py", line 150, in NewCheckpointReader
return CheckpointReader(compat.as_bytes(filepattern), status)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/errors_impl.py", line 473, in exit
c_api.TF_GetCode(self.status.status))
tensorflow.python.framework.errors_impl.DataLossError: Unable to open table file /home/wpq/data/potato/data/model.ckpt.data-00000-of-00001: Data loss: not an sstable (bad magic number): perhaps your file is in a different file format and you need to use a different restore operator?

Process finished with exit code 1

wpq3142 · 2017-11-01T11:26:37Z

File format is inconsistent,Look at posts：
http://votec.top/2016/12/24/tensorflow-r12-tf-train-Saver/

slim.get_or_create_global_step() change to: tf.train.get_or_create_global_step()

scotthuang1989 · 2017-11-01T14:30:41Z

@wpq3142
this exception raised at here:

ckpt_reader = tf.train.NewCheckpointReader(checkpoint_path)

I don't dive into the implementation of this API, but I suppose this API is for new format.

jart · 2017-11-01T17:28:22Z

I'm assuming the model code here would need to be updated to maybe determine which format the checkpoint is written in, and if so, use the correct API? If so, that sounds like a straightforward change and we'd welcome contributions helping to clean up the model.

tombstone · 2017-11-03T15:26:42Z

@wpq3142 Can you tell us how you are configuring this particular entry in the config:
fine_tune_checkpoint: "PATH_TO_BE_CONFIGURED/model.ckpt".

It should look like
fine_tune_checkpoint: "/home/wpq/data/potato/data/model.ckpt"

Moreover, it also looks like you are using rfcn_resnet101_coco.config with a faster_rcnn_inception_resnet_v2_atrous_coco_11_06_2017 checkpoint. These two are not compatible. You need use rfcn_resnet101_coco_11_06_2017.tar.gz with the rfcn_resnet101_coco.config

wpq3142 · 2017-11-03T15:48:21Z

@tombstone

I downloaded the latest model，It's working right now，Configuration is as follows:
--clone_on_cpu true
--logtostderr
--pipeline_config_path /home/wpq/data/potato/model/faster_rcnn_nas_coco.config
--train_dir /home/wpq/data/potato/model/train

For one reason, I seem to lack a space between keys and values，

paulrich1234 · 2018-09-16T13:24:52Z

you just need to restore (.ckpt) not (.ckpt.meta)
something like this 👍
sess = tf.Session()
saver.restore(sess, 'mymodel/model100-500-0.998.ckpt')

pbashivan · 2018-12-04T20:15:53Z

Apparently in V2 checkpoints, you should only include the filename up to ".ckpt". For instance if the checkpoint filename is model.ckpt.data-00000-of-00001 then you should only use model.ckpt. Using the full filename leads to getting a DataLossError.

praneethpj · 2019-02-08T14:28:22Z

Apparently in V2 checkpoints, you should only include the filename up to ".ckpt". For instance if the checkpoint filename is model.ckpt.data-00000-of-00001 then you should only use model.ckpt. Using the full filename leads to getting a DataLossError.

@pbashivan thank you so much

shellyfung · 2019-03-21T08:26:34Z

I have fixed the issue by this:
replace model.ckpt the model.ckpt-200000
where 20000 is your checkpoint number

codexponent · 2019-04-08T05:55:03Z

Solved on #7696

Rajamohanreddyai · 2019-05-12T16:33:45Z

Hello all, just follow the below video and export your own model with in a 10 seconds

https://youtu.be/w0Ebsbz7HYA

phosseini · 2019-06-17T22:11:18Z

Apparently in V2 checkpoints, you should only include the filename up to ".ckpt". For instance if the checkpoint filename is model.ckpt.data-00000-of-00001 then you should only use model.ckpt. Using the full filename leads to getting a DataLossError.

This works, and in my case, I used the longest common prefix among my check point related files which was model.ckpt-1000000 and it worked for me. I had the three following files in my folder:

model.ckpt-1000000.data-00000-of-00001
model.ckpt-1000000.index
model.ckpt-1000000.meta

I just thought this might be the case for some folks.

patspeis · 2019-06-23T17:10:09Z

I was running into this and this worked for me. All I had to do was run the following on my windows 10 x64 machine and it worked:

python export_inference_graph.py --input_type image_tensor --pipeline_config_path ssd_mobilenet_v1_coco.config --trained_checkpoint_prefix models\model.ckpt-1000 --output_directory tuned_model

Instead of:

python export_inference_graph.py --input_type image_tensor --pipeline_config_path ssd_mobilenet_v1_coco.config --trained_checkpoint_prefix models\model.ckpt-1000.data-###-### --output_directory tuned_model

tl;dr Dont reference single files in the --trained_checkpoint_prefix flag. Just reference the batch (the prefix) of those three files.

Hope it helps.

anjani-dhrangadhariya · 2019-10-16T08:59:09Z

@phosseini is correct. The model itself is made up of three different files with three different extensions showing what kind of model data each file stores.

For me too, using the longest shared file name prefix solved the issue.

model.ckpt-1000000.data-00000-of-00001
model.ckpt-1000000.index
model.ckpt-1000000.meta

kamrankausar · 2019-12-18T08:32:47Z

tensorflow.python.framework.errors_impl.DataLossError: Unable to open table file ./model_dir/model.ckpt-1000000.data-00000-of-00001: Data loss: not an sstable (bad magic number): perhaps your file is in a different file format and you need to use a different restore operator?

snrnsrk06 · 2020-04-23T09:01:40Z

I am trying to run opened project properly, the code saved files as model-10.data-0000-of-0001, .index, .meta.
and The part in code to save files is described as below:

saver = tf.train.Saver(max_to_keep=50)

if self.pretrained_model is not None:
        print("Start training with pretrained Model..")
        saver.restore(sess, self.pretrained_model)



if (e + 1) % self.save_every == 0:
          saver.save(sess, self.model_path + 'model', global_step=e + 1)
          print("model-%s saved." % (e + 1))

One of solution in this issue is to change the file name.

model.ckpt-1000000.data-00000-of-00001
model.ckpt-1000000.index
model.ckpt-1000000.meta

How to touch the code in my situation? How to change the file name? It looks the save method determine file name automatically. Or should i change the file name manually?

/////////////////////////////////////////////////////////////////////////////////////////////

It can be

if (e + 1) % self.save_every == 0:
                    saver.save(sess, self.model_path + 'model.ckpt', global_step=e + 1)
                    print("model-%s saved." % (e + 1))

but not enough

saver.restore(sess, self.model_path + cur_model2)

cur_model is 'model.ckpt-50.data-0000-of-0001', .index, .meta.

cur_model2 = cur_model[0:cur_model.find('-') + cur_model[cur_model.find('-'):].find('.')]
saver.restore(sess, self.model_path + cur_model2)

Just include file name in restore.

cur_model2 is 'model.ckpt-50'

Rajput245 · 2020-05-22T09:44:39Z

none of the above worked.
model.ckpt-1000000
model.ckpt-1000000.index
model.ckpt-1000000.meta
solved this problem for me..

dome272 · 2020-08-25T18:07:24Z

Apparently in V2 checkpoints, you should only include the filename up to ".ckpt". For instance if the checkpoint filename is model.ckpt.data-00000-of-00001 then you should only use model.ckpt. Using the full filename leads to getting a DataLossError.

you are a legend

mikelty · 2020-11-17T05:59:28Z

in some models, it could also be caused by lacking a .meta file and / or a .index file.

BassantTolba1234 · 2020-12-15T16:21:31Z

Please all,
After I trained the tensrflow session , I do not have the name of files as .ckpt.data
model.ckpt-1000000.data-00000-of-00001
model.ckpt-1000000.index
model.ckpt-1000000.meta
but instead
Pretrained.data-00000-of-00001
Pretrained.index
Pretrained.meta
what should I do to solve the above problem of Data loss with my these saved files ??

saramsv · 2021-04-03T00:34:41Z

none of the above worked.
model.ckpt-1000000
model.ckpt-1000000.index
model.ckpt-1000000.meta
solved this problem for me..

@Rajput245 I have the same problem. Were you able to fix it?

joan-yanqiong · 2022-01-19T21:05:10Z

Hi guys, I don't know if it is still a problem for you, but I had the following files:
model.ckpt-100000.data-00000-of-00001
model.ckpt-100000.index
model.ckpt-100000.meta

When I used the following code:

import tensorflow.compat.v1 as tf
import tf_slim as slim

checkpoint_path = absolute_path_to/model.ckpt-100000

init_fn = slim.assign_from_checkpoint_fn(
        checkpoint_path, slim.get_model_variables(model_variables))
sess = tf.Session()
init_fn(sess)

I hope this helps you!

pinzhi000 · 2022-05-11T02:22:14Z

In my situation I don't have "ckpt" at all.

I just have the following 2 files:

What do I do?

joan-yanqiong · 2022-05-11T06:02:54Z

I would maybe try to just add the ckpt after 'variables'.

pinzhi000 · 2022-05-11T19:00:58Z

I just resolved this issue. I saved the model as a .h5 file and that worked.

yohannesSM · 2022-07-23T21:47:43Z

import tensorflow as tf
from tensorflow.python.training import checkpoint_utils as cp
print(cp.list_variables('path/model_name.ckpt'))
#use only the model name up to the .ckpt part. Do not other magical numbers

…(via correct filename input, tensorflow/models#2676)

jart added the help wanted label Nov 1, 2017

tombstone added the stat:awaiting response Waiting on input from the contributor label Nov 3, 2017

tombstone removed the help wanted label Nov 3, 2017

tombstone mentioned this issue Nov 3, 2017

Data loss: not an sstable (bad magic number) #2675

Closed

aselle removed the stat:awaiting response Waiting on input from the contributor label Nov 3, 2017

bignamehyp closed this as completed Nov 7, 2017

ys7yoo mentioned this issue Feb 26, 2018

Cannot restore ys7yoo/deeppose_old#7

Open

chatterboy mentioned this issue Jan 31, 2019

Could not open. Unknown: New RandomAccessFile failed to Create/Open: Access denied AutonomicMachineLearning/MLFramework#13

Closed

ekofman mentioned this issue Feb 7, 2019

How to run/call call_variants when make_examples produces sharded outputs google/deepvariant#151

Closed

joeyqzhou mentioned this issue Jun 29, 2019

run run_classifier.py on chinese data, Failed to find any matching files for /path/chinese_L-12_H-768_A-12/bert_model.ckpt google-research/bert#57

Closed

rmothukuru mentioned this issue Sep 18, 2019

Loading Model from GCS Fails With Data loss: not an sstable (bad magic number) tensorflow/serving#1441

Closed

ManousogiannisM mentioned this issue Nov 20, 2019

NER finetuning - checkpoint problem dmis-lab/biobert#61

Closed

krupalraj mentioned this issue May 27, 2020

Data loss: not an sstable && TypeError: 'NoneType' object is not iterable jina-ai/clip-as-service#555

Closed

This was referenced Oct 22, 2020

Unable to load models in huggingface, tf throws DataLossError allenai/unifiedqa#6

Closed

Unable to load UnifiedQA models, tf throws DataLossError huggingface/transformers#7972

Closed

benballintyn mentioned this issue Nov 26, 2020

[FasterRCNN] Checkpoint is expected to be an object-based checkpoint_Object detection. #9278

Open

mfe7 mentioned this issue Feb 5, 2021

ModuleNotFoundError: No module named 'numpy' mit-acl/rl_collision_avoidance#5

Closed

mohantym mentioned this issue Aug 10, 2022

Data loss: not an sstable (bad magic number) tensorflow/tensorflow#57081

Closed

kharanshu2 mentioned this issue Mar 16, 2023

checkpoint file causing errors. haydengunraj/COVIDNet-CT#21

Closed

doughazell added a commit to doughazell/ai that referenced this issue Jan 6, 2024

Reduced input into GPT-2 & printing out checkpoint based Transformer …

305a25e

…(via correct filename input, tensorflow/models#2676)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

model.ckpt.data-00000-of-00001: Data loss: not an sstable (bad magic number): perhaps your file is in a different file format and you need to use a different restore operator? #2676

model.ckpt.data-00000-of-00001: Data loss: not an sstable (bad magic number): perhaps your file is in a different file format and you need to use a different restore operator? #2676

wpq3142 commented Nov 1, 2017

wpq3142 commented Nov 1, 2017

scotthuang1989 commented Nov 1, 2017

jart commented Nov 1, 2017

tombstone commented Nov 3, 2017 •

edited

Loading

wpq3142 commented Nov 3, 2017 •

edited

Loading

paulrich1234 commented Sep 16, 2018

pbashivan commented Dec 4, 2018

praneethpj commented Feb 8, 2019

shellyfung commented Mar 21, 2019 •

edited

Loading

codexponent commented Apr 8, 2019

Rajamohanreddyai commented May 12, 2019

phosseini commented Jun 17, 2019

patspeis commented Jun 23, 2019 •

edited

Loading

anjani-dhrangadhariya commented Oct 16, 2019

kamrankausar commented Dec 18, 2019

snrnsrk06 commented Apr 23, 2020 •

edited

Loading

Rajput245 commented May 22, 2020 •

edited

Loading

dome272 commented Aug 25, 2020

mikelty commented Nov 17, 2020

BassantTolba1234 commented Dec 15, 2020

saramsv commented Apr 3, 2021

joan-yanqiong commented Jan 19, 2022 •

edited

Loading

pinzhi000 commented May 11, 2022

joan-yanqiong commented May 11, 2022

pinzhi000 commented May 11, 2022

yohannesSM commented Jul 23, 2022 •

edited

Loading

model.ckpt.data-00000-of-00001: Data loss: not an sstable (bad magic number): perhaps your file is in a different file format and you need to use a different restore operator? #2676

model.ckpt.data-00000-of-00001: Data loss: not an sstable (bad magic number): perhaps your file is in a different file format and you need to use a different restore operator? #2676

Comments

wpq3142 commented Nov 1, 2017

System information

Describe the problem

Source code / logs

wpq3142 commented Nov 1, 2017

scotthuang1989 commented Nov 1, 2017

jart commented Nov 1, 2017

tombstone commented Nov 3, 2017 • edited Loading

wpq3142 commented Nov 3, 2017 • edited Loading

paulrich1234 commented Sep 16, 2018

pbashivan commented Dec 4, 2018

praneethpj commented Feb 8, 2019

shellyfung commented Mar 21, 2019 • edited Loading

codexponent commented Apr 8, 2019

Rajamohanreddyai commented May 12, 2019

phosseini commented Jun 17, 2019

patspeis commented Jun 23, 2019 • edited Loading

anjani-dhrangadhariya commented Oct 16, 2019

kamrankausar commented Dec 18, 2019

snrnsrk06 commented Apr 23, 2020 • edited Loading

Rajput245 commented May 22, 2020 • edited Loading

dome272 commented Aug 25, 2020

mikelty commented Nov 17, 2020

BassantTolba1234 commented Dec 15, 2020

saramsv commented Apr 3, 2021

joan-yanqiong commented Jan 19, 2022 • edited Loading

pinzhi000 commented May 11, 2022

joan-yanqiong commented May 11, 2022

pinzhi000 commented May 11, 2022

yohannesSM commented Jul 23, 2022 • edited Loading

tombstone commented Nov 3, 2017 •

edited

Loading

wpq3142 commented Nov 3, 2017 •

edited

Loading

shellyfung commented Mar 21, 2019 •

edited

Loading

patspeis commented Jun 23, 2019 •

edited

Loading

snrnsrk06 commented Apr 23, 2020 •

edited

Loading

Rajput245 commented May 22, 2020 •

edited

Loading

joan-yanqiong commented Jan 19, 2022 •

edited

Loading

yohannesSM commented Jul 23, 2022 •

edited

Loading