Loading Model from GCS Fails With Data loss: not an sstable (bad magic number) #1441

stephen-lazaro · 2019-09-17T18:21:30Z

#Bug Report

System information

Standard Docker Container version of Tensorflow Serving

Describe the problem

I am able to load models locally, but whenever loading them from GCS the model boot fails with

11:19:54.133457: I tensorflow_serving/util/retrier.cc:33] Retrying of Loading servable: {name: itc_credit_health_qingdao_v2__20190815 version: 1} retry: 1
2019-09-17 11:19:54.668447: I external/org_tensorflow/tensorflow/contrib/session_bundle/bundle_shim.cc:363] Attempting to load native SavedModelBundle in bundle-shim from: gs://bar/models/tensorflow/foo/1
2019-09-17 11:19:54.668560: I external/org_tensorflow/tensorflow/cc/saved_model/reader.cc:31] Reading SavedModel from: gs://prediction-management-app/models/tensorflow/foo/1
2019-09-17 11:19:55.062650: I external/org_tensorflow/tensorflow/cc/saved_model/reader.cc:54] Reading meta graph with tags { serve }
2019-09-17 11:19:55.625766: I external/org_tensorflow/tensorflow/cc/saved_model/loader.cc:202] Restoring SavedModel bundle.
2019-09-17 11:19:56.597807: W external/org_tensorflow/tensorflow/core/framework/op_kernel.cc:1502] OP_REQUIRES failed at save_restore_v2_ops.cc:184 : Data loss: not an sstable (bad magic number)
2019-09-17 11:19:56.600123: I external/org_tensorflow/tensorflow/cc/saved_model/loader.cc:311] SavedModel load for tags { serve }; Status: fail. Took 1931572 microseconds.
2019-09-17 11:19:56.600213: E tensorflow_serving/util/retrier.cc:37] Loading servable: {name: foo version: 1} failed: Data loss: not an sstable (bad magic number)
	 [[{{node save/RestoreV2}}]]

logs have been mildly anonymized.
All these models load successfully when booted from local filesystem rather than GCS.

Exact Steps to Reproduce

Attempt to point to Google cloud storage for model to be loaded built from Tensorflow 1.12

The text was updated successfully, but these errors were encountered:

rmothukuru · 2019-09-17T18:47:13Z

@stephen-lazaro ,
In order to expedite the trouble-shooting process, please provide a code snippet (all the commands used) to reproduce the issue reported here. Thanks!

stephen-lazaro · 2019-09-17T20:51:26Z

@rmothukuru In what sense? Here's the content of my entrypoint:

tensorflow_model_server \
  --port=8500 \
  --tensorflow_intra_op_parallelism=12 \
  --tensorflow_inter_op_parallelism=12 \
  --rest_api_port=8501 \
  --model_config_file=/home/karmanix/empty_model.config \
  --enable_model_warmup &

I have an env variable:
ENV GOOGLE_APPLICATION_CREDENTIALS /etc/configs/credentials.json
which contains a service account token. I can confirm that both privileges and network connectivity work correctly.

The contents of empty_model.config is e.g.:

model_config_list {
 config {
    name: "DNNEstimatorBuilder_jt__cta_recsys_amr_skip_test_fixed_badge__20190909"
    base_path: "gs://BUCKET/models/tensorflow/DNNEstimatorBuilder__cta_recsys_amr_custom_valid"
    model_platform: "tensorflow"
  }
}

where obviously BUCKET is my bucket name.
It is clear that the model is being detected and read, but upon attempting to restore it we see the data loss error.
If I instead, mount the model into my image's file system I do not see this problem. However, due to the constraints of my problem, I cannot do that and must load the model from the GCS bucket.

In the bucket at gs://BUCKET/models/tensorflow/DNNEstimatorBuilder__cta_recsys_amr_custom_valid you would find:

1/
1/assets.extras/tf_serving_warmup_requests
1/saved_model.pb
1/variables/variables.data-00000-of-00001
1/variables/variables.index

Let me know if you need more information of any kind.

rmothukuru · 2019-09-18T19:11:18Z

@stephen-lazaro ,
Can you please check the issues 1, 2 and 3 and let us know if it resolves your problem.

If those links doesn't resolve your problem, can you please try performing inference on a Single Model instead of a Config File in GCS, and let us know if it is successful.
Thanks!

stephen-lazaro · 2019-09-18T20:13:38Z

Update, @rmothukuru it was known of those. The issue was that our files were compressed and TF serving was not respecting the compression format header. Closing as resolved.

rmothukuru self-assigned this Sep 17, 2019

rmothukuru added the type:support label Sep 17, 2019

rmothukuru added the stat:awaiting response label Sep 17, 2019

stephen-lazaro closed this as completed Sep 18, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Loading Model from GCS Fails With Data loss: not an sstable (bad magic number) #1441

Loading Model from GCS Fails With Data loss: not an sstable (bad magic number) #1441

stephen-lazaro commented Sep 17, 2019 •

edited

Loading

rmothukuru commented Sep 17, 2019

stephen-lazaro commented Sep 17, 2019 •

edited

Loading

rmothukuru commented Sep 18, 2019 •

edited

Loading

stephen-lazaro commented Sep 18, 2019

Loading Model from GCS Fails With Data loss: not an sstable (bad magic number) #1441

Loading Model from GCS Fails With Data loss: not an sstable (bad magic number) #1441

Comments

stephen-lazaro commented Sep 17, 2019 • edited Loading

System information

Describe the problem

Exact Steps to Reproduce

rmothukuru commented Sep 17, 2019

stephen-lazaro commented Sep 17, 2019 • edited Loading

rmothukuru commented Sep 18, 2019 • edited Loading

stephen-lazaro commented Sep 18, 2019

stephen-lazaro commented Sep 17, 2019 •

edited

Loading

stephen-lazaro commented Sep 17, 2019 •

edited

Loading

rmothukuru commented Sep 18, 2019 •

edited

Loading