Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Loading Model from GCS Fails With Data loss: not an sstable (bad magic number) #1441

Closed
stephen-lazaro opened this issue Sep 17, 2019 · 4 comments

Comments

@stephen-lazaro
Copy link

stephen-lazaro commented Sep 17, 2019

#Bug Report

System information

Standard Docker Container version of Tensorflow Serving

Describe the problem

I am able to load models locally, but whenever loading them from GCS the model boot fails with

11:19:54.133457: I tensorflow_serving/util/retrier.cc:33] Retrying of Loading servable: {name: itc_credit_health_qingdao_v2__20190815 version: 1} retry: 1
2019-09-17 11:19:54.668447: I external/org_tensorflow/tensorflow/contrib/session_bundle/bundle_shim.cc:363] Attempting to load native SavedModelBundle in bundle-shim from: gs://bar/models/tensorflow/foo/1
2019-09-17 11:19:54.668560: I external/org_tensorflow/tensorflow/cc/saved_model/reader.cc:31] Reading SavedModel from: gs://prediction-management-app/models/tensorflow/foo/1
2019-09-17 11:19:55.062650: I external/org_tensorflow/tensorflow/cc/saved_model/reader.cc:54] Reading meta graph with tags { serve }
2019-09-17 11:19:55.625766: I external/org_tensorflow/tensorflow/cc/saved_model/loader.cc:202] Restoring SavedModel bundle.
2019-09-17 11:19:56.597807: W external/org_tensorflow/tensorflow/core/framework/op_kernel.cc:1502] OP_REQUIRES failed at save_restore_v2_ops.cc:184 : Data loss: not an sstable (bad magic number)
2019-09-17 11:19:56.600123: I external/org_tensorflow/tensorflow/cc/saved_model/loader.cc:311] SavedModel load for tags { serve }; Status: fail. Took 1931572 microseconds.
2019-09-17 11:19:56.600213: E tensorflow_serving/util/retrier.cc:37] Loading servable: {name: foo version: 1} failed: Data loss: not an sstable (bad magic number)
	 [[{{node save/RestoreV2}}]]

logs have been mildly anonymized.
All these models load successfully when booted from local filesystem rather than GCS.

Exact Steps to Reproduce

Attempt to point to Google cloud storage for model to be loaded built from Tensorflow 1.12

@rmothukuru rmothukuru self-assigned this Sep 17, 2019
@rmothukuru
Copy link

@stephen-lazaro ,
In order to expedite the trouble-shooting process, please provide a code snippet (all the commands used) to reproduce the issue reported here. Thanks!

@stephen-lazaro
Copy link
Author

stephen-lazaro commented Sep 17, 2019

@rmothukuru In what sense? Here's the content of my entrypoint:

tensorflow_model_server \
  --port=8500 \
  --tensorflow_intra_op_parallelism=12 \
  --tensorflow_inter_op_parallelism=12 \
  --rest_api_port=8501 \
  --model_config_file=/home/karmanix/empty_model.config \
  --enable_model_warmup &

I have an env variable:
ENV GOOGLE_APPLICATION_CREDENTIALS /etc/configs/credentials.json
which contains a service account token. I can confirm that both privileges and network connectivity work correctly.

The contents of empty_model.config is e.g.:

model_config_list {
 config {
    name: "DNNEstimatorBuilder_jt__cta_recsys_amr_skip_test_fixed_badge__20190909"
    base_path: "gs://BUCKET/models/tensorflow/DNNEstimatorBuilder__cta_recsys_amr_custom_valid"
    model_platform: "tensorflow"
  }
}

where obviously BUCKET is my bucket name.
It is clear that the model is being detected and read, but upon attempting to restore it we see the data loss error.
If I instead, mount the model into my image's file system I do not see this problem. However, due to the constraints of my problem, I cannot do that and must load the model from the GCS bucket.

In the bucket at gs://BUCKET/models/tensorflow/DNNEstimatorBuilder__cta_recsys_amr_custom_valid you would find:

1/
1/assets.extras/tf_serving_warmup_requests
1/saved_model.pb
1/variables/variables.data-00000-of-00001
1/variables/variables.index

Let me know if you need more information of any kind.

@rmothukuru
Copy link

rmothukuru commented Sep 18, 2019

@stephen-lazaro ,
Can you please check the issues 1, 2 and 3 and let us know if it resolves your problem.

If those links doesn't resolve your problem, can you please try performing inference on a Single Model instead of a Config File in GCS, and let us know if it is successful.
Thanks!

@stephen-lazaro
Copy link
Author

Update, @rmothukuru it was known of those. The issue was that our files were compressed and TF serving was not respecting the compression format header. Closing as resolved.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants