Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

saved_model_warmup.cc is not in the start log #1679

Closed
wangyunxiaa opened this issue Jun 29, 2020 · 10 comments
Closed

saved_model_warmup.cc is not in the start log #1679

wangyunxiaa opened this issue Jun 29, 2020 · 10 comments
Assignees
Labels
stale This label marks the issue/pr stale - to be closed automatically if no activity stat:awaiting response type:support

Comments

@wangyunxiaa
Copy link

#911 Bug Report
If this is a bug report, please fill out the following form in full:

System information

  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04):centos 7
  • TensorFlow Serving installed from (source or binary):source
  • TensorFlow Serving version:1.15

Describe the problem

Describe the problem clearly here. Be sure to convey here why it's a bug in
TensorFlow Serving.
I want to use tf_serving_warmup_requests to warming up my savedmodels, I followed the https://www.tensorflow.org/tfx/serving/saved_model_warmup and https://stackoverflow.com/questions/51980407/how-to-prepare-warmup-request-file-for-tensorflow-serving
however, I couldnot get the expected result, the first time runnig cost 10* times then floowing

Exact Steps to Reproduce

Please include all steps necessary for someone to reproduce this issue on their
own machine. If not, skip this section.

Source code / logs

Include any logs or source code that would be helpful to diagnose the problem.
If including tracebacks, please include the full traceback. Large logs and files
should be attached. Try to provide a reproducible test case that is the bare
minimum necessary to generate the problem.
when I start run tensorflow_model_server, I got floowing log:
2020-06-29 16:12:57.675438: I tensorflow_serving/model_servers/server_core.cc:462] Adding/updating models. 2020-06-29 16:12:57.675495: I tensorflow_serving/model_servers/server_core.cc:573] (Re-)adding model: aicallcenter 2020-06-29 16:12:57.675521: I tensorflow_serving/model_servers/server_core.cc:573] (Re-)adding model: nnlm_fp16 2020-06-29 16:12:57.778949: I tensorflow_serving/core/basic_manager.cc:739] Successfully reserved resources to load servable {name: nnlm_fp16 version: 1} 2020-06-29 16:12:57.779034: I tensorflow_serving/core/loader_harness.cc:66] Approving load for servable version {name: nnlm_fp16 version: 1} 2020-06-29 16:12:57.779093: I tensorflow_serving/core/loader_harness.cc:74] Loading servable version {name: nnlm_fp16 version: 1} 2020-06-29 16:12:57.779183: I external/org_tensorflow/tensorflow/contrib/session_bundle/bundle_shim.cc:363] Attempting to load native SavedModelBundle in bundle-shim from: /mnt/lustre/aifs/home/yxw30/tfs_server_v01/tfs_models/nnlm_fp16/1 2020-06-29 16:12:57.779331: I external/org_tensorflow/tensorflow/cc/saved_model/reader.cc:31] Reading SavedModel from: /mnt/lustre/aifs/home/yxw30/tfs_server_v01/tfs_models/nnlm_fp16/1 2020-06-29 16:12:57.824410: I external/org_tensorflow/tensorflow/cc/saved_model/reader.cc:54] Reading meta graph with tags { serve } 2020-06-29 16:12:57.869870: I external/org_tensorflow/tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcuda.so.1 2020-06-29 16:12:57.877642: I tensorflow_serving/core/basic_manager.cc:739] Successfully reserved resources to load servable {name: aicallcenter version: 1} 2020-06-29 16:12:57.877712: I tensorflow_serving/core/loader_harness.cc:66] Approving load for servable version {name: aicallcenter version: 1} 2020-06-29 16:12:57.877758: I tensorflow_serving/core/loader_harness.cc:74] Loading servable version {name: aicallcenter version: 1} 2020-06-29 16:12:57.877830: I external/org_tensorflow/tensorflow/contrib/session_bundle/bundle_shim.cc:363] Attempting to load native SavedModelBundle in bundle-shim from: /mnt/lustre/aifs/home/yxw30/tfs_server_v01/tfs_models/aicallcenter/1 2020-06-29 16:12:57.877858: I external/org_tensorflow/tensorflow/cc/saved_model/reader.cc:31] Reading SavedModel from: /mnt/lustre/aifs/home/yxw30/tfs_server_v01/tfs_models/aicallcenter/1 2020-06-29 16:12:58.043891: I external/org_tensorflow/tensorflow/cc/saved_model/reader.cc:54] Reading meta graph with tags { serve } 2020-06-29 16:13:06.252903: I external/org_tensorflow/tensorflow/core/common_runtime/gpu/gpu_device.cc:1639] Found device 0 with properties: name: GeForce GTX 1080 major: 6 minor: 1 memoryClockRate(GHz): 1.7335 pciBusID: 0000:02:00.0 2020-06-29 16:13:06.253040: I external/org_tensorflow/tensorflow/stream_executor/platform/default/dlopen_checker_stub.cc:25] GPU libraries are statically linked, skip dlopen check. 2020-06-29 16:13:06.255059: I external/org_tensorflow/tensorflow/core/common_runtime/gpu/gpu_device.cc:1767] Adding visible gpu devices: 0 2020-06-29 16:13:07.244030: I external/org_tensorflow/tensorflow/core/common_runtime/gpu/gpu_device.cc:1180] Device interconnect StreamExecutor with strength 1 edge matrix: 2020-06-29 16:13:07.244094: I external/org_tensorflow/tensorflow/core/common_runtime/gpu/gpu_device.cc:1186] 0 2020-06-29 16:13:07.244109: I external/org_tensorflow/tensorflow/core/common_runtime/gpu/gpu_device.cc:1199] 0: N 2020-06-29 16:13:07.246671: I external/org_tensorflow/tensorflow/core/common_runtime/gpu/gpu_device.cc:1325] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 8119 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1080, pci bus id: 0000:02:00.0, compute capability: 6.1) 2020-06-29 16:13:07.250357: I external/org_tensorflow/tensorflow/core/common_runtime/gpu/gpu_device.cc:1639] Found device 0 with properties: name: GeForce GTX 1080 major: 6 minor: 1 memoryClockRate(GHz): 1.7335 pciBusID: 0000:02:00.0 2020-06-29 16:13:07.250430: I external/org_tensorflow/tensorflow/stream_executor/platform/default/dlopen_checker_stub.cc:25] GPU libraries are statically linked, skip dlopen check. 2020-06-29 16:13:07.253447: I external/org_tensorflow/tensorflow/core/common_runtime/gpu/gpu_device.cc:1767] Adding visible gpu devices: 0 2020-06-29 16:13:07.253549: I external/org_tensorflow/tensorflow/core/common_runtime/gpu/gpu_device.cc:1180] Device interconnect StreamExecutor with strength 1 edge matrix: 2020-06-29 16:13:07.253646: I external/org_tensorflow/tensorflow/core/common_runtime/gpu/gpu_device.cc:1186] 0 2020-06-29 16:13:07.253687: I external/org_tensorflow/tensorflow/core/common_runtime/gpu/gpu_device.cc:1199] 0: N 2020-06-29 16:13:07.257264: I external/org_tensorflow/tensorflow/core/common_runtime/gpu/gpu_device.cc:1325] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 8119 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1080, pci bus id: 0000:02:00.0, compute capability: 6.1) 2020-06-29 16:13:07.332860: I external/org_tensorflow/tensorflow/cc/saved_model/loader.cc:202] Restoring SavedModel bundle. 2020-06-29 16:13:07.344369: I external/org_tensorflow/tensorflow/cc/saved_model/loader.cc:202] Restoring SavedModel bundle. 2020-06-29 16:13:07.344492: I external/org_tensorflow/tensorflow/cc/saved_model/loader.cc:212] The specified SavedModel has no variables; no checkpoints were restored. File does not exist: /mnt/lustre/aifs/home/yxw30/tfs_server_v01/tfs_models/aicallcenter/1/variables/variables.index 2020-06-29 16:13:07.344531: I external/org_tensorflow/tensorflow/cc/saved_model/loader.cc:311] SavedModel load for tags { serve }; Status: success. Took 9466665 microseconds. 2020-06-29 16:13:07.344564: I tensorflow_serving/servables/tensorflow/saved_model_bundle_factory.cc:169] Wrapping session to perform batch processing 2020-06-29 16:13:07.344597: I tensorflow_serving/servables/tensorflow/bundle_factory_util.cc:153] Wrapping session to perform batch processing 2020-06-29 16:13:07.345864: I tensorflow_serving/core/loader_harness.cc:87] Successfully loaded servable version {name: aicallcenter version: 1} 2020-06-29 16:13:11.997699: I external/org_tensorflow/tensorflow/cc/saved_model/loader.cc:151] Running initialization op on SavedModel bundle at path: /mnt/lustre/aifs/home/yxw30/tfs_server_v01/tfs_models/nnlm_fp16/1 2020-06-29 16:13:12.091786: I external/org_tensorflow/tensorflow/cc/saved_model/loader.cc:311] SavedModel load for tags { serve }; Status: success. Took 14312453 microseconds. 2020-06-29 16:13:12.091860: I tensorflow_serving/servables/tensorflow/saved_model_bundle_factory.cc:169] Wrapping session to perform batch processing 2020-06-29 16:13:12.091978: I tensorflow_serving/servables/tensorflow/bundle_factory_util.cc:153] Wrapping session to perform batch processing 2020-06-29 16:13:12.094657: I tensorflow_serving/core/loader_harness.cc:87] Successfully loaded servable version {name: nnlm_fp16 version: 1}
I cannot find the log like tensorflow_serving/servables/tensorflow/saved_model_warmup.cc:143] Finished reading warmup data for model at /tensorflow-serving/model/1/assets.extra/tf_serving_warmup_requests

I am confused that why the log about saved_model_warmup.cc is not avaliable? by adding "--enable_model_warmup=true" also no use

@rmothukuru
Copy link

In order to expedite the trouble-shooting process, please provide a code snippet to reproduce the issue reported here. Thanks!

@wangyunxiaa
Copy link
Author

wangyunxiaa commented Jun 30, 2020

the saved_model_cli shows as followings:

signature_def['ac_forward']: The given SavedModel SignatureDef contains the following input(s): inputs['input'] tensor_info: dtype: DT_FLOAT shape: (-1, -1) name: Placeholder:0 inputs['length'] tensor_info: dtype: DT_INT32 shape: (-1, 1) name: Placeholder_1:0 inputs['pre_state'] tensor_info: dtype: DT_FLOAT shape: (-1, 102400) name: pre_state:0 The given SavedModel SignatureDef contains the following output(s): outputs['cur_state'] tensor_info: dtype: DT_FLOAT shape: (-1, -1) name: ofsmnmt/concat:0 outputs['output'] tensor_info: dtype: DT_FLOAT shape: (-1, -1, 2966) name: ofsmnmt/label_counts/sub:0 Method name is: tensorflow/serving/predict

the warmingup data is

with tf.io.TFRecordWriter("tf_serving_warmup_requests") as writer:
        predict_request = predict_pb2.PredictRequest()
        predict_request.model_spec.name = 'testmodel'
        predict_request.model_spec.signature_name = 'ac_forward'
        state_size = 102400
        predict_request.inputs['input'].CopyFrom(tensor_util.make_tensor_proto(np.random.randn(1, 20*1320).reshape(1, -1).astype(np.float32), tf.float32))
        predict_request.inputs['length'].CopyFrom(tensor_util.make_tensor_proto(np.array([20]).astype(np.int32).reshape(1, 1), tf.int32))
        predict_request.inputs['pre_state'].CopyFrom(tensor_util.make_tensor_proto(np.zeros((1, state_size), dtype=np.float32).reshape(1, state_size), tf.float32))

        log = prediction_log_pb2.PredictionLog(
            predict_log=prediction_log_pb2.PredictLog(request=predict_request))
        writer.write(log.SerializeToString())

and I put the file in "testmodel/1/assets.extra/tf_serving_warmup_requests"

then, I run the tensorflow_model_server by adding "--enable_model_warmup=true", But I got no speedup the first time to ask a request, BTW, the start log also no meaasge about tensorflow_serving/servables/tensorflow/saved_model_warmup.cc

@wangyunxiaa
Copy link
Author

I am confused that I also cannot get the message info
I tensorflow_serving/servables/tensorflow/saved_model_warmup.cc:83] No warmup data file found at ****

It is strange, Is there some options missed when buildding the tensorflow_model_server or other BUILD rules should be added?

@wangyunxiaa
Copy link
Author

Any suggestions or ideas about this issue?

@shadowdragon89
Copy link
Contributor

Could you provide reproduce steps?

@wangyunxiaa
Copy link
Author

you can have a test with the docker of tensorflow/serving:1.15.0-gpu
start tensorflow_model_server by docker also cannot get the message info about
I tensorflow_serving/servables/tensorflow/saved_model_warmup.cc:83] No warmup data file found at ****

@wangyunxiaa
Copy link
Author

@shadowdragon89 , the start commond as followings:
nvidia-docker run -it -p 9501:9501
-v ${basepath}:${docker_model_path}
--env CUDA_VISIBLE_DEVICES=1
tensorflow/serving:1.15.0-gpu
--model_config_file=${docker_model_path}/cfg/model.cfg
--platform_config_file=${docker_model_path}/cfg/platform.cfg
--port=9501
--grpc_socket_path=/tmp/f.yx

--enable_model_warmup=true is defalut, and add the options "--enable_model_warmup=true" also get the same result

@wangyunxiaa
Copy link
Author

@shadowdragon89 I can get the log messages on tfs1.13, Does TFS1.15 not support ?

@gaikwadrahul8
Copy link

Hi, @wangyunxiaa

Apologies for the delay and I would request you to please try with latest TF serving version(2.11.0) and for your reference I have added one article in that it has shown how to handle warmup data file issue also please refer our official documentation for Tensorflow Serving Configuration and model-warmup

Could you please try with latest TF Serving version(2.11.0) and let us know is it resolving your issue or not?, If issue still persists In order to expedite the trouble-shooting process, please provide a code snippet to reproduce the issue reported here ?

Thank you!

@gaikwadrahul8 gaikwadrahul8 self-assigned this Dec 21, 2022
@singhniraj08 singhniraj08 added the stale This label marks the issue/pr stale - to be closed automatically if no activity label Feb 15, 2023
@gaikwadrahul8
Copy link

Hi, @wangyunxiaa

Closing this issue due to lack of recent activity for couple of weeks. Please feel free to reopen the issue or post comments, if you need any further assistance or update. Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stale This label marks the issue/pr stale - to be closed automatically if no activity stat:awaiting response type:support
Projects
None yet
Development

No branches or pull requests

5 participants