tensorflow serving batch inference slow !!!! #1483

sevenold · 2019-11-08T04:19:28Z

Excuse me, how to solve the problem of slow speed?
shape：(1, 32, 387, 1)
data time: 0.005219221115112305
post time: 0.24771547317504883
end time: 0.2498164176940918
shape:(2, 32, 387, 1)
data time: 0.0056378841400146484
post time: 0.4651315212249756
end time: 0.4693586826324463

docker run --runtime=nvidia -it --rm -p 8501:8501
-v "$(pwd)/densenet_ctc:/models/docker_test"
-e MODEL_NAME=docker_test tensorflow/serving:latest-gpu
--tensorflow_intra_op_parallelism=8
--tensorflow_inter_op_parallelism=8
--enable_batching=true
--batching_parameters_file=/models/docker_test/batching_parameters.conf

num_batch_threads { value: 4 }
batch_timeout_micros { value: 2000}
max_batch_size {value: 48}
max_enqueued_batches {value: 48}

GPU:1080Ti
Thanks.

rmothukuru · 2019-11-08T06:19:09Z

@sevenold,
Can you please let us know what is the GPU Utilization during Serving. Problem might be low GPU Utilization.

Can you please try running the Container with the below parameters and let us know if it resolves your issue. Thanks!

--grpc_channel_arguments=grpc.max_concurrent_streams=1000
--per_process_gpu_memory_fraction=0.7
--enable_batching=true
--max_batch_size=10
--batch_timeout_micros=1000
--max_enqueued_batches=1000
--num_batch_threads=6
--batching_parameters_file=/models/flow2_batching.config
--tensorflow_session_parallelism=2 \

For more information, please refer #1440

sevenold · 2019-11-08T09:52:58Z

@rmothukuru
I try running the Container with the below parameters but the same result.

docker run --runtime=nvidia -it --rm -p 8501:8501
-v "$(pwd)/densenet_ctc:/models/docker_test"
-e MODEL_NAME=docker_test tensorflow/serving:latest-gpu
--grpc_channel_arguments=grpc.max_concurrent_streams=1000
--per_process_gpu_memory_fraction=0.7
--enable_batching=true
--max_batch_size=128
--batch_timeout_micros=1000
--max_enqueued_batches=1000
--num_batch_threads=8
--batching_parameters_file=/models/docker_test/batching_parameters.conf
--tensorflow_session_parallelism=2

it's also low GPU Utilization.

rmothukuru · 2019-11-08T10:23:06Z

@sevenold,
Can you please confirm that you have gone through the issue, #1440 and issue still persists.
If so, can you please share your Model so that we can reproduce the issue at our side. Thanks!

sevenold · 2019-11-11T01:42:03Z

@rmothukuru Thanks.
google drive
This is my model and client.

sevenold · 2019-11-11T02:07:08Z

@rmothukuru
I tested my other models, such as the verification code recognition model, and the parameters are the same, it is normal to use gpu for prediction.Thanks!

leo-XUKANG · 2019-11-25T02:50:42Z

maybe you can try the grpc channel

sevenold · 2019-11-26T04:28:30Z

maybe you can try the grpc channel

I tried but the same result.

RainZhang1990 · 2019-12-10T06:26:48Z

Same question . Seems like tf serving predicts images tandem even I post multiple images one time.

misterpeddy · 2020-01-16T00:26:37Z

what happens when you load up the model with TF? Do you get significantly better inference latency? your TF runtime requires X time to do a forward pass on your model on a batch of examples, X becomes a lower bound for your inference latency with TF Serving.

ganler · 2020-04-02T10:21:29Z

I found that the serialization(of FP16 data) is of great overhead in the gRPC client API.
And this heavily drops the QPS. And in my case, I use 3x224x244 as the data to be transferred.
The serialization cost is 2 times as the server processing time in the ResNet50 model.

owenljn · 2021-09-15T20:49:40Z

Is this issue solved?
I'm having the same problem when serving a OpenNMT tensorflow model. I have configured the --rest_api_num_threads=1000 and --grpc_channel_arguments=grpc.max_concurrent_streams=1000
they just won't work somehow, the tensorflow server keeps saying gRPC resource exhausted, I can't send more than 15 requests in concurrent threads.

singhniraj08 · 2023-02-16T07:52:56Z

@oohx,

Could you please provide some more information for us to debug this issue?
We would like to understand how the same model with same batching data performs in Tensorflow. Could you please share the latency of your model doing inference in TF runtime and same model doing inference in TF serving.

If your TF runtime requires X time to do a forward pass on your model on a batch of examples, X becomes a lower bound for your inference latency with TF Serving. Also, please refer to performance guide.

Thank you!

github-actions · 2023-03-16T02:00:31Z

This issue was closed due to lack of activity after being marked stale for past 14 days.

rmothukuru self-assigned this Nov 8, 2019

rmothukuru added the type:performance Performance Issue label Nov 8, 2019

rmothukuru added the stat:awaiting response label Nov 8, 2019

rmothukuru assigned lilao and unassigned rmothukuru Nov 20, 2019

rmothukuru added stat:awaiting tensorflower and removed stat:awaiting response labels Nov 20, 2019

misterpeddy unassigned lilao Aug 21, 2020

misterpeddy added needs prio and removed stat:awaiting tensorflower labels Aug 21, 2020

pindinagesh mentioned this issue Mar 22, 2022

tf serving performace is so slow #1989

Closed

singhniraj08 self-assigned this Feb 16, 2023

singhniraj08 added the stat:awaiting response label Feb 16, 2023

singhniraj08 added the stale This label marks the issue/pr stale - to be closed automatically if no activity label Feb 28, 2023

github-actions bot closed this as completed Mar 16, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

tensorflow serving batch inference slow !!!! #1483

tensorflow serving batch inference slow !!!! #1483

sevenold commented Nov 8, 2019

rmothukuru commented Nov 8, 2019

sevenold commented Nov 8, 2019 •

edited

Loading

rmothukuru commented Nov 8, 2019

sevenold commented Nov 11, 2019 •

edited

Loading

sevenold commented Nov 11, 2019

leo-XUKANG commented Nov 25, 2019

sevenold commented Nov 26, 2019

RainZhang1990 commented Dec 10, 2019

misterpeddy commented Jan 16, 2020

ganler commented Apr 2, 2020 •

edited

Loading

owenljn commented Sep 15, 2021

singhniraj08 commented Feb 16, 2023

github-actions bot commented Mar 16, 2023

tensorflow serving batch inference slow !!!! #1483

tensorflow serving batch inference slow !!!! #1483

Comments

sevenold commented Nov 8, 2019

Excuse me, how to solve the problem of slow speed? shape：(1, 32, 387, 1) data time: 0.005219221115112305 post time: 0.24771547317504883 end time: 0.2498164176940918 shape:(2, 32, 387, 1) data time: 0.0056378841400146484 post time: 0.4651315212249756 end time: 0.4693586826324463

num_batch_threads { value: 4 } batch_timeout_micros { value: 2000} max_batch_size {value: 48} max_enqueued_batches {value: 48}

rmothukuru commented Nov 8, 2019

sevenold commented Nov 8, 2019 • edited Loading

rmothukuru commented Nov 8, 2019

sevenold commented Nov 11, 2019 • edited Loading

sevenold commented Nov 11, 2019

leo-XUKANG commented Nov 25, 2019

sevenold commented Nov 26, 2019

RainZhang1990 commented Dec 10, 2019

misterpeddy commented Jan 16, 2020

ganler commented Apr 2, 2020 • edited Loading

owenljn commented Sep 15, 2021

singhniraj08 commented Feb 16, 2023

github-actions bot commented Mar 16, 2023

Excuse me, how to solve the problem of slow speed?
shape：(1, 32, 387, 1)
data time: 0.005219221115112305
post time: 0.24771547317504883
end time: 0.2498164176940918
shape:(2, 32, 387, 1)
data time: 0.0056378841400146484
post time: 0.4651315212249756
end time: 0.4693586826324463

num_batch_threads { value: 4 }
batch_timeout_micros { value: 2000}
max_batch_size {value: 48}
max_enqueued_batches {value: 48}

sevenold commented Nov 8, 2019 •

edited

Loading

sevenold commented Nov 11, 2019 •

edited

Loading

ganler commented Apr 2, 2020 •

edited

Loading