-
Notifications
You must be signed in to change notification settings - Fork 2.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
tensorflow serving batch inference slow !!!! #1483
Comments
@sevenold, Can you please try running the Container with the below parameters and let us know if it resolves your issue. Thanks!
For more information, please refer #1440 |
@rmothukuru docker run --runtime=nvidia -it --rm -p 8501:8501
|
@rmothukuru Thanks. |
@rmothukuru |
maybe you can try the grpc channel |
I tried but the same result. |
Same question . Seems like tf serving predicts images tandem even I post multiple images one time. |
what happens when you load up the model with TF? Do you get significantly better inference latency? your TF runtime requires X time to do a forward pass on your model on a batch of examples, X becomes a lower bound for your inference latency with TF Serving. |
I found that the serialization(of FP16 data) is of great overhead in the gRPC client API. |
Is this issue solved? |
@oohx, Could you please provide some more information for us to debug this issue? If your TF runtime requires X time to do a forward pass on your model on a batch of examples, X becomes a lower bound for your inference latency with TF Serving. Also, please refer to performance guide. Thank you! |
This issue was closed due to lack of activity after being marked stale for past 14 days. |
Excuse me, how to solve the problem of slow speed?
shape:(1, 32, 387, 1)
data time: 0.005219221115112305
post time: 0.24771547317504883
end time: 0.2498164176940918
shape:(2, 32, 387, 1)
data time: 0.0056378841400146484
post time: 0.4651315212249756
end time: 0.4693586826324463
docker run --runtime=nvidia -it --rm -p 8501:8501
-v "$(pwd)/densenet_ctc:/models/docker_test"
-e MODEL_NAME=docker_test tensorflow/serving:latest-gpu
--tensorflow_intra_op_parallelism=8
--tensorflow_inter_op_parallelism=8
--enable_batching=true
--batching_parameters_file=/models/docker_test/batching_parameters.conf
num_batch_threads { value: 4 }
batch_timeout_micros { value: 2000}
max_batch_size {value: 48}
max_enqueued_batches {value: 48}
GPU:1080Ti
Thanks.
The text was updated successfully, but these errors were encountered: