-
Notifications
You must be signed in to change notification settings - Fork 2.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
tf serving performace is so slow #1989
Comments
Hi @liumilan One gotcha I ran into is that the cpu of TF Serving container is rather spikey and does not show up in the 1min aggregates (so it uses 100% + cpu but on average shows < 50% in some cases). I'm not sure of your serving environment, but if it is in kubernetes I'd recommend plotting CPU throttling to make sure you are not running into that helpful throttling video. Increasing limits will help allow your application to burst into spikes. In addition, you can look into serving your application with more cpu (though that's costly since you are already only at 20% cpu usage). Apart from that, you can look into attaching tensorboard to look at costly operations -- it is fairly easy to setup. I've not found any other parameters that have helped too much with this problem, only changing resources and changing batch size. |
i have looked into attaching tensorboard to look at costly operations offline, i have found it cost much time on looking up embedding features,so can it possible to save this time ?@salliewalecka |
Hey, I don't have any more tips for you if it is the embedding features lookup is the bottleneck. Sorry! |
i don't think it is the same as issue.My bottleneck is it cost much time to embedding lookup,from tensorflow timeline. |
timeline-1.txt |
who can help to check this timeline? |
In fact, other applications have similar performance issue #1991 |
my scenario is recommend ,not cv |
@pindinagesh @christisg could u help to check timline.json |
Can you please compare the time taken to generate predictions using Tensorflow runtime and then Tensorflow Serving. Underneath the hood, TensorFlow Serving uses the TensorFlow runtime to do the actual inference on your requests. This means the average latency of serving a request with TensorFlow Serving is usually at least that of doing inference directly with TensorFlow. In case, tail latency(time taken by tensoflow serving to do inference) results high, you can try gRPC API surface which is slightly more performant. Also, you can experiment with command-line flags (most notably tensorflow_intra_op_parallelism and tensorflow_inter_op_parallelism) to find right configuration for your specific workload and environment. Thank you! |
This issue has been marked stale because it has no recent activity since 7 days. It will be closed if no further activity occurs. Thank you. |
This issue was closed due to lack of activity after being marked stale for past 7 days. |
Now I train a recommend nn model offline ,and then predict it by tf serving on cpu online machine. There are 8 cores i just applied, and found it cost slow when predict it. More than 0.4% it costs 100ms when predict. The batch size request is 100. There are 167 one hot features and 3 full-connected layers.And the usage of cpu is also slow, it is only 20% usage.
How can i analyze the bottleneck of serving ,and can it possible to reduce the time cost ratio by adjust some parameters?
i have tried many way follow this link, https://www.tensorflow.org/tfx/serving/performance ,but it can't improve the performace.I doubt if i have so many one hot featues. ,that it cost much time to find hash featues embeddings
The text was updated successfully, but these errors were encountered: