tf serving performace is so slow #1989

liumilan · 2022-03-21T10:42:04Z

Now I train a recommend nn model offline ,and then predict it by tf serving on cpu online machine. There are 8 cores i just applied, and found it cost slow when predict it. More than 0.4% it costs 100ms when predict. The batch size request is 100. There are 167 one hot features and 3 full-connected layers.And the usage of cpu is also slow, it is only 20% usage.
How can i analyze the bottleneck of serving ,and can it possible to reduce the time cost ratio by adjust some parameters?
i have tried many way follow this link, https://www.tensorflow.org/tfx/serving/performance ,but it can't improve the performace.I doubt if i have so many one hot featues. ,that it cost much time to find hash featues embeddings

pindinagesh · 2022-03-22T11:26:34Z

Hi @liumilan

Can you take a look at the workaround proposed in this thread and see if it helps in resolving your issue? Also you can refer to link1, link2 which discusses about similar problem. Thanks!

salliewalecka · 2022-03-22T21:38:14Z

Hi @liumilan

One gotcha I ran into is that the cpu of TF Serving container is rather spikey and does not show up in the 1min aggregates (so it uses 100% + cpu but on average shows < 50% in some cases). I'm not sure of your serving environment, but if it is in kubernetes I'd recommend plotting CPU throttling to make sure you are not running into that helpful throttling video. Increasing limits will help allow your application to burst into spikes. In addition, you can look into serving your application with more cpu (though that's costly since you are already only at 20% cpu usage).

Apart from that, you can look into attaching tensorboard to look at costly operations -- it is fairly easy to setup. I've not found any other parameters that have helped too much with this problem, only changing resources and changing batch size.

liumilan · 2022-03-28T10:19:40Z

Hi @liumilan

One gotcha I ran into is that the cpu of TF Serving container is rather spikey and does not show up in the 1min aggregates (so it uses 100% + cpu but on average shows < 50% in some cases). I'm not sure of your serving environment, but if it is in kubernetes I'd recommend plotting CPU throttling to make sure you are not running into that helpful throttling video. Increasing limits will help allow your application to burst into spikes. In addition, you can look into serving your application with more cpu (though that's costly since you are already only at 20% cpu usage).

Apart from that, you can look into attaching tensorboard to look at costly operations -- it is fairly easy to setup. I've not found any other parameters that have helped too much with this problem, only changing resources and changing batch size.

i have looked into attaching tensorboard to look at costly operations offline, i have found it cost much time on looking up embedding features,so can it possible to save this time ?@salliewalecka

salliewalecka · 2022-03-28T16:21:16Z

Hey, I don't have any more tips for you if it is the embedding features lookup is the bottleneck. Sorry!

liumilan · 2022-03-29T02:50:11Z

Hi @liumilan

Can you take a look at the workaround proposed in this thread and see if it helps in resolving your issue? Also you can refer to link1, link2 which discusses about similar problem. Thanks!

i don't think it is the same as issue.My bottleneck is it cost much time to embedding lookup,from tensorflow timeline.
@pindinagesh

liumilan · 2022-03-29T02:56:10Z

timeline-1.txt
@pindinagesh here is my timline,could u help to check it? just change name to timeline-1.json,and ,open it by chrome

liumilan · 2022-04-05T13:06:52Z

who can help to check this timeline?

vscv · 2022-04-06T00:34:40Z

In fact, other applications have similar performance issue #1991

liumilan · 2022-04-06T02:10:07Z

I also have the same low-performance issue. I guess it mainly comes from two parts:

It takes time to convert the image into JSON payload and POST.

TF serving itself is delayed (posts have been made several times in advance as a warm-up).

Therefore, the result of my POST test on the remote side and the local side is that the remote side (MBP + WIFI) takes 16 ~ 20 seconds to print res.josn. The local side takes 5 ~ 7 seconds. Also, I observed GPU usage, and it only ran (~70%) for less than a second during the entire POST.
# 1024x1024x3 image to json ans POST
image = PIL.Image.open(sys.argv[1])
payload = {"inputs": [image_np.tolist()]}
res = requests.request("POST", "http://2444.333.222.111:8501/v1/models/maskrcnn:predict", data=json.dumps(payload))
print(res.json())

my scenario is recommend ,not cv

liumilan · 2022-04-11T12:17:29Z

@pindinagesh @christisg could u help to check timline.json

singhniraj08 · 2023-04-06T05:01:26Z

@liumilan,

Can you please compare the time taken to generate predictions using Tensorflow runtime and then Tensorflow Serving. Underneath the hood, TensorFlow Serving uses the TensorFlow runtime to do the actual inference on your requests. This means the average latency of serving a request with TensorFlow Serving is usually at least that of doing inference directly with TensorFlow.
That would help us understand if the real issue is with Tensorflow Serving or with model. If embedding lookup is your bottleneck, I would suggest you to re-design your model with inference latency as a design constraint in mind.

In case, tail latency(time taken by tensoflow serving to do inference) results high, you can try gRPC API surface which is slightly more performant. Also, you can experiment with command-line flags (most notably tensorflow_intra_op_parallelism and tensorflow_inter_op_parallelism) to find right configuration for your specific workload and environment.

Thank you!

github-actions · 2023-04-14T01:52:38Z

This issue has been marked stale because it has no recent activity since 7 days. It will be closed if no further activity occurs. Thank you.

github-actions · 2023-04-21T01:54:14Z

This issue was closed due to lack of activity after being marked stale for past 7 days.

google-ml-butler · 2023-04-21T01:54:20Z

Are you satisfied with the resolution of your issue?
Yes
No

pindinagesh self-assigned this Mar 21, 2022

pindinagesh added the type:performance Performance Issue label Mar 21, 2022

pindinagesh added the stat:awaiting response label Mar 22, 2022

1025KB mentioned this issue Mar 28, 2022

can it possible improve the tf serving performance tensorflow/tfx#4759

Closed

pindinagesh assigned christisg and unassigned pindinagesh Mar 30, 2022

pindinagesh added stat:awaiting tensorflower and removed stat:awaiting response labels Mar 30, 2022

singhniraj08 assigned nniuzft and unassigned christisg Feb 17, 2023

singhniraj08 self-assigned this Apr 6, 2023

singhniraj08 added stat:awaiting response and removed stat:awaiting tensorflower labels Apr 6, 2023

github-actions bot added the stale This label marks the issue/pr stale - to be closed automatically if no activity label Apr 14, 2023

github-actions bot closed this as completed Apr 21, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

tf serving performace is so slow #1989

tf serving performace is so slow #1989

liumilan commented Mar 21, 2022

pindinagesh commented Mar 22, 2022

salliewalecka commented Mar 22, 2022 •

edited

Loading

liumilan commented Mar 28, 2022 •

edited

Loading

salliewalecka commented Mar 28, 2022

liumilan commented Mar 29, 2022

liumilan commented Mar 29, 2022

liumilan commented Apr 5, 2022

vscv commented Apr 6, 2022 •

edited

Loading

liumilan commented Apr 6, 2022

liumilan commented Apr 11, 2022

singhniraj08 commented Apr 6, 2023

github-actions bot commented Apr 14, 2023

github-actions bot commented Apr 21, 2023

google-ml-butler bot commented Apr 21, 2023

tf serving performace is so slow #1989

tf serving performace is so slow #1989

Comments

liumilan commented Mar 21, 2022

pindinagesh commented Mar 22, 2022

salliewalecka commented Mar 22, 2022 • edited Loading

liumilan commented Mar 28, 2022 • edited Loading

salliewalecka commented Mar 28, 2022

liumilan commented Mar 29, 2022

liumilan commented Mar 29, 2022

liumilan commented Apr 5, 2022

vscv commented Apr 6, 2022 • edited Loading

liumilan commented Apr 6, 2022

liumilan commented Apr 11, 2022

singhniraj08 commented Apr 6, 2023

github-actions bot commented Apr 14, 2023

github-actions bot commented Apr 21, 2023

google-ml-butler bot commented Apr 21, 2023

salliewalecka commented Mar 22, 2022 •

edited

Loading

liumilan commented Mar 28, 2022 •

edited

Loading

vscv commented Apr 6, 2022 •

edited

Loading