[Serve] resnet50 benchmarking #29096

sihanwang41 · 2022-10-05T22:42:58Z

Signed-off-by: Sihan Wang [email protected]

Why are these changes needed?

Resnet50 is being used following the MLCommon
Besides MLCommon only testing on the model inference, we also integrate with the data download and tensor converting from image step, which is more similar to the real world use case.
CPU is for downloading and tensor conversion, GPU is for model inference in the release tests. Observing a big latency with big data transmission if all images sent to single replica, so it is intentionally split the images into different replicas for doing the downloading and tensor conversion. It boost the throughput quite much.

Related issue number

Checks

I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
I've run scripts/format.sh to lint the changes in this PR.
I've included any doc changes needed for https://docs.ray.io/en/master/.
I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
Testing Strategy
- Unit tests
- Release tests
- This PR is not tested :(

architkulkarni

Looks good to me, just had a few questions.

Do you mind adding a short description in a comment at the top of the benchmark.py? It could just be what you have in the PR description

release/serve_tests/workloads/serve_resnet_benchmark.py

architkulkarni · 2022-10-06T16:08:34Z

release/serve_tests/workloads/serve_resnet_benchmark.py

+
+    async def fetch(session):
+        async with session.get(
+            "http://localhost:8000/", json=input_uris * int(data_size / len(input_uris))


Looks like we're repeating the images here. Probably a dumb question but the inference doesn't do any caching of the results anywhere right? If it did, the benchmark wouldn't be correct

yes, we don't do any caching. Caching results is not under this scope.

release/serve_tests/workloads/serve_resnet_benchmark.py

architkulkarni · 2022-10-06T16:23:31Z

release/serve_tests/workloads/serve_resnet_benchmark.py

+
+        save_test_results(
+            {test_name: result},
+            default_output_file="/tmp/serve_resent_benchmark.json",


Do you know how the release test infra finds this file? It might have to be named /tmp/release_test_out.json or use the env var TEST_OUTPUT_JSON in order for the "fetch results" step to work, do you mind double checking this?

+1 please use TEST_OUTPUT_JSON and follow other files for the JSON schema.

so it can be shown in our perf dashboard

simon-mo

should we send uris or should we send the image directly? what is this benchmark exactly built to test/evaluate?

release/serve_tests/workloads/serve_resnet_benchmark.py

release/serve_tests/compute_tpl_gpu_node.yaml

release/serve_tests/workloads/serve_resnet_benchmark.py

simon-mo · 2022-10-06T21:00:45Z

release/serve_tests/workloads/serve_resnet_benchmark.py

+        )
+
+    async def _get_tensor_from_img(self, uri: str):
+        return await asyncio.coroutine(self.utils.prepare_input_from_uri)(uri)


if prepare_input_from_uri is blocking, making them async won't help here.

release/serve_tests/workloads/serve_resnet_benchmark.py

sihanwang41 · 2022-10-06T21:09:31Z

should we send uris or should we send the image directly? what is this benchmark exactly built to test/evaluate?

Test: CPU + GPU + Resnet performance

I think it is more practical to download image and convert them to tensor inside the deployment code instead of passing image directly from http request.

simon-mo · 2022-10-06T21:15:28Z

I see. if downloading is on the critical path, we should definite put them in s3

simon-mo

Please see Archit's comment about TEST_OUTPUT_JSON.

simon-mo · 2022-10-07T16:55:18Z

We can merge this after a successful release test demo run.

simon-mo · 2022-10-07T16:55:56Z

(as a stretch goal, a smoke test would be preferred because so we can run it in CI as well: https://github.com/ray-project/ray/blob/master/release/BUILD#L5-L39)

c21 · 2022-10-08T00:31:26Z

Can it be merged?

simon-mo · 2022-10-12T20:16:16Z





release/serve_tests/workloads/serve_resnet_benchmark.py:85:65: F841 local variable 'fp' is assigned to but never used
 

<br class="Apple-interchange-newline">

Lint failed

Signed-off-by: Sihan Wang <[email protected]>

…et50_benchmark

Signed-off-by: Sihan Wang <[email protected]>

Signed-off-by: Weichen Xu <[email protected]>

sihanwang41 force-pushed the resnet50_benchmark branch from bb2ae08 to e6d066b Compare October 5, 2022 22:56

sihanwang41 changed the title ~~[Serve] restnet 50 benchmarking~~ [Serve] restnet50 benchmarking Oct 5, 2022

sihanwang41 force-pushed the resnet50_benchmark branch from e6d066b to e4bb798 Compare October 6, 2022 03:15

sihanwang41 assigned architkulkarni, simon-mo and shrekris-anyscale Oct 6, 2022

architkulkarni approved these changes Oct 6, 2022

View reviewed changes

architkulkarni reviewed Oct 6, 2022

View reviewed changes

sihanwang41 force-pushed the resnet50_benchmark branch from 5ff8f29 to 2f323dd Compare October 6, 2022 19:48

simon-mo reviewed Oct 6, 2022

View reviewed changes

sihanwang41 force-pushed the resnet50_benchmark branch from 0f56c7d to a3e58c0 Compare October 6, 2022 22:12

simon-mo approved these changes Oct 7, 2022

View reviewed changes

c21 added the Ray 2.1 label Oct 7, 2022

c21 removed the Ray 2.1 label Oct 8, 2022

sihanwang41 force-pushed the resnet50_benchmark branch from a3e58c0 to 02e55eb Compare October 11, 2022 22:36

sihanwang41 added 7 commits October 13, 2022 16:31

[Serve] restnet50 benchmarking

945cb9b

Signed-off-by: Sihan Wang <[email protected]>

Update

0a46c49

Signed-off-by: Sihan Wang <[email protected]>

Add a fake file to trigger wheel build

c6b6479

Signed-off-by: Sihan Wang <[email protected]>

Address comments

bab3980

Signed-off-by: Sihan Wang <[email protected]>

Add starlette install

423477f

Signed-off-by: Sihan Wang <[email protected]>

Fix torch hub download

3e4d4af

Signed-off-by: Sihan Wang <[email protected]>

Fix lint

7fd203a

Signed-off-by: Sihan Wang <[email protected]>

sihanwang41 force-pushed the resnet50_benchmark branch from 56b2b23 to 7fd203a Compare October 13, 2022 23:31

Remove the starlette install

7b6ef3a

Signed-off-by: Sihan Wang <[email protected]>

richardliaw changed the title ~~[Serve] restnet50 benchmarking~~ [Serve] resnet50 benchmarking Oct 14, 2022

Merge branch 'master' of https://github.com/ray-project/ray into resn…

36926f4

…et50_benchmark

sihanwang41 added the release-blocker P0 Issue that blocks the release label Oct 14, 2022

Update starlette install

ebb85a5

Signed-off-by: Sihan Wang <[email protected]>

simon-mo merged commit 08fbdfb into ray-project:master Oct 17, 2022

sihanwang41 added a commit to sihanwang41/ray that referenced this pull request Oct 20, 2022

[Serve] resnet50 benchmarking (ray-project#29096)

8090ee7

sihanwang41 added a commit to sihanwang41/ray that referenced this pull request Oct 20, 2022

[Serve] resnet50 benchmarking (ray-project#29096)

907ed2c

Signed-off-by: Sihan Wang <[email protected]>

rickyyx pushed a commit that referenced this pull request Oct 21, 2022

[Serve] resnet50 benchmarking (#29096) (#29511)

fe05642

Signed-off-by: Sihan Wang <[email protected]>

WeichenXu123 pushed a commit to WeichenXu123/ray that referenced this pull request Dec 19, 2022

[Serve] resnet50 benchmarking (ray-project#29096)

0708119

Signed-off-by: Weichen Xu <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Serve] resnet50 benchmarking #29096

[Serve] resnet50 benchmarking #29096

sihanwang41 commented Oct 5, 2022 •

edited

Loading

architkulkarni left a comment

architkulkarni Oct 6, 2022

sihanwang41 Oct 6, 2022

architkulkarni Oct 6, 2022

simon-mo Oct 7, 2022

simon-mo Oct 7, 2022

simon-mo left a comment

simon-mo Oct 6, 2022

sihanwang41 Oct 6, 2022

sihanwang41 commented Oct 6, 2022

simon-mo commented Oct 6, 2022

simon-mo left a comment

simon-mo commented Oct 7, 2022

simon-mo commented Oct 7, 2022

c21 commented Oct 8, 2022

simon-mo commented Oct 12, 2022

[Serve] resnet50 benchmarking #29096

[Serve] resnet50 benchmarking #29096

Conversation

sihanwang41 commented Oct 5, 2022 • edited Loading

Why are these changes needed?

Related issue number

Checks

architkulkarni left a comment

Choose a reason for hiding this comment

architkulkarni Oct 6, 2022

Choose a reason for hiding this comment

sihanwang41 Oct 6, 2022

Choose a reason for hiding this comment

architkulkarni Oct 6, 2022

Choose a reason for hiding this comment

simon-mo Oct 7, 2022

Choose a reason for hiding this comment

simon-mo Oct 7, 2022

Choose a reason for hiding this comment

simon-mo left a comment

Choose a reason for hiding this comment

simon-mo Oct 6, 2022

Choose a reason for hiding this comment

sihanwang41 Oct 6, 2022

Choose a reason for hiding this comment

sihanwang41 commented Oct 6, 2022

simon-mo commented Oct 6, 2022

simon-mo left a comment

Choose a reason for hiding this comment

simon-mo commented Oct 7, 2022

simon-mo commented Oct 7, 2022

c21 commented Oct 8, 2022

simon-mo commented Oct 12, 2022

sihanwang41 commented Oct 5, 2022 •

edited

Loading