Introduce restate_benchmarks package with sequential and parallel throughput benchmark #561

tillrohrmann · 2023-07-05T10:28:31Z

This PR introduces the restate_benchmarks package with the sequential and parallel throughput benchmark.

The sequential throughput benchmark invokes counter.Counter/GetAndAdd sequentially with the same key.

The parallel throughput benchmark invokes counter.Counter/GetAndAdd concurrently with random keys.

The benchmarks are set up to support profiling via pprof. You can run them via cargo bench --bench throughput_parallel -- --profile-time=30. This will generate a flamegraph under target/criterion/throughput/parallel/profile/flamegraph.svg.

This PR also includes a couple of version upgrades to resolve dependency conflicts.

This PR fixes #560.

tillrohrmann · 2023-07-05T10:29:41Z

@slinkydeveloper it would be great if you could try to create the flamegraphs on your machine. I am not 100% whether the frame-pointer of the pprof-rs package works on your machine. If not, then we probably need to activate the feature conditionally depending on the actual platform.

tillrohrmann · 2023-07-05T10:45:04Z

Using the Java based counter.Counter service I obtain the current results on my machine:

throughput/parallel
time: [277.50 ms 283.08 ms 289.40 ms]
thrpt: [6.9109 Kelem/s 7.0652 Kelem/s 7.2073 Kelem/s]

throughput/sequential
time: [472.02 µs 483.38 µs 494.62 µs]
thrpt: [2.0218 Kelem/s 2.0688 Kelem/s 2.1185 Kelem/s]

slinkydeveloper

Looks good to me, just left a couple of comment with some ideas on benchmarks we can add in future. I'll now try to run it locally now.

src/restate/benches/proto/counter.proto

slinkydeveloper · 2023-07-06T08:45:01Z

src/restate/benches/sequential_throughput.rs

+async fn send_sequential_counter_requests(
+    mut counter_client: CounterClient<tonic::transport::Channel>,
+    num_requests: u64,
+) {
+    for _ in 0..num_requests {
+        counter_client
+            .get_and_add(CounterAddRequest {
+                counter_name: "single".into(),
+                value: 10,
+            })
+            .await
+            .expect("counter.Counter::get_and_add should not fail");
+    }
+}


An interesting variant of this benchmark could be to send many requests in parallel (always with the same key), and then await on them all at once. Perhaps this might be interesting to test if there is some contention going on in the ingress_grpc? (e.g. for the registry?)

True. Maybe this could also be a more component targeted benchmark that only tests the ingress_grpc component.

src/restate/benches/sequential_throughput.rs

slinkydeveloper · 2023-07-06T08:47:09Z

src/restate/benches/sequential_throughput.rs

+            .await
+            .expect("counter.Counter::get_and_add should not fail");
+    }
+}


Another bench variant could be to send requests using connect/http, rather than grpc, to see whether json encoding/decoding has some effective cost on the requests

True. This could also be a more component targeted benchmark.

src/benchmarks/README.md

slinkydeveloper · 2023-07-06T10:51:53Z

Using the java counter as well:

throughput/parallel     time:   [843.31 ms 878.67 ms 919.21 ms]
                        thrpt:  [2.1758 Kelem/s 2.2762 Kelem/s 2.3716 Kelem/s]
throughput/sequential   time:   [41.169 ms 41.230 ms 41.293 ms]
                        thrpt:  [24.217  elem/s 24.254  elem/s 24.290  elem/s]

Wondering why it's so low my sequential time?!

slinkydeveloper · 2023-07-06T11:52:19Z

I did not managed to get flamegraphs on my machine working :( I suggest we go ahead and merge this PR, and then i'll take the issue of figuring out why it doesn't work on my machine

slinkydeveloper · 2023-07-06T12:18:48Z

I guess it's also worth figuring out why on my machine the performance of the sequential case is so terrible

tillrohrmann · 2023-07-06T12:54:30Z

Using the java counter as well:

throughput/parallel     time:   [843.31 ms 878.67 ms 919.21 ms]
                        thrpt:  [2.1758 Kelem/s 2.2762 Kelem/s 2.3716 Kelem/s]
throughput/sequential   time:   [41.169 ms 41.230 ms 41.293 ms]
                        thrpt:  [24.217  elem/s 24.254  elem/s 24.290  elem/s]

Wondering why it's so low my sequential time?!

These numbers are indeed quite low. Not sure what is happening on your machine. This deserves indeed some further investigation.

I suggest we go ahead and merge this PR, and then i'll take the issue of figuring out why it doesn't work on my machine

Alright, sounds good to me.

tillrohrmann · 2023-07-06T12:54:52Z

Thanks for the review @slinkydeveloper. I've addressed your comments. Rebasing and then merging this PR.

This fixes #560. Add exception for inferno library which is under CDDL-1.0 We can use the inferno library (CDDL-1.0) because we don't modify its source code. Any such modifications would have to be made source code available under CDDL-1.0 as well.

Builders are generated via the derive_builder crate which creates a builder via a procedural macro.

…arks

This resolves a clash between different transitive dependencies. Add exception for duplicate regex-syntax dependency The regex-syntax dependency is a transitive dependency of tracing-subscriber and datafusion. Add exception for duplicate quick-xml dependency The dependency is pulled in by datafusion and pprof which is being used for profiling our benchmarks.

tillrohrmann requested a review from slinkydeveloper July 5, 2023 10:28

slinkydeveloper reviewed Jul 6, 2023

View reviewed changes

src/benchmarks/README.md Show resolved Hide resolved

tillrohrmann added 12 commits July 6, 2023 16:23

Update .gitignore to ignore tracing files

4af472b

Move criterion dependency to top-level Cargo.toml

2b464ec

Make Restate modules part of library for reuse in benchmark

4ad50e8

Add sequential_throughput benchmark to restate package

7899fdb

Bump Tokio to 1.29.1 and Tokio-stream to 0.1.14

442d225

Introduce builders for our configuration and options

fe522dc

Builders are generated via the derive_builder crate which creates a builder via a procedural macro.

Update benchmarks to use custom Restate configuration

1a3d5b2

Add debug symbols to benchmark builds for better tracing

717efea

Add benchmarks/README.md to explain how to run and profile the benchm…

a621898

…arks

Bump prost-reflect to 0.11.4 and remove unneeded feature text-format

497fbf7

tillrohrmann merged commit 497fbf7 into restatedev:main Jul 6, 2023

tillrohrmann deleted the benchmark branch July 6, 2023 14:43

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Introduce restate_benchmarks package with sequential and parallel throughput benchmark #561

Introduce restate_benchmarks package with sequential and parallel throughput benchmark #561

tillrohrmann commented Jul 5, 2023

tillrohrmann commented Jul 5, 2023

tillrohrmann commented Jul 5, 2023

slinkydeveloper left a comment

slinkydeveloper Jul 6, 2023

tillrohrmann Jul 6, 2023

slinkydeveloper Jul 6, 2023

tillrohrmann Jul 6, 2023

slinkydeveloper commented Jul 6, 2023

slinkydeveloper commented Jul 6, 2023

slinkydeveloper commented Jul 6, 2023

tillrohrmann commented Jul 6, 2023

tillrohrmann commented Jul 6, 2023

Introduce restate_benchmarks package with sequential and parallel throughput benchmark #561

Introduce restate_benchmarks package with sequential and parallel throughput benchmark #561

Conversation

tillrohrmann commented Jul 5, 2023

tillrohrmann commented Jul 5, 2023

tillrohrmann commented Jul 5, 2023

slinkydeveloper left a comment

Choose a reason for hiding this comment

slinkydeveloper Jul 6, 2023

Choose a reason for hiding this comment

tillrohrmann Jul 6, 2023

Choose a reason for hiding this comment

slinkydeveloper Jul 6, 2023

Choose a reason for hiding this comment

tillrohrmann Jul 6, 2023

Choose a reason for hiding this comment

slinkydeveloper commented Jul 6, 2023

slinkydeveloper commented Jul 6, 2023

slinkydeveloper commented Jul 6, 2023

tillrohrmann commented Jul 6, 2023

tillrohrmann commented Jul 6, 2023