Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce restate_benchmarks package with sequential and parallel throughput benchmark #561

Merged
merged 12 commits into from
Jul 6, 2023
Merged

Conversation

tillrohrmann
Copy link
Contributor

This PR introduces the restate_benchmarks package with the sequential and parallel throughput benchmark.

The sequential throughput benchmark invokes counter.Counter/GetAndAdd sequentially with the same key.

The parallel throughput benchmark invokes counter.Counter/GetAndAdd concurrently with random keys.

The benchmarks are set up to support profiling via pprof. You can run them via cargo bench --bench throughput_parallel -- --profile-time=30. This will generate a flamegraph under target/criterion/throughput/parallel/profile/flamegraph.svg.

This PR also includes a couple of version upgrades to resolve dependency conflicts.

This PR fixes #560.

@tillrohrmann
Copy link
Contributor Author

@slinkydeveloper it would be great if you could try to create the flamegraphs on your machine. I am not 100% whether the frame-pointer of the pprof-rs package works on your machine. If not, then we probably need to activate the feature conditionally depending on the actual platform.

@tillrohrmann
Copy link
Contributor Author

Using the Java based counter.Counter service I obtain the current results on my machine:

throughput/parallel
time: [277.50 ms 283.08 ms 289.40 ms]
thrpt: [6.9109 Kelem/s 7.0652 Kelem/s 7.2073 Kelem/s]

throughput/sequential
time: [472.02 µs 483.38 µs 494.62 µs]
thrpt: [2.0218 Kelem/s 2.0688 Kelem/s 2.1185 Kelem/s]

Copy link
Contributor

@slinkydeveloper slinkydeveloper left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, just left a couple of comment with some ideas on benchmarks we can add in future. I'll now try to run it locally now.

src/restate/benches/proto/counter.proto Outdated Show resolved Hide resolved
Comment on lines 61 to 74
async fn send_sequential_counter_requests(
mut counter_client: CounterClient<tonic::transport::Channel>,
num_requests: u64,
) {
for _ in 0..num_requests {
counter_client
.get_and_add(CounterAddRequest {
counter_name: "single".into(),
value: 10,
})
.await
.expect("counter.Counter::get_and_add should not fail");
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An interesting variant of this benchmark could be to send many requests in parallel (always with the same key), and then await on them all at once. Perhaps this might be interesting to test if there is some contention going on in the ingress_grpc? (e.g. for the registry?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True. Maybe this could also be a more component targeted benchmark that only tests the ingress_grpc component.

src/restate/benches/sequential_throughput.rs Outdated Show resolved Hide resolved
.await
.expect("counter.Counter::get_and_add should not fail");
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another bench variant could be to send requests using connect/http, rather than grpc, to see whether json encoding/decoding has some effective cost on the requests

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True. This could also be a more component targeted benchmark.

@slinkydeveloper
Copy link
Contributor

Using the java counter as well:

throughput/parallel     time:   [843.31 ms 878.67 ms 919.21 ms]
                        thrpt:  [2.1758 Kelem/s 2.2762 Kelem/s 2.3716 Kelem/s]
throughput/sequential   time:   [41.169 ms 41.230 ms 41.293 ms]
                        thrpt:  [24.217  elem/s 24.254  elem/s 24.290  elem/s]

Wondering why it's so low my sequential time?!

@slinkydeveloper
Copy link
Contributor

I did not managed to get flamegraphs on my machine working :( I suggest we go ahead and merge this PR, and then i'll take the issue of figuring out why it doesn't work on my machine

@slinkydeveloper
Copy link
Contributor

I guess it's also worth figuring out why on my machine the performance of the sequential case is so terrible

@tillrohrmann
Copy link
Contributor Author

Using the java counter as well:

throughput/parallel     time:   [843.31 ms 878.67 ms 919.21 ms]
                        thrpt:  [2.1758 Kelem/s 2.2762 Kelem/s 2.3716 Kelem/s]
throughput/sequential   time:   [41.169 ms 41.230 ms 41.293 ms]
                        thrpt:  [24.217  elem/s 24.254  elem/s 24.290  elem/s]

Wondering why it's so low my sequential time?!

These numbers are indeed quite low. Not sure what is happening on your machine. This deserves indeed some further investigation.

I suggest we go ahead and merge this PR, and then i'll take the issue of figuring out why it doesn't work on my machine

Alright, sounds good to me.

@tillrohrmann
Copy link
Contributor Author

Thanks for the review @slinkydeveloper. I've addressed your comments. Rebasing and then merging this PR.

This fixes #560.

Add exception for inferno library which is under CDDL-1.0

We can use the inferno library (CDDL-1.0) because we don't modify its
source code. Any such modifications would have to be made source code
available under CDDL-1.0 as well.
Builders are generated via the derive_builder crate which
creates a builder via a procedural macro.
This resolves a clash between different transitive dependencies.

Add exception for duplicate regex-syntax dependency

The regex-syntax dependency is a transitive dependency of tracing-subscriber
and datafusion.

Add exception for duplicate quick-xml dependency

The dependency is pulled in by datafusion and pprof which is being
used for profiling our benchmarks.
@tillrohrmann tillrohrmann merged commit 497fbf7 into restatedev:main Jul 6, 2023
@tillrohrmann tillrohrmann deleted the benchmark branch July 6, 2023 14:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add benchmarks for measuring throughput
2 participants