-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Router::nest
has poor performance on 0.6
#1624
Comments
That is hard to say with such a big amount of code. Are you able to make a smaller repro? |
Well, that will take some work (and time), but I will try... |
We do have some benchmarks here that you try to poke at and see if you notice anything. |
Ok, I will try to work something out from that... |
One thing that could be relevant: Are you nesting routers? If yes, can you try instead adding prefix to all sub-routes individually and |
Also @davidpdrsn could we change the implementation of |
@jplatte : I just tried it, and although the merged routes were not routed to in the benchmark, I DID get back the original performance ! Amazing ! So, that solve my problem ! Thanks a lot ! |
If
Thats basically what we did in 0.5, but in a problematic way where we only had one First thing should be to add a benchmark for it. Would also be good to see if we can improve the general |
We've seen an even worse drop from 1500req/s to 75req/s (that is for an endpoint that only returns some static data). Will check if removing nesting fixes it. |
That sounds really bad 😬 I'm curious to hear if it is just because of
A repro of that would be great! Off the top of my head I don't think there were any intentional changes. |
We're running into the same problem and nesting, heavily reduces the performance. I made a little example that demonstrates the Problem. Changing from 0.5 -> 0.6 reduces the req/s from ~ 20'000 req/s to 1'500 req/s further nesting makes it worse. EDIT: forgot to mention that the numbers where in debug mode. I've also tested in release now but its just different numbers but the same problem 27'000 req/s to 2500 req/s use axum::{
routing::get,
http::StatusCode,
response::IntoResponse,
Json, Router,
};
use serde::Serialize;
use std::net::SocketAddr;
#[tokio::main]
async fn main() {
// ab -c 15 -n 100 http://localhost:3001/v1/v2/v3/v4/v5/users -> 1500 req/s on 0.6 | 20'000 req/s on 0.5
/******************************************************************/
let users = Router::new().route("/users", get(get_user));
let v5 = Router::new().nest("/v5", users);
let v4 = Router::new().nest("/v4", v5);
let v3 = Router::new().nest("/v3", v4);
let v2 = Router::new().nest("/v2", v3);
let app = Router::new().nest("/v1", v2);
/******************************************************************/
// ab -c 15 -n 100 http://localhost:3001/v1/v2/v3/v4/v5/users -> 20'000 req/s for both 0.5 + 0.6
/******************************************************************/
// let app = Router::new().route("/v1/v2/v3/v4/v5/users", get(get_user));
/******************************************************************/
let addr = SocketAddr::from(([127, 0, 0, 1], 3001));
tracing::debug!("listening on {}", addr);
axum::Server::bind(&addr)
.serve(app.into_make_service())
.await
.unwrap();
}
async fn get_user() -> impl IntoResponse {
let user = User {
id: 1337,
username: "payload.username".to_owned(),
};
(StatusCode::OK, Json(user))
}
#[derive(Serialize)]
struct User {
id: u64,
username: String,
} |
Router::nest
has poor performance on 0.6
Final numbers are in (release builds this time 😅):
Note that I got very inconsistent results with outliers of ~30% in some test runs, so take the above with a very big grain of salt. But I would say those results are consistent with this being only a nesting issue without any other regression. |
@FSMaxB thanks for providing those numbers! Definitely something going on that we should look into. |
with 0.6 on the left and 0.5 on the right side 0.5 on the left and 0.6 on the right side (sorry) On a single call of the URL we get with 0.6 |
A silly question here - given how nest is designed, would it not be always slower than merge? Since nested routers essentially need to be called after the "outer" router has worked out the path, would it not need to allocate a new String to hold the "tail" portion of the path? Line 440 in 1959658
With that in mind, would it not be reasonable to have some sort of better way to handle that? Extra allocation per request is not such a great idea for performance AFAIK... PS: apologies if I am talking nonsense here ... |
@alexpyattaev yes it would. The plan is to flatten the routes like we did in 0.5. See #1711. That'll give the same performance as regular routes because that's what it'll be internally. |
Not sure that is actually fixed. |
@nicolaspernoud I can't reproduce those numbers with the benchmarks we have in this repo. Can you provide a benchmark? Edit: I've added the benchmarks I use in 377d73a |
@davidpdrsn , well my use case, as noted before, is quite complicated, atrium being an axum based reverse proxy. Still, the benchmark I use is here : https://github.com/nicolaspernoud/atrium/blob/development/scripts/benchmark/benchmark.sh . The file to alter to switch between nest/merge is here : https://github.com/nicolaspernoud/atrium/blob/development/backend/src/server.rs . When reverting to merge, the performance impact is clear just by altering Cargo.toml and pinning axum between 0.6.12/15. Maybe the performance hit is not due to the change on nest and comes from somewhere else altogether… |
That's quite a lot of setup indeed. Are you able to try and make a smaller reproduction script? I'm afraid I can't really debug that. No other changes in 0.6.15 come to mind. |
As for my case, there doesn't seem to be a regression in performance, or if so only a small one. The codebase has evolved a bit since then, but let's redo the numbers:
I also don't have the nested code anymore, so can't test with that. Given that the values varied between 620 and 780 req/s I would say that the above is not statistically significant (without having made any statistical analysis on the data, but I've seen very large outliers in all cases and they tended to settle differently between runs.) |
@davidpdrsn I will try to set up a minimum test project when I have some time. |
@FSMaxB thanks for testing! I'm a bit less worried now 😅 I also tried making a micro benchmark that doesn't use hyper (and therefore not the network) and just calls the router directly. Code for that is here https://gist.github.com/davidpdrsn/4d60c2a7a16047f4eff76b6e71fe055d To send 10k requests I get these numbers:
So the performance does seem comparable between 0.6.15 and 0.5, which is good. I'm curious if there is anything we can do to make nesting faster in general. It does additional processing such as stripping the matched prefix so does make sense that it'd be slower.
Thanks! |
I ran the tests (in release mode) from my comment
The ~10% difference between |
@davidpdrsn Hello, I tried to put together a simplified version of my app here https://github.com/nicolaspernoud/axum_benchmark : it does not proxy anything but only answers hello world, the webdav handler is gone, and the complex TLS/Let's encrypt configuration as well. I let some of the custom extractors, middlewares and global state to keep it realistic. It could be simplified some more if needed, but that might lose some of the specificities that make the performance hit. You normally just need to clone the repo and run benchmark.sh to see the results. There is some included in the reports directory. The switching between 0.6.12 and 0.6.15 is brutally made with a sed in Cargo.toml... The nested performance between 0.6.12 and 0.6.15 is quite similar. But the merged performance is a lot better in 0.6.12.
|
There's gotta be something else going on with your setup if Are you able to reproduce a similar difference without doing any networking and instead just calling the axum router directly, like my micro benchmark does? axum doesn't handle any networking stuff like parsing the incoming stream or dealing with connections. Hyper does all that. So it's easier to evaluate axum's performance by bypassing all that. |
My guess is that is not a merge vs nest problem anymore but a v12 vs v15 problem on something else, that is more easy to see now that merge and nest have the same performance. It happens on an app that has some extractors, middlewares, etc. That could be why isolated or micro benchmarks do not show it. I will try (next week) to narrow the version which causes the performance gap between v12 and v15, to see when something was changed, other than the nest improvement. |
The overhead of middleware or extractors should be constant regardless how you build your router. I changed my benchmark to instead print requests per second and compare 0.5.17, 0.6.12, and 0.6.15:
This shows that 0.6.15 is roughly inline with 0.5. So I do believe this issue has been fixed. I'll close this again but please do comment if you find more issues! |
I ran several more tests that you can find here : https://github.com/nicolaspernoud/axum_benchmark/tree/main/reports |
0.6.13 contained a I updated my numbers to include 12, 13, 14, and 15 (and a test with a middleware):
Doesn't seem like there is an issue to me. |
Well, that is the results I get :
So I am pretty positive that "something" makes it run slower. It may or may not be related to the merge/nest change, but be a side effect of something else (?). |
I can't run it:
This is after I comment out the This is why I made a script that you just |
@nicolaspernoud are you in the tokio discord? Might be quicker to chat there 😅 and we avoid spamming people with notifications while we figure stuff out. |
@davidpdrsn : yes, joining you here ! |
It turns out this was a decrease in the performance of @nicolaspernoud's used I don't think there is anything we need to do in axum but I documented that |
Bug Report (more of a questions actually)
I know it is kind of a blind shot, but maybe someone can give some insight into this...
Version
├── axum v0.6.1
│ ├── axum-core v0.3.0
├── axum-extra v0.4.2
│ ├── axum v0.6.1 ()
├── axum-server v0.4.2
│ ├── axum-server v0.4.2 ()
Platform
Linux *** 5.19.0-26-generic #27-Ubuntu SMP PREEMPT_DYNAMIC Wed Nov 23 20:44:15 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
Crates
axum = { version="0.6.1", features = ["query", "json", "http2", "tokio", "headers"], default-features = false }
axum-extra = { version = "0.4.2", features = ["cookie-private"], default-features = false }
Description
I migrated my app from axum 0.5 to axum 0.6 as it is detailed here : nicolaspernoud/atrium@d002700#diff-28504f0a0ed99657d5f5bee11b72eb6ef2db4faf68b0f6e94bfe28c77e34e35a .
The old code manages approximately 24 500 req/sec in a reverse proxy use case.
The new code and so on manage approximately 16 500 req/sec, which is a huge drop.
The new code also use state instead of extensions, but I am not sure that is the problem, it seems to be related to axum itself : a temp branch kept extensions for configuration and it was kind of the same performance.
Any idea or advice on how to improve that ?
Test code
The test code is here : https://github.com/nicolaspernoud/go_vs_rust_reverse_proxy_benchmark/blob/main/test_atrium.sh .
By the way, in a super simple use case as shown here : https://github.com/nicolaspernoud/go_vs_rust_reverse_proxy_benchmark/blob/main/test.sh ; that is not the case, axum 0.6 is actually slightly faster.
I suppose that is related to the way axum extract something from the request (?).
The text was updated successfully, but these errors were encountered: