`Router::nest` has poor performance on 0.6 #1624

nicolaspernoud · 2022-12-07T13:25:02Z

Bug Report (more of a questions actually)

I know it is kind of a blind shot, but maybe someone can give some insight into this...

Version

├── axum v0.6.1
│ ├── axum-core v0.3.0
├── axum-extra v0.4.2
│ ├── axum v0.6.1 ()
├── axum-server v0.4.2
│ ├── axum-server v0.4.2 ()

Platform

Linux *** 5.19.0-26-generic #27-Ubuntu SMP PREEMPT_DYNAMIC Wed Nov 23 20:44:15 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux

Crates

axum = { version="0.6.1", features = ["query", "json", "http2", "tokio", "headers"], default-features = false }
axum-extra = { version = "0.4.2", features = ["cookie-private"], default-features = false }

Description

I migrated my app from axum 0.5 to axum 0.6 as it is detailed here : nicolaspernoud/atrium@d002700#diff-28504f0a0ed99657d5f5bee11b72eb6ef2db4faf68b0f6e94bfe28c77e34e35a .
The old code manages approximately 24 500 req/sec in a reverse proxy use case.
The new code and so on manage approximately 16 500 req/sec, which is a huge drop.
The new code also use state instead of extensions, but I am not sure that is the problem, it seems to be related to axum itself : a temp branch kept extensions for configuration and it was kind of the same performance.

Any idea or advice on how to improve that ?

Test code

The test code is here : https://github.com/nicolaspernoud/go_vs_rust_reverse_proxy_benchmark/blob/main/test_atrium.sh .

By the way, in a super simple use case as shown here : https://github.com/nicolaspernoud/go_vs_rust_reverse_proxy_benchmark/blob/main/test.sh ; that is not the case, axum 0.6 is actually slightly faster.

I suppose that is related to the way axum extract something from the request (?).

The text was updated successfully, but these errors were encountered:

davidpdrsn · 2022-12-07T13:54:36Z

That is hard to say with such a big amount of code. Are you able to make a smaller repro?

nicolaspernoud · 2022-12-07T14:08:31Z

Well, that will take some work (and time), but I will try...

davidpdrsn · 2022-12-07T14:16:11Z

We do have some benchmarks here that you try to poke at and see if you notice anything.

nicolaspernoud · 2022-12-07T14:22:02Z

Ok, I will try to work something out from that...

jplatte · 2022-12-07T14:30:36Z

One thing that could be relevant: Are you nesting routers? If yes, can you try instead adding prefix to all sub-routes individually and .merge the previously nested routers into the main one instead?

jplatte · 2022-12-07T14:32:26Z

Also @davidpdrsn could we change the implementation of nest (not nest_service) to use merge instead of nest_service? Regardless of performance, I think this could actually simplify things and notably, it should mean that the "sometimes can't extract matched path in middleware" thing only affects non-Router services.

nicolaspernoud · 2022-12-07T14:51:43Z

@jplatte : I just tried it, and although the merged routes were not routed to in the benchmark, I DID get back the original performance ! Amazing ! So, that solve my problem ! Thanks a lot !
@davidpdrsn : I let to you choosing to close the issue right now or waiting for a change in the nest implementation... But for me, that is ok.

davidpdrsn · 2022-12-07T17:26:14Z

I let to you choosing to close the issue right now or waiting for a change in the nest implementation... But for me, that is ok.

If nest is slow then we should try and fix that. So I'll leave this open for now.

Also @davidpdrsn could we change the implementation of nest (not nest_service) to use merge instead of nest_service? Regardless of performance, I think this could actually simplify things and notably, it should mean that the "sometimes can't extract matched path in middleware" thing only affects non-Router services.

Thats basically what we did in 0.5, but in a problematic way where we only had one nest function with two code paths. Maybe its fine now since we have different nest functions 🤔 As that seems to fix the issue its definitely worth considering.

First thing should be to add a benchmark for it. Would also be good to see if we can improve the general nest performance.

FSMaxB · 2022-12-14T08:10:28Z

We've seen an even worse drop from 1500req/s to 75req/s (that is for an endpoint that only returns some static data). Will check if removing nesting fixes it.
Admittedly, this slowdown includes some other changes in how the router is built because just switching from 0.5 to 0.6 broke the merging of routes somehow, showing conflicts that weren't there before. If it helps I can also try to reduce the code to get a reproducer for that other issue, just don't know when I'll find the time to do that.

davidpdrsn · 2022-12-14T08:14:45Z

from 1500req/s to 75req/s

That sounds really bad 😬 I'm curious to hear if it is just because of nest, though that sounds quite extreme.

switching from 0.5 to 0.6 broke the merging of routes somehow, showing conflicts that weren't there before.

A repro of that would be great! Off the top of my head I don't think there were any intentional changes. merge is equivalent to just calling route yourself for each route so it shouldn't change which routes overlap which each other.

pythoneer · 2022-12-14T09:18:57Z

We're running into the same problem and nesting, heavily reduces the performance. I made a little example that demonstrates the Problem. Changing from 0.5 -> 0.6 reduces the req/s from ~ 20'000 req/s to 1'500 req/s further nesting makes it worse.

EDIT: forgot to mention that the numbers where in debug mode. I've also tested in release now but its just different numbers but the same problem 27'000 req/s to 2500 req/s

use axum::{
    routing::get,
    http::StatusCode,
    response::IntoResponse,
    Json, Router,
};
use serde::Serialize;
use std::net::SocketAddr;

#[tokio::main]
async fn main() {

    // ab -c 15 -n 100 http://localhost:3001/v1/v2/v3/v4/v5/users  -> 1500 req/s on 0.6 | 20'000 req/s on 0.5
    /******************************************************************/
    let users = Router::new().route("/users", get(get_user));
    let v5 = Router::new().nest("/v5", users);
    let v4 = Router::new().nest("/v4", v5);
    let v3 = Router::new().nest("/v3", v4);
    let v2 = Router::new().nest("/v2", v3);

    let app = Router::new().nest("/v1", v2);
    /******************************************************************/

    // ab -c 15 -n 100 http://localhost:3001/v1/v2/v3/v4/v5/users  -> 20'000 req/s for both 0.5 + 0.6
    /******************************************************************/
    // let app = Router::new().route("/v1/v2/v3/v4/v5/users", get(get_user));
    /******************************************************************/

    let addr = SocketAddr::from(([127, 0, 0, 1], 3001));
    tracing::debug!("listening on {}", addr);
    axum::Server::bind(&addr)
        .serve(app.into_make_service())
        .await
        .unwrap();
}

async fn get_user() -> impl IntoResponse {
    let user = User {
        id: 1337,
        username: "payload.username".to_owned(),
    };

    (StatusCode::OK, Json(user))
}

#[derive(Serialize)]
struct User {
    id: u64,
    username: String,
}

FSMaxB · 2022-12-14T09:57:47Z

I'm curious to hear if it is just because of nest, though that sounds quite extreme.

Final numbers are in (release builds this time 😅):

0.6, nesting removed: ~950 req/s
0.6 with nesting: ~100 req/s (those were 3 or 4 levels)
0.5 with nesting (and routes arranged slightly differently): ~950/reqs

Note that I got very inconsistent results with outliers of ~30% in some test runs, so take the above with a very big grain of salt.

But I would say those results are consistent with this being only a nesting issue without any other regression.

davidpdrsn · 2022-12-14T10:01:40Z

@FSMaxB thanks for providing those numbers! Definitely something going on that we should look into.

romanstingler · 2022-12-14T15:11:00Z

@davidpdrsn

with ab -c 15 -n 2000 http://localhost:3001/v1/v2/v3/v4/v5/users
with the script from @pythoneer
I get more than a magnitude more calls on 0.6 than in 0.5 .
I have profiled these 2 commits (callgrind & cachegrind output attached)

0.6 = 3279d64
0.5 = 2c00e61

0.6 on the left and 0.5 on the right side

0.5 on the left and 0.6 on the right side (sorry)

cachegrind.out.zip

callgrind.out.zip

On a single call of the URL we get with 0.6

and with 0.5

alexpyattaev · 2023-02-01T19:35:11Z

A silly question here - given how nest is designed, would it not be always slower than merge? Since nested routers essentially need to be called after the "outer" router has worked out the path, would it not need to allocate a new String to hold the "tail" portion of the path?
Seems like here is the allocation call that happens, so calling a nested router would cause 2 allocations per request instead of one:

axum/axum/src/routing/mod.rs

Line 440 in 1959658

let path = req.uri().path().to_owned();

With that in mind, would it not be reasonable to have some sort of better way to handle that? Extra allocation per request is not such a great idea for performance AFAIK...
PS: apologies if I am talking nonsense here ...

davidpdrsn · 2023-02-01T20:29:48Z

@alexpyattaev yes it would. The plan is to flatten the routes like we did in 0.5. See #1711. That'll give the same performance as regular routes because that's what it'll be internally.

nicolaspernoud · 2023-04-12T15:24:13Z

Not sure that is actually fixed.
With 0.6.12 I have ~25 000 req/s with merge and ~15 000 req/s with nest.
With 0.6.15 (and only changing that) I have ~15 000 req/s with merge and ~15 000 req/s with nest.
So performance is now the same... but for the worse.

davidpdrsn · 2023-04-12T17:07:23Z

@nicolaspernoud I can't reproduce those numbers with the benchmarks we have in this repo. Can you provide a benchmark?

Edit: I've added the benchmarks I use in 377d73a

nicolaspernoud · 2023-04-12T19:27:56Z

@davidpdrsn , well my use case, as noted before, is quite complicated, atrium being an axum based reverse proxy. Still, the benchmark I use is here : https://github.com/nicolaspernoud/atrium/blob/development/scripts/benchmark/benchmark.sh . The file to alter to switch between nest/merge is here : https://github.com/nicolaspernoud/atrium/blob/development/backend/src/server.rs . When reverting to merge, the performance impact is clear just by altering Cargo.toml and pinning axum between 0.6.12/15.

Maybe the performance hit is not due to the change on nest and comes from somewhere else altogether…

davidpdrsn · 2023-04-12T21:42:29Z

That's quite a lot of setup indeed. Are you able to try and make a smaller reproduction script? I'm afraid I can't really debug that. No other changes in 0.6.15 come to mind.

FSMaxB · 2023-04-13T07:43:06Z

As for my case, there doesn't seem to be a regression in performance, or if so only a small one.

The codebase has evolved a bit since then, but let's redo the numbers:

0.6.12 no nesting: ~650 req/s
0.6.14 no nesting: ~680 req/s
0.6.15 no nesting: ~620 req/s

I also don't have the nested code anymore, so can't test with that.

Given that the values varied between 620 and 780 req/s I would say that the above is not statistically significant (without having made any statistical analysis on the data, but I've seen very large outliers in all cases and they tended to settle differently between runs.)

nicolaspernoud · 2023-04-13T07:50:10Z

@davidpdrsn I will try to set up a minimum test project when I have some time.

davidpdrsn · 2023-04-13T08:15:27Z

@FSMaxB thanks for testing! I'm a bit less worried now 😅

I also tried making a micro benchmark that doesn't use hyper (and therefore not the network) and just calls the router directly. Code for that is here https://gist.github.com/davidpdrsn/4d60c2a7a16047f4eff76b6e71fe055d

To send 10k requests I get these numbers:

axum = "=0.6.15"
no-nesting: 6.53075ms
10 levels of nesting: 50.194459ms

axum = "=0.6.12"
no-nesting: 6.445834ms
10 levels of nesting: very slow, not gonna wait for it. More than 10 seconds

axum = "=0.5.17"
no-nesting: 7.3635ms
nesting: 45.244ms

So the performance does seem comparable between 0.6.15 and 0.5, which is good. I'm curious if there is anything we can do to make nesting faster in general. It does additional processing such as stripping the matched prefix so does make sense that it'd be slower.

@davidpdrsn I will try to set up a minimum test project when I have some time.

Thanks!

pythoneer · 2023-04-13T09:00:39Z

I ran the tests (in release mode) from my comment

0.6.12 nest  4291
0.6.15 nest 28097

0.6.12 no nest 31113
0.6.15 no nest 28296

The ~10% difference between 0.6.12 and 0.6.15 in the "no nest" case looks kinda stable on my machine.

nicolaspernoud · 2023-04-15T12:25:31Z

@davidpdrsn Hello, I tried to put together a simplified version of my app here https://github.com/nicolaspernoud/axum_benchmark : it does not proxy anything but only answers hello world, the webdav handler is gone, and the complex TLS/Let's encrypt configuration as well. I let some of the custom extractors, middlewares and global state to keep it realistic. It could be simplified some more if needed, but that might lose some of the specificities that make the performance hit.

You normally just need to clone the repo and run benchmark.sh to see the results. There is some included in the reports directory. The switching between 0.6.12 and 0.6.15 is brutally made with a sed in Cargo.toml...

The nested performance between 0.6.12 and 0.6.15 is quite similar. But the merged performance is a lot better in 0.6.12.

Version	Type	Reqs/s
0.6.12	Nest	39469
0.6.15	Nest	32408
0.6.12	Merge	64265
0.6.15	Merge	37365

davidpdrsn · 2023-04-15T13:22:05Z

There's gotta be something else going on with your setup if merge is that much slower. All merge does is move the routes from one router to another, as if you had called .route() yourself. It doesn't add any middleware or any additional processing. I also cannot reproduce the difference in a micro benchmark.

Are you able to reproduce a similar difference without doing any networking and instead just calling the axum router directly, like my micro benchmark does?

axum doesn't handle any networking stuff like parsing the incoming stream or dealing with connections. Hyper does all that. So it's easier to evaluate axum's performance by bypassing all that.

nicolaspernoud · 2023-04-15T14:10:35Z

My guess is that is not a merge vs nest problem anymore but a v12 vs v15 problem on something else, that is more easy to see now that merge and nest have the same performance. It happens on an app that has some extractors, middlewares, etc. That could be why isolated or micro benchmarks do not show it. I will try (next week) to narrow the version which causes the performance gap between v12 and v15, to see when something was changed, other than the nest improvement.

davidpdrsn · 2023-04-15T14:40:45Z

The overhead of middleware or extractors should be constant regardless how you build your router.

I changed my benchmark to instead print requests per second and compare 0.5.17, 0.6.12, and 0.6.15:

0.5.17
 flat: 1759735 rps
merge: 1746821 rps
 nest:  253203 rps

0.6.12
 flat: 1969787 rps
merge: 2005635 rps
 nest:      66 rps

0.6.15
 flat: 1895037 rps
merge: 1909538 rps
 nest:  222305 rps

This shows that 0.6.15 is roughly inline with 0.5. So I do believe this issue has been fixed. I'll close this again but please do comment if you find more issues!

nicolaspernoud · 2023-04-17T08:16:17Z

I ran several more tests that you can find here : https://github.com/nicolaspernoud/axum_benchmark/tree/main/reports
There is some variability, but it is clear that performance between 0.6.12 and 0.6.13 decreased a lot (~60k Req/Sec to ~40k Req/Sec).
It is not a nest issue anymore, so that issue can effectively be closed, but should not another issue be opened ?

davidpdrsn · 2023-04-17T08:40:45Z

0.6.13 contained a println that was committed by accident. 0.6.15 doesn't have that, so don't test 0.6.13 😊

I updated my numbers to include 12, 13, 14, and 15 (and a test with a middleware):

0.6.12: flat: 2003882 rps
0.6.13: flat:  708557 rps
0.6.14: flat: 1867643 rps
0.6.15: flat: 1873534 rps

0.6.12: with middleware: 1473827 rps
0.6.13: with middleware:  595125 rps
0.6.14: with middleware: 1376340 rps
0.6.15: with middleware: 1408086 rps

0.6.12: nest:     62 rps
0.6.13: nest: 156008 rps
0.6.14: nest: 215389 rps
0.6.15: nest: 217711 rps

0.6.12: nest with middleware:     58 rps
0.6.13: nest with middleware: 142710 rps
0.6.14: nest with middleware: 203133 rps
0.6.15: nest with middleware: 203579 rps

Doesn't seem like there is an issue to me.

nicolaspernoud · 2023-04-17T08:55:33Z

Well, that is the results I get :

Version	Reqs/s
0.6.12	~60k
0.6.13	~40k
0.6.14	~40k
0.6.15	~40k

So I am pretty positive that "something" makes it run slower. It may or may not be related to the merge/nest change, but be a side effect of something else (?).
Could you just clone my repo https://github.com/nicolaspernoud/axum_benchmark.git and run benchmark.sh to see if that happens on your side ?
If you look at benchmark.sh, you'll see the only thing it changes is the axum version. And I test 0.6.12 twice (on the beginning and the end to take into account CPU warming). Every time 0.6.13+ is significantly slower. I tried on two different computers as well.

davidpdrsn · 2023-04-17T09:00:30Z

I can't run it:

❯ ./benchmark.sh
mkdir: /Users/david.pedersen/Desktop/axum_benchmark/reports: File exists
sed: 1: "Cargo.toml": invalid command code C
Error: Changing Axum version in Cargo.toml failed !

This is after I comment out the apt install commands. I'm on a Mac.

This is why I made a script that you just cargo run, no additional dependencies needed.

davidpdrsn · 2023-04-17T09:07:00Z

@nicolaspernoud are you in the tokio discord? Might be quicker to chat there 😅 and we avoid spamming people with notifications while we figure stuff out.

nicolaspernoud · 2023-04-17T09:20:50Z

@davidpdrsn : yes, joining you here !

davidpdrsn · 2023-04-17T12:44:35Z

It turns out this was a decrease in the performance of Router::fallback. To support fallback inheritance it needs to perform some additional processing so I don't consider this a bug.

@nicolaspernoud's used Router::new().fallback(...) a lot as a way to accept all requests. That meant all requests would go through the fallback. It is faster to use Handler::into_service or axum::routing::any instead. The change can be found here.

I don't think there is anything we need to do in axum but I documented that Router::new().fallback(...) isn't the optimal way to accept all requests in #1940

davidpdrsn changed the title ~~Question : why Axum 0.6 is (a lot) slower than axum 0.5 (in my use case)~~ Router::nest has poor performance on 0.6 Dec 14, 2022

davidpdrsn added A-axum C-performance Category: Performance related issues and PRs labels Dec 14, 2022

FSMaxB mentioned this issue Jan 2, 2023

Undocumented breaking change in routing in axum 0.6 #1669

Closed

davidpdrsn mentioned this issue Jan 5, 2023

axum 0.6 version Router::new().nest() #1677

Closed

This was referenced Jan 13, 2023

Add internal macro to make tests of nest easier to write #1694

Merged

Change Router::nest to flatten the routes #1711

Merged

davidpdrsn closed this as completed in #1711 Apr 11, 2023

davidpdrsn reopened this Apr 12, 2023

davidpdrsn closed this as completed Apr 15, 2023

Router::nest has poor performance on 0.6 #1624

Router::nest has poor performance on 0.6 #1624

Comments

nicolaspernoud commented Dec 7, 2022

Bug Report (more of a questions actually)

Version

Platform

Crates

Description

Test code

davidpdrsn commented Dec 7, 2022

nicolaspernoud commented Dec 7, 2022

davidpdrsn commented Dec 7, 2022

nicolaspernoud commented Dec 7, 2022

jplatte commented Dec 7, 2022

jplatte commented Dec 7, 2022

nicolaspernoud commented Dec 7, 2022

davidpdrsn commented Dec 7, 2022

FSMaxB commented Dec 14, 2022

davidpdrsn commented Dec 14, 2022

pythoneer commented Dec 14, 2022 • edited Loading

FSMaxB commented Dec 14, 2022

davidpdrsn commented Dec 14, 2022

romanstingler commented Dec 14, 2022 • edited Loading

alexpyattaev commented Feb 1, 2023

davidpdrsn commented Feb 1, 2023

nicolaspernoud commented Apr 12, 2023

davidpdrsn commented Apr 12, 2023 • edited Loading

nicolaspernoud commented Apr 12, 2023 • edited Loading

davidpdrsn commented Apr 12, 2023

FSMaxB commented Apr 13, 2023

nicolaspernoud commented Apr 13, 2023

davidpdrsn commented Apr 13, 2023

pythoneer commented Apr 13, 2023 • edited Loading

nicolaspernoud commented Apr 15, 2023

davidpdrsn commented Apr 15, 2023 • edited Loading

nicolaspernoud commented Apr 15, 2023

davidpdrsn commented Apr 15, 2023

nicolaspernoud commented Apr 17, 2023

davidpdrsn commented Apr 17, 2023 • edited Loading

nicolaspernoud commented Apr 17, 2023

davidpdrsn commented Apr 17, 2023

davidpdrsn commented Apr 17, 2023

nicolaspernoud commented Apr 17, 2023

davidpdrsn commented Apr 17, 2023

`Router::nest` has poor performance on 0.6 #1624

`Router::nest` has poor performance on 0.6 #1624

pythoneer commented Dec 14, 2022 •

edited

Loading

romanstingler commented Dec 14, 2022 •

edited

Loading

davidpdrsn commented Apr 12, 2023 •

edited

Loading

nicolaspernoud commented Apr 12, 2023 •

edited

Loading

pythoneer commented Apr 13, 2023 •

edited

Loading

davidpdrsn commented Apr 15, 2023 •

edited

Loading

davidpdrsn commented Apr 17, 2023 •

edited

Loading