-
Notifications
You must be signed in to change notification settings - Fork 43
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(perf): continuosly measure on single conn (iperf-style) #276
Conversation
Our current throughput tests open a connection, open a stream, up- or download 100MB and close the connection. 100 MB is not enough on the given path (60ms, ~5gbit/s) to exit congestion controller's slow-start. See #261 for details. Instead of downloading 100MB multiple times, each on a new connection, establish a single connection and continuously measure the throughput for a fixed duration (60s).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we do 3 iterations at 20 seconds? Slow start won't take more than 1-2s, so this should give us plenty of time to converge, and it would show situations where the congestion controllers runs into a state that it takes long to recover from (sudden cross-traffic).
perf/impl/https/v0.1/main.go
Outdated
// TODO | ||
jsonB, err := json.Marshal(Result{ | ||
TimeSeconds: time.Since(r.LastReportTime).Seconds(), | ||
UploadBytes: uint(r.lastReportRead), | ||
Type: "intermediary", | ||
}) | ||
if err != nil { | ||
log.Fatalf("failed to marshal perf result: %s", err) | ||
} | ||
fmt.Println(string(jsonB)) | ||
|
||
r.LastReportTime = time.Now() | ||
r.lastReportRead = 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only do a single call to time.Now()
, so we don't lose any bytes sent between the two calls:
// TODO | |
jsonB, err := json.Marshal(Result{ | |
TimeSeconds: time.Since(r.LastReportTime).Seconds(), | |
UploadBytes: uint(r.lastReportRead), | |
Type: "intermediary", | |
}) | |
if err != nil { | |
log.Fatalf("failed to marshal perf result: %s", err) | |
} | |
fmt.Println(string(jsonB)) | |
r.LastReportTime = time.Now() | |
r.lastReportRead = 0 | |
now := time.Now() | |
// TODO | |
jsonB, err := json.Marshal(Result{ | |
TimeSeconds: r.LastReportTime.Sub(now).Seconds(), | |
UploadBytes: uint(r.lastReportRead), | |
Type: "intermediary", | |
}) | |
if err != nil { | |
log.Fatalf("failed to marshal perf result: %s", err) | |
} | |
fmt.Println(string(jsonB)) | |
r.LastReportTime = now | |
r.lastReportRead = 0 |
Are these results available anywhere? |
Not yet. Still in a very work-in-progress state. |
Reaching 4.5 gbit/s with https and >5 gbit/s with rust-libp2p. Still testing. Though this is looking promising. |
Needed for dashboard
…rf-exit-slow-start
(Also removes -b 25g which does not have an impact on throughput.)
iperf throughput mismatch was due to Nagle's algorithm. Disabled now via https://observablehq.com/d/682dcea9fe2505c4?branch=perf-exit-slow-start#branch I still need to investigate more for the other measurements (*-libp2p and quic-go). Will try with a fixed MTU of 1500 next. //CC @marten-seemann |
Interesting! It's great to see iperf and HTTPS achieving roughly similar results (at least in the limit). This means that our setup is getting more trustworthy! Looking at the graphs, why are some measurements drawn as boxes and some as points? Why do some have error bars and others don't? The spread seems pretty high, do we need more iterations? I wouldn't be surprised if quic-go maxed out somewhere around 2 Gbps. At some point, your transfer becomes CPU-limited, depending on the number of kernel offloads that your QUIC stack uses (and that's not the thing we want to benchmark here). That said, I just updated quic-go/perf to quic-go v0.38.1 (quic-go/perf#16, I'll merge the PR once GHA is not broken anymore...), which uses GSO by default. Might be worth rebasing your branch to see if this changes anything. In go-libp2p, Yamux uses a 16 MB receive window, which should limit us to roughly 2 Gbps (minus some muxer overhead). It's interesting to see that we're achieving roughly half of that. Could be a coincidence, or point to a bug in our flow control autotuning. I'd be happy to debug this using the current setup (assuming I can still run it manually as I could with the version on master), please let me know. QUIC uses a 10 MB window, which limits the bandwidth to 1.25 Gbps. That means we're not quite at the optimum, but pretty close. Would it be helpful for you if we prioritize resolving libp2p/go-libp2p#2290? Alternatively, we could also just have a go-libp2p branch that bumps that value, so we can see if that's actually the root of the problem.
Does AWS allow larger MTUs on their backbone? That would indeed give TCP an unfair advantage over at least quic-go. Have you verified that using tcpdump / Wireshark? |
Here are some interesting result from running the HTTPS test and analyzing the tcpdump. The congestion controller used for this test is Cubic. First interesting result: importing and processing an 8 GB pcap into Wireshark takes a pretty long time O(30min) ;) Here's the RTT distribution: Here's the sequence plot (ignore the wrapping of the packet number, obviously), showing the time when packet loss occurred: Obviously, we're very far from reaching a steady state. Here's some back-of-the-envelope math to calculate the recovery time (i.e. the time it takes to ramp up the congestion window to its original size after a loss), and assuming a BDP (at 5 gbps and 65ms RTT) of roughly 40 MB:
In the sequence plot above, we see packet loss happening 10x as frequently as this calculation suggests. This might be due to the more shallow buffer, but I don't know precisely how the recovery time scales with the buffer size. What does this mean for our perf setup? At the BDP that we chose for our test, we're running into limitations imposed by the congestion controllers:
|
Turns out, it does not:
|
The Box visualizes Q1 to Q3 with Q2 (median) denoted with a line within the box. The lines are the whiskers, representing Q0 (minimum) and Q4 (maximum). The dots represent outliers. https://en.wikipedia.org/wiki/Box_plot has a good explanation for each of these.
I am not sure what you refer to with "error bars" @marten-seemann.
I decreased each measurement duration to 20 seconds and increased the iterations per implementation and transport to 10. I triggered a new CI run to update our https://github.com/libp2p/test-plans/actions/runs/6250828757/job/16970551159 |
👍 Note that I merged current |
Indeed surprising. You can still run it manually. Please go ahead. Thank you @marten-seemann. |
I have updated the forked dashboard to the latest data format: https://observablehq.com/d/682dcea9fe2505c4?branch=27d07a6f47c2bc1a9c9d9a9f6626b536248284f5 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@marten-seemann @sukunrt can either of you give the perf/impl/https
and perf/impl/go-libp2p
changes a review?
perf/runner/src/versions.ts
Outdated
{ | ||
id: "v0.46", | ||
implementation: "js-libp2p", | ||
transportStacks: ["tcp"] | ||
} | ||
// { | ||
// id: "v0.46", | ||
// implementation: "js-libp2p", | ||
// transportStacks: ["tcp"] | ||
// } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressed in libp2p/js-libp2p#2067.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I reviewed the go-libp2p implementation.
Unless there are any objections, I plan to merge here once libp2p/rust-libp2p#4382 is merged. |
Now that libp2p/rust-libp2p#4382 is merged.
Once libp2p/js-libp2p#2067 is merged, we can re-introduce it.
The last commit removes js-libp2p. Once libp2p/js-libp2p#2067 is merged, we can re-introduce it here. |
Our current throughput tests open a connection, open a stream, up- or download 100MB and close the connection. 100 MB is not enough on the given path (60ms, ~5gbit/s) to exit congestion controller's slow-start. See #261 for details.
Instead of downloading 100MB multiple times, each on a new connection, establish a single connection and continuously measure the throughput for a fixed duration (20s).
Closes #261