-
Notifications
You must be signed in to change notification settings - Fork 4.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reduce max staked streams count to avoid fragmentations #32771
Reduce max staked streams count to avoid fragmentations #32771
Conversation
Codecov Report
@@ Coverage Diff @@
## master #32771 +/- ##
=======================================
Coverage 82.0% 82.0%
=======================================
Files 785 785
Lines 212075 212075
=======================================
+ Hits 173945 173958 +13
+ Misses 38130 38117 -13 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It appears this solves the issue I'd been seeing. Very unlikely we'd have 11+ runs in a row w/o seeing a bad run if the issue was not mitigated.
I wonder if decreasing the maximum number of streams will decrease our maximum throughput? Do we have any benches related to that?
Sorry for the late response -- I was side tracked by looking at some other issues. I have some results using bench-tps using rpc-client which will trigger the send-transaction-service to send packets as staked nodes hence utilizing the Run configuration: lijun@lijun-dev:~/sol2/solana$ ./cargo run --release --bin solana-bench-tps -- -u http://35.233.177.221:8899 --identity /home/lijun/.config/solana/id.json --tx_count 1000 --thread-batch-sleep-ms 0 -t 20 --duration 30 -n 35.233.177.221:8001 --read-client-keys /home/lijun/gce-keypairs.yaml --use-rpc-client With change:
[2023-08-15T06:54:28.184476833Z INFO solana_bench_tps::bench] Average TPS: 11562.491
[2023-08-15T06:55:52.398316168Z INFO solana_bench_tps::bench] Average TPS: 11714.341
[2023-08-15T06:56:49.020849338Z INFO solana_bench_tps::bench] Average TPS: 10501.246
[2023-08-15T06:58:42.274724913Z INFO solana_bench_tps::bench] Average TPS: 10866.384
[2023-08-15T07:02:54.095735232Z INFO solana_bench_tps::bench] Average TPS: 11417.363 without change: rpc-client [2023-08-15T07:11:16.036991107Z INFO solana_bench_tps::bench] http://35.233.177.221:8899 | 23715.32 | 323173
[2023-08-15T07:13:42.854387818Z INFO solana_bench_tps::bench] Average TPS: 10242.592
[2023-08-15T07:14:52.595901036Z INFO solana_bench_tps::bench] Average TPS: 11942.176
[2023-08-15T07:17:19.556543721Z INFO solana_bench_tps::bench] Average TPS: 11271.985
[2023-08-15T07:18:36.675589529Z INFO solana_bench_tps::bench] Average TPS: 10152.27 |
e7298ea
to
dc4831e
Compare
Problem
We are seeing sporadic performance degradations in the bare metal local cluster bench-tps. Metrics indicate poor and uneven quic network stream performance for both staked_received_chunks on the server side for forwarded packets and num_packets on the client side. And showing spike of stream read timeout on the server for forwarded packets. This is different from regular unstaked node using thin-client with quic. The reason for that too many concurrent streams may contend with the limited receive_window bandwidth set in the connection.
Experiment with reducing the max count to 1024 -- issue still reproducible while setting to 512 shows pretty stable performance over many runs (11+ runs). Results:
https://buildkite.com/solana-labs/solana-local-cluster/builds/738#_
Summary of Changes
Reduce max staked concurrent streams.
Fixes #
#32179