-
Notifications
You must be signed in to change notification settings - Fork 341
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Performance Regression with async-std > 1.6.2 #892
Comments
Quick question, what kind of CPU configuration are you running this on? on SMP or Ryzen systems 1.6.5 suffers from cach invaludation when moving tasks between different core clusters. |
@Licenser thanks very much for the prompt response. The flame-graphs were made on a intel skylake running the latest Ubuntu Linux. Let me know if you need other info or want us to run some other tests. |
One change that comes to mind is that we're no longer inlining I'm not sure what the right solution here is, but perhaps switching to |
@yoshuawuyts that could be the issue as we are heavily using TcpStream. BTW, it would also seem that there is some higher overhead on the ConcurrentQueue::pop. In any case the flame graph reveal that the TCPStream sending side performance have indeed degraded. |
Hello everyone, any updates on this issue? We would be happy to help testing systematically async-std performance before each release. We can do that running zenoh on our 10Gbps testbed. |
Looking at the flamegraph, it seems that with v1.6.2 everything runs inside the main thread, while at v1.6.5 half of the program is in the main thread and the other half is on executor threads. Am I reading that right? It would be worth trying to see what happens if the benchmark is run with ASYNC_STD_THREAD_COUNT=1 and if the body of the main function is wrapped in a #[async_std::main]
fn main() {
async_std::task::spawn(async {
// code goes here...
})
.await
} |
Hello @stjepang, this is what we thought at first, then by looking carefully we spotted that the other thread is doing very little work. What seems to us, is that what used to represent a marginal overhead in 1.6.2 has grown to show as a wider portion of the flame graph. In any case we'll try running with a single thread and will let you know what that gives. Thanks for the suggestion! |
Hello Everyone,
First of all thanks very much for the great work on async-std. We are making heavy use of this framework in zenoh and have remarked a major performance drop when upgrading from 1.6.2. Whey I say major I mean that our throughput for in some cases is divided by two.
We have identified that the performance issue is introduced on the publishing side and to highlight the huge difference in the cpu time taken by async 1.6.5 vs that taken by 1.6.2 we have made some flames graphs collecting perf data while running our throughput performance test.
The exact command used to collect perf data is included below and the code was compiled in release mode:
The resulting flame graphs are available here for 1.6.2 and here for 1.6.5.
zenoh GitHub depository is https://github.com/eclipse-zenoh/zenoh/tree/rust-master
As you will see from the flame graphs the <core::future::from_generator::GenFuture as core::future::future::Future>::poll takes very little time on 1.6.2 and almost 50% of the time on 1.6.5.
I know that there have been changes in the scheduler, maybe we need to change something on our side. In any case any insight will be extremely welcome.
Thanks very much in advance!
Keep the Good Hacking!
The text was updated successfully, but these errors were encountered: