Yet another comparison between io_uring and epoll on network performance #536

beef9999 · 2022-02-22T12:34:25Z

Backgroud: io_uring vs epoll

Nowadays there are many issues and projects focused on io_uring network performance, and the competitor is always epoll.

However most of the tests are merely demos, and lack of verification in a production scenario. So I started to integrate io_uring socket into our C++ coroutine library, and did some full evaluations on it. By the way, all the coroutines are running in a single OS thread, which fits io_uring event model quite well.

Network workloads

In my opinion, there are basically two types of workloads in the network. Althrough generated by two different clients, a typical echo server could handle both.

Ping-Pong mode client

This is most echo clients look like. The clients will be continueously sending and receving requests in a loop.

// client demo code
while (true) {
    send();
    recv();
}

Streaming mode client

Streaming clients is not rare to see. It means multiple channels will be multiplexing on a single connection, for instance, RPC and HTTP 2.0. Usually it doesn't have too many clients, but the throughput could be high.
Below is an approach to simulate streaming workloads. Send coroutine and recv coroutine are running their loops separately.

// client demo code: coroutine 1
while (true) {
    send();
}

// client demo code: coroutine 2
while (true) {
    recv();
}

This example might be a little bit extreme, but with good simplicity. In real scenario, multiple coroutines will do ping-pong send/recv in their own loops. Because the excution contexts of coroutine would keep switching, so if you observe the network on any side of the full duplex socket, you will see that the channel has been filled with packets. So this scenario is basically the same as the above example code.

Implementations

In an epoll program, the stereotype is to use non-blocking fd + epoll_wait + psync read/write
In an io_uring program, we can simply use its async APIs, as long as we already have a event engine driven by io_uring. This part is provided by the coroutine lib.

Quick conclusion

io_uring is faster than epoll if the workloads are ping-pong mode
io_uring is slower than epoll if the workloads are streaming mode

There are two ways to minish the performance gap.

Increase buffer size
Increase connections number

And an aternative to bypass the problem.

Use io_uring to poll, but not to process socket IO. Namely, non-blocking fd + io_uring poll + psync read/write

Note that this article will NOT disscuss the Ping-pong mode, because io_uring can always surpass epoll in this situation. I just want to throw out a question in terms of why io_uring is sometimes slower in the Streaming mode.

Environment

Two VMs in a cloud environment, Intel Xeon 8369B 2.70GHz, 96 cores 128GB, 40Gb network bandwidth.
CentOS 8, Kernel 6.0.7-1.el8, IORING_FEAT_FAST_POLL is enabled by default.

Test 1, Echo server performance (streaming client, single connection)

Note that I only setup one client, and there is only one connection within it.

The QPS is shown in the terminal. The throughput is observed by iftop.

server type	buf size	client num	server qps	server throughput
epoll	64	1	1565K	780Mb/s
io_uring	64	1	506K	260Mb/s
epoll	512	1	1250K	4.79Gb/s
io_uring	512	1	447K	1.70Gb/s
epoll	4096	1	669K	20.4Gb/s
io_uring	4096	1	343K	10.3Gb/s
epoll	16384	1	224K	27.3Gb/s
io_uring	16384	1	183K	22.5Gb/s

Conclusions:

io_uring is slower than epoll in the streaming mode
When buf size increases, the performance gap is drawing near

Test 2, Echo server performance (streaming client, multiple connections)

Note that I will setup multiple client processes this time. One connection per client, as before.

~~outdated data~~

Conclusions

Increasing the connections number is helpful to io_uring

Test 3, io_uring IO vs psync IO (with memory backend, and IO depth = 1)

In this test, I just want to verify an idea that when IO backend is in memory, psync stack is more efficient that io_uring stack.

Not providing source code here, but you can create a normal file under /dev/shm/ (tmpfs) and use io_uring to write it (with 1 concurrency). Don't do reads because I'm not sure if page cache would affect performance. Eventually you will find psync is 3~4 times faster than io_uring.

The result is easy to understand. When your data is all in memory, psync IO stack is almost like doing memcpy. And with only 1 concurrency (IO depth = 1), you will centainly find that the io_uring's async event system not leveraging its full power.

Network buffer is similar to this situation, and for a specific fd/connection, the IO depth is always 1. So perhaps when there is still free network buffer to write to, or there is still data to read from, we should consider using psync stack.

Final Conclusions

Socket is not like the file IO. Reading/writing a network fd/connection is sequential (IO depth = 1).
When memory is the backend, e.g., network buffer or tmpfs, psync stack is more efficient than io_uring.
io_uring is an async event system. It performs better when having more fds, and larger buffer

How to solve this problem?

From a user's perspective, my idea to solve this performance issue of io_uring is like below:

int fd = socket();
set_non_blocking(fd);
...
while (not_read_enough_size()) {
    int ret = read(fd, buf, size);
    if (ret < 0 && errno == EAGAIN) {
        new_io_uring_read(fd);
    }
}

The new_io_uring_read means that the kernel will still execute a FAST_POLL for this non-blocking fd, and return cqe after the next read finished.

Because for most of the time, the network buffer will be able to read, so this would leverage psync efficiency while utilize iouring FAST_POLL read at the same time.

But unfortunately there is no such a kernel to provide this behavior by far. I'll ask some kernel guys for help and re-test it later.

Appendix

Architecture of coroutine based net server

Both of io_uring server and epoll server have a frontend and backend. The frontend is responsible for submiting async IO (and start polling), and falls into sleep in current coroutine. The backend will be running an event engine, and awake the sleeping coroutine when IO finished.

io_uring server

// io_uring frontend
io_uring_get_sqe();
io_uring_prep_send(); // or io_uring_prep_recv()
io_uring_submit();
coroutine_sleep();

// io_uring backend
while (should_wait_for_event()) {
    io_uring_wait_cqes();
    unsigned i = 0;
    io_uring_for_each_cqe(m_ring, head, cqe) {
        ++i;
        coroutine_interrupt();  // wake sleeping coroutine
    }
    io_uring_cq_advance(m_ring, i);
}

epoll server

// epoll frontend 
int fd = socket();
set_non_blocking(fd);
while (not_sent_enough_size()) {
    int ret = write(fd, buf, size);
    if (ret < 0 && errno == EAGAIN)
        wait_fd_writable();    // coroutine sleep here
}

// epoll backend
while (should_wait_for_event()) {
    epoll_wait();                  // fd is writable
    coroutine_interrupt();   // wake sleeping coroutine
}

How to reproduce

Test code

The full test code is here. You are welcome to run it in your own environment.

Build

# centos
dnf install gcc-c++ epel-release cmake
dnf install openssl-devel libcurl-devel libaio-devel
dnf config-manager --set-enabled powertools
dnf install gtest-devel gmock-devel gflags-devel fuse-devel libgsasl-devel

# ubuntu
apt install cmake
apt install libssl-dev libcurl4-openssl-dev libaio-dev
apt install libgtest-dev libgmock-dev libgflags-dev libfuse-dev libgsasl7-dev

git clone https://github.com/alibaba/PhotonLibOS.git
cd PhotonLibOS
git fetch && git pull origin main     # some test code has been updated in Nov. 7.
cmake -D BUILD_TESTING=1 -D ENABLE_SASL=1 -D ENABLE_FUSE=1 -D ENABLE_URING=1 -D CMAKE_BUILD_TYPE=Release -B build
cmake --build build -t net-perf -j

Run epoll server

./build/output/net-perf -buf_size 512 -port 9527

Run epoll client

./build/output/net-perf -client -buf_size 512 -client_mode streaming -ip <server_ip> -port 9527

Run io_uring server

You will need to modify some code to switch to io_uring server.

Change photon::INIT_EVENT_EPOLL to photon::INIT_EVENT_IOURING

https://github.com/alibaba/PhotonLibOS/blob/f858f0a8d7e507c4d3667f0cc7da023600f46e8f/examples/perf/net-perf.cpp#L245

Comment out the new_tcp_socket_server , use the new_iouring_tcp_server in the next line.

https://github.com/alibaba/PhotonLibOS/blob/f858f0a8d7e507c4d3667f0cc7da023600f46e8f/examples/perf/net-perf.cpp#L177-L178

Run io_uring client

You will need to modify some code to switch to io_uring client. Of course, you may still use epoll client to test againt io_uring server, in order to reduce variables.

Change photon::INIT_EVENT_EPOLL to photon::INIT_EVENT_IOURING

https://github.com/alibaba/PhotonLibOS/blob/f858f0a8d7e507c4d3667f0cc7da023600f46e8f/examples/perf/net-perf.cpp#L245

Change new_tcp_socket_client to new_iouring_tcp_client

https://github.com/alibaba/PhotonLibOS/blob/e07ce42648864528f0724b6c339d17317a4003c9/examples/perf/net-perf.cpp#L119

How to setup multiple clients

I just wrote a batch script to make them running in the background.

for i in `seq 1 100`; do
    sleep 0.01
    ./build/output/net-perf -client -buf_size 512 -client_mode streaming -ip <server_ip> -port 9527  > /dev/null &
done

The text was updated successfully, but these errors were encountered:

v3ss0n · 2022-10-16T10:23:37Z

Great work, this test need proper discussion, I think you should post on HN.

beef9999 · 2022-10-17T10:45:36Z

Great work, this test need proper discussion, I think you should post on HN.

What is HN?

ammarfaizi2 · 2022-10-17T10:50:16Z

Great work, this test need proper discussion, I think you should post on HN.

What is HN?

https://news.ycombinator.com/

GavinRay97 · 2022-10-27T14:44:24Z

Did this ever get posted there? I also agree someone should post it (ideally @beef9999 if they want)

I can also post it, but I don't want that to come off as "Sure, I'll take all those upvotes for your hard work." since it's like two seconds to submit a post.

beef9999 · 2022-10-27T14:53:31Z

@GavinRay97 I don’t have an account of that forum. It’s OK if you post it for me. But please wait until this weekend so I can make some modifications on the performance data, and upload the full test code as well.

v3ss0n · 2022-10-27T15:37:17Z

i could post for you but since i don't want to take credit you should @beef9999 . Its easy to register there (user and pass only
no email needed ) and it is arguably best community of tech people. that forum is backed by world top startup accelerator called YCombinator where a lot of tech people from google , FAANG , and Unicorn startups and big companies are there. io_uring is big interest there too.

axboe · 2022-10-27T15:55:04Z

Not sure why it's so interesting to post on HN, honestly most of the commentary there is vitriol and not very useful. What are we trying to accomplish?

For the performance side, try and set IORING_SETUP_DEFER_TASKRUN when the ring is created. That has shown nice results for this kind of workload recently.

axboe · 2022-10-27T16:01:35Z

Here's one from this week: https://lore.kernel.org/io-uring/[email protected]/

GavinRay97 · 2022-10-27T16:11:00Z

Not sure why it's so interesting to post on HN, honestly most of the commentary there is vitriol and not very useful. What are we trying to accomplish?

(Personally) I like to share/evangelize stuff by people I think is interesting and deserves attention, or that other people might find interesting.

They seem to be pretty keen on performance stuff and io_uring in general, though there's a (rightfully so) certain rigor expected if you're going to post benchmarks.

Even if a particular topic doesn't trend well or some people post negative comments, it's nice for the folks browsing that are interested in that thing that otherwise wouldn't have known about it IMO.

Sometimes I find posts where I have a highly positive opinion of the thing/think it's neat and nobody else does. Oh well, their loss.

That's my $0.02 at least

axboe · 2022-10-27T16:14:44Z

I'm just not a fan, most of the commentary (to me) are from folks looking to look smart and not knowing a lot of the details. In many ways, not that different from reddit. Not useful imho, from the cases I've seen. Arguably, I haven't spent a lot of time on the site, this is just my experience from the couple of times when I have.

v3ss0n · 2022-10-27T17:11:53Z

Not sure why it's so interesting to post on HN, honestly most of the commentary there is vitriol and not very useful. What are we trying to accomplish?

(Personally) I like to share/evangelize stuff by people I think is interesting and deserves attention, or that other people might find interesting.

They seem to be pretty keen on performance stuff and io_uring in general, though there's a (rightfully so) certain rigor expected if you're going to post benchmarks.

Even if a particular topic doesn't trend well or some people post negative comments, it's nice for the folks browsing that are interested in that thing that otherwise wouldn't have known about it IMO.

Sometimes I find posts where I have a highly positive opinion of the thing/think it's neat and nobody else does. Oh well, their loss.

That's my $0.02 at least

Yeah , same reason , and that community that is quite interested in io_uring , sharing and discussiong @axboe 's tweets like everyweek , and that how i found out about io_uring too.
Reddit used to be good and intellect now it is quite the opposite, no real discussion going there.

beef9999 · 2022-11-05T18:02:47Z

@GavinRay97 I have simplified the tests and rephrased some explanations. Please help post it if you are convenient.

GavinRay97 · 2022-11-06T13:20:19Z

@beef9999 I have posted it at A performance review of io_uring vs. epoll for standard/streamed socket traffic 👍

Hopefully some people find it interesting

ghost · 2022-11-06T16:39:54Z

This is interesting. Thank you for this.
I wrote an epoll echo server which multiplexes multiple clients over each thread. The idea is that each core can scale the number of clients it serves.
I want to add io_uring maybe I can learn it from this repository

I wonder what how is the performance when multiple cores are run.

Its kind of similar to libuv. I use IO threads to handle IO. It's incomplete though but a proof of idea.

https://github.com/samsquire/epoll-server

It is based on a multiconsumer multiproducer RingBuffer by Alexander Krizhanovsky.

https://www.linuxjournal.com/content/lock-free-multi-produce...

I also wrote a userspace 1:M:N lightweight thread scheduler which should be integrated with the epoll server. This is an alternative to coroutines. I multiplex multiple lightweight threads on a kernel thread and switch between them fast. The scheduler thread preempts hot for and while loops by setting the looping variable to the limit. This allows preemption to occur when the code finished the current iteration. This is why I call it userspace preemption.

https://github.com/samsquire/preemptible-thread

One idea I have for even higher performance is to split sending and receiving to their own threads and multiplex sending and receiving across threads. This means you can scale sending and receiving.

axboe · 2022-11-06T17:42:49Z

Tried to compile this as I'm pretty convinced something is amiss with the single thread performance, but it fails for me:

Consolidate compiler generated dependencies of target photon_obj
[  1%] Building CXX object CMakeFiles/photon_obj.dir/io/signal.cpp.o
/home/axboe/git/PhotonLibOS/io/signal.cpp:259:9: error: use of undeclared identifier 'pthread_atfork'
        pthread_atfork(nullptr, nullptr, &fork_hook_child);
        ^
1 error generated.
make[2]: *** [CMakeFiles/photon_obj.dir/build.make:440: CMakeFiles/photon_obj.dir/io/signal.cpp.o] Error 1
make[1]: *** [CMakeFiles/Makefile2:104: CMakeFiles/photon_obj.dir/all] Error 2
make: *** [Makefile:111: all] Error 2

I'm on debian testing. Outside of that, I failed to find examples of how to run it? Maybe I'm just blind, but hints would be appreciated.

axboe · 2022-11-06T17:49:54Z

OK got it going, and the examples built. signal.ccp is missing a pthread.h include.

victorstewart · 2022-11-06T18:00:37Z

juxtaposing the performance variance between epoll and io_uring for 512 + 1 client in test 1... vs equivalent performance in test 2 with that usleep... my intuition is all the test 2 data are poisoned.

axboe · 2022-11-06T18:10:51Z

juxtaposing the performance variance between epoll and io_uring for 512 + 1 client in test 1... vs equivalent performance in test 2 with that usleep... my intuition is all the test 2 data are poisoned.

I agree, it all looks very odd to me.

axboe · 2022-11-06T18:12:31Z

Got it built and running, but there are no docs on how to run with the various backends on either the client or server side. The interval thing doesn't seem to work either, it always keeps running without dumping stats until the client is killed/interrupted.

Will be happy to take a look at the perf differences, but I don't want to spend ages figuring out how to run this thing. Please provide examples, I can't find any.

GavinRay97 · 2022-11-06T22:30:18Z

The interval thing doesn't seem to work either, it always keeps running without dumping stats until the client is killed/interrupted.

It appears to be just a NGINX-like static server, which defaults to an epoll backend:

[user@MSI PhotonLibOS]$ ./build/output/server_perf
2022/11/07 05:54:32|INFO |th=0000000000B76050|/home/user/projects/PhotonLibOS/io/epoll.cpp:289|new_epoll_engine:Init event engine: epoll
2022/11/07 05:54:33|INFO |th=00007FBC8FFCEB00|/home/user/projects/PhotonLibOS/net/http/test/server_perf.cpp:44|show_qps_loop:qps: 0
2022/11/07 05:54:34|INFO |th=00007FBC8FFCEB00|/home/user/projects/PhotonLibOS/net/http/test/server_perf.cpp:44|show_qps_loop:qps: 0
2022/11/07 05:54:35|INFO |th=00007FBC8FFCEB00|/home/user/projects/PhotonLibOS/net/http/test/server_perf.cpp:44|show_qps_loop:qps: 0

I think you're meant to use something like k6/wrk2 to send an HTTP load test to the URL it's running at, which seems to be http://localhost:19876 by default. I thought it would have generated some load/throughput by itself.

It seems you are meant to run the client-perf.cpp binary alongside the server-perf.cpp one, and it will generate the HTTP requests.

I see in the docs that you can switch the epoll engine out for io_uring, but I don't seem to be able to do that.
What I've done was:

Compile with -DENABLE_URING
Added the following modifications to server-perf.cpp's main():
- Full code here: https://gist.github.com/GavinRay97/5c9815c47d30c52f3685ac4ecdefa30f

  // I think this tries to initialize some global event engine?
  int ret = photon::init(photon::INIT_EVENT_IOURING, photon::INIT_IO_LIBAIO);
  if (ret != 0) {
    LOG_ERRNO_RETURN(0, -1, "photon init failed");
  }

  // Replaced this with io_uring specific method
  auto tcpserv = net::new_iouring_tcp_server();

  // Specified `io_uring` engine for FS
  auto fs = fs::new_localfs_adaptor(".", photon::fs::ioengine_iouring);

This still logs as using the epoll engine though 🙁

I also had to modify a few things to get it to build:

Like Jens mentioned, there was an #include <pthreads> needed in one of the headers
The CMake variable for including GTest/GMock headers was incorrectly named, it was singular instead of plural
- include_directories(${GTEST_INCLUDE_DIR} ${GMOCK_INCLUDE_DIR} ${GFLAGS_INCLUDE_DIR})
- These needed to be DIRS instead of DIR for me
- Fix compilation errors alibaba/PhotonLibOS#91

beef9999 · 2022-11-07T00:08:57Z

Yes, C++ programs are very sensitive to the environment, platform specific… We only tested the compiling on CentOS and Ubuntu before, didn’t have the pthread header problem.

I’ll add some instructions about how to run the program with appropriate parameters.

beef9999 · 2022-11-07T00:47:48Z

Hi, Everyone. I have updated this issue and added the how to reproduce instructions.

About test 2, I deleted this line

~~In order to ease the server's pressure (for it only enabled one core), I added a 10 μs sleep in the client's send/recv loop.~~

It's not a MUST DO. I just added in my own code.

beef9999 · 2022-11-07T00:54:27Z

juxtaposing the performance variance between epoll and io_uring for 512 + 1 client in test 1... vs equivalent performance in test 2 with that usleep... my intuition is all the test 2 data are poisoned.

That's because stress is high in streaming mode, only one client could almost occupied the server CPU (one core). So I figured out this method to reduce server stress.

GavinRay97 · 2022-11-07T14:12:41Z

Yes, C++ programs are very sensitive to the environment, platform specific… We only tested the compiling on CentOS and Ubuntu before, didn’t have the pthread header problem.

If it's any help, I am running on Fedora 37, compiling with Clang 15, and GCC 12 toolchain (/usr/include/c++/12/)

beef9999 · 2022-11-07T14:45:09Z

@GavinRay97 Are you able to reproduce my data for test 1 ?

axboe · 2022-11-07T17:46:24Z

With the actual instructions, I gave it a test spin. From a quick look, you're doing a lot more on the io_uring side than you are on the epoll side. I made the following 2 minute tweaks:

Don't arm a timer, use the appropriate io_uring_submit_and_wait_timeout()
Register the ring fd
Update liburing to something that isn't 1+ years old

and got a 50% increase from that alone. I'm sure there's a lot more that could be done, but I'm pretty skeptical that this is a apples-to-apples epoll vs io_uring test case as it is. Other notes:

Use fixed files?
Why read on a socket? recv would be more efficient, at least on the io_uring side
How are buffers managed? Is it the same on epoll vs io_uring?
What are the linked timeouts doing?

axboe · 2022-11-07T18:11:36Z

Another note - lots of receives will have cflags == 0x04 == IORING_CQE_F_SOCK_NONEMPTY, meaning that the socket still had more data after this receive? Is this really a ping-pong test, or is it just blasting data in both directions?

axboe · 2022-11-07T18:36:31Z

We're also spending a ton of time in __vdso_gettimeofday() when run with io_uring, and I see nothing if using epoll. This is about ~10% of the time spent! It's coming off resume_threads().

I'm not going to spend more time on this, there are vast differences between what is being run here and I think some debugging and checking+optimizing of the io_uring side would go a long way toward improving the single thread / single connection disparity.

beef9999 · 2022-11-08T03:52:36Z

@axboe Thanks for your time. Try to answer some of your questions:

Why read on a socket?
Because the Linux manual says read is identical to recv in terms of socket. Didn't know io_uring has this specialty.
How are buffers managed? Is it the same on epoll vs io_uring?
They are the same. Both allocated on stack. Didn't register to io_uring, or epoll eighter.
What are the linked timeouts doing?
To replace io_uring_submit_and_wait_timeout, because this bug (Why does io_uring_wait_cqe_timeout always have a minimum overhead of 2 milliseconds ? #531) I reported before was only merged in to the latest kernel. Didn't have chance to upgrade my kernel yet.

io_uring_submit_and_wait_timeout was invoked in the coroutine scheduling, now I wrote these code to replace it.

__kernel_timespec ts = get_timeout_ts();
io_uring_prep_timeout(sqe, &ts, 1, 0);
io_uring_submit_and_wait(ring, 1);

Why there is performance disparity between these two approaches?

IORING_CQE_F_SOCK_NONEMPTY, meaning that the socket still had more data after this receive?
Yes, this test is all about streaming mode client. The socket had been filled with continuously coming data.

I'd like to say something about why this test ever exists. Because unlike the traditional usage, if you need to pipeline the socket IO ( or technically said, concurrent read/write ), then an event engine is a necessary technology. You can hardly find a mature async event engine driven by io_uring in the open source world nowadays, except ours. And I think that's why people didn't meet the streaming client performance issue before.

I believe our old epoll event engine has been optimized quite well, otherwise it wouldn't be able to surpass other IO engine in performance. According to our tests, in streaming mode, boost::asio can only got 50% throughput of ours. What I mean is the upper limit is high.

Another interesting thing to mention is that if I use nonblocking fd + io_uring poll + psync read/write, the performance would still be rising to epoll as well. That means my io_uring event engine is proven to be capable.

Anyway, I'll keep on optimizing the io_uring code based on your notes. Thank you.

Updated on Nov. 8, I upgraded my kernel to 6.0.7.

I can confirm that the 50% performance increase (current QPS is 660K) came from the io_uring_submit_and_wait_timeout. The timer is slow, indeed. But there is still a huge gap from 660K to epoll's 1200K. I don't think any trivial optimization would cover this gap.
Register ring fd didn't bring any benefits. This is a single thread program.

GavinRay97 mentioned this issue Nov 15, 2022

Add IORING_SETUP_DEFER_TASKRUN flag to uring_type.hpp if Kernel >= 6.1 Codesire-Deng/co_context#40

Closed

beef9999 mentioned this issue Nov 16, 2022

Handling non-blocking file descriptors consistently. #364

Closed

dveeden mentioned this issue Jan 2, 2024

Use epoll pingcap/tiproxy#248

Closed

atakavci mentioned this issue Mar 18, 2024

Use epoll over io_uring channel for Unix Domain Sockets redis/lettuce#2791

Closed

4 tasks

wegul mentioned this issue May 16, 2024

io_uring performance DylanZA/netbench#1

Closed

axboe closed this as completed Oct 1, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Yet another comparison between io_uring and epoll on network performance #536

Yet another comparison between io_uring and epoll on network performance #536

beef9999 commented Feb 22, 2022 •

edited

Loading

v3ss0n commented Oct 16, 2022

beef9999 commented Oct 17, 2022

ammarfaizi2 commented Oct 17, 2022

GavinRay97 commented Oct 27, 2022 •

edited

Loading

beef9999 commented Oct 27, 2022

v3ss0n commented Oct 27, 2022 •

edited

Loading

axboe commented Oct 27, 2022

axboe commented Oct 27, 2022

GavinRay97 commented Oct 27, 2022 •

edited

Loading

axboe commented Oct 27, 2022

v3ss0n commented Oct 27, 2022 •

edited

Loading

beef9999 commented Nov 5, 2022

GavinRay97 commented Nov 6, 2022 •

edited

Loading

ghost commented Nov 6, 2022

axboe commented Nov 6, 2022

axboe commented Nov 6, 2022

victorstewart commented Nov 6, 2022

axboe commented Nov 6, 2022

axboe commented Nov 6, 2022

GavinRay97 commented Nov 6, 2022 •

edited

Loading

beef9999 commented Nov 7, 2022

beef9999 commented Nov 7, 2022 •

edited

Loading

beef9999 commented Nov 7, 2022 •

edited

Loading

GavinRay97 commented Nov 7, 2022 •

edited

Loading

beef9999 commented Nov 7, 2022 •

edited

Loading

axboe commented Nov 7, 2022

axboe commented Nov 7, 2022

axboe commented Nov 7, 2022

beef9999 commented Nov 8, 2022 •

edited

Loading

Yet another comparison between io_uring and epoll on network performance #536

Yet another comparison between io_uring and epoll on network performance #536

Comments

beef9999 commented Feb 22, 2022 • edited Loading

Backgroud: io_uring vs epoll

Network workloads

Implementations

Quick conclusion

Environment

Test 1, Echo server performance (streaming client, single connection)

Test 2, Echo server performance (streaming client, multiple connections)

Test 3, io_uring IO vs psync IO (with memory backend, and IO depth = 1)

Final Conclusions

How to solve this problem?

Appendix

Architecture of coroutine based net server

io_uring server

epoll server

How to reproduce

Test code

Build

Run epoll server

Run epoll client

Run io_uring server

Run io_uring client

How to setup multiple clients

v3ss0n commented Oct 16, 2022

beef9999 commented Oct 17, 2022

ammarfaizi2 commented Oct 17, 2022

GavinRay97 commented Oct 27, 2022 • edited Loading

beef9999 commented Oct 27, 2022

v3ss0n commented Oct 27, 2022 • edited Loading

axboe commented Oct 27, 2022

axboe commented Oct 27, 2022

GavinRay97 commented Oct 27, 2022 • edited Loading

axboe commented Oct 27, 2022

v3ss0n commented Oct 27, 2022 • edited Loading

beef9999 commented Nov 5, 2022

GavinRay97 commented Nov 6, 2022 • edited Loading

ghost commented Nov 6, 2022

axboe commented Nov 6, 2022

axboe commented Nov 6, 2022

victorstewart commented Nov 6, 2022

axboe commented Nov 6, 2022

axboe commented Nov 6, 2022

GavinRay97 commented Nov 6, 2022 • edited Loading

beef9999 commented Nov 7, 2022

beef9999 commented Nov 7, 2022 • edited Loading

beef9999 commented Nov 7, 2022 • edited Loading

GavinRay97 commented Nov 7, 2022 • edited Loading

beef9999 commented Nov 7, 2022 • edited Loading

axboe commented Nov 7, 2022

axboe commented Nov 7, 2022

axboe commented Nov 7, 2022

beef9999 commented Nov 8, 2022 • edited Loading

beef9999 commented Feb 22, 2022 •

edited

Loading

GavinRay97 commented Oct 27, 2022 •

edited

Loading

v3ss0n commented Oct 27, 2022 •

edited

Loading

GavinRay97 commented Oct 27, 2022 •

edited

Loading

v3ss0n commented Oct 27, 2022 •

edited

Loading

GavinRay97 commented Nov 6, 2022 •

edited

Loading

GavinRay97 commented Nov 6, 2022 •

edited

Loading

beef9999 commented Nov 7, 2022 •

edited

Loading

beef9999 commented Nov 7, 2022 •

edited

Loading

GavinRay97 commented Nov 7, 2022 •

edited

Loading

beef9999 commented Nov 7, 2022 •

edited

Loading

beef9999 commented Nov 8, 2022 •

edited

Loading