-
Let's assume I'm using io_uring on a SOCK_STREAM (TCP) [or SOCK_SEQPACKET (SCTP)] socket, and I'm submitting multiple write/sendmsg/sendto operations for the same socket/fd on the same submission queue. Is the order of writes to the socket guaranteed to execute in the same order of submitting the writes? In the documentation I read I could not find an answer to this. It just states the order of the completions is not guaranteed. I'm don't care about the order of completions. I'm worried that if I submit operations to write A, B and C to the same socket, the order in the TCP stream as sent on the wire will actually be A, B and C - and not C, A, B or any other order. I'm aware there is a flag to link SQEs. However, this would only be practical if the write operations A, B and C for any given fd are submitted exactly in-order. In reality, the application will submit writes on potentially thousands of sockets in any random order. There are no ordering constraints between all those write - only between all writes on one given socket/fd. Thanks in advance for any assistance. I think this should be a question answered in any io_uring networking guide, but at least I was unable to find it. |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments
-
In general, you should not have multiple send/sendmsg on the same socket. Most of the time this will work fine, as this is what will happen:
Everything is fine and dandy in this case. However, you could also potentially see:
5a and 5b can happen at the same time, hence both sendA or sendB can go out at that point. But ordering has now been ruined. Obviously this is a rare occurrence, but it can happen, hence the suggestion never to have more than one send against the same socket inflight at the same time. For 6.10, I've been working on provided buffer support for send. Here's the current commit: basically this would mean that you could just use provided buffers with send, and then it doesn't really matter if sendA or sendB wins, as each of them will pick the next buffer to send. Even with that, it'd be wasteful rather than just having a single send inflight. On top of the above, there's also send bundles: where you can simply issue a send and say "drain my outgoing queue". This is more efficient, as you only need a single send, and you only need a single notification CQE posted as well. In terms of what's in the code base now, one thing I did in proxy [1] based on Pavel's suggestion is simply to have two iovecs and use sendmsg. One iovec is always being prepared, and you just append a vec entry to it where you would otherwise have submitted a send. When it's full or you are submitting anyway, you flip to the other vec and this is where new elements get appended to. The next sendmsg submitted will use the current vec, etc. This isn't quite as efficient as send bundles, but it'll work with any kernel. And it is more efficient than doing what you describe above, where you have multiple send SQEs and get multiple completions from it. [1] https://git.kernel.dk/cgit/liburing/tree/examples/proxy.c |
Beta Was this translation helpful? Give feedback.
-
Hi Jens, thansk for taking time for your very exhaustive/elaborate answer. I think it would be great to add this to the documentation somehwere. At least I was not able to spot something like this explanation anywhere. In our current implementation (https://gitea.osmocom.org/osmocom/libosmocore/src/branch/master/src/core/osmo_io_uring.c in case anyone should be curious) we actually do have an userspace-side per-fd transmit queue, and hence always only submit one write/sendto/sendmsg per socket to io_uring, so we're safe. Based on your explanation this is the desired scenario, and any ideas about submitting multiple writes at the same time would be both inefficient and prone to ordering errors. |
Beta Was this translation helpful? Give feedback.
In general, you should not have multiple send/sendmsg on the same socket. Most of the time this will work fine, as this is what will happen:
Everything is fine and dandy in this case. However, you could also potentially see:
5a) sendB is issued, since sendA socket space has freed up, hence sendB can complete
5b) sendA retry is triggered via poll callback.
5a and 5b can happen at the same time, h…