-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
'skopeo copy' gets stuck uploading images to a Docker registry on Fedora >= 40 hosts #22575
Comments
Thanks for your report. This works for me fine with
If this is reproducible, can you run Skopeo with |
This is what I got:
|
That looks very confusing. 17 different backtraces for “goroutine 0”, several different kinds of output interleaves. Is that a single copy?
is clearly from 2 different processes. |
I don't think I fumbled the paste by pressing |
This part about only older images getting stuck isn't true. I can now reproduce a However, the part about only Fedora 40 and above hosts showing this problem is likely still true because of the observed behaviour of the Toolbx CI. |
No, I think it’s the process structure. Sending a control character sends the signal to every process in the process group, and all of them output the traceback at once. Could you try |
Okay! I did a
|
AFAICT, the layer has been uploaded to the server; all of that blob has been consumed, the request was sent, and now it is waiting for the server to respond. Now, if the destination is With
|
Note that I am using a Fedora 40 Workstation virtual machine that I created just for this, and typed in the individual commands after lifting them from Toolbx's CI. I don't think I used |
Good question. For some reason, I assumed that all current Fedoras have the same Podman version, but that's not the case. Fedora 40 onwards has 5.0.x and the older Fedoras have 4.9.4. I will try to downgrade the Podman stack and see if that helps. |
Assuming you podman as rootless pasta will forward the ports and has to handle many parts of the tcp stack so it can easily be broken in some way which causes this. The easy way to test if it is pasta's fault is to run podman with |
Also if slirp4netns is working try to get the latest pasta (passt rpm) and see if it changes anything, if it still doies not work feel free to report this against podman I can ping the pasta maintainers there. |
In the meantime, here are the logs from the These are produced at start-up:
These are produced by a stuck attempt to copy a
It's interesting that it refers to |
The “blob info cache” maintains local records about which layers exist on which registries, so |
Thanks for that hint. Using |
I think I already have the latest Pasta build, and it reproduces the problem:
|
I can't move issues across repositories in the Containers organization. So, will you take care of moving the issue? |
Did you try to push with podman push, does this hang as well? Is the https setup required or does it also hang for unencrypted traffic? Juts looking to make the reproducer a bit simpler. In any case you should run podman with something like |
No, I didn't. I will try it now.
I don't think HTTPS is necessary, but I haven't actually tried plain text traffic.
This part of the test suite wasn't written by me. So I am also trying to remove the unnecessary bits as I go along. eg., the original code created a Docker registry that requires authentication. I will try to get rid of the HTTPS part.
Okay! |
Podman 5.0 switched to using pasta(1), instead of slirp4netns(1), by default for rootless containers. This change has led to a regression causing 'skopeo copy' to get stuck uploading an OCI image to the local temporary Docker registry run by the tests as a Podman container [1], which breaks the test suite on Fedora 40 onwards. Work around this by forcing the use of slirp4netns(1). Note that the slirp4nets package needs to be explicitly installed on Fedora 40 onwards, because the dependency in containers-common-extra changed from Recommends to Suggests [2]. [1] containers/podman#22575 [2] Fedora containers-common commit 17934d87b2686ab5 Fedora containers-common commit 13c232f064113860 https://src.fedoraproject.org/rpms/containers-common/c/17934d87b2686ab5 https://src.fedoraproject.org/rpms/containers-common/c/13c232f064113860 containers#1468
Could this have something to do with the I'm not exactly sure how skopeo uses pasta here, and how local ports are remapped (is it something going into pasta's configuration as well?). |
I don't think so Skopeo doesn't use pasta, the setup contains a container registry run inside podman with a forwarded tcp port into the container. The container registry is really just a normal HTTP(S) server with a special API. skopeo just connects to it via the forwarded port and uploads the image data which seems to hang at the end of the upload. |
Also, the TCP connection worked well enough to set up HTTPS, i.e. the packet flow was bidirectional and working, at least for a while. That should rule out at least the long-term-persistent aspects of IP addressing. |
I wonder if that's related to the use of plain text HTTP instead of HTTPS. The first HTTP request I have seen is:
... which prompts Go to throw a
I wish there was an easy way to trigger more detailed network logs from Skopeo instead of me doing things like hacking in custom http.RounderTripper implementations into the code. :) |
Thanks for explaining all that!
Okay, sure. In the meantime, here's my latest instrumented build of Skopeo that shows the HTTP request and response headers, in case anybody finds it useful: https://koji.fedoraproject.org/koji/taskinfo?taskID=117353488 ... and I would suggest trying a Fedora 40 virtual machine. That's what the Toolbx CI uses, which triggered this bug in the first place. I am able to reproduce at will using a F40 VM myself. There are times when the |
That’s perfectly fine. The first connection attempt is always using TLS, unencrypted HTTP is only tried afterwards. If HTTP works, all the following requests should be HTTP. |
I'm wondering if this is an attempted tls/ssl handshake which the server is refusing because it's configured for plain http. Wireshark is interpreting it as something differemt, but I don't know enough about tls at the bits & bytes level to say if it could just be making a bad guess at the protocol. In any case I'm fairly skeptical that this is relevant to the overall problem, since this interaction seems to be over quickly and is followed by other more successful looking transactions. The big thing I notice is that between frame 2227 and frame 2228 there's an approximately 70 second gap for no obvious reason. There are a bunch of window updates just before that, so I wonder if something could be misinterpreting them as a zero window, even though they're clearly not. That would have to be in the kernel, though, since for a splice connection like this pasta doesn't handle TCP at that level. At this point what I kind of want is an strace of pasta, but that's a bit awkward to get, since it makes itself undumpable. |
I'm trying to reproduce this on a Fedora 40 virtual machine now, but meanwhile:
...pasta can be straced without changes as root, too (
Yes, at this point if you manage to strace it just before and during the hang, that would probably be really helpful. |
I started the Docker registry container with:
Then attached
Then tried a
Once it got stuck, I pressed Here's the strace log: https://rishi.fedorapeople.org/podman-22575-pasta.strace It looks like there weren't any forks to follow after I attached |
I haven't been able to reproduce this so far, I tried a Fedora 40 VM with:
and a Fedora Rawhide VM with:
and two different kernel versions here, @debarshiray, how much memory and how many CPU threads did you give to your Fedora 40 VM? Available and total memory affect some TCP parameters -- so far I tried with 4 GiB and 1 GiB. Other than the strace, there are two other bits of information (perhaps more convenient to fetch) that might help as well:
and see if you can still reproduce the issue? It might also be worth to get rid of the whole |
Hah, thanks, I missed you had already posted this while I was writing my previous comment. Could you please re-run strace with
Right, yes, that's pretty much expected, pasta clone()s periodically only if it needs to scan bound ports inside the container (for automatic port binding, which is not enabled by default with Podman), and occasionally to replenish pools of socket and pipes (the ones that need to be created inside the container), but if you have just a couple of connections, you won't see any cloned process. |
Yes, sure. Updated with |
Here's an instrumented Podman build: https://koji.fedoraproject.org/koji/taskinfo?taskID=117380638
Here's an instrumented Pasta build with the inner Using those two, here's the |
This doesn't seem to be a public link, I get 403 Forbidden |
Oops! Fixed with a Here's an instrumented Pasta build with the outer Here's the Looking closely at the Docker registry logs, I see these two extra lines as compared to before:
|
Those two lines are emitted by the Docker registry container after I hit So, the Docker registry container claims to have completed responding to the |
Thanks for the additional traces, I'm looking into them now. And:
...it looks like this was the key to reproduce it. I assigned CPU 12 threads (instead of one) to my Fedora 40 virtual machine and I can finally reproduce the hang as well! For some reason, it happens only once per boot. If I At least on my setup, I see this:
everything looks normal until this point. Twice in a row, the receiver (socket 115) is not ready to read what we're splicing (pipe ends: 151 to 150) from the originating socket (211). Then we wait until:
we get |
Phew! :) I didn't quite restart the container. I was almost always running them with Anyway, I am ecstatic that you managed to reproduce it. Thanks for sticking with this issue! |
I just posted a patch at https://archives.passt.top/passt-dev/[email protected]/ that fixes the issue for me. Short summary: we fill up the buffer of the receiving socket (here, container-side), so we get @debarshiray, it would be great if you could test this: |
Yes, the patch appears to fix this problem for me too! I tried three times on my Fedora 40 VM, rebooted and tried three times again. It didn't get stuck. |
They get horribly confusing, especially here where we use "rc" for two completely different purposes: containers/podman#22575 (comment) Should not change behavior. Signed-off-by: Miloslav Trmač <[email protected]>
They get horribly confusing, especially here where we use "rc" for two completely different purposes: containers/podman#22575 (comment) Should not change behavior. Signed-off-by: Miloslav Trmač <[email protected]>
This is now fixed in passt version 2024_05_10.7288448 and in the corresponding updates for Fedora 39, Fedora 40, and Fedora Rawhide. |
In tcp_splice_sock_handler(), if we get EAGAIN on the second splice(), from pipe to receiving socket, that doesn't necessarily mean that the pipe is empty: the receiver buffer might be full instead. Hence, we can't use the 'never_read' flag to decide that there's nothing to wait for: even if we didn't read anything from the sending side in a given iteration, we might still have data to send in the pipe. Use read/written counters, instead. This fixes an issue where large bulk transfers would occasionally hang. From a corresponding strace: 0.000061 epoll_wait(4, [{events=EPOLLOUT, data={u32=29442, u64=12884931330}}], 8, 1000) = 1 0.005003 epoll_ctl(4, EPOLL_CTL_MOD, 211, {events=EPOLLIN|EPOLLRDHUP, data={u32=54018, u64=8589988610}}) = 0 0.000089 epoll_ctl(4, EPOLL_CTL_MOD, 115, {events=EPOLLIN|EPOLLRDHUP, data={u32=29442, u64=12884931330}}) = 0 0.000081 splice(211, NULL, 151, NULL, 1048576, SPLICE_F_MOVE|SPLICE_F_NONBLOCK) = -1 EAGAIN (Resource temporarily unavailable) 0.000073 splice(150, NULL, 115, NULL, 1048576, SPLICE_F_MOVE|SPLICE_F_NONBLOCK) = 1048576 0.000087 splice(211, NULL, 151, NULL, 1048576, SPLICE_F_MOVE|SPLICE_F_NONBLOCK) = -1 EAGAIN (Resource temporarily unavailable) 0.000045 splice(150, NULL, 115, NULL, 1048576, SPLICE_F_MOVE|SPLICE_F_NONBLOCK) = 520415 0.000060 splice(211, NULL, 151, NULL, 1048576, SPLICE_F_MOVE|SPLICE_F_NONBLOCK) = -1 EAGAIN (Resource temporarily unavailable) 0.000044 splice(150, NULL, 115, NULL, 1048576, SPLICE_F_MOVE|SPLICE_F_NONBLOCK) = -1 EAGAIN (Resource temporarily unavailable) 0.000044 epoll_wait(4, [], 8, 1000) = 0 we're reading from socket 211 into to the pipe end numbered 151, which connects to pipe end 150, and from there we're writing into socket 115. We initially drop EPOLLOUT from the set of monitored flags for socket 115, because it already signaled it's ready for output. Then we read nothing from socket 211 (the sender had nothing to send), and we keep emptying the pipe into socket 115 (first 1048576 bytes, then 520415 bytes). This call of tcp_splice_sock_handler() ends with EAGAIN on the writing side, and we just exit this function without setting the OUT_WAIT_1 flag (and, in turn, EPOLLOUT for socket 115). However, it turns out, the pipe wasn't actually emptied, and while socket 211 had nothing more to send, we should have waited on socket 115 to be ready for output again. As a further step, we could consider not clearing EPOLLOUT at all, unless the read/written counters match, but I'm first trying to fix this ugly issue with a minimal patch. Link: containers/podman#22575 Link: containers/podman#22593 Signed-off-by: Stefano Brivio <[email protected]> Reviewed-by: David Gibson <[email protected]>
Closing. |
Thanks for all the work to fix this. Much appreciated. |
They get horribly confusing, especially here where we use "rc" for two completely different purposes: containers/podman#22575 (comment) Should not change behavior. Signed-off-by: Miloslav Trmač <[email protected]>
They get horribly confusing, especially here where we use "rc" for two completely different purposes: containers/podman#22575 (comment) Should not change behavior. Signed-off-by: Miloslav Trmač <[email protected]>
This was observed when trying to add Fedora 40 to the list of host operating systems on which the upstream Toolbx CI is run. Eventually, it started happening on Fedora Rawhide hosts too.
The Toolbx test suite sets up a local temporary Docker registry to test some behaviour of the
toolbox(1)
binary.First, it sets up a TLS key pair:
Second, it runs a Docker registry in a container:
Finally, it populates the Docker registry with an image:
That last step to upload an older Fedora image to populate the local temporary Docker registry always gets stuck on hosts running Fedora 40 and newer, but doesn't on hosts running older Fedora, such as 38 and 39.
For various odd requirements and quirks, the Toolbx test suite specifically uploads an older Fedora based image to simplify the implementation. That said, I reproduced it also with the
fedora:35
,fedora:36
,fedora:37
,fedora:38
andfedora:39
images on a Fedora 40 host. Thefedora:40
image didn't get stuck on a Fedora 40 host.All my tests were done using Skopeo 1.15.0 on Fedora 41, 40, 39 and 38 hosts. The corresponding RPMs were:
The text was updated successfully, but these errors were encountered: