-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kill cpcontainer: could not be stopped... sending SIGKILL... container state improper #16142
Comments
That's interesting. The |
To improve the error message reported in containers#16142 where the container is reported to be in the wrong state but we do not know which. This is not a fix for containers#16142 but will hopefully aid in better understanding what's going on if it flakes again. [NO NEW TESTS NEEDED] as hitting the condition is inherently racy. Signed-off-by: Valentin Rothberg <[email protected]>
I opened #16151 to improve the error message. I really want to know which state the container is in. |
You got lucky!:
|
Wuaaaaaaa ... I expected everything but that O.O |
Failed in nightly cron, f36 rootless. Again, state is |
#16177 should fix the issue. I am surprised this didn't flake more. |
Make sure to wait for the container to exit after kill. While the cleanup process will take care eventually of transitioning the state, we need to give a guarantee to the user to leave the container in the expected state once the (kill) command has finished. The issue could be observed in a flaking test (containers#16142) where `podman rm -f -t0` failed because the preceding `podman kill` left the container in "running" state which ultimately confused the "stop" backend. Note that we should only wait for the container to exit when SIGKILL is being used. Other signals have different semantics. [NO NEW TESTS NEEDED] as I do not know how to reliably reproduce the issue. If containers#16142 stops flaking, we are good. Fixes: containers#16142 Signed-off-by: Valentin Rothberg <[email protected]>
To improve the error message reported in containers#16142 where the container is reported to be in the wrong state but we do not know which. This is not a fix for containers#16142 but will hopefully aid in better understanding what's going on if it flakes again. [NO NEW TESTS NEEDED] as hitting the condition is inherently racy. Signed-off-by: Valentin Rothberg <[email protected]>
Make sure to wait for the container to exit after kill. While the cleanup process will take care eventually of transitioning the state, we need to give a guarantee to the user to leave the container in the expected state once the (kill) command has finished. The issue could be observed in a flaking test (containers#16142) where `podman rm -f -t0` failed because the preceding `podman kill` left the container in "running" state which ultimately confused the "stop" backend. Note that we should only wait for the container to exit when SIGKILL is being used. Other signals have different semantics. [NO NEW TESTS NEEDED] as I do not know how to reliably reproduce the issue. If containers#16142 stops flaking, we are good. Fixes: containers#16142 Signed-off-by: Valentin Rothberg <[email protected]>
Reopening, sorry. Just saw this on #16275, which was based on 589ff20, which is very recent and includes #16177. f36-aarch64 remote root:
|
Monsieur, I love your reports and bookkeeping. Some flakes are more stubborn than others. The error message strongly suggests another fart/race in the kill code. I will take a look today. |
#16320 should fix the issue. It's slightly different but effectively a similar scenario as before. When sending signals to the container, Podman releases the lock to prevent |
That was quick! Thank you! |
Another "container state improper" flake, but I'm not sure if it's the same cause. This time in int remote f37 root but it's in # podman-remote [options] start 755ed1a8ac74984410bd7817cec045e56038e06a2133ab9592e70bc9da623c62
755ed1a8ac74984410bd7817cec045e56038e06a2133ab9592e70bc9da623c62
# podman-remote [options] restart 755ed1a8ac74984410bd7817cec045e56038e06a2133ab9592e70bc9da623c62
open pidfd: No such process
Error: sending SIGKILL to container 755ed1a8ac74984410bd7817cec045e56038e06a2133ab9592e70bc9da623c62: container state improper: stopped |
This one looks different to me. Would you create a new issue for it? |
A friendly reminder that this issue had no activity for 30 days. |
"container state improper" is one of the symptoms of the everything-hosed issue (#15367). Yesterday I assigned these three to that PR:
Is it possible that those are manifestations of this bug instead? If so, should I remove "cpcontainer" from the title and reassign those three? |
Yes, it looks different to me. |
Sorry, I'm having trouble interpreting that. Can you clarify whether I should
|
Apologies, I think they look more like #15367 but things are getting blurry a bit. Some symptoms tend to occur simultaneously at times. |
The container lock is released before stopping/killing which implies certain race conditions with, for instance, the cleanup process changing the container state to stopped, exited or other states. The (remaining) flakes seen in containers#16142 and containers#15367 strongly indicate a race in between the stopping/killing a container and the cleanup process. To fix the flake make sure to ignore invalid-state errors. An alternative fix would be to change `KillContainer` to not return such errors at all but commit c77691f indicates an explicit desire to have these errors being reported in the sig proxy. [NO NEW TESTS NEEDED] as it's a race already covered by the system tests. Fixes: containers#16142 Fixes: containers#15367 Signed-off-by: Valentin Rothberg <[email protected]>
[sys] 127 podman cp file from host to container
The text was updated successfully, but these errors were encountered: