-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add ContainerStateRemoving #4493
Add ContainerStateRemoving #4493
Conversation
WIP: Needs tests, needs integration with |
b1dc750
to
58040fe
Compare
LGTM |
Fixed the race by moving lock removal to after the container is removed from the database. Could potentially result in leaking locks is we get a SIGKILL precisely between the database removal and attempting to free the lock, but does solve issues where we could try to double-free a lock if |
58040fe
to
41b29c7
Compare
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: mheon The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Good news: this seems to work. Bad news: it potentially introduces 4 new database writes during removal to ensure atomicity of removal operations. It doesn't seem to have a significant effect on removal time on my laptop, but four disk writes can't be free. |
When Libpod removes a container, there is the possibility that removal will not fully succeed. The most notable problems are storage issues, where the container cannot be removed from c/storage. When this occurs, we were faced with a choice. We can keep the container in the state, appearing in `podman ps` and available for other API operations, but likely unable to do any of them as it's been partially removed. Or we can remove it very early and clean up after it's already gone. We have, until now, used the second approach. The problem that arises is intermittent problems removing storage. We end up removing a container, failing to remove its storage, and ending up with a container permanently stuck in c/storage that we can't remove with the normal Podman CLI, can't use the name of, and generally can't interact with. A notable cause is when Podman is hit by a SIGKILL midway through removal, which can consistently cause `podman rm` to fail to remove storage. We now add a new state for containers that are in the process of being removed, ContainerStateRemoving. We set this at the beginning of the removal process. It notifies Podman that the container cannot be used anymore, but preserves it in the DB until it is fully removed. This will allow Remove to be run on these containers again, which should successfully remove storage if it fails. Fixes containers#3906 Signed-off-by: Matthew Heon <[email protected]>
If the container is running and we need to get its netns and can't, that is a serious bug deserving of errors. If it's not running, that's not really a big deal. Log an error and continue. Signed-off-by: Matthew Heon <[email protected]>
625291f
to
6c405b5
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code LGTM but tests are missing.
LGTM too, added tests are always welcomed. |
/lgtm |
When Libpod removes a container, there is the possibility that removal will not fully succeed. The most notable problems are storage issues, where the container cannot be removed from c/storage.
When this occurs, we were faced with a choice. We can keep the container in the state, appearing in
podman ps
and available for other API operations, but likely unable to do any of them as it's been partially removed. Or we can remove it very early and clean up after it's already gone. We have, until now, used the second approach.The problem that arises is intermittent problems removing storage. We end up removing a container, failing to remove its storage, and ending up with a container permanently stuck in c/storage that we can't remove with the normal Podman CLI, can't use the name of, and generally can't interact with. A notable cause is when Podman is hit by a SIGKILL midway through removal, which can consistently cause
podman rm
to fail to remove storage.We now add a new state for containers that are in the process of being removed, ContainerStateRemoving. We set this at the beginning of the removal process. It notifies Podman that the container cannot be used anymore, but preserves it in the DB until it is fully removed. This will allow Remove to be run on these containers again, which should successfully remove storage if it fails.
Fixes #3906