Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ci:docs] Clean up /var/tmp/ when creating containers from oci-archive tarballs #19201

Closed

Conversation

jdoss
Copy link
Contributor

@jdoss jdoss commented Jul 11, 2023

When creating a container from an oci-archive tarball it will unpack everything to /var/tmp but there isn't anything to remove these directories over time after they are loaded into container storage. Over time, if these are not removed, it can lead to running out of disk space as old directories start to pile up as you update containers over time from oci-archive tarballs.

This PR adds support to our existing tmpfiles.d conf file which will remove all /var/tmp/oci* and /var/tmp/storage* directories. See my testing below.

The changes from this PR:

# cat /etc/tmpfiles.d/oci-archive-podman.conf 
# Remove /var/tmp/oci* and /var/tmp/storage* podman temporary directories on each
# boot which are created when creating containers from oci-archive tarballs
R! /var/tmp/oci*
R! /var/tmp/storage*

Leftover directories:

# ls -lah /var/tmp/
total 8.0K
drwxrwxrwt. 21 root root 4.0K Jul 11 22:09 .
drwxr-xr-x. 24 root root 4.0K Jul 11 21:32 ..
drwx------.  3 root root   55 Jul 11 21:36 oci1333441593
drwx------.  3 root root   55 Jul 11 21:33 oci1627215173
drwx------.  3 root root   55 Jul 11 21:33 oci1838483529
drwx------.  3 root root   55 Jul 11 21:40 oci2343867263
drwx------.  3 root root   55 Jul 11 21:34 oci2628228094
drwx------.  3 root root   55 Jul 11 21:33 oci2971263250
drwx------.  3 root root   55 Jul 11 21:40 oci695730759
drwx------.  2 root root   60 Jul 11 21:45 storage1255676731
drwx------.  2 root root  157 Jul 11 21:34 storage1270338544
drwx------.  2 root root  177 Jul 11 21:34 storage2400693880
drwx------.  2 root root    6 Jul 11 21:36 storage2692925340
drwx------.  2 root root   60 Jul 11 21:45 storage2908037199
drwx------.  2 root root  147 Jul 11 21:34 storage3742651420
drwx------.  2 root root  137 Jul 11 21:35 storage3819223164
drwx------.  3 root root   17 Jul 11 21:59 systemd-private-fc2f46e6b2474492951b4654179bb076-chronyd.service-JVYOdD
drwx------.  3 root root   17 Jul 11 21:59 systemd-private-fc2f46e6b2474492951b4654179bb076-dbus-broker.service-4U7zse
drwx------.  3 root root   17 Jul 11 21:59 systemd-private-fc2f46e6b2474492951b4654179bb076-systemd-logind.service-xwQ2Xk
drwx------.  3 root root   17 Jul 11 21:59 systemd-private-fc2f46e6b2474492951b4654179bb076-systemd-resolved.service-X5AH0U

After a reboot you can see that they have been removed as expected on boot by tmpfiles.d

# ls -lah /var/tmp/
total 8.0K
drwxrwxrwt.  7 root root 4.0K Jul 11 22:41 .
drwxr-xr-x. 24 root root 4.0K Jul 11 21:32 ..
drwx------.  3 root root   17 Jul 11 22:40 systemd-private-a7a802622b8d4b3795379c4932c8222c-chronyd.service-AyXPYE
drwx------.  3 root root   17 Jul 11 22:40 systemd-private-a7a802622b8d4b3795379c4932c8222c-dbus-broker.service-F36xIa
drwx------.  3 root root   17 Jul 11 22:40 systemd-private-a7a802622b8d4b3795379c4932c8222c-systemd-logind.service-XQwTRO
drwx------.  3 root root   17 Jul 11 22:40 systemd-private-a7a802622b8d4b3795379c4932c8222c-systemd-resolved.service-mF0pqk

Does this PR introduce a user-facing change?

Remove /var/tmp/oci* and /var/tmp/storage* podman temporary directories with tmpfiles.d on boot when creating containers from an oci-archive tarball

[NO NEW TESTS NEEDED]

@rhatdan rhatdan changed the title Clean up /var/tmp/ when creating containers from oci-archive tarballs [ci:docs] Clean up /var/tmp/ when creating containers from oci-archive tarballs Jul 12, 2023
@rhatdan
Copy link
Member

rhatdan commented Jul 12, 2023

/approve
LGTM

You need to sign your commits.

git commit -a --amend -s
git push --force

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jul 12, 2023

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: jdoss, rhatdan

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jul 12, 2023
We need to remove /var/tmp/oci* and /var/tmp/storage* which are podman temporary
directories on each boot which are created when creating containers from
oci-archive tarballs

Signed-off-by: Joe Doss <[email protected]>
@jdoss jdoss force-pushed the jdoss/tmpfiles.d_oci_archive_cleanup branch from 5415f10 to a56d785 Compare July 12, 2023 00:39
# Remove /var/tmp/oci* and /var/tmp/storage* podman temporary directories on each
# boot which are created when creating containers from oci-archive tarballs
R! /var/tmp/oci*
R! /var/tmp/storage*
Copy link
Member

@edsantiago edsantiago Jul 12, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a bit alarming: storage is a generic word, it is easy to imagine regular users saving /var/tmp/storage-archive.tgz and being surprised to find it gone on reboot. (Let's leave aside discussion of the merits of expecting anything in a tmpdir to survive)

Suggestion: instead of *, try [0-9]+$ (first making sure that those patterns are valid in this context)

[EDIT: this applies to both added patterns, both oci and storage]

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even better than that, a better solution would be for @containers/podman-maintainers to use better namespaces (podman-unpack-*) and to find (and plug) the leaks.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought of this too so that is why I did on reboot vs using https://www.freedesktop.org/software/systemd/man/tmpfiles.d.html#e to have it be removed when systemd-tmpfiles-clean.service fires off it's timer. At least in this case it can stick around until reboot. I know you want to put this aside but I also don't expect anything in tmpdirs to survive a reboot which is why I went this direction. It a pretty well known that if tmpdirs are temporary.

Looking into your suggestion

-rw-r--r--. 1 jdoss jdoss    0 Jul 11 20:04 oci1234.tar
drwxr-xr-x. 1 jdoss jdoss    0 Jul 11 20:03 oci1333441593
drwxr-xr-x. 1 jdoss jdoss    0 Jul 11 20:04 oci2343867263
drwxr-xr-x. 1 jdoss jdoss    0 Jul 11 20:04 storage1255676731
drwx------. 1 jdoss jdoss    2 Jun 23 14:56 storage162146237
drwx------. 1 jdoss jdoss    0 Jun 23 14:47 storage1628300548
-rw-r--r--. 1 jdoss jdoss    0 Jul 11 20:04 storage.tar

This shell-style globbing seems to work

$ ls storage*[0-9]
storage1255676731:

storage162146237:
1

storage1628300548:
$ ls oci*[0-9]
oci1333441593:

oci2343867263:

But I am hesitant to to do anything more than what my PR provides because again, if something is in /var/tmp I don't expect it to survive a reboot. Heck, someone could create a directory like this and it will get caught.

$ mkdir storage123
[jdoss@sw-0608 tmp]$ ls storage*[0-9]
storage123:

storage1255676731:

storage162146237:
1

storage1628300548:

So even the shell-style globbing doesn't catch this edge case.

It would be handy if these directories were to have a prefix of podman- to make it more error proof when removing them with systemd-tmpfiles.d but this is a quick fix to prevent systems from running out of disk space. Ideally we should change to a podman- prefix on these directories and adjust this file later so they are cleaned up with systemd-tmpfiles-clean.service

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even better than that, a better solution would be for @containers/podman-maintainers to use better namespaces (podman-unpack-*) and to find (and plug) the leaks.

I missed this reply before I sent mine. I agree wholeheartedly.

@jdoss
Copy link
Contributor Author

jdoss commented Jul 12, 2023

@rhatdan OK I signed them and force pushed.

@Luap99
Copy link
Member

Luap99 commented Jul 12, 2023

How and why are we leaking these? IMO this is the wrong approach as it just works around the actual issue.
AFAIK these directories are created by c/image or c/image which means our other tools skopeo/buildah can run into the same problem.

c/image or c/storage or whatever creates these should be smart enough to clean them up on its own. And I agree that it should use a better namespace. And if we really think we should use the tmpfile hack then it should not live in podman. This should be at least in c/common so that it can be shipped vie containers-common so that it also effects buildah, skopeo or other c/storage,image users. This does not look like a podman specific problem to me.

cc @vrothberg @mtrmac

@vrothberg
Copy link
Member

A reproducer would be nice. Is it podman load or podman import ?

Copy link
Collaborator

@mtrmac mtrmac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree this is not desirable. Let’s fix the root cause.

Or at the very least, if we needed this for crash resiliency or something, we should switch to use very-unique file names and target those.


And yes, c/image is supposedly cleaning up these files, in https://github.com/containers/image/blob/9b3e4a40d5c314aebcfaa197ee2652118760e45f/storage/storage_dest.go#L153 and https://github.com/containers/image/blob/9b3e4a40d5c314aebcfaa197ee2652118760e45f/oci/archive/oci_dest.go#L59 / https://github.com/containers/image/blob/9b3e4a40d5c314aebcfaa197ee2652118760e45f/oci/archive/oci_src.go#L104 ; assuming users call .Close().

@jdoss
Copy link
Contributor Author

jdoss commented Jul 12, 2023

A reproducer would be nice. Is it podman load or podman import ?

@vrothberg Sure thing.

I am saving my images like this:

podman pull $CONTAINER_IMAGE
podman save $CONTAINER_IMAGE --format oci-archive -o fcos/images/$(echo $CONTAINER_IMAGE | sed 's/[^a-zA-Z0-9]/-/g').tar
podman rmi $CONTAINER_IMAGE

Then I let podman run via systemd load them on demand.

[Unit]
Description=mycool Service
Wants=network-online.target
After=network-online.target
RequiresMountsFor=%t/containers
BindsTo=mycool-pod.service
After=mycool-pod.service

[Service]
Environment=PODMAN_SYSTEMD_UNIT=%n
Restart=always
TimeoutStopSec=150
TimeoutStartSec=600
ExecStartPre=/bin/rm -f %t/%n.ctr-id
ExecStart=/usr/bin/podman run -d --rm --replace \
	--cidfile=%t/%n.ctr-id \
	--cgroups=no-conmon \
	--pod-id-file %t/mycool-pod.pod-id \
	--sdnotify=conmon \
	oci-archive:/usr/src/mycool/images/myrepo-mycool-sha.tar 
ExecStop=/usr/bin/podman stop --ignore --cidfile=%t/%n.ctr-id
ExecStopPost=/usr/bin/podman rm	-f --ignore	--cidfile=%t/%n.ctr-id
Type=notify
NotifyAccess=all

[Install]
WantedBy=default.target

@jdoss
Copy link
Contributor Author

jdoss commented Jul 12, 2023

@Luap99 @mtrmac I unfortunately don't have the emotional fortitude to provide more of a fix than what I have in this PR. I fixed this on my end with the same file in /etc/tmpfiles.d/ because I have FCOS nodes running out of disk space over time and it's impacting my customers. I wanted to at least try to help work around this issue for others with this PR. If you guys feel it needs to be more, we can close this PR and I'll let my favorite Podman Pros sort it out with a proper fix.

I do deeply appreciate you all for wanting to fix the root problem. I agree that podman (or whatever part responsible) should be cleaning up after itself. Maybe some proper namespacing of tmp dirs too wouldn't hurt.

@rhatdan
Copy link
Member

rhatdan commented Jul 12, 2023

I think we can fix this both ways, cleanup podman commands that left cruft behind because they were canceled as well as have the commands cleanup for themselves.

@rhatdan
Copy link
Member

rhatdan commented Jul 16, 2023

$ podman save ubi8 --format oci-archive -o /tmp/images/ubi8.tar
Copying blob b51194abfc91 [===========================>----------] 148.6MiB / 204.9MiB
^C
$ ls -ld /var/tmp/oci*
drwx------. 3 dwalsh dwalsh 49 Jul 16 08:48 /var/tmp/oci431979589

We should fix this to be a content specific directory.

rhatdan added a commit to rhatdan/image that referenced this pull request Jul 16, 2023
If you hit Ctr-C while pulling an image files and directories get
left in /var/tmp. By adding "containers_images" prefix, we can use
systemd tmpfiles handling to remove them on reboot safely.

Help to make containers/podman#19201 safer.

Signed-off-by: Daniel J Walsh <[email protected]>
rhatdan added a commit to rhatdan/image that referenced this pull request Jul 16, 2023
If you hit Ctr-C while pulling an image files and directories get
left in /var/tmp. By adding "containers_images" prefix, we can use
systemd tmpfiles handling to remove them on reboot safely.

Help to make containers/podman#19201 safer.

Signed-off-by: Daniel J Walsh <[email protected]>
rhatdan added a commit to rhatdan/image that referenced this pull request Jul 17, 2023
If you hit Ctr-C while pulling an image files and directories get
left in /var/tmp. By adding "containers_images" prefix, we can use
systemd tmpfiles handling to remove them on reboot safely.

Help to make containers/podman#19201 safer.

Signed-off-by: Daniel J Walsh <[email protected]>
@mtrmac
Copy link
Collaborator

mtrmac commented Jul 17, 2023

It is … a possibility… for Podman to listen for a SIGINT, and to turn that into a context cancellation of a global context object. That would allow cleanup actions to be processed throughout the codebase.

OTOH

  • it would be a fairly major design change
  • it could cause the reaction to Ctrl-C to be fairly slow
  • we need to handle abrupt terminations anyway; making them less likely to happen would somewhat increase the risk of them being broken without us noticing

@rhatdan
Copy link
Member

rhatdan commented Jul 17, 2023

@jdoss I have updated this PR with new containers/image names.
#19265

@rhatdan rhatdan closed this Jul 17, 2023
@jdoss
Copy link
Contributor Author

jdoss commented Jul 17, 2023

Thanks @rhatdan !!

@github-actions github-actions bot added the locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. label Oct 16, 2023
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Oct 16, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. release-note
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants