Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

possible solution for podman pause process that keeps running after user logout #11560

Closed
rptaylor opened this issue Sep 14, 2021 · 10 comments · Fixed by #11606
Closed

possible solution for podman pause process that keeps running after user logout #11560

rptaylor opened this issue Sep 14, 2021 · 10 comments · Fixed by #11606
Labels
kind/bug Categorizes issue or PR as related to a bug. locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments.

Comments

@rptaylor
Copy link

rptaylor commented Sep 14, 2021

Review of related issues
A number of issues and workarounds have been raised related to the podman pause process when a user runs rootless podman containers, which remains running even after a user logs out.

#7180
cockpit-project/cockpit-podman#473
#7192
#7133
#10640

systemd
systemd/systemd#16332
systemd/systemd#16318 this is the only one which remains open, which is about ways of storing file descriptors.

Is this a BUG REPORT or FEATURE REQUEST? (leave only one on its own line)

/kind bug

Description

The specific problem I am encountering is that I have made a systemd user service to set up SSH agents for users.
In /etc/systemd/user/

[Unit]
Description=Automatic SSH key agent
[Service]
Type=forking
Environment=SSH_AUTH_SOCK=%t/ssh-agent.socket
ExecStart=/usr/bin/ssh-agent -a $SSH_AUTH_SOCK
[Install]
WantedBy=default.target

Then sudo systemctl --global enable ssh-agent.service to enable it as a user-level service for all users. This works perfectly on EL8 to run at most one SSH agent for each user logged in (no matter how many login sessions they have), and automatically stop it when every session for a user has ended. But having gone to some length to do this "the right (systemd) way" as opposed to bash scripting, the problem is that if users use podman, their systemd sessions never end when they log out, because podman pause is left running indefinitely, so their SSH agents are left running forever which is a security concern.

Steps to reproduce the issue:

  1. Run a container with podman , e.g. podman run --rm -it registry.hub.docker.com/library/fedora:34

  2. Exit the container

  3. Log out

Describe the results you received:

The user's systemd session is still running:

user        9605       1  0 22:26 ?        00:00:00 /usr/lib/systemd/systemd --user
user        9608    9605  0 22:26 ?        00:00:00 (sd-pam)
user        9616    9605  0 22:26 ?        00:00:00 /usr/bin/ssh-agent -a /run/user/1002/my-ssh-agent.socket
user        9660       1  0 22:27 ?        00:00:00 podman
user        9669    9605  0 22:27 ?        00:00:00 /usr/bin/dbus-daemon --session --address=systemd: --nofork --nopidfile --systemd-activation --syslog-only

The podman pause process has to be manually killed in order to allow the session to be cleaned up properly.

Describe the results you expected:
The user's systemd session is able to finish normally because no process is left running.

Idea for a possible solution

As I understand the podman pause process always needs to be running for a user to keep a handle on some namespaces, so that rootless pods can use the same user namespace. But instead of being started when podman is first invoked by a user, could the pause process be a systemd user service, just like the SSH agent example? That way it would be automatically started when a user logs in (if there isn't one already running), and automatically stopped when the user logs out by systemd, allowing normal termination of the session. It could be an optional opt-in for admins, by doing sudo systemctl --global enable podman-pause.service and individual users could manage it themselves via systemctl --user stop podman-pause.service or opt out with systemctl --user mask podman-pause.service.
This way systemd could do the work for you and avoid a race condition with starting the pause process, and podman would play nicely with other systemd-managed user services.

Also IIUC I think it would only require a way to start the podman pause process with a command (for ExecStart, maybe podman unshare something ??), but probably not require any other changes in podman code.

Output of podman version:

$ podman version
Version:      3.2.3
API Version:  3.2.3
Go Version:   go1.15.14
Built:        Tue Aug 10 20:55:16 2021
OS/Arch:      linux/amd64

Package info (e.g. output of rpm -q podman or apt list podman):

podman-3.2.3-0.10.module_el8.4.0+886+c9a8d9ad.x86_64

**Have you tested with the latest version of Podman and have you checked the Podman Troubleshooting Guide?
Yes

@openshift-ci openshift-ci bot added the kind/bug Categorizes issue or PR as related to a bug. label Sep 14, 2021
@rptaylor
Copy link
Author

rptaylor commented Sep 14, 2021

Actually I tested this podman-pause.service unit file:

[Unit]
Description=podman pause

[Service]
Type=forking
ExecStart=/usr/bin/podman unshare echo

[Install]
WantedBy=default.target

systemctl --user enable podman-pause

Not sure if podman unshare echo is the simplest command that can start the pause process but it seems to work.
When I log in, I have a pause process, however this time its parent is my systemd --user session, instead of PID 1:

user       11358       1  0 00:25 ?        00:00:00 /usr/lib/systemd/systemd --user
user       11361   11358  0 00:25 ?        00:00:00 (sd-pam)
user       11369   11354  0 00:25 ?        00:00:00 sshd: user@pts/0
user       11370   11358  0 00:25 ?        00:00:00 /usr/bin/ssh-agent -a /run/user/1002/my-ssh-agent.socket
user       11374   11369  0 00:25 pts/0    00:00:00 -bash
user       11392   11358  0 00:25 ?        00:00:00 /usr/bin/podman

Podman seems to work normally, and then when I log out, systemd is able to clean up the pause process as part of normal systemd session termination, instead of podman preventing the session termination from happening! It would be nice if this could be the standard behaviour of podman.

@Luap99
Copy link
Member

Luap99 commented Sep 14, 2021

@giuseppe PTAL

@giuseppe
Copy link
Member

podman moves the pause process to a scope, but (I am not sure why) we disable the DefaultDependencies. Would something like this be enough for your case? Could you try it out and see if it solves the problem you've seen?

diff --git a/utils/utils_supported.go b/utils/utils_supported.go
index ebc870d26..f3507b7d1 100644
--- a/utils/utils_supported.go
+++ b/utils/utils_supported.go
@@ -40,7 +40,7 @@ func RunUnderSystemdScope(pid int, slice string, unitName string) error {
        properties = append(properties, systemdDbus.PropSlice(slice))
        properties = append(properties, newProp("PIDs", []uint32{uint32(pid)}))
        properties = append(properties, newProp("Delegate", true))
-       properties = append(properties, newProp("DefaultDependencies", false))
+       properties = append(properties, newProp("DefaultDependencies", true))
        ch := make(chan string)
        _, err = conn.StartTransientUnit(unitName, "replace", properties, ch)
        if err != nil {

@rptaylor
Copy link
Author

rptaylor commented Sep 14, 2021

@giuseppe I tried that compiling from the v3.3.1-rhel branch on EL8.

$ git diff
diff --git a/utils/utils_supported.go b/utils/utils_supported.go
index ebc870d26..f3507b7d1 100644
--- a/utils/utils_supported.go
+++ b/utils/utils_supported.go
@@ -40,7 +40,7 @@ func RunUnderSystemdScope(pid int, slice string, unitName string) error {
        properties = append(properties, systemdDbus.PropSlice(slice))
        properties = append(properties, newProp("PIDs", []uint32{uint32(pid)}))
        properties = append(properties, newProp("Delegate", true))
-       properties = append(properties, newProp("DefaultDependencies", false))
+       properties = append(properties, newProp("DefaultDependencies", true))
        ch := make(chan string)
        _, err = conn.StartTransientUnit(unitName, "replace", properties, ch)
        if err != nil {

Did make BUILDTAGS="selinux seccomp" and sudo make install PREFIX=/usr based on https://podman.io/getting-started/installation. But there was no change in behaviour: after running podman, exiting the container, and logging out, the podman pause process still kept the systemd session active, and the parent PID of it was 1:

user       21657       1  0 20:31 ?        00:00:00 /usr/lib/systemd/systemd --user
user       21660   21657  0 20:31 ?        00:00:00 (sd-pam)
user       21722       1  0 20:32 ?        00:00:00 podman
user       21753   21657  0 20:32 ?        00:00:00 /usr/bin/dbus-daemon --session --address=systemd: --nofork --nopidfile --systemd-activation --syslog-only
root       21871    1000  0 20:32 ?        00:00:00 sshd: user [priv]
user       21874   21871  0 20:32 ?        00:00:00 sshd: user@pts/1
user       21875   21874  0 20:32 pts/1    00:00:00 -bash

P.S. I thought your name seemed familiar - we both gave presentations at https://indico.cern.ch/event/757415/ and went on a tour of the CMS detector. :)

@giuseppe
Copy link
Member

systemd has a delay to stop the user session after a logout.

Even without the podman pause process running, I see that the user session is still kept around for 10 seconds. How long have you waited after the logout to see what processes were still running?

If you want to terminate the user session immediately after the last logout you may need to tweak UserStopDelaySec= in /etc/systemd/logind.conf

@rptaylor
Copy link
Author

@giuseppe The default value of UserStopDelaySec seems to be ~ 5-10 seconds, but the problem persists indefinitely, e.g. even after waiting 20 minutes the systemd session is still kept alive by the podman pause process.
I have reproduced it with podman v3.2.3 on AlmaLinux and v3.3.1 on CentOS8 Stream.

@giuseppe
Copy link
Member

thanks for confirming it.

I've tested on RHEL 8 and I see a different behavior since UserStopDelaySec= seem to not be present in the systemd version there.

This behavior helped to find an error on our side: #11606

Would it be possible for you to check that PR? I am not able to reproduce anymore the issue on RHEL with that applied.

giuseppe added a commit to giuseppe/libpod that referenced this issue Sep 16, 2021
make sure the pause process is moved to its own scope as well as what
we do when we join an existing user+mount namespace.

Closes: containers#11560

[NO TESTS NEEDED]

Signed-off-by: Giuseppe Scrivano <[email protected]>
@rptaylor
Copy link
Author

@giuseppe thanks for making the patch!
I compiled your branch:

$ podman version
Version:      4.0.0-dev
API Version:  4.0.0-dev
Go Version:   go1.15.14
Git Commit:   a2c8b5d9d6d6e46679fe9540619d4303d4b4601d
Built:        Thu Sep 16 21:10:52 2021
OS/Arch:      linux/amd64

Then I ran podman as before; although the PPID of the pause process is still 1, it gets properly terminated when I log out and the systemd session is able to finish successfully. Thanks!

@rptaylor
Copy link
Author

Do you think this might get backported to v3.3 or v3.4 ?

giuseppe added a commit to giuseppe/libpod that referenced this issue Sep 17, 2021
make sure the pause process is moved to its own scope as well as what
we do when we join an existing user+mount namespace.

Closes: containers#11560

[NO TESTS NEEDED]

Signed-off-by: Giuseppe Scrivano <[email protected]>
(cherry picked from commit a2c8b5d)
@giuseppe
Copy link
Member

Do you think this might get backported to v3.3 or v3.4 ?

Sure, the backport for 3.4 is here: #11624

@github-actions github-actions bot added the locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. label Sep 21, 2023
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Sep 21, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
kind/bug Categorizes issue or PR as related to a bug. locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants