Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Containers started using socket-activated APIv2 die from systemd activation timeout #7294

Closed
danopia opened this issue Aug 11, 2020 · 11 comments · Fixed by #7312
Closed

Containers started using socket-activated APIv2 die from systemd activation timeout #7294

danopia opened this issue Aug 11, 2020 · 11 comments · Fixed by #7312
Labels
kind/bug Categorizes issue or PR as related to a bug. locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments.

Comments

@danopia
Copy link

danopia commented Aug 11, 2020

/kind bug

Description

When creating and starting containers specifically via the /run/podman/podman.sock endpoint, the containers are stopped when podman.service gets shut down due to socket activation timeout. This manifests as containers stopping 10s after the last APIv2 request is made. The shutdown is clean; podman ps -a shows exit code 0 for them.

Opening an APIv2 events stream or issuing APIv2 requests every ~5 seconds causes the containers to stay alive longer; but if those requests stop, the containers still stop then.

Steps to reproduce the issue:

Minimal case assuming you have podman's systemd units installed already:

  1. Check you have the socket unit ready: systemctl enable --now podman.socket

  2. Create a sleep container:

    1. curl --unix-socket /run/podman/podman.sock -XPOST http://d/v1.0.0/libpod/images/pull'?reference=k8s.gcr.io/pause:3.2'
    2. curl --unix-socket /run/podman/podman.sock -XPOST http://d/v1.0.0/libpod/containers/create --header 'content-type: application/json' --data '{"image":"k8s.gcr.io/pause:3.2","name":"bg-test"}'
  3. Start watching events: podman events &

  4. Start the container: curl --unix-socket /run/podman/podman.sock -XPOST http://d/v1.0.0/libpod/containers/bg-test/start

Describe the results you received:

2020-08-11 14:52:21.101145183 +0000 UTC container init 1d1018fa7c028dd9341df4fd26d3b9d04a719ca4b4913da872f9c6d408a831e5 (image=k8s.gcr.io/pause:3.2, name=bg-test)
2020-08-11 14:52:21.114121047 +0000 UTC container start 1d1018fa7c028dd9341df4fd26d3b9d04a719ca4b4913da872f9c6d408a831e5 (image=k8s.gcr.io/pause:3.2, name=bg-test)
[ ... 10 seconds pass ... ]
2020-08-11 14:52:31.215222381 +0000 UTC container died 1d1018fa7c028dd9341df4fd26d3b9d04a719ca4b4913da872f9c6d408a831e5 (image=k8s.gcr.io/pause:3.2, name=bg-test)
2020-08-11 14:52:31.247591086 +0000 UTC container cleanup 1d1018fa7c028dd9341df4fd26d3b9d04a719ca4b4913da872f9c6d408a831e5 (image=k8s.gcr.io/pause:3.2, name=bg-test)
# podman ps -a
CONTAINER ID  IMAGE                 COMMAND  CREATED         STATUS                    PORTS   NAMES
1d1018fa7c02  k8s.gcr.io/pause:3.2           31 seconds ago  Exited (0) 18 seconds ago         bg-test

Describe the results you expected:

2020-08-11 14:52:21.101145183 +0000 UTC container init 1d1018fa7c028dd9341df4fd26d3b9d04a719ca4b4913da872f9c6d408a831e5 (image=k8s.gcr.io/pause:3.2, name=bg-test)
2020-08-11 14:52:21.114121047 +0000 UTC container start 1d1018fa7c028dd9341df4fd26d3b9d04a719ca4b4913da872f9c6d408a831e5 (image=k8s.gcr.io/pause:3.2, name=bg-test)
[ ... nothing further ]
# podman ps -a
CONTAINER ID  IMAGE                 COMMAND  CREATED        STATUS            PORTS   NAMES
1d1018fa7c02  k8s.gcr.io/pause:3.2           2 minutes ago  Up 2 minutes ago          bg-test

Alternative repro within docker:

I also constructed a self-contained repro within docker. This isn't too minimal, but should show the problem on any machine that has Docker available (I'm not sure how relevant the Linux distro is in this issue). You can just copy the whole block and it should show the container dying during the final sleep call.

# bring up a systemd container
docker run --name test -d --privileged --rm --tmpfs /tmp --tmpfs /run --tmpfs /run/lock -v /sys/fs/cgroup:/sys/fs/cgroup:ro jrei/systemd-debian:sid
# install podman (need to ask for runc explicitly because apt apparently prefers crun due to alphabetic order, while podman prefers runc)
docker exec test sh -euxc "apt update; apt install -y podman runc curl | grep podman; systemctl start podman.socket"
# create a sleep container
docker exec test curl --unix-socket /run/podman/podman.sock 'http://d/v1.0.0/libpod/images/pull?reference=k8s.gcr.io/pause:3.2' -XPOST
docker exec test curl --unix-socket /run/podman/podman.sock http://d/v1.0.0/libpod/containers/create -XPOST --header 'content-type: application/json' --data '{"image":"k8s.gcr.io/pause:3.2","name":"bg-test","netns":{"nsmode":"host"}}'
# watch event stream & start the container 
docker exec test podman events &
docker exec test curl --unix-socket /run/podman/podman.sock http://d/v1.0.0/libpod/containers/bg-test/start -XPOST
# ... observe the container stop after 10s
sleep 15s
# cleanup our repro
docker rm -f test

I also tried to repro on the fedora:32 image but fedora's podman wouldn't run within docker, said something about overlay-on-overlay. I didn't debug further.

Additional information you deem important (e.g. issue happens only occasionally):

Pod infra containers don't seem affected, when the /pods/.../start API is used.

I wasn't able to repro when podman system service is run manually from an interactive session. I've only seen this happen specifically when the podman daemon's lifecycle is managed by systemd.

Output of podman version:

Version:      2.0.4
API Version:  1
Go Version:   go1.14
Built:        Wed Dec 31 16:00:00 1969
OS/Arch:      linux/amd64
Output of `podman info --debug`:
host:
  arch: amd64
  buildahVersion: 1.15.0
  cgroupVersion: v1
  conmon:
    package: 'conmon: /usr/libexec/podman/conmon'
    path: /usr/libexec/podman/conmon
    version: 'conmon version 2.0.18, commit: '
  cpus: 4
  distribution:
    distribution: debian
    version: "10"
  eventLogger: file
  hostname: redapu
  idMappings:
    gidmap: null
    uidmap: null
  kernel: 4.19.0-9-amd64
  linkmode: dynamic
  memFree: 871215104
  memTotal: 4121145344
  ociRuntime:
    name: runc
    package: 'runc: /usr/sbin/runc'
    path: /usr/sbin/runc
    version: |-
      runc version 1.0.0~rc6+dfsg1
      commit: 1.0.0~rc6+dfsg1-3
      spec: 1.0.1
  os: linux
  remoteSocket:
    path: /run/podman/podman.sock
  rootless: false
  slirp4netns:
    executable: ""
    package: ""
    version: ""
  swapFree: 0
  swapTotal: 0
  uptime: 714h 51m 22.19s (Approximately 29.75 days)
registries:
  search:
  - docker.io
  - registry.fedoraproject.org
  - registry.access.redhat.com
store:
  configFile: /etc/containers/storage.conf
  containerStore:
    number: 2
    paused: 0
    running: 2
    stopped: 0
  graphDriverName: overlay
  graphOptions: {}
  graphRoot: /var/lib/containers/storage
  graphStatus:
    Backing Filesystem: extfs
    Native Overlay Diff: "true"
    Supports d_type: "true"
    Using metacopy: "false"
  imageStore:
    number: 6
  runRoot: /var/run/containers/storage
  volumePath: /var/lib/containers/storage/volumes
version:
  APIVersion: 1
  Built: 0
  BuiltTime: Wed Dec 31 16:00:00 1969
  GitCommit: ""
  GoVersion: go1.14
  OsArch: linux/amd64
  Version: 2.0.4

Package info (e.g. output of rpm -q podman or apt list podman):

podman/unknown,now 2.0.4~1 amd64 [installed]

Additional environment details (AWS, VirtualBox, physical, etc.):

  • Discovered on Debian 10 on bare metal, rootful podman 2.0.4.
  • Repro'd further using a Debian sid image via docker[-in-containerd], rootful podman 2.0.4
@openshift-ci-robot openshift-ci-robot added the kind/bug Categorizes issue or PR as related to a bug. label Aug 11, 2020
@mheon
Copy link
Member

mheon commented Aug 11, 2020

@baude @jwhonce @vrothberg PTAL

I don't think we've seen this elsewhere in our testing, which is making me think this is Debian-specific. Does this happen as rootless, root, or both?

Suspicion: systemd is blowing away all containers when it throttles down due to KillMode on the unit file. I'd expect that moving the containers out of the unit's cgroup would fix this for root containers, but I could be wrong.

@jwhonce
Copy link
Member

jwhonce commented Aug 11, 2020

@mheon I have not seen this on fedora or rhel using systemd.

@danopia
Copy link
Author

danopia commented Aug 11, 2020

I'd believe that this is debian specific. I have not had a chance to try rootless yet, will see if I can get a repro.

I just tried adding KillMode=process to the podman units but it didn't seem to change anything. Not sure if I applied it right. I'm not too handy with systemd.

Separately, I noticed that systemd is not too happy about the podman shutdown in general. Here's several systemctl status podman.service snapshots while running that docker script above:

While unix socket is active and in use
* podman.service - Podman API Service
     Loaded: loaded (/lib/systemd/system/podman.service; enabled; vendor preset: enabled)
     Active: active (running) since Tue 2020-08-11 16:56:44 UTC; 8s ago
TriggeredBy: * podman.socket
       Docs: man:podman-system-service(1)
   Main PID: 3859 (podman)
      Tasks: 11 (limit: 8924)
     Memory: 10.9M
     CGroup: /kubepods/burstable/podb8a75f5d-8b75-4e4a-9604-e127f52c0ae5/31baac9f33dab30776c93283f418471af3b5ba3d8ddc1c6d14bd8654b657497f/kubepods/burstable/podb8a75f5d-8b75-4e4a-9604-e127f52c0ae5/31baac9f33dab30776c93283f418471af3b5ba3d8ddc1c6d14bd8654b657497f/docker/fa6977c78428cb8fd7c9a610f13730c143d651987c0a0e6ea95f314b23e2ebc4/system.slice/podman.service
             |-3859 /usr/bin/podman system service
             `-3886 /usr/libexec/podman/conmon --api-version 1 -c acb61eb4a3d7b3f509a88bfcee7f3bea85d6f7f6d3b78c86a327f1cfce295049 -u acb61eb4a3d7b3f509a88bfcee7f3bea85d6f7f6d3b78c86a327f1cfce295049 -r /usr/bin/runc -b /var/lib/containers/storage/vfs-containers/acb61eb4a3d7b3f509a88bfcee7f3bea85d6f7f6d3b78c86a327f1cfce295049/userdata -p /var/run/containers/storage/vfs-containers/acb61eb4a3d7b3f509a88bfcee7f3bea85d6f7f6d3b78c86a327f1cfce295049/userdata/pidfile -n bg-test --exit-dir /var/run/libpod/exits --socket-dir-path /var/run/libpod/socket -s -l k8s-file:/var/lib/containers/storage/vfs-containers/acb61eb4a3d7b3f509a88bfcee7f3bea85d6f7f6d3b78c86a327f1cfce295049/userdata/ctr.log --log-level error --runtime-arg --log-format=json --runtime-arg --log --runtime-arg=/var/run/containers/storage/vfs-containers/acb61eb4a3d7b3f509a88bfcee7f3bea85d6f7f6d3b78c86a327f1cfce295049/userdata/oci-log --conmon-pidfile /var/run/containers/storage/vfs-containers/acb61eb4a3d7b3f509a88bfcee7f3bea85d6f7f6d3b78c86a327f1cfce295049/userdata/conmon.pid --exit-command /usr/bin/podman --exit-command-arg --root --exit-command-arg /var/lib/containers/storage --exit-command-arg --runroot --exit-command-arg /var/run/containers/storage --exit-command-arg --log-level --exit-command-arg error --exit-command-arg --cgroup-manager --exit-command-arg systemd --exit-command-arg --tmpdir --exit-command-arg /var/run/libpod --exit-command-arg --runtime --exit-command-arg runc --exit-command-arg --events-backend --exit-command-arg file --exit-command-arg container --exit-command-arg cleanup --exit-command-arg acb61eb4a3d7b3f509a88bfcee7f3bea85d6f7f6d3b78c86a327f1cfce295049

Aug 11 16:56:44 fa6977c78428 systemd[1]: Started Podman API Service.
Just after the timeout, after containers already died; unit is 'deactivating' for some time
* podman.service - Podman API Service
     Loaded: loaded (/lib/systemd/system/podman.service; enabled; vendor preset: enabled)
     Active: deactivating (stop-sigterm) since Tue 2020-08-11 16:56:54 UTC; 15s ago
TriggeredBy: * podman.socket
       Docs: man:podman-system-service(1)
    Process: 3859 ExecStart=/usr/bin/podman system service (code=exited, status=0/SUCCESS)
   Main PID: 3859 (code=exited, status=0/SUCCESS)
      Tasks: 0 (limit: 8924)
     Memory: 1.1M
     CGroup: /kubepods/burstable/podb8a75f5d-8b75-4e4a-9604-e127f52c0ae5/31baac9f33dab30776c93283f418471af3b5ba3d8ddc1c6d14bd8654b657497f/kubepods/burstable/podb8a75f5d-8b75-4e4a-9604-e127f52c0ae5/31baac9f33dab30776c93283f418471af3b5ba3d8ddc1c6d14bd8654b657497f/docker/fa6977c78428cb8fd7c9a610f13730c143d651987c0a0e6ea95f314b23e2ebc4/system.slice/podman.service

Aug 11 16:56:44 fa6977c78428 systemd[1]: Started Podman API Service.
Minutes later: unit is failed
* podman.service - Podman API Service
     Loaded: loaded (/lib/systemd/system/podman.service; enabled; vendor preset: enabled)
     Active: failed (Result: timeout) since Tue 2020-08-11 16:58:25 UTC; 1min 19s ago
TriggeredBy: * podman.socket
       Docs: man:podman-system-service(1)
    Process: 3859 ExecStart=/usr/bin/podman system service (code=exited, status=0/SUCCESS)
   Main PID: 3859 (code=exited, status=0/SUCCESS)

Aug 11 16:56:44 fa6977c78428 systemd[1]: Started Podman API Service.
Aug 11 16:58:25 fa6977c78428 systemd[1]: podman.service: State 'stop-sigterm' timed out. Killing.
Aug 11 16:58:25 fa6977c78428 systemd[1]: podman.service: Failed with result 'timeout'.

On any further request the unit recovers back to active and services the request. Not sure how relevant that is.. in fact I see podman system service not obeying ^C when when interactively, so probably not directly relevant to this bug report.

So, possibly some missed config in the debian packaging? I've repro'd using packages from both kubic and debian-testing.

@mheon
Copy link
Member

mheon commented Aug 11, 2020

Can you try KillMode=none instead?

@danopia
Copy link
Author

danopia commented Aug 11, 2020

Ah crap, I patched the user unit instead of the system unit.

Adding KillMode=process to /lib/systemd/system/podman.service is a successful workaround in my docker repro.

Seems like nothing unintentional leaked?

# docker exec test ps aux
USER         PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root           1  0.2  0.1  24016 10700 ?        Ss   17:30   0:00 /lib/systemd/systemd --system --deserialize 14
message+     129  0.0  0.0   6932  4252 ?        Ss   17:30   0:00 /usr/bin/dbus-daemon --system --address=systemd: --nofork --nopidfile --systemd-activation --syslog-only
root         131  0.1  0.0  16580  6700 ?        Ss   17:30   0:00 /lib/systemd/systemd-logind
root         321  0.1  0.1  54380 11608 ?        Ss   17:30   0:00 /lib/systemd/systemd-journald
root        4144  0.0  0.0  81156  1960 ?        Ssl  17:33   0:00 /usr/libexec/podman/conmon --api-version 1 -c 1dca36ce88bad40ff34ee36042be6ccd9185389142fe3e040f76459a0a2c3331 -u 1dca36ce88bad40ff34ee36042be6ccd9185389142fe3e040f76459a0a2c3331 -r /usr/bin/runc -b /var/lib/containers/storage/vfs-containers/1dca36ce88bad40ff34ee36042be6ccd9185389142fe3e040f76459a0a2c3331/userdata -p /var/run/containers/storage/vfs-containers/1dca36ce88bad40ff34ee36042be6ccd9185389142fe3e040f76459a0a2c3331/userdata/pidfile -n bg-test --exit-dir /var/run/libpod/exits --socket-dir-path /var/run/libpod/socket -s -l k8s-file:/var/lib/containers/storage/vfs-containers/1dca36ce88bad40ff34ee36042be6ccd9185389142fe3e040f76459a0a2c3331/userdata/ctr.log --log-level error --runtime-arg --log-format=json --runtime-arg --log --runtime-arg=/var/run/containers/storage/vfs-containers/1dca36ce88bad40ff34ee36042be6ccd9185389142fe3e040f76459a0a2c3331/userdata/oci-log --conmon-pidfile /var/run/containers/storage/vfs-containers/1dca36ce88bad40ff34ee36042be6ccd9185389142fe3e040f76459a0a2c3331/userdata/conmon.pid --exit-command /usr/bin/podman --exit-command-arg --root --exit-command-arg /var/lib/containers/storage --exit-command-arg --runroot --exit-command-arg /var/run/containers/storage --exit-command-arg --log-level --exit-command-arg error --exit-command-arg --cgroup-manager --exit-command-arg systemd --exit-command-arg --tmpdir --exit-command-arg /var/run/libpod --exit-command-arg --runtime --exit-command-arg runc --exit-command-arg --events-backend --exit-command-arg file --exit-command-arg container --exit-command-arg cleanup --exit-command-arg 1dca36ce88bad40ff34ee36042be6ccd9185389142fe3e040f76459a0a2c3331
root        4156  0.2  0.0    968     4 ?        Ss   17:33   0:00 /pause
root        4276  0.0  0.0   6624  2892 ?        Rs   17:35   0:00 ps aux

systemd noted what leaked, too:

Aug 11 17:33:31 8b373e600677 systemd[1]: podman.service: Unit process 4144 (conmon) remains running after unit stopped.

Do we have an explanation for why this would be debian specific? I do not see KillMode in podman.service.

@mheon
Copy link
Member

mheon commented Aug 11, 2020

This was a quite recent change - @vrothberg PTAL, I think your KillMode change may have broken things.

danopia added a commit to danopia/kube-pet-node that referenced this issue Aug 11, 2020
Without this flag, systemd will tear down our containers when stopping
the podman API server due to inactivity.

Upstream ticket: containers/podman#7294
@vrothberg
Copy link
Member

We changed to using the default killmode (i.e., cgroups) because of #7021 (Cc @martinpitt).

FWIW, I can reproduce on Fedora as well. The problem I think we're having is that conmons and containers are in a sub-cgroup and are hence killed when the service stops.

@mheon @baude, I guess that's something we need to fix before 2.0.5.

I suspect that #7021 happened in the user service because of Podman rexecing to join the user namespace? We don't really have a mechanism yet for Podman to either write it's PID or send it via sd notify.

What I think we should do, now as we have nice sd notify support, is to check in podman-system-service if we're running in a systemd service and then send READY and PID.

@giuseppe WDYT?

@giuseppe
Copy link
Member

If we specify KillMode=process then conmon will be targetted then it is a race between conmon being KILLed if it doesn't terminate before the container does. Instead with KillMode=cgroup all the processes inside of the container get the signal (do we even honor the SIGNAL configuration in this case)?

IMO we should use KillMode=none and let podman manage how to kill the container.

@mheon
Copy link
Member

mheon commented Aug 12, 2020

Implementing sd-notify for podman system service would seem to make sense, given it's expected to run systemd socket-activated and (as rootless) fork to join the userns.

@rhatdan
Copy link
Member

rhatdan commented Aug 12, 2020

Lets attempt the sd-notify.

vrothberg added a commit to vrothberg/libpod that referenced this issue Aug 13, 2020
Commit 2b6dd3f set the killmode of the podman.service to the
systemd default which ultimately lead to the problem that systemd
will kill *all* processes inside the unit's cgroup and hence kill
all containers whenever the service is stopped.

Fix it by setting the type to sdnotify and the killmode to process.
`podman system service` will send the necessary notify messages
when the NOTIFY_SOCKET is set and unset it right after to prevent
the backend and container runtimes from jumping in between and send
messages as well.

Fixes: containers#7294
Signed-off-by: Valentin Rothberg <[email protected]>
@edsantiago
Copy link
Member

The killing of pause is one symptom of this bug; here is another reproducer I found with much worse symptoms:

# (podman-remote run -d --name foo busybox sleep 60;sleep 11;echo here we go;podman-remote ps -a)|ts
Aug 13 13:16:33 01da3fa8a9e8e7ce0ac470f8ff5856676d4a6906837eb87b8b55d8118b48b342
Aug 13 13:16:44 here we go
Aug 13 13:17:33 CONTAINER ID  IMAGE                             COMMAND   CREATED             STATUS                             PORTS   NAMES
Aug 13 13:17:33 01da3fa8a9e8  docker.io/library/busybox:latest  sleep 60  About a minute ago  Exited (0) Less than a second ago          foo

Translation:

  • podman system service does die, as reported
  • the podman container does not necessarily die - I think there is something special about pause, perhaps in its signal handling, that causes it to exit.
  • once podman system service dies, no more podman-remote. ps hangs until the container exits. In this case it's just a few seconds, because it was a sleep 60, but in a long-running container podman ps would hang as long as I'd let it (many minutes).

As best I can tell, #7312 solves this behavior, but I'm still looking into it further. I am reporting this because others may experience this different symptom and search for "podman remote ps hang".

mheon pushed a commit to mheon/libpod that referenced this issue Aug 17, 2020
Commit 2b6dd3f set the killmode of the podman.service to the
systemd default which ultimately lead to the problem that systemd
will kill *all* processes inside the unit's cgroup and hence kill
all containers whenever the service is stopped.

Fix it by setting the type to sdnotify and the killmode to process.
`podman system service` will send the necessary notify messages
when the NOTIFY_SOCKET is set and unset it right after to prevent
the backend and container runtimes from jumping in between and send
messages as well.

Fixes: containers#7294
Signed-off-by: Valentin Rothberg <[email protected]>
mheon pushed a commit to mheon/libpod that referenced this issue Aug 20, 2020
Commit 2b6dd3f set the killmode of the podman.service to the
systemd default which ultimately lead to the problem that systemd
will kill *all* processes inside the unit's cgroup and hence kill
all containers whenever the service is stopped.

Fix it by setting the type to sdnotify and the killmode to process.
`podman system service` will send the necessary notify messages
when the NOTIFY_SOCKET is set and unset it right after to prevent
the backend and container runtimes from jumping in between and send
messages as well.

Fixes: containers#7294
Signed-off-by: Valentin Rothberg <[email protected]>
jhvst added a commit to jhvst/stateless-fcos that referenced this issue Apr 27, 2021
more about flaky cache:
- containers/podman#7021
- containers/podman#7294

using mv does not work across boots, running rm -rf seems too adventorous
@github-actions github-actions bot added the locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. label Sep 23, 2023
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Sep 23, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
kind/bug Categorizes issue or PR as related to a bug. locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants