Containers started using socket-activated APIv2 die from systemd activation timeout #7294

danopia · 2020-08-11T15:50:22Z

/kind bug

Description

When creating and starting containers specifically via the /run/podman/podman.sock endpoint, the containers are stopped when podman.service gets shut down due to socket activation timeout. This manifests as containers stopping 10s after the last APIv2 request is made. The shutdown is clean; podman ps -a shows exit code 0 for them.

Opening an APIv2 events stream or issuing APIv2 requests every ~5 seconds causes the containers to stay alive longer; but if those requests stop, the containers still stop then.

Steps to reproduce the issue:

Minimal case assuming you have podman's systemd units installed already:

Check you have the socket unit ready: systemctl enable --now podman.socket
Create a sleep container:
1. curl --unix-socket /run/podman/podman.sock -XPOST http://d/v1.0.0/libpod/images/pull'?reference=k8s.gcr.io/pause:3.2'
2. curl --unix-socket /run/podman/podman.sock -XPOST http://d/v1.0.0/libpod/containers/create --header 'content-type: application/json' --data '{"image":"k8s.gcr.io/pause:3.2","name":"bg-test"}'
Start watching events: podman events &
Start the container: curl --unix-socket /run/podman/podman.sock -XPOST http://d/v1.0.0/libpod/containers/bg-test/start

Describe the results you received:

2020-08-11 14:52:21.101145183 +0000 UTC container init 1d1018fa7c028dd9341df4fd26d3b9d04a719ca4b4913da872f9c6d408a831e5 (image=k8s.gcr.io/pause:3.2, name=bg-test)
2020-08-11 14:52:21.114121047 +0000 UTC container start 1d1018fa7c028dd9341df4fd26d3b9d04a719ca4b4913da872f9c6d408a831e5 (image=k8s.gcr.io/pause:3.2, name=bg-test)
[ ... 10 seconds pass ... ]
2020-08-11 14:52:31.215222381 +0000 UTC container died 1d1018fa7c028dd9341df4fd26d3b9d04a719ca4b4913da872f9c6d408a831e5 (image=k8s.gcr.io/pause:3.2, name=bg-test)
2020-08-11 14:52:31.247591086 +0000 UTC container cleanup 1d1018fa7c028dd9341df4fd26d3b9d04a719ca4b4913da872f9c6d408a831e5 (image=k8s.gcr.io/pause:3.2, name=bg-test)

# podman ps -a
CONTAINER ID  IMAGE                 COMMAND  CREATED         STATUS                    PORTS   NAMES
1d1018fa7c02  k8s.gcr.io/pause:3.2           31 seconds ago  Exited (0) 18 seconds ago         bg-test

Describe the results you expected:

2020-08-11 14:52:21.101145183 +0000 UTC container init 1d1018fa7c028dd9341df4fd26d3b9d04a719ca4b4913da872f9c6d408a831e5 (image=k8s.gcr.io/pause:3.2, name=bg-test)
2020-08-11 14:52:21.114121047 +0000 UTC container start 1d1018fa7c028dd9341df4fd26d3b9d04a719ca4b4913da872f9c6d408a831e5 (image=k8s.gcr.io/pause:3.2, name=bg-test)
[ ... nothing further ]

# podman ps -a
CONTAINER ID  IMAGE                 COMMAND  CREATED        STATUS            PORTS   NAMES
1d1018fa7c02  k8s.gcr.io/pause:3.2           2 minutes ago  Up 2 minutes ago          bg-test

Alternative repro within docker:

I also constructed a self-contained repro within docker. This isn't too minimal, but should show the problem on any machine that has Docker available (I'm not sure how relevant the Linux distro is in this issue). You can just copy the whole block and it should show the container dying during the final sleep call.

# bring up a systemd container
docker run --name test -d --privileged --rm --tmpfs /tmp --tmpfs /run --tmpfs /run/lock -v /sys/fs/cgroup:/sys/fs/cgroup:ro jrei/systemd-debian:sid
# install podman (need to ask for runc explicitly because apt apparently prefers crun due to alphabetic order, while podman prefers runc)
docker exec test sh -euxc "apt update; apt install -y podman runc curl | grep podman; systemctl start podman.socket"
# create a sleep container
docker exec test curl --unix-socket /run/podman/podman.sock 'http://d/v1.0.0/libpod/images/pull?reference=k8s.gcr.io/pause:3.2' -XPOST
docker exec test curl --unix-socket /run/podman/podman.sock http://d/v1.0.0/libpod/containers/create -XPOST --header 'content-type: application/json' --data '{"image":"k8s.gcr.io/pause:3.2","name":"bg-test","netns":{"nsmode":"host"}}'
# watch event stream & start the container 
docker exec test podman events &
docker exec test curl --unix-socket /run/podman/podman.sock http://d/v1.0.0/libpod/containers/bg-test/start -XPOST
# ... observe the container stop after 10s
sleep 15s
# cleanup our repro
docker rm -f test

I also tried to repro on the fedora:32 image but fedora's podman wouldn't run within docker, said something about overlay-on-overlay. I didn't debug further.

Additional information you deem important (e.g. issue happens only occasionally):

Pod infra containers don't seem affected, when the /pods/.../start API is used.

I wasn't able to repro when podman system service is run manually from an interactive session. I've only seen this happen specifically when the podman daemon's lifecycle is managed by systemd.

Output of podman version:

Version:      2.0.4
API Version:  1
Go Version:   go1.14
Built:        Wed Dec 31 16:00:00 1969
OS/Arch:      linux/amd64

Output of `podman info --debug`:

host:
  arch: amd64
  buildahVersion: 1.15.0
  cgroupVersion: v1
  conmon:
    package: 'conmon: /usr/libexec/podman/conmon'
    path: /usr/libexec/podman/conmon
    version: 'conmon version 2.0.18, commit: '
  cpus: 4
  distribution:
    distribution: debian
    version: "10"
  eventLogger: file
  hostname: redapu
  idMappings:
    gidmap: null
    uidmap: null
  kernel: 4.19.0-9-amd64
  linkmode: dynamic
  memFree: 871215104
  memTotal: 4121145344
  ociRuntime:
    name: runc
    package: 'runc: /usr/sbin/runc'
    path: /usr/sbin/runc
    version: |-
      runc version 1.0.0~rc6+dfsg1
      commit: 1.0.0~rc6+dfsg1-3
      spec: 1.0.1
  os: linux
  remoteSocket:
    path: /run/podman/podman.sock
  rootless: false
  slirp4netns:
    executable: ""
    package: ""
    version: ""
  swapFree: 0
  swapTotal: 0
  uptime: 714h 51m 22.19s (Approximately 29.75 days)
registries:
  search:
  - docker.io
  - registry.fedoraproject.org
  - registry.access.redhat.com
store:
  configFile: /etc/containers/storage.conf
  containerStore:
    number: 2
    paused: 0
    running: 2
    stopped: 0
  graphDriverName: overlay
  graphOptions: {}
  graphRoot: /var/lib/containers/storage
  graphStatus:
    Backing Filesystem: extfs
    Native Overlay Diff: "true"
    Supports d_type: "true"
    Using metacopy: "false"
  imageStore:
    number: 6
  runRoot: /var/run/containers/storage
  volumePath: /var/lib/containers/storage/volumes
version:
  APIVersion: 1
  Built: 0
  BuiltTime: Wed Dec 31 16:00:00 1969
  GitCommit: ""
  GoVersion: go1.14
  OsArch: linux/amd64
  Version: 2.0.4

Package info (e.g. output of rpm -q podman or apt list podman):

podman/unknown,now 2.0.4~1 amd64 [installed]

Additional environment details (AWS, VirtualBox, physical, etc.):

Discovered on Debian 10 on bare metal, rootful podman 2.0.4.
Repro'd further using a Debian sid image via docker[-in-containerd], rootful podman 2.0.4

The text was updated successfully, but these errors were encountered:

mheon · 2020-08-11T16:34:40Z

@baude @jwhonce @vrothberg PTAL

I don't think we've seen this elsewhere in our testing, which is making me think this is Debian-specific. Does this happen as rootless, root, or both?

Suspicion: systemd is blowing away all containers when it throttles down due to KillMode on the unit file. I'd expect that moving the containers out of the unit's cgroup would fix this for root containers, but I could be wrong.

jwhonce · 2020-08-11T16:42:33Z

@mheon I have not seen this on fedora or rhel using systemd.

danopia · 2020-08-11T17:09:08Z

I'd believe that this is debian specific. I have not had a chance to try rootless yet, will see if I can get a repro.

I just tried adding KillMode=process to the podman units but it didn't seem to change anything. Not sure if I applied it right. I'm not too handy with systemd.

Separately, I noticed that systemd is not too happy about the podman shutdown in general. Here's several systemctl status podman.service snapshots while running that docker script above:

While unix socket is active and in use

* podman.service - Podman API Service
     Loaded: loaded (/lib/systemd/system/podman.service; enabled; vendor preset: enabled)
     Active: active (running) since Tue 2020-08-11 16:56:44 UTC; 8s ago
TriggeredBy: * podman.socket
       Docs: man:podman-system-service(1)
   Main PID: 3859 (podman)
      Tasks: 11 (limit: 8924)
     Memory: 10.9M
     CGroup: /kubepods/burstable/podb8a75f5d-8b75-4e4a-9604-e127f52c0ae5/31baac9f33dab30776c93283f418471af3b5ba3d8ddc1c6d14bd8654b657497f/kubepods/burstable/podb8a75f5d-8b75-4e4a-9604-e127f52c0ae5/31baac9f33dab30776c93283f418471af3b5ba3d8ddc1c6d14bd8654b657497f/docker/fa6977c78428cb8fd7c9a610f13730c143d651987c0a0e6ea95f314b23e2ebc4/system.slice/podman.service
             |-3859 /usr/bin/podman system service
             `-3886 /usr/libexec/podman/conmon --api-version 1 -c acb61eb4a3d7b3f509a88bfcee7f3bea85d6f7f6d3b78c86a327f1cfce295049 -u acb61eb4a3d7b3f509a88bfcee7f3bea85d6f7f6d3b78c86a327f1cfce295049 -r /usr/bin/runc -b /var/lib/containers/storage/vfs-containers/acb61eb4a3d7b3f509a88bfcee7f3bea85d6f7f6d3b78c86a327f1cfce295049/userdata -p /var/run/containers/storage/vfs-containers/acb61eb4a3d7b3f509a88bfcee7f3bea85d6f7f6d3b78c86a327f1cfce295049/userdata/pidfile -n bg-test --exit-dir /var/run/libpod/exits --socket-dir-path /var/run/libpod/socket -s -l k8s-file:/var/lib/containers/storage/vfs-containers/acb61eb4a3d7b3f509a88bfcee7f3bea85d6f7f6d3b78c86a327f1cfce295049/userdata/ctr.log --log-level error --runtime-arg --log-format=json --runtime-arg --log --runtime-arg=/var/run/containers/storage/vfs-containers/acb61eb4a3d7b3f509a88bfcee7f3bea85d6f7f6d3b78c86a327f1cfce295049/userdata/oci-log --conmon-pidfile /var/run/containers/storage/vfs-containers/acb61eb4a3d7b3f509a88bfcee7f3bea85d6f7f6d3b78c86a327f1cfce295049/userdata/conmon.pid --exit-command /usr/bin/podman --exit-command-arg --root --exit-command-arg /var/lib/containers/storage --exit-command-arg --runroot --exit-command-arg /var/run/containers/storage --exit-command-arg --log-level --exit-command-arg error --exit-command-arg --cgroup-manager --exit-command-arg systemd --exit-command-arg --tmpdir --exit-command-arg /var/run/libpod --exit-command-arg --runtime --exit-command-arg runc --exit-command-arg --events-backend --exit-command-arg file --exit-command-arg container --exit-command-arg cleanup --exit-command-arg acb61eb4a3d7b3f509a88bfcee7f3bea85d6f7f6d3b78c86a327f1cfce295049

Aug 11 16:56:44 fa6977c78428 systemd[1]: Started Podman API Service.

Just after the timeout, after containers already died; unit is 'deactivating' for some time

* podman.service - Podman API Service
     Loaded: loaded (/lib/systemd/system/podman.service; enabled; vendor preset: enabled)
     Active: deactivating (stop-sigterm) since Tue 2020-08-11 16:56:54 UTC; 15s ago
TriggeredBy: * podman.socket
       Docs: man:podman-system-service(1)
    Process: 3859 ExecStart=/usr/bin/podman system service (code=exited, status=0/SUCCESS)
   Main PID: 3859 (code=exited, status=0/SUCCESS)
      Tasks: 0 (limit: 8924)
     Memory: 1.1M
     CGroup: /kubepods/burstable/podb8a75f5d-8b75-4e4a-9604-e127f52c0ae5/31baac9f33dab30776c93283f418471af3b5ba3d8ddc1c6d14bd8654b657497f/kubepods/burstable/podb8a75f5d-8b75-4e4a-9604-e127f52c0ae5/31baac9f33dab30776c93283f418471af3b5ba3d8ddc1c6d14bd8654b657497f/docker/fa6977c78428cb8fd7c9a610f13730c143d651987c0a0e6ea95f314b23e2ebc4/system.slice/podman.service

Aug 11 16:56:44 fa6977c78428 systemd[1]: Started Podman API Service.

Minutes later: unit is failed

* podman.service - Podman API Service
     Loaded: loaded (/lib/systemd/system/podman.service; enabled; vendor preset: enabled)
     Active: failed (Result: timeout) since Tue 2020-08-11 16:58:25 UTC; 1min 19s ago
TriggeredBy: * podman.socket
       Docs: man:podman-system-service(1)
    Process: 3859 ExecStart=/usr/bin/podman system service (code=exited, status=0/SUCCESS)
   Main PID: 3859 (code=exited, status=0/SUCCESS)

Aug 11 16:56:44 fa6977c78428 systemd[1]: Started Podman API Service.
Aug 11 16:58:25 fa6977c78428 systemd[1]: podman.service: State 'stop-sigterm' timed out. Killing.
Aug 11 16:58:25 fa6977c78428 systemd[1]: podman.service: Failed with result 'timeout'.

On any further request the unit recovers back to active and services the request. Not sure how relevant that is.. in fact I see podman system service not obeying ^C when when interactively, so probably not directly relevant to this bug report.

So, possibly some missed config in the debian packaging? I've repro'd using packages from both kubic and debian-testing.

mheon · 2020-08-11T17:26:39Z

Can you try KillMode=none instead?

danopia · 2020-08-11T17:38:58Z

Ah crap, I patched the user unit instead of the system unit.

Adding KillMode=process to /lib/systemd/system/podman.service is a successful workaround in my docker repro.

Seems like nothing unintentional leaked?

# docker exec test ps aux
USER         PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root           1  0.2  0.1  24016 10700 ?        Ss   17:30   0:00 /lib/systemd/systemd --system --deserialize 14
message+     129  0.0  0.0   6932  4252 ?        Ss   17:30   0:00 /usr/bin/dbus-daemon --system --address=systemd: --nofork --nopidfile --systemd-activation --syslog-only
root         131  0.1  0.0  16580  6700 ?        Ss   17:30   0:00 /lib/systemd/systemd-logind
root         321  0.1  0.1  54380 11608 ?        Ss   17:30   0:00 /lib/systemd/systemd-journald
root        4144  0.0  0.0  81156  1960 ?        Ssl  17:33   0:00 /usr/libexec/podman/conmon --api-version 1 -c 1dca36ce88bad40ff34ee36042be6ccd9185389142fe3e040f76459a0a2c3331 -u 1dca36ce88bad40ff34ee36042be6ccd9185389142fe3e040f76459a0a2c3331 -r /usr/bin/runc -b /var/lib/containers/storage/vfs-containers/1dca36ce88bad40ff34ee36042be6ccd9185389142fe3e040f76459a0a2c3331/userdata -p /var/run/containers/storage/vfs-containers/1dca36ce88bad40ff34ee36042be6ccd9185389142fe3e040f76459a0a2c3331/userdata/pidfile -n bg-test --exit-dir /var/run/libpod/exits --socket-dir-path /var/run/libpod/socket -s -l k8s-file:/var/lib/containers/storage/vfs-containers/1dca36ce88bad40ff34ee36042be6ccd9185389142fe3e040f76459a0a2c3331/userdata/ctr.log --log-level error --runtime-arg --log-format=json --runtime-arg --log --runtime-arg=/var/run/containers/storage/vfs-containers/1dca36ce88bad40ff34ee36042be6ccd9185389142fe3e040f76459a0a2c3331/userdata/oci-log --conmon-pidfile /var/run/containers/storage/vfs-containers/1dca36ce88bad40ff34ee36042be6ccd9185389142fe3e040f76459a0a2c3331/userdata/conmon.pid --exit-command /usr/bin/podman --exit-command-arg --root --exit-command-arg /var/lib/containers/storage --exit-command-arg --runroot --exit-command-arg /var/run/containers/storage --exit-command-arg --log-level --exit-command-arg error --exit-command-arg --cgroup-manager --exit-command-arg systemd --exit-command-arg --tmpdir --exit-command-arg /var/run/libpod --exit-command-arg --runtime --exit-command-arg runc --exit-command-arg --events-backend --exit-command-arg file --exit-command-arg container --exit-command-arg cleanup --exit-command-arg 1dca36ce88bad40ff34ee36042be6ccd9185389142fe3e040f76459a0a2c3331
root        4156  0.2  0.0    968     4 ?        Ss   17:33   0:00 /pause
root        4276  0.0  0.0   6624  2892 ?        Rs   17:35   0:00 ps aux

systemd noted what leaked, too:

Aug 11 17:33:31 8b373e600677 systemd[1]: podman.service: Unit process 4144 (conmon) remains running after unit stopped.

Do we have an explanation for why this would be debian specific? I do not see KillMode in podman.service.

mheon · 2020-08-11T17:40:33Z

This was a quite recent change - @vrothberg PTAL, I think your KillMode change may have broken things.

Without this flag, systemd will tear down our containers when stopping the podman API server due to inactivity. Upstream ticket: containers/podman#7294

vrothberg · 2020-08-12T07:57:29Z

We changed to using the default killmode (i.e., cgroups) because of #7021 (Cc @martinpitt).

FWIW, I can reproduce on Fedora as well. The problem I think we're having is that conmons and containers are in a sub-cgroup and are hence killed when the service stops.

@mheon @baude, I guess that's something we need to fix before 2.0.5.

I suspect that #7021 happened in the user service because of Podman rexecing to join the user namespace? We don't really have a mechanism yet for Podman to either write it's PID or send it via sd notify.

What I think we should do, now as we have nice sd notify support, is to check in podman-system-service if we're running in a systemd service and then send READY and PID.

@giuseppe WDYT?

giuseppe · 2020-08-12T08:08:54Z

If we specify KillMode=process then conmon will be targetted then it is a race between conmon being KILLed if it doesn't terminate before the container does. Instead with KillMode=cgroup all the processes inside of the container get the signal (do we even honor the SIGNAL configuration in this case)?

IMO we should use KillMode=none and let podman manage how to kill the container.

mheon · 2020-08-12T13:30:10Z

Implementing sd-notify for podman system service would seem to make sense, given it's expected to run systemd socket-activated and (as rootless) fork to join the userns.

rhatdan · 2020-08-12T15:34:51Z

Lets attempt the sd-notify.

Commit 2b6dd3f set the killmode of the podman.service to the systemd default which ultimately lead to the problem that systemd will kill *all* processes inside the unit's cgroup and hence kill all containers whenever the service is stopped. Fix it by setting the type to sdnotify and the killmode to process. `podman system service` will send the necessary notify messages when the NOTIFY_SOCKET is set and unset it right after to prevent the backend and container runtimes from jumping in between and send messages as well. Fixes: containers#7294 Signed-off-by: Valentin Rothberg <[email protected]>

edsantiago · 2020-08-13T17:22:29Z

The killing of pause is one symptom of this bug; here is another reproducer I found with much worse symptoms:

# (podman-remote run -d --name foo busybox sleep 60;sleep 11;echo here we go;podman-remote ps -a)|ts
Aug 13 13:16:33 01da3fa8a9e8e7ce0ac470f8ff5856676d4a6906837eb87b8b55d8118b48b342
Aug 13 13:16:44 here we go
Aug 13 13:17:33 CONTAINER ID  IMAGE                             COMMAND   CREATED             STATUS                             PORTS   NAMES
Aug 13 13:17:33 01da3fa8a9e8  docker.io/library/busybox:latest  sleep 60  About a minute ago  Exited (0) Less than a second ago          foo

Translation:

podman system service does die, as reported
the podman container does not necessarily die - I think there is something special about pause, perhaps in its signal handling, that causes it to exit.
once podman system service dies, no more podman-remote. ps hangs until the container exits. In this case it's just a few seconds, because it was a sleep 60, but in a long-running container podman ps would hang as long as I'd let it (many minutes).

As best I can tell, #7312 solves this behavior, but I'm still looking into it further. I am reporting this because others may experience this different symptom and search for "podman remote ps hang".

Commit 2b6dd3f set the killmode of the podman.service to the systemd default which ultimately lead to the problem that systemd will kill *all* processes inside the unit's cgroup and hence kill all containers whenever the service is stopped. Fix it by setting the type to sdnotify and the killmode to process. `podman system service` will send the necessary notify messages when the NOTIFY_SOCKET is set and unset it right after to prevent the backend and container runtimes from jumping in between and send messages as well. Fixes: containers#7294 Signed-off-by: Valentin Rothberg <[email protected]>

more about flaky cache: - containers/podman#7021 - containers/podman#7294 using mv does not work across boots, running rm -rf seems too adventorous

openshift-ci-robot added the kind/bug Categorizes issue or PR as related to a bug. label Aug 11, 2020

danopia added a commit to danopia/kube-pet-node that referenced this issue Aug 11, 2020

Add KillMode=process to kube-podman.service

1ed8303

Without this flag, systemd will tear down our containers when stopping the podman API server due to inactivity. Upstream ticket: containers/podman#7294

vrothberg mentioned this issue Aug 13, 2020

podman.service: use sdnotify #7312

Merged

openshift-merge-robot closed this as completed in #7312 Aug 13, 2020

github-actions bot added the locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. label Sep 23, 2023

github-actions bot locked as resolved and limited conversation to collaborators Sep 23, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Containers started using socket-activated APIv2 die from systemd activation timeout #7294

Containers started using socket-activated APIv2 die from systemd activation timeout #7294

danopia commented Aug 11, 2020 •

edited

Loading

mheon commented Aug 11, 2020

jwhonce commented Aug 11, 2020

danopia commented Aug 11, 2020

mheon commented Aug 11, 2020

danopia commented Aug 11, 2020

mheon commented Aug 11, 2020

vrothberg commented Aug 12, 2020

giuseppe commented Aug 12, 2020

mheon commented Aug 12, 2020

rhatdan commented Aug 12, 2020

edsantiago commented Aug 13, 2020

Containers started using socket-activated APIv2 die from systemd activation timeout #7294

Containers started using socket-activated APIv2 die from systemd activation timeout #7294

Comments

danopia commented Aug 11, 2020 • edited Loading

mheon commented Aug 11, 2020

jwhonce commented Aug 11, 2020

danopia commented Aug 11, 2020

mheon commented Aug 11, 2020

danopia commented Aug 11, 2020

mheon commented Aug 11, 2020

vrothberg commented Aug 12, 2020

giuseppe commented Aug 12, 2020

mheon commented Aug 12, 2020

rhatdan commented Aug 12, 2020

edsantiago commented Aug 13, 2020

danopia commented Aug 11, 2020 •

edited

Loading