Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

podman stats not working, returns cgroup deleted #9252

Closed
Kiritow opened this issue Feb 7, 2021 · 9 comments · Fixed by #9464
Closed

podman stats not working, returns cgroup deleted #9252

Kiritow opened this issue Feb 7, 2021 · 9 comments · Fixed by #9464
Assignees
Labels
kind/bug Categorizes issue or PR as related to a bug. locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments.

Comments

@Kiritow
Copy link
Contributor

Kiritow commented Feb 7, 2021

Is this a BUG REPORT or FEATURE REQUEST? (leave only one on its own line)

/kind bug

Description

podman stats not working. Using root, podman stats returns Error: unable to load cgroup at /machine.slice/machine-libpod_pod_8cbc0cae539eb254c64e9fb5a613b151ed364c351083af55f5437fa4661e217f.slice/libpod-3b221f7e8b39f1e85111d88e7e8e864f7acbca8a6b880504cbd34793a72e8997.scope/init.scope: cgroup deleted

Steps to reproduce the issue:

  1. Reboot the machine, and login as root

  2. Start all pods

  3. podman stats fails.

Describe the results you received:

podman stats returns Error: unable to load cgroup at /machine.slice/machine-libpod_pod_8cbc0cae539eb254c64e9fb5a613b151ed364c351083af55f5437fa4661e217f.slice/libpod-3b221f7e8b39f1e85111d88e7e8e864f7acbca8a6b880504cbd34793a72e8997.scope/init.scope: cgroup deleted

Describe the results you expected:

podman stats should work fine.

Additional information you deem important (e.g. issue happens only occasionally):

docker is not installed on this machine.

podman ps -a:

CONTAINER ID  IMAGE                          COMMAND               CREATED       STATUS                   PORTS                     NAMES               POD ID        PODNAME
8a182186806b  localhost/frpc:latest          lsp-factorio-mcsm...  11 hours ago  Up 11 hours ago          0.0.0.0:34197->34197/udp  vigilant_bhaskara   c18bba00e974  lspfactorio
ae57b8694457  localhost/frpc:latest          lsp-factorio-dire...  11 hours ago  Up 11 hours ago          0.0.0.0:34197->34197/udp  optimistic_cerf     c18bba00e974  lspfactorio
3dabaaf7933e  localhost/lsp-factorio:latest                        11 hours ago  Up 11 hours ago          0.0.0.0:34197->34197/udp  stoic_kapitsa       c18bba00e974  lspfactorio
3db5b8127aa9  k8s.gcr.io/pause:3.2                                 11 hours ago  Up 11 hours ago          0.0.0.0:34197->34197/udp  c18bba00e974-infra  c18bba00e974  lspfactorio
0fbd46cd35ba  localhost/frpc:latest          lsp-mcsm tcp 2333...  47 hours ago  Up 11 hours ago          0.0.0.0:25565->25565/tcp  zen_mendeleev       8cbc0cae539e  lspmc
3b221f7e8b39  localhost/lsp:latest                                 47 hours ago  Up 11 hours ago          0.0.0.0:25565->25565/tcp  lucid_albattani     8cbc0cae539e  lspmc
b430f6cf4644  localhost/frpclocal:latest     10.88.0.1 7000 ls...  3 weeks ago   Exited (1) 11 hours ago  0.0.0.0:25565->25565/tcp  competent_mccarthy  8cbc0cae539e  lspmc
587e702d6172  localhost/frpc:latest          lsp-mc-direct tcp...  3 weeks ago   Up 11 hours ago          0.0.0.0:25565->25565/tcp  quizzical_swanson   8cbc0cae539e  lspmc
a45de18ec2b4  k8s.gcr.io/pause:3.2                                 3 weeks ago   Up 11 hours ago          0.0.0.0:25565->25565/tcp  8cbc0cae539e-infra  8cbc0cae539e  lspmc

podman pod ls

POD ID        NAME         STATUS    CREATED       INFRA ID      # OF CONTAINERS
c18bba00e974  lspfactorio  Running   11 hours ago  3db5b8127aa9  4
8cbc0cae539e  lspmc        Degraded  3 weeks ago   a45de18ec2b4  5

uname -a

Linux liteserver-mc 5.4.0-65-generic #73-Ubuntu SMP Mon Jan 18 17:25:17 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux

lsb_release -a

Distributor ID: Ubuntu
Description:    Ubuntu 20.04.2 LTS
Release:        20.04
Codename:       focal

Output of podman version:

podman version is 2.2.1

Version:      2.2.1
API Version:  2.1.0
Go Version:   go1.15.2
Built:        Thu Jan  1 00:00:00 1970
OS/Arch:      linux/amd64

Output of podman info --debug:

host:
  arch: amd64
  buildahVersion: 1.18.0
  cgroupManager: systemd
  cgroupVersion: v1
  conmon:
    package: 'conmon: /usr/libexec/podman/conmon'
    path: /usr/libexec/podman/conmon
    version: 'conmon version 2.0.24, commit: '
  cpus: 8
  distribution:
    distribution: ubuntu
    version: "20.04"
  eventLogger: journald
  hostname: liteserver-mc
  idMappings:
    gidmap: null
    uidmap: null
  kernel: 5.4.0-65-generic
  linkmode: dynamic
  memFree: 16043335680
  memTotal: 25193005056
  ociRuntime:
    name: runc
    package: 'cri-o-runc: /usr/lib/cri-o-runc/sbin/runc'
    path: /usr/lib/cri-o-runc/sbin/runc
    version: 'runc version spec: 1.0.2-dev'
  os: linux
  remoteSocket:
    path: /run/podman/podman.sock
  rootless: false
  slirp4netns:
    executable: ""
    package: ""
    version: ""
  swapFree: 4294963200
  swapTotal: 4294963200
  uptime: 11h 6m 35.57s (Approximately 0.46 days)
registries:
  search:
  - docker.io
  - quay.io
store:
  configFile: /etc/containers/storage.conf
  containerStore:
    number: 9
    paused: 0
    running: 8
    stopped: 1
  graphDriverName: overlay
  graphOptions:
    overlay.mountopt: nodev
  graphRoot: /var/lib/containers/storage
  graphStatus:
    Backing Filesystem: extfs
    Native Overlay Diff: "true"
    Supports d_type: "true"
    Using metacopy: "false"
  imageStore:
    number: 22
  runRoot: /var/run/containers/storage
  volumePath: /var/lib/containers/storage/volumes
version:
  APIVersion: 2.1.0
  Built: 0
  BuiltTime: Thu Jan  1 00:00:00 1970
  GitCommit: ""
  GoVersion: go1.15.2
  OsArch: linux/amd64
  Version: 2.2.1

Package info (e.g. output of rpm -q podman or apt list podman):

Listing... Done
podman/unknown,now 2.2.1~4 amd64 [installed]
podman/unknown 2.2.1~4 arm64
podman/unknown 2.2.1~4 armhf
podman/unknown 2.2.1~4 s390x

Have you tested with the latest version of Podman and have you checked the Podman Troubleshooting Guide?

Yes

Additional environment details (AWS, VirtualBox, physical, etc.):

Virtualized with VMWare Workstation 16.

@openshift-ci-robot openshift-ci-robot added the kind/bug Categorizes issue or PR as related to a bug. label Feb 7, 2021
@rhatdan
Copy link
Member

rhatdan commented Feb 8, 2021

Could you try this on the main branch or podman 3.0.

We have made some changes to allow stats to work on systems that do not enable all of the cgroups stats attempts to read.

@rhatdan rhatdan self-assigned this Feb 8, 2021
@rhatdan
Copy link
Member

rhatdan commented Feb 8, 2021

I believe it is fixed in main branch and 3.0. Reopen if I am mistaken.

@rhatdan rhatdan closed this as completed Feb 8, 2021
@Kiritow
Copy link
Contributor Author

Kiritow commented Feb 16, 2021

Just upgrade and installed Podman 3.0.0 on Ubuntu 20.04.2 LTS. Ran into these two issues during upgrade:

actions/runner-images#2703

#8227

After upgraded to Podman 3.0, still can reproduce with steps above.

kiritow@ubuntu-podman:~$ sudo podman stats
Error: unable to load cgroup at /machine.slice/machine-libpod_pod_4f8e36a5b9cc577a1b81592496579396e503d8126205bfa233ed26d651dce944.slice/libpod-3a5d6b2a4ffcb29587ddfb5f5edf7fc7a9ba51e0755d796b81b589ceab858d83.scope/init.scope: cgroup deleted

(I'm upgrading and running this command on another virtual machine, because I can't stop the original server now)

Sorry @rhatdan but I can't reopen this issue. There's no 'reopen' button.

Output of podman version:

Version:      3.0.0
API Version:  3.0.0
Go Version:   go1.15.2
Built:        Thu Jan  1 00:00:00 1970
OS/Arch:      linux/amd64

Output of podman info --debug:

host:
  arch: amd64
  buildahVersion: 1.19.2
  cgroupManager: systemd
  cgroupVersion: v1
  conmon:
    package: 'conmon: /usr/libexec/podman/conmon'
    path: /usr/libexec/podman/conmon
    version: 'conmon version 2.0.26, commit: '
  cpus: 8
  distribution:
    distribution: ubuntu
    version: "20.04"
  eventLogger: journald
  hostname: ubuntu-podman
  idMappings:
    gidmap: null
    uidmap: null
  kernel: 5.4.0-65-generic
  linkmode: dynamic
  memFree: 14954733568
  memTotal: 16762474496
  ociRuntime:
    name: crun
    package: 'crun: /usr/bin/crun'
    path: /usr/bin/crun
    version: |-
      crun version 0.17.6-58ef-dirty
      commit: fd582c529489c0738e7039cbc036781d1d039014
      spec: 1.0.0
      +SYSTEMD +SELINUX +APPARMOR +CAP +SECCOMP +EBPF +YAJL
  os: linux
  remoteSocket:
    path: /run/podman/podman.sock
  security:
    apparmorEnabled: true
    capabilities: CAP_CHOWN,CAP_DAC_OVERRIDE,CAP_FOWNER,CAP_FSETID,CAP_KILL,CAP_NET_BIND_SERVICE,CAP_SETFCAP,CAP_SETGID,CAP_SETPCAP,CAP_SETUID,CAP_SYS_CHROOT
    rootless: false
    seccompEnabled: true
    selinuxEnabled: false
  slirp4netns:
    executable: ""
    package: ""
    version: ""
  swapFree: 4294963200
  swapTotal: 4294963200
  uptime: 19h 11m 19.74s (Approximately 0.79 days)
registries:
  192.168.0.72:7000:
    Blocked: false
    Insecure: true
    Location: 192.168.0.72:7000
    MirrorByDigestOnly: false
    Mirrors: []
    Prefix: 192.168.0.72:7000
  search:
  - docker.io
  - quay.io
store:
  configFile: /etc/containers/storage.conf
  containerStore:
    number: 8
    paused: 0
    running: 2
    stopped: 6
  graphDriverName: overlay
  graphOptions:
    overlay.mountopt: nodev,metacopy=on
  graphRoot: /var/lib/containers/storage
  graphStatus:
    Backing Filesystem: extfs
    Native Overlay Diff: "false"
    Supports d_type: "true"
    Using metacopy: "true"
  imageStore:
    number: 38
  runRoot: /var/run/containers/storage
  volumePath: /var/lib/containers/storage/volumes
version:
  APIVersion: 3.0.0
  Built: 0
  BuiltTime: Thu Jan  1 00:00:00 1970
  GitCommit: ""
  GoVersion: go1.15.2
  OsArch: linux/amd64
  Version: 3.0.0

Package info (e.g. output of rpm -q podman or apt list podman):

Listing... Done
podman/unknown,now 100:3.0.0-1 amd64 [installed]
podman/unknown 100:3.0.0-1 arm64
podman/unknown 100:3.0.0-1 armhf
podman/unknown 100:3.0.0-1 s390x

Additional environment details (AWS, VirtualBox, physical, etc.):
Virtualized with VMWare Workstation 16

@mheon mheon reopened this Feb 16, 2021
@Kiritow
Copy link
Contributor Author

Kiritow commented Feb 19, 2021

I've found a workaround, Ubuntu Server 20.04 LTS uses cgroup V1 by default. Add systemd.unified_cgroup_hierarchy=1 to kernel parameters and reboot, cgroup v2 will be used instead. And now podman stats works just fine. Though still don't know why systemd-enabled containers and cgroup v1 cause podman stats to fail.

Switching to cgroups v2

@rhatdan
Copy link
Member

rhatdan commented Feb 19, 2021

@giuseppe Ideas?

@giuseppe
Copy link
Member

can you show me the output for the following command (where CONTAINER_ID is the container that fails)?

# cat /proc/$(podman inspect --format '{{.State.Pid}}' $CONTAINER_ID)/cgroup

@Kiritow
Copy link
Contributor Author

Kiritow commented Feb 19, 2021

podman run -itd ubuntu-cn

root@ubuntu-podman:~# podman ps -ap
pCONTAINER ID  IMAGE                       COMMAND    CREATED        STATUS            PORTS   NAMES                POD ID  PODNAME
cd035dbbbc9b  localhost/ubuntu-cn:latest  /bin/bash  2 minutes ago  Up 2 minutes ago          festive_stonebraker
root@ubuntu-podman:~# podman stats
ID            NAME                 CPU %   MEM USAGE / LIMIT  MEM %   NET IO          BLOCK IO           PIDS
cd035dbbbc9b  festive_stonebraker  1.26%   6.136MB / 16.76GB  0.04%   908B / 2.924kB  4.252MB / 4.096kB  1

podman run -d mcsm-mc-base

root@ubuntu-podman:~# podman ps -ap
CONTAINER ID  IMAGE                          COMMAND    CREATED             STATUS                 PORTS   NAMES                POD ID  PODNAME
cd035dbbbc9b  localhost/ubuntu-cn:latest     /bin/bash  About a minute ago  Up About a minute ago          festive_stonebraker
ab2f97265857  localhost/mcsm-mc-base:latest             54 seconds ago      Up 54 seconds ago              stoic_dubinsky
root@ubuntu-podman:~# podman stats
Error: unable to load cgroup at /machine.slice/libpod-ab2f97265857230535ab5d1b79a33ead4c44ba773e0f1f874d0402dc89fae56d.scope/init.scope: cgroup deleted
root@ubuntu-podman:~# cat /proc/$(podman inspect --format '{{.State.Pid}}' ab2f97265857)/cgroup
12:freezer:/machine.slice/libpod-ab2f97265857230535ab5d1b79a33ead4c44ba773e0f1f874d0402dc89fae56d.scope
11:cpuset:/machine.slice/libpod-ab2f97265857230535ab5d1b79a33ead4c44ba773e0f1f874d0402dc89fae56d.scope
10:pids:/machine.slice/libpod-ab2f97265857230535ab5d1b79a33ead4c44ba773e0f1f874d0402dc89fae56d.scope
9:memory:/machine.slice/libpod-ab2f97265857230535ab5d1b79a33ead4c44ba773e0f1f874d0402dc89fae56d.scope
8:perf_event:/machine.slice/libpod-ab2f97265857230535ab5d1b79a33ead4c44ba773e0f1f874d0402dc89fae56d.scope
7:rdma:/machine.slice/libpod-ab2f97265857230535ab5d1b79a33ead4c44ba773e0f1f874d0402dc89fae56d.scope
6:net_cls,net_prio:/machine.slice/libpod-ab2f97265857230535ab5d1b79a33ead4c44ba773e0f1f874d0402dc89fae56d.scope
5:hugetlb:/machine.slice/libpod-ab2f97265857230535ab5d1b79a33ead4c44ba773e0f1f874d0402dc89fae56d.scope
4:blkio:/machine.slice/libpod-ab2f97265857230535ab5d1b79a33ead4c44ba773e0f1f874d0402dc89fae56d.scope
3:cpu,cpuacct:/machine.slice/libpod-ab2f97265857230535ab5d1b79a33ead4c44ba773e0f1f874d0402dc89fae56d.scope
2:devices:/machine.slice/libpod-ab2f97265857230535ab5d1b79a33ead4c44ba773e0f1f874d0402dc89fae56d.scope
1:name=systemd:/machine.slice/libpod-ab2f97265857230535ab5d1b79a33ead4c44ba773e0f1f874d0402dc89fae56d.scope/init.scope
0::/machine.slice/libpod-ab2f97265857230535ab5d1b79a33ead4c44ba773e0f1f874d0402dc89fae56d.scope

@Kiritow
Copy link
Contributor Author

Kiritow commented Feb 20, 2021

After reading source code [pkg/cgroups/cgroups.go], [libpod/stats.go], [libpod/container.go], now I've found that [pkg/cgroups/cgroups.go:334] will stat the following directories:

/sys/fs/cgroup/cpu/machine.slice/libpod-conmon-8b63c6e2cb7b4039892a97704cf2cf0fc6f57d7cfd8a6ce8ad8330a39024f107.scope
/sys/fs/cgroup/cpuacct/machine.slice/libpod-conmon-8b63c6e2cb7b4039892a97704cf2cf0fc6f57d7cfd8a6ce8ad8330a39024f107.scope
/sys/fs/cgroup/cpuset/machine.slice/libpod-conmon-8b63c6e2cb7b4039892a97704cf2cf0fc6f57d7cfd8a6ce8ad8330a39024f107.scope
/sys/fs/cgroup/memory/machine.slice/libpod-conmon-8b63c6e2cb7b4039892a97704cf2cf0fc6f57d7cfd8a6ce8ad8330a39024f107.scope
/sys/fs/cgroup/pids/machine.slice/libpod-conmon-8b63c6e2cb7b4039892a97704cf2cf0fc6f57d7cfd8a6ce8ad8330a39024f107.scope
/sys/fs/cgroup/blkio/machine.slice/libpod-conmon-8b63c6e2cb7b4039892a97704cf2cf0fc6f57d7cfd8a6ce8ad8330a39024f107.scope

I've manually checked them all, only the cpuset one is missing:

root@ubuntu-podman:/proc/1733# cd /sys/fs/cgroup/cpuset/machine.slice/libpod-conmon-8b63c6e2cb7b4039892a97704cf2cf0fc6f57d7cfd8a6ce8ad8330a39024f107.scope
bash: cd: /sys/fs/cgroup/cpuset/machine.slice/libpod-conmon-8b63c6e2cb7b4039892a97704cf2cf0fc6f57d7cfd8a6ce8ad8330a39024f107.scope: No such file or directory

From [libpod/container.go:957] and #8397 I get /proc/1733/cgroup will be read, and here is its content. Seems cpuset controller is not enabled or not present?

12:freezer:/
11:blkio:/machine.slice/libpod-conmon-8b63c6e2cb7b4039892a97704cf2cf0fc6f57d7cfd8a6ce8ad8330a39024f107.scope
10:memory:/machine.slice/libpod-conmon-8b63c6e2cb7b4039892a97704cf2cf0fc6f57d7cfd8a6ce8ad8330a39024f107.scope
9:hugetlb:/
8:devices:/machine.slice/libpod-conmon-8b63c6e2cb7b4039892a97704cf2cf0fc6f57d7cfd8a6ce8ad8330a39024f107.scope
7:cpuset:/
6:rdma:/
5:perf_event:/
4:pids:/machine.slice/libpod-conmon-8b63c6e2cb7b4039892a97704cf2cf0fc6f57d7cfd8a6ce8ad8330a39024f107.scope
3:cpu,cpuacct:/machine.slice/libpod-conmon-8b63c6e2cb7b4039892a97704cf2cf0fc6f57d7cfd8a6ce8ad8330a39024f107.scope
2:net_cls,net_prio:/
1:name=systemd:/machine.slice/libpod-conmon-8b63c6e2cb7b4039892a97704cf2cf0fc6f57d7cfd8a6ce8ad8330a39024f107.scope
0::/machine.slice/libpod-conmon-8b63c6e2cb7b4039892a97704cf2cf0fc6f57d7cfd8a6ce8ad8330a39024f107.scope

@giuseppe
Copy link
Member

thanks! PR here: #9464

giuseppe added a commit to giuseppe/libpod that referenced this issue Feb 22, 2021
do not raise an error if the cgroup exists at least on one
controller.

Previously it expected the cgroup to exists under all the
controllers.

[NO TESTS NEEDED]

Closes: containers#9252

Signed-off-by: Giuseppe Scrivano <[email protected]>
@github-actions github-actions bot added the locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. label Sep 22, 2023
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Sep 22, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
kind/bug Categorizes issue or PR as related to a bug. locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants