Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Podman's vs Docker's approach to container id inspection in cgroup v2 environments #14236

Closed
skepticoitusInteruptus opened this issue May 13, 2022 · 12 comments · Fixed by #14308
Labels
kind/bug Categorizes issue or PR as related to a bug. locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments.

Comments

@skepticoitusInteruptus
Copy link

Is this a BUG REPORT or FEATURE REQUEST?

/kind bug

Description

My spike:

  • Compare the behavior of Docker vs Podman running in a cgroup v2-configured Alpine distro on top of a WLS2 kernel configured for cgroup v2 support.

My tests:

podman run -itd -m 6m docker.io/ibmcom/helloworld

podman stats --no-stream

Steps to reproduce the issue:

  1. Install MS' WSL2 in Windows 10
  2. Install @yuk7's Alpine 3.15
  3. Configure %USERPROFILE%\.wlsconfig to support cgroup v2:1
[wsl2]
kernelCommandLine=cgroup_no_v1=all cgroup_memory=1 cgroup_enable=memory swapaccount=1
  1. In a PowerShell terminal, do:
wsl -s Alpine
wsl -u root 
  1. In an Alpine terminal, do:
sed -i 's/v3.15/edge/g' /etc/apk/repositories
apk update
apk --no-cache --upgrade -U add docker podman
  1. Switch the stock Alpine 3.15's default cgroup filesystem from it's original v1 support, to v2:
umount /sys/fs/cgroup
mount -t cgroup2 cgroup2 /sys/fs/cgroup -o rw,nosuid,nodev,noexec,relatime,nsdelegate
  1. Observe that the v2 controllers survived the switch to the cgroup2 sysfs:
cat /sys/fs/cgroup/cgroup.controllers
cpuset cpu io memory hugetlb pids rdma
  1. Enable those same v2 controllers:
echo "+memory +io +pids +cpuset +hugetlb +rdma" > /sys/fs/cgroup/cgroup.subtree_control
  1. Observe that the cgroup2 controllers are enabled:
cat /sys/fs/cgroup/cgroup.subtree_control
cpuset cpu io memory hugetlb pids rdma
  1. Observe that podman runs successfully without the -m option:
podman run -itd docker.io/ibmcom/helloworld
8003581f61d95a3d112c54eb6baa888570935809833fa0c02178033f3eff25de
  1. Observe that podman chokes with the -m option:
podman run -itd -m 6m docker.io/ibmcom/helloworld
Error: could not find cgroup mount in "/proc/self/cgroup"

Describe the results you received:

Error: could not find cgroup mount in "/proc/self/cgroup"

Describe the results you expected:

CONTAINER ID   NAME               CPU %     MEM USAGE / LIMIT   MEM %     NET I/O     BLOCK I/O    PIDS
8003581f61d9   some_goofy_name   0.00%     3.086MiB / 6MiB     51.43%    876B / 0B   197kB / 0B   6

Additional information you deem important:

  • My Alpine distro is not inited by systemd2
  • My spike is not for a rootless container use case
  • My issue's title is based on what I'm guessing might be a related(?) issue I discovered from a search on "/proc/self/cgroup"
  • These analogous Docker commands behave as expected:
docker info
...
 Cgroup Driver: cgroupfs
 Cgroup Version: 2
...
 Security Options:
  ...
  cgroupns
...
WARNING: No swap limit support
...

docker run -itd -m 6m docker.io/ibmcom/helloworld

docker stats --no-stream
CONTAINER ID   NAME               CPU %     MEM USAGE / LIMIT   MEM %     NET I/O     BLOCK I/O    PIDS
8003581f61d9   some_goofy_name   0.00%     3.086MiB / 6MiB     51.43%    876B / 0B   197kB / 0B   6

Output of podman version:

podman version 4.1.0

Output of podman info --debug:

host:
  arch: amd64
  buildahVersion: 1.26.1
  cgroupControllers:
  - cpuset
  - cpu
  - io
  - memory
  - hugetlb
  - pids
  - rdma
  cgroupManager: cgroupfs
  cgroupVersion: v2
  conmon:
    package: conmon-2.1.0-r1
    path: /usr/bin/conmon
    version: 'conmon version 2.1.0, commit: ad24dda9f2b11fd974e510713e0923f810ea19c6'
  cpuUtilization:
    idlePercent: 99.87
    systemPercent: 0.1
    userPercent: 0.03
  cpus: 4
  distribution:
    distribution: alpine
    version: 3.15.0
  eventLogger: file
  hostname: ***************
  idMappings:
    gidmap: null
    uidmap: null
  kernel: 5.10.102.1-microsoft-standard-WSL2
  linkmode: dynamic
  logDriver: k8s-file
  memFree: 12537532416
  memTotal: 12926758912
  networkBackend: netavark
  ociRuntime:
    name: crun
    package: crun-1.4.5-r0
    path: /usr/bin/crun
    version: |-
      crun version 1.4.5
      commit: c381048530aa750495cf502ddb7181f2ded5b400
      spec: 1.0.0
      +SELINUX +APPARMOR +CAP +SECCOMP +EBPF +YAJL
  os: linux
  remoteSocket:
    path: /run/podman/podman.sock
  security:
    apparmorEnabled: false
    capabilities: CAP_CHOWN,CAP_DAC_OVERRIDE,CAP_FOWNER,CAP_FSETID,CAP_KILL,CAP_NET_BIND_SERVICE,CAP_SETFCAP,CAP_SETGID,CAP_SETPCAP,CAP_SETUID,CAP_SYS_CHROOT
    rootless: false
    seccompEnabled: true
    seccompProfilePath: /etc/containers/seccomp.json
    selinuxEnabled: false
  serviceIsRemote: false
  slirp4netns:
    executable: /usr/bin/slirp4netns
    package: slirp4netns-1.2.0-r0
    version: |-
      slirp4netns version 1.2.0
      commit: 656041d45cfca7a4176f6b7eed9e4fe6c11e8383
      libslirp: 4.7.0
      SLIRP_CONFIG_VERSION_MAX: 4
      libseccomp: 2.5.2
  swapFree: 4293881856
  swapTotal: 4294967296
  uptime: 16h 16m 48.5s (Approximately 0.67 days)
plugins:
  log:
  - k8s-file
  - none
  - passthrough
  network:
  - bridge
  - macvlan
  volume:
  - local
registries:
  search:
  - docker.io
store:
  configFile: /etc/containers/storage.conf
  containerStore:
    number: 8
    paused: 0
    running: 0
    stopped: 8
  graphDriverName: overlay
  graphOptions:
    overlay.mountopt: nodev
  graphRoot: /var/lib/containers/storage
  graphRootAllocated: 269490393088
  graphRootUsed: 1399648256
  graphStatus:
    Backing Filesystem: extfs
    Native Overlay Diff: "true"
    Supports d_type: "true"
    Using metacopy: "false"
  imageCopyTmpDir: /var/tmp
  imageStore:
    number: 3
  runRoot: /run/containers/storage
  volumePath: /var/lib/containers/storage/volumes
version:
  APIVersion: 4.1.0
  Built: 1651863909
  BuiltTime: Fri May  6 19:05:09 2022
  GitCommit: 9c738fd311e5ca96844c2efdd826ef673ca9cf14
  GoVersion: go1.18.1
  Os: linux
  OsArch: linux/amd64
  Version: 4.1.0

Package info (e.g. output of rpm -q podman or apt list podman):

apk info podman
podman-4.1.0-r0 description:
Simple management tool for pods, containers and images

podman-4.1.0-r0 webpage:
https://podman.io/

podman-4.1.0-r0 installed size:
37 MiB
...
apk list podman
podman-4.1.0-r0 x86_64 {podman} (Apache-2.0) [installed]
...
apk manifest podman
sha1:4d28e54ab8ee91a695bab0db57a6cfe03f8eac22  usr/bin/podman
sha1:a3d437499a6ba6fa2f92d18786b627ece5747fc3  usr/libexec/podman/rootlessport

Have you tested with the latest version of Podman and have you checked the Podman Troubleshooting Guide? (https://github.com/containers/podman/blob/main/troubleshooting.md)

Yes (v4.1.0)/Yes (none of the 35 common issues apply)

Additional environment details (AWS, VirtualBox, physical, etc.):

uname -a

Linux *************** 5.10.102.1-microsoft-standard-WSL2 #1 SMP Wed Mar 2 00:30:59 UTC 2022 x86_64 Linux
cat /etc/os-release
NAME="Alpine Linux"
ID=alpine
VERSION_ID=3.15.0
PRETTY_NAME="Alpine Linux v3.15"
...





1 With WSL2 and Alpine configured for cgroup v2, based on an admixture of configurations suggested by @lightmelodies, configurations suggested by @nunix and MS' own Advanced settings in WSL
2 My requirements emphatically exclude any dependency on systemd

@openshift-ci openshift-ci bot added the kind/bug Categorizes issue or PR as related to a bug. label May 13, 2022
@rhatdan
Copy link
Member

rhatdan commented May 13, 2022

@n1hility @giuseppe PTAL

@n1hility
Copy link
Member

@skepticoitusInteruptus podman expects there to be an initial group defined (normally done by systemd) so that it can propagate values from it. You should just be able to create a cgroup in /sys/fs/cgroup (e.g. mkdir /sys/fs/cgroup/mygroup), and then run podman under that cgroup

@skepticoitusInteruptus
Copy link
Author

skepticoitusInteruptus commented May 16, 2022

Mucho Thanky-o, @n1hility. I will try that later today.

In the meantime I will share where I'm at so far...

Observations

  • Alpine is not listed as a distribution that podman officially supports
  • The table in the Build Tags subsection of Build and Run Dependencies suggests that systemd is merely optional; i.e., to be included in the build only if you want to add the journald logging feature to podman
  • The docker run -m 6m... command results in a /sys/fs/cgroup/docker subgroup getting automatically created (ie., no systemd required)
  • The podman run... command (without the -m switch) results in a /sys/fs/cgroup/libpod_parent subgroup getting automatically created (ie., no systemd required)
  • The podman run... command (without the -m switch) results in a /sys/fs/cgroup/libpod_parent/libprod-8003581f61d9... subgroup getting automatically created (i.e., no systemd required)
  • On a freshly-installed Alpine distro with cgroup v1 activated by default (and unchanged), the absence of systemd is, presumably, the root cause of this error:
podman run -itd -m 6m --memory-swap -1 docker.io/ibmcom/helloworld
Error: crun: mount `proc` to `/proc`: Permission denied: OCI permission denied

Questions

  1. Given that podman info correctly detects and reports all the cgroup v2 controllers that are enabled, why is that available cgroup v2 controller info not sufficient to recover from choking on the above run -m command?
  2. Whatever podman process it is that's creating those libprod subgroups, what prevents it from taking it from there in cases where it detects no systemd present?
  3. Given that simple cgroup-related commands (v2 and v1) will always fail if systemd is not present, why isn't systemd explicitly listed as a hard requirement to run podman out of the box?
  4. Given that /proc is always mounted during boot regardless, why is crun trying to mount it afterwards?

@n1hility
Copy link
Member

  1. Given that podman info correctly detects and reports all the cgroup v2 controllers that are enabled, why is that available cgroup v2 controller info not sufficient to recover from choking on the above run -m command?

It's a fair question. We could just skip the swap validation check in the case of a root v2 control group wdyt @giuseppe

  • On a freshly-installed Alpine distro with cgroup v1 activated by default (and unchanged), the absence of systemd is, presumably, the root cause of this error:
podman run -itd -m 6m --memory-swap -1 docker.io/ibmcom/helloworld
Error: crun: mount `proc` to `/proc`: Permission denied: OCI permission denied

This actually works for me with your reproducer. Did you try removing your .wslconfig settings and doing a wsl --shutdown first just to make sure its booting correctly? Are you doing this as root. (Rootless doesn't work with v1)

  1. Given that simple cgroup-related commands (v2 and v1) will always fail if systemd is not present, why isn't systemd explicitly listed as a hard requirement to run podman out of the box?

It's definitely not a hard requirement. In the past you might have had to set things like the cgroups manager to not use systemd

@skepticoitusInteruptus
Copy link
Author

Sincere thanks for your answers, @n1hility 👍

...create a cgroup in /sys/fs/cgroup (e.g. mkdir /sys/fs/cgroup/mygroup), and then run podman under that cgroup...

First, I manually added the shell I'm invoking podman in, to the root cgroup.procs interface file:

echo $$ > /sys/fs/cgroup/cgroup.procs

Then, I figured out the pid of the libpod cgroup subgroup that had already been automatically created by something in podman:

cat /proc/42/cgroup
0::/libpod_parent/libpod-8003581f61d95a3d112c54eb6baa888570935809833fa0c02178033f3eff25de

Then, after establishing that pid 42 was not in cgroup.proc, I manually added it:

echo 42 > /sys/fs/cgroup/cgroup.procs

Then, I manually set the memory limits for the automatically-created libpod_parent cgroup:

echo 7M > /sys/fs/cgroup/libpod_parent/memory.max
echo 7M > /sys/fs/cgroup/libpod_parent/memory.swap.max

I still got the same error I originally reported though:

podman run -m 6m -itd docker.io/ibmcom/helloworld
Error: could not find cgroup mount in "/proc/self/cgroup"

podman run -m 6m --memory-swap 6m -itd docker.io/ibmcom/helloworld
Error: could not find cgroup mount in "/proc/self/cgroup"

Trying to circumvent that, I tried --memory-swap -1 instead, but got a different (I presume cgroup v2-triggered) error that time:

podman run -m 6m --memory-swap -1 -itd docker.io/ibmcom/helloworld
Error: exit status 1

I'm 97% confident that my manual cgroup config took. That's based on the cgroup-related stuff I'm seeing in all the dmesg logs.

After each of the three different invocations above, dmesg | tail -n 100 for all three invocations look exactly the same.

The tell-tale cgroup v2 clues that are logged after every different permutation of podman -run, are:

...
Memory cgroup stats for /libpod_parent:
...
podman invoked oom-killer: gfp_mask=0xcc0(GFP_KERNEL), order=0, oom_score_adj=-1000
...
memory: usage 7168kB, limit 7168kB, failcnt 122537328
swap: usage 8556kB, limit 7168kB, failcnt 0
...

This actually works for me with your reproducer...

I forgot to mention that I did that particular cgroup v1 experiment in Docker Playground that time; not in my WSL2 Alpine locally. That error might be because Docker Playground's Alpine instances are on an old 4.4.0-210-generic kernel.

@n1hility
Copy link
Member

n1hility commented May 17, 2022

Sincere thanks for your answers, @n1hility 👍

you're welcome!

...create a cgroup in /sys/fs/cgroup (e.g. mkdir /sys/fs/cgroup/mygroup), and then run podman under that cgroup...

First, I manually added the shell I'm invoking podman in, to the root cgroup.procs interface file:

echo $$ > /sys/fs/cgroup/cgroup.procs

Try this:

  1. Follow your instructions in the original issue description
  2. double check cgroup_no_v1=all is in /proc/cmdline
  3. execute the following
# mkdir /sys/fs/cgroup/foo
# echo $$ > /sys/fs/cgroup/foo/cgroup.procs
# cat /proc/self/cgroup
0::/foo
# podman run -m 6m -itd docker.io/ibmcom/helloworld
<id>
# podman stats 

You should see the 6m value having an effect in the output.

@skepticoitusInteruptus
Copy link
Author

...You should see the 6m value having an effect in the output...

Can confirm 👍

I'm guessing that you were able to reproduce the error I originally described(?)

So do your results establish anything regarding podman's behavior parity with Docker?

By behavior parity, I mean docker run -m... just works; both, without systemd and without any manual configuration of the kernel's cgroup interface files by the user.

If podman run -m... just worked — without systemd, without manual cgroup configuration — then "podman is a drop-in replacement for docker" would be that much more accurate, in my opinion.

Thanks again for looking into this, @n1hility. Mucho helpfully-o!

@n1hility
Copy link
Member

...You should see the 6m value having an effect in the output...

Can confirm 👍

I'm guessing that you were able to reproduce the error I originally described(?)

Yes it’s an excellent reproducer. Thank you.

So do your results establish anything regarding podman's behavior parity with Docker?

By behavior parity, I mean docker run -m... just works; both, without systemd and without any manual configuration of the kernel's cgroup interface files by the user.

Oh sure, I basically agreed on this point earlier that we could handle this case. I’ll try to throw up a pr tomorrow if I find some cycles.

If podman run -m... just worked — without systemd, without manual cgroup configuration — then "podman is a drop-in replacement for docker" would be that much more accurate, in my opinion.

Thanks again for looking into this, @n1hility. Mucho helpfully-o!

You’re welcome!

@skepticoitusInteruptus
Copy link
Author

Related to containers/crun/issues/923

@skepticoitusInteruptus
Copy link
Author

@openshift-merge-robot closed this as completed in #14308 21 minutes ago

Thanks fellahs 👍

Given the absense of a response from @giuseppe to my question in @n1hility's PR, I'm gonna go ahead and infer the unspoken answer is probably:

Yes. Podman v4.1.0- original implementations do, indeed, assume Podman is always launched in a child cgroup running in a systemd-spawned parent cgroup.1

I will note the findings of my spike in the documentation that will be presented to my team.

Mucho Thanky-o 🙏





 1 Other observations of Podman's cgroup v2-related behavior also suggest this is a reasonable inference

@n1hility
Copy link
Member

@openshift-merge-robot closed this as completed in #14308 21 minutes ago

Thanks fellahs 👍

Given the absense of a response from @giuseppe to my question in @n1hility's PR, I'm gonna go ahead and infer the unspoken answer is probably:

Yes. Podman v4.1.0- original implementations do, indeed, assume Podman is always launched in a child cgroup running in a systemd-spawned parent cgroup.1

Sorry I missed your question in the PR. This is essentially what I meant in #14236 (comment)

I would word it a little differently though than from what you have above. I would say that the only known reliable swap controller detection mechanism with cgroupsv2 relied on a cgroup being present, and since a system using cgroups usually associates one with logins (either via systemd or something else) that didn't (yet) present as a problem.

@skepticoitusInteruptus
Copy link
Author

...Sorry I missed your question in the PR...

We're good @n1hility. I apprecicate it was neither your nor @giuseppe's highest priority.

If either of you ever do get one or two free cycles, I'm curious to get some insight into how unreasonable my expectations are in this crun issue I reported last week.

...a system using cgroups usually associates one with logins (either via systemd or something else)...

I appreciate why in practice that would be an implementation detail for something like a Podman.

I haven't been able to find any mention of any explicit login-related constraints on cgroup's expected general usage in the official cgroup v2 documentation, though.

Muchas Thankias 🥇

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
kind/bug Categorizes issue or PR as related to a bug. locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants