Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enabling kmem accounting can break applications on CentOS7 #1725

Closed
TvdW opened this issue Feb 20, 2018 · 15 comments
Closed

Enabling kmem accounting can break applications on CentOS7 #1725

TvdW opened this issue Feb 20, 2018 · 15 comments

Comments

@TvdW
Copy link

TvdW commented Feb 20, 2018

After upgrading from Docker (CE) 17.05.0 to 17.12.0, these kernel messages started showing up on my machines:

[43073.004575] SLUB: Unable to allocate memory on node -1 (gfp=0x8020)
[43073.022211]   cache: ip6_dst_cache(0:18479a49f76b66461d8fedd17a6eba407f08cc8960bd0246d39df8eb21f20b4f), object size: 448, buffer size: 448, default order: 2, min order: 0
[43073.061420]   node 0: slabs: 74, objs: 2610, free: 0
[43073.077875]   node 1: slabs: 88, objs: 3141, free: 0

I was able to reproduce this on Docker 17.06.0, and eventually traced it to the commit introduced by #1350 which enables kmem accounting for all containers. But as Docker helpfully suggests, kmemcg are experimental before linux 4.0:

$ sudo docker update --kernel-memory 1000g 18479a49f76b
You specified a kernel memory limit on a kernel older than 4.0. Kernel memory limits are experimental on older kernels, it won't work as expected and can cause your system to be unstable.

After a few more tests I was able to reproduce the issue on 17.05.0, by passing --kernel-memory 1000g to docker run. The kernel log slowly fills up (10 messages per hour?) with SLUB warnings, and containers seem far less stable than normal (i.e. they crash).

Steps to reproduce

  1. CentOS 7, latest kernel (3.10.0-693.17.1.el7.x86_64)
  2. Docker 17.06.0+ or runc with equivalent settings
  3. kmem limit unset, mem limit 40G, with 80G free memory left for page caches
  4. An application that very heavily uses the local disk to cause caches to build up (in my case Apache Cassandra)

Eventually, these messages will start popping up in the kernel logs, and in rare cases it leads to an application getting killed/crashed.


With all of the above said, I'm 99% sure this is a kernel bug related to running an ancient kernel, and runc's patch would at best be a workaround. If only I had a way to get redhat's attention so they can fix it 😃

@hqhq
Copy link
Contributor

hqhq commented Apr 1, 2018

@TvdW The root cause is there are kernel memory limit bugs in 3.10, if you don't want to use kernel memory limit because it's not stable on your kernel, the best solution would be to disable kernel memory limit on your kernel.

I can't think of a way to workaround this on runc side without causing issues like #1083 and #1347 , unless we add some ugly logic like do different things for different kernel versions, I'm afraid that won't be an option.

@cizixs
Copy link

cizixs commented Apr 11, 2018

@hqhq Is there any document on how to disable kernel memory limit?

We're using CentOS 7, and run into a similar problem. It seems CentOS enables kernel memory limit by default even though it's a experimental feature(not sure why), and RunC uses this feature without checking.

@pmoust
Copy link

pmoust commented Apr 11, 2018

@hqhq cgroup.memory=nokmem is not available on RHEL/CentOS 3.10 (kmem accounting is active in the legacy cgroup hierarchy)

@hqhq
Copy link
Contributor

hqhq commented Apr 11, 2018

@cizixs You need to disable CONFIG_MEMCG_KMEM in kernel config and recompile the kernel, or you can help to add an option in runc and other tools on top of runc to disable kmem accounting as I suggested in kubernetes/kubernetes#61937 (comment) .

@jieyu
Copy link

jieyu commented Jul 2, 2018

I got a bit confused by this ticket. I tested with docker 17.09.0-ce and CentOS 7:

[jie@core-dev memory]$ docker info | grep Version
Server Version: 17.09.0-ce
Kernel Version: 3.10.0-693.5.2.el7.x86_64
[jie@core-dev memory]$ find /sys/fs/cgroup/memory -name memory.limit_in_bytes -type f -print -exec cat {} \; | grep -A 1 docker                                                                                                                                                                                               
/sys/fs/cgroup/memory/docker/c02dfa0697e198453e1f352a4f794dbcd8bda0dbc851fc27031b5395201a5b6e/memory.limit_in_bytes
1073741824
/sys/fs/cgroup/memory/docker/memory.limit_in_bytes
9223372036854771712
[jie@core-dev memory]$ find /sys/fs/cgroup/memory -name memory.kmem.limit_in_bytes -type f -print -exec cat {} \; | grep -A 1 docker
/sys/fs/cgroup/memory/docker/c02dfa0697e198453e1f352a4f794dbcd8bda0dbc851fc27031b5395201a5b6e/memory.kmem.limit_in_bytes
9223372036854771712
/sys/fs/cgroup/memory/docker/memory.kmem.limit_in_bytes
9223372036854771712
[jie@core-dev memory]$ docker inspect c02dfa0697e1 | grep Memory
            "Memory": 1073741824,
            "KernelMemory": 0,
            "MemoryReservation": 0,
            "MemorySwap": 2147483648,
            "MemorySwappiness": null,

I don't see kmem accounting being turned on by default in docker 17.09.0-ce. Am I missing something?

@jieyu
Copy link

jieyu commented Jul 2, 2018

Ok, I think I get it now because runc will enable the accounting by setting it to 1 first and set it to -1 to just enable the accounting feature
https://github.com/opencontainers/runc/pull/1350/files#diff-b56f02f0dc51a436b542c6d80bdef7e8R68

kolyshkin added a commit to kolyshkin/runc that referenced this issue Oct 31, 2018
Commit fe898e7 (PR opencontainers#1350) enables kernel memory accounting
for all cgroups created by libcontainer even if kmem limit is
not configured.

Kernel memory accounting is known to be broken in RHEL7 kernels,
including the latest RHEL 7.5 kernel. It does not support reclaim
and can lead to kernel oopses while removing cgroup (merging it
with its parent). Unconditionally enabling kmem acct on RHEL7
leads to bugs:

* opencontainers#1725
* kubernetes/kubernetes#61937
* moby/moby#29638

I am not aware of any good way to figure out whether the kernel
memory accounting in the given kernel is working or broken.
For the lack of a better way, let's check if the running kernel
is RHEL7, and disable initial setting of kmem.

Signed-off-by: Kir Kolyshkin <[email protected]>
kolyshkin added a commit to kolyshkin/runc that referenced this issue Nov 1, 2018
Commit fe898e7 (PR opencontainers#1350) enables kernel memory accounting
for all cgroups created by libcontainer -- even if kmem limit is
not configured.

Kernel memory accounting is known to be broken in some kernels,
specifically the ones from RHEL7 (including RHEL 7.5). Those
kernels do not support kernel memory reclaim, and are prone to
oopses. Unconditionally enabling kmem acct on such kernels lead
to bugs, such as

* opencontainers#1725
* kubernetes/kubernetes#61937
* moby/moby#29638

This commit gives a way to compile runc without kernel memory setting
support. To do so, use something like

	make BUILDTAGS="seccomp nokmem"

Signed-off-by: Kir Kolyshkin <[email protected]>
kolyshkin added a commit to kolyshkin/runc that referenced this issue Nov 1, 2018
Commit fe898e7 (PR opencontainers#1350) enables kernel memory accounting
for all cgroups created by libcontainer even if kmem limit is
not configured.

Kernel memory accounting is known to be broken in RHEL7 kernels,
including the latest RHEL 7.5 kernel. It does not support reclaim
and can lead to kernel oopses while removing cgroup (merging it
with its parent). Unconditionally enabling kmem acct on RHEL7
leads to bugs:

* opencontainers#1725
* kubernetes/kubernetes#61937
* moby/moby#29638

I am not aware of any good way to figure out whether the kernel
memory accounting in the given kernel is working or broken.
For the lack of a better way, let's check if the running kernel
is RHEL7, and disable initial setting of kmem.

Signed-off-by: Kir Kolyshkin <[email protected]>
kolyshkin added a commit to kolyshkin/runc that referenced this issue Nov 1, 2018
Commit fe898e7 (PR opencontainers#1350) enables kernel memory accounting
for all cgroups created by libcontainer -- even if kmem limit is
not configured.

Kernel memory accounting is known to be broken in some kernels,
specifically the ones from RHEL7 (including RHEL 7.5). Those
kernels do not support kernel memory reclaim, and are prone to
oopses. Unconditionally enabling kmem acct on such kernels lead
to bugs, such as

* opencontainers#1725
* kubernetes/kubernetes#61937
* moby/moby#29638

This commit gives a way to compile runc without kernel memory setting
support. To do so, use something like

	make BUILDTAGS="seccomp nokmem"

Signed-off-by: Kir Kolyshkin <[email protected]>
@cyphar
Copy link
Member

cyphar commented Nov 2, 2018

In theory this is fixed by #1921 -- though RHEL will have to rebuild runc with BUILDTAGS=nokmem...

@crosbymichael
Copy link
Member

@cyphar or fix their kernel....

thaJeztah pushed a commit to thaJeztah/runc that referenced this issue Nov 13, 2018
Commit fe898e7 (PR opencontainers#1350) enables kernel memory accounting
for all cgroups created by libcontainer -- even if kmem limit is
not configured.

Kernel memory accounting is known to be broken in some kernels,
specifically the ones from RHEL7 (including RHEL 7.5). Those
kernels do not support kernel memory reclaim, and are prone to
oopses. Unconditionally enabling kmem acct on such kernels lead
to bugs, such as

* opencontainers#1725
* kubernetes/kubernetes#61937
* moby/moby#29638

This commit gives a way to compile runc without kernel memory setting
support. To do so, use something like

	make BUILDTAGS="seccomp nokmem"

Signed-off-by: Kir Kolyshkin <[email protected]>
(cherry picked from commit 6a2c155)
Signed-off-by: Sebastiaan van Stijn <[email protected]>
clnperez pushed a commit to clnperez/runc that referenced this issue Nov 13, 2018
Commit fe898e7 (PR opencontainers#1350) enables kernel memory accounting
for all cgroups created by libcontainer -- even if kmem limit is
not configured.

Kernel memory accounting is known to be broken in some kernels,
specifically the ones from RHEL7 (including RHEL 7.5). Those
kernels do not support kernel memory reclaim, and are prone to
oopses. Unconditionally enabling kmem acct on such kernels lead
to bugs, such as

* opencontainers#1725
* kubernetes/kubernetes#61937
* moby/moby#29638

This commit gives a way to compile runc without kernel memory setting
support. To do so, use something like

	make BUILDTAGS="seccomp nokmem"

Signed-off-by: Kir Kolyshkin <[email protected]>
@cyphar
Copy link
Member

cyphar commented Nov 21, 2018

This is fixed by #1921. But I just noticed there's actually a bug in it (it will ignore explicitly-set kernel memory limits).

@cyphar cyphar closed this as completed Nov 21, 2018
@pmoust
Copy link

pmoust commented Nov 21, 2018

So is there a follow-up on #1921 @cyphar ?

@pmoust
Copy link

pmoust commented Nov 21, 2018

I did some digging and the follow-up item is #1938, mentioning it here for posterity.

@cyphar
Copy link
Member

cyphar commented Nov 21, 2018

Yes, sorry I tagged #1921 and not this issue.

@ethercflow
Copy link

NicolasT added a commit to scality/centos-kernel that referenced this issue Oct 9, 2019
This causes kernel memory leaks when using versions of `runc` that
unconditionally enable per-cgroup kernel memory resource accounting,
leading to systems becoming unusable when many containers were created.

The links below mention actual leaks of cgroups as well. However, in
testing this appears to be fixed in more recent RedHat/CentOS kernel
versions.

We disable the feature in the kernel configuration, which however
changes its ABI.

See: https://docs.google.com/document/d/1892PZs2ZdV4_JsSoFwC6WfoOHqKVirFci9r_6NAJzUU/edit?usp=sharing
See: moby/moby#29638 (comment)
See: kubernetes/kubernetes#61937
See: opencontainers/runc#1725
See: https://bugzilla.redhat.com/show_bug.cgi?id=1507149
See: https://bugs.schedmd.com/show_bug.cgi?id=5082#c28
caruccio pushed a commit to caruccio/runc that referenced this issue Oct 10, 2019
Commit fe898e7 (PR opencontainers#1350) enables kernel memory accounting
for all cgroups created by libcontainer -- even if kmem limit is
not configured.

Kernel memory accounting is known to be broken in some kernels,
specifically the ones from RHEL7 (including RHEL 7.5). Those
kernels do not support kernel memory reclaim, and are prone to
oopses. Unconditionally enabling kmem acct on such kernels lead
to bugs, such as

* opencontainers#1725
* kubernetes/kubernetes#61937
* moby/moby#29638

This commit gives a way to compile runc without kernel memory setting
support. To do so, use something like

	make BUILDTAGS="seccomp nokmem"

Signed-off-by: Kir Kolyshkin <[email protected]>
tedli pushed a commit to tedli/kubernetes that referenced this issue Apr 22, 2020
        It seems that the kernel bug which causes this error is finally fixed now,
        and will be released in kernel-3.10.0-1075.el7, which is due in RHEL 7.8
        http://jira.tenxcloud.com/browse/LOT-1896
        http://jira.tenxcloud.com/browse/MAS-159
        kubernetes#61937
        opencontainers/runc#1725

Signed-off-by: weiwei <[email protected]>
@zionwu
Copy link

zionwu commented Oct 17, 2020

There is several wrong information in the artical https://pingcap.com/blog/try-to-fix-two-linux-kernel-bugs-while-testing-tidb-operator-in-k8s :

  1. runc of Docker v18.09.1 does not disable kmem accounting.
    The approach to verify is running the command docker run -d --name test --kernel-memory 100M nginx:1.14.2. If kmem accounting is enabled, the command succeed and container is created. If kmem accounting is disabled, the command fails with the following message:
docker run -d --name test --kernel-memory 100M nginx:1.14.2
eb8dfe53ea903a9207bd356999b14c7ef3b57d9d35d33635c0e8b700387f60ce
docker: Error response from daemon: OCI runtime create failed: container_linux.go:345: starting container process caused "process_linux.go:430: container init caused \"process_linux.go:396: setting cgroup config for procHooks process caused \\\"kernel memory accounting disabled in this runc build\\\"\"": unknown.
  1. The buildtags BUILDTAGS="nokmem" is useless. The correct buildtags is GOFLAGS="-tags=nokmem". The complete command to compile kubelet is
./build/run.sh make kubelet KUBE_BUILD_PLATFORMS=linux/amd64 BUILDTAGS="nokmem"
  1. A better approach to verify if kmem accounting is disabled in the host after reboot, is to run the following command:
find /sys/fs/cgroup/memory/kubepods/ -name memory.kmem.slabinfo | xargs cat > /tmp/mem.txt

If mem.txt is empty, kmem accounting is disabled.

@cyphar
Copy link
Member

cyphar commented Oct 17, 2020

We didn't write the article you linked (and this issue is long since closed) so I don't know why you've posted in this issue, but to your points:

  1. Yes, Docker doesn't disable kmem accounting in their builds of runc but issue trackers is not for the Docker project.
  2. BUILDTAGS=nokmem does work if you're building runc directly using our project's Makefile. If you're building runc as part of another project's build system (such as Kubernetes) then you should consult their documentation.
  3. You can also just check if /sys/fs/cgroup/memory contains any "memory.kmem" files (such as /sys/fs/cgroup/memory/memory.kmem.limit_in_bytes).

EDIT: Ah, someone else linked to the article above.

dims pushed a commit to dims/libcontainer that referenced this issue Oct 19, 2024
Commit e882dae (PR #1350) enables kernel memory accounting
for all cgroups created by libcontainer -- even if kmem limit is
not configured.

Kernel memory accounting is known to be broken in some kernels,
specifically the ones from RHEL7 (including RHEL 7.5). Those
kernels do not support kernel memory reclaim, and are prone to
oopses. Unconditionally enabling kmem acct on such kernels lead
to bugs, such as

* opencontainers/runc#1725
* kubernetes/kubernetes#61937
* moby/moby#29638

This commit gives a way to compile runc without kernel memory setting
support. To do so, use something like

	make BUILDTAGS="seccomp nokmem"

Signed-off-by: Kir Kolyshkin <[email protected]>
dims pushed a commit to dims/libcontainer that referenced this issue Oct 19, 2024
Commit c786d3e (PR #1350) enables kernel memory accounting
for all cgroups created by libcontainer -- even if kmem limit is
not configured.

Kernel memory accounting is known to be broken in some kernels,
specifically the ones from RHEL7 (including RHEL 7.5). Those
kernels do not support kernel memory reclaim, and are prone to
oopses. Unconditionally enabling kmem acct on such kernels lead
to bugs, such as

* opencontainers/runc#1725
* kubernetes/kubernetes#61937
* moby/moby#29638

This commit gives a way to compile runc without kernel memory setting
support. To do so, use something like

	make BUILDTAGS="seccomp nokmem"

Signed-off-by: Kir Kolyshkin <[email protected]>
dims pushed a commit to dims/libcontainer that referenced this issue Oct 19, 2024
Commit ce43d2d (PR #1350) enables kernel memory accounting
for all cgroups created by libcontainer -- even if kmem limit is
not configured.

Kernel memory accounting is known to be broken in some kernels,
specifically the ones from RHEL7 (including RHEL 7.5). Those
kernels do not support kernel memory reclaim, and are prone to
oopses. Unconditionally enabling kmem acct on such kernels lead
to bugs, such as

* opencontainers/runc#1725
* kubernetes/kubernetes#61937
* moby/moby#29638

This commit gives a way to compile runc without kernel memory setting
support. To do so, use something like

	make BUILDTAGS="seccomp nokmem"

Signed-off-by: Kir Kolyshkin <[email protected]>
kolyshkin added a commit to kolyshkin/containerd-cgroups that referenced this issue Nov 6, 2024
Commit 3e0f215 (PR #1350) enables kernel memory accounting
for all cgroups created by libcontainer -- even if kmem limit is
not configured.

Kernel memory accounting is known to be broken in some kernels,
specifically the ones from RHEL7 (including RHEL 7.5). Those
kernels do not support kernel memory reclaim, and are prone to
oopses. Unconditionally enabling kmem acct on such kernels lead
to bugs, such as

* opencontainers/runc#1725
* kubernetes/kubernetes#61937
* moby/moby#29638

This commit gives a way to compile runc without kernel memory setting
support. To do so, use something like

	make BUILDTAGS="seccomp nokmem"

Signed-off-by: Kir Kolyshkin <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

9 participants