-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enabling kmem accounting can break applications on CentOS7 #1725
Comments
@TvdW The root cause is there are kernel memory limit bugs in 3.10, if you don't want to use kernel memory limit because it's not stable on your kernel, the best solution would be to disable kernel memory limit on your kernel. I can't think of a way to workaround this on runc side without causing issues like #1083 and #1347 , unless we add some ugly logic like do different things for different kernel versions, I'm afraid that won't be an option. |
@hqhq Is there any document on how to disable kernel memory limit? We're using CentOS 7, and run into a similar problem. It seems CentOS enables kernel memory limit by default even though it's a experimental feature(not sure why), and RunC uses this feature without checking. |
@hqhq |
@cizixs You need to disable CONFIG_MEMCG_KMEM in kernel config and recompile the kernel, or you can help to add an option in runc and other tools on top of runc to disable kmem accounting as I suggested in kubernetes/kubernetes#61937 (comment) . |
I got a bit confused by this ticket. I tested with docker 17.09.0-ce and CentOS 7: [jie@core-dev memory]$ docker info | grep Version
Server Version: 17.09.0-ce
Kernel Version: 3.10.0-693.5.2.el7.x86_64
[jie@core-dev memory]$ find /sys/fs/cgroup/memory -name memory.limit_in_bytes -type f -print -exec cat {} \; | grep -A 1 docker
/sys/fs/cgroup/memory/docker/c02dfa0697e198453e1f352a4f794dbcd8bda0dbc851fc27031b5395201a5b6e/memory.limit_in_bytes
1073741824
/sys/fs/cgroup/memory/docker/memory.limit_in_bytes
9223372036854771712
[jie@core-dev memory]$ find /sys/fs/cgroup/memory -name memory.kmem.limit_in_bytes -type f -print -exec cat {} \; | grep -A 1 docker
/sys/fs/cgroup/memory/docker/c02dfa0697e198453e1f352a4f794dbcd8bda0dbc851fc27031b5395201a5b6e/memory.kmem.limit_in_bytes
9223372036854771712
/sys/fs/cgroup/memory/docker/memory.kmem.limit_in_bytes
9223372036854771712
[jie@core-dev memory]$ docker inspect c02dfa0697e1 | grep Memory
"Memory": 1073741824,
"KernelMemory": 0,
"MemoryReservation": 0,
"MemorySwap": 2147483648,
"MemorySwappiness": null, I don't see kmem accounting being turned on by default in docker 17.09.0-ce. Am I missing something? |
Ok, I think I get it now because runc will enable the accounting by setting it to 1 first and set it to -1 to just enable the accounting feature |
Commit fe898e7 (PR opencontainers#1350) enables kernel memory accounting for all cgroups created by libcontainer even if kmem limit is not configured. Kernel memory accounting is known to be broken in RHEL7 kernels, including the latest RHEL 7.5 kernel. It does not support reclaim and can lead to kernel oopses while removing cgroup (merging it with its parent). Unconditionally enabling kmem acct on RHEL7 leads to bugs: * opencontainers#1725 * kubernetes/kubernetes#61937 * moby/moby#29638 I am not aware of any good way to figure out whether the kernel memory accounting in the given kernel is working or broken. For the lack of a better way, let's check if the running kernel is RHEL7, and disable initial setting of kmem. Signed-off-by: Kir Kolyshkin <[email protected]>
Commit fe898e7 (PR opencontainers#1350) enables kernel memory accounting for all cgroups created by libcontainer -- even if kmem limit is not configured. Kernel memory accounting is known to be broken in some kernels, specifically the ones from RHEL7 (including RHEL 7.5). Those kernels do not support kernel memory reclaim, and are prone to oopses. Unconditionally enabling kmem acct on such kernels lead to bugs, such as * opencontainers#1725 * kubernetes/kubernetes#61937 * moby/moby#29638 This commit gives a way to compile runc without kernel memory setting support. To do so, use something like make BUILDTAGS="seccomp nokmem" Signed-off-by: Kir Kolyshkin <[email protected]>
Commit fe898e7 (PR opencontainers#1350) enables kernel memory accounting for all cgroups created by libcontainer even if kmem limit is not configured. Kernel memory accounting is known to be broken in RHEL7 kernels, including the latest RHEL 7.5 kernel. It does not support reclaim and can lead to kernel oopses while removing cgroup (merging it with its parent). Unconditionally enabling kmem acct on RHEL7 leads to bugs: * opencontainers#1725 * kubernetes/kubernetes#61937 * moby/moby#29638 I am not aware of any good way to figure out whether the kernel memory accounting in the given kernel is working or broken. For the lack of a better way, let's check if the running kernel is RHEL7, and disable initial setting of kmem. Signed-off-by: Kir Kolyshkin <[email protected]>
Commit fe898e7 (PR opencontainers#1350) enables kernel memory accounting for all cgroups created by libcontainer -- even if kmem limit is not configured. Kernel memory accounting is known to be broken in some kernels, specifically the ones from RHEL7 (including RHEL 7.5). Those kernels do not support kernel memory reclaim, and are prone to oopses. Unconditionally enabling kmem acct on such kernels lead to bugs, such as * opencontainers#1725 * kubernetes/kubernetes#61937 * moby/moby#29638 This commit gives a way to compile runc without kernel memory setting support. To do so, use something like make BUILDTAGS="seccomp nokmem" Signed-off-by: Kir Kolyshkin <[email protected]>
In theory this is fixed by #1921 -- though RHEL will have to rebuild runc with |
@cyphar or fix their kernel.... |
Commit fe898e7 (PR opencontainers#1350) enables kernel memory accounting for all cgroups created by libcontainer -- even if kmem limit is not configured. Kernel memory accounting is known to be broken in some kernels, specifically the ones from RHEL7 (including RHEL 7.5). Those kernels do not support kernel memory reclaim, and are prone to oopses. Unconditionally enabling kmem acct on such kernels lead to bugs, such as * opencontainers#1725 * kubernetes/kubernetes#61937 * moby/moby#29638 This commit gives a way to compile runc without kernel memory setting support. To do so, use something like make BUILDTAGS="seccomp nokmem" Signed-off-by: Kir Kolyshkin <[email protected]> (cherry picked from commit 6a2c155) Signed-off-by: Sebastiaan van Stijn <[email protected]>
Commit fe898e7 (PR opencontainers#1350) enables kernel memory accounting for all cgroups created by libcontainer -- even if kmem limit is not configured. Kernel memory accounting is known to be broken in some kernels, specifically the ones from RHEL7 (including RHEL 7.5). Those kernels do not support kernel memory reclaim, and are prone to oopses. Unconditionally enabling kmem acct on such kernels lead to bugs, such as * opencontainers#1725 * kubernetes/kubernetes#61937 * moby/moby#29638 This commit gives a way to compile runc without kernel memory setting support. To do so, use something like make BUILDTAGS="seccomp nokmem" Signed-off-by: Kir Kolyshkin <[email protected]>
This is fixed by #1921. But I just noticed there's actually a bug in it (it will ignore explicitly-set kernel memory limits). |
I did some digging and the follow-up item is #1938, mentioning it here for posterity. |
Yes, sorry I tagged #1921 and not this issue. |
This causes kernel memory leaks when using versions of `runc` that unconditionally enable per-cgroup kernel memory resource accounting, leading to systems becoming unusable when many containers were created. The links below mention actual leaks of cgroups as well. However, in testing this appears to be fixed in more recent RedHat/CentOS kernel versions. We disable the feature in the kernel configuration, which however changes its ABI. See: https://docs.google.com/document/d/1892PZs2ZdV4_JsSoFwC6WfoOHqKVirFci9r_6NAJzUU/edit?usp=sharing See: moby/moby#29638 (comment) See: kubernetes/kubernetes#61937 See: opencontainers/runc#1725 See: https://bugzilla.redhat.com/show_bug.cgi?id=1507149 See: https://bugs.schedmd.com/show_bug.cgi?id=5082#c28
Commit fe898e7 (PR opencontainers#1350) enables kernel memory accounting for all cgroups created by libcontainer -- even if kmem limit is not configured. Kernel memory accounting is known to be broken in some kernels, specifically the ones from RHEL7 (including RHEL 7.5). Those kernels do not support kernel memory reclaim, and are prone to oopses. Unconditionally enabling kmem acct on such kernels lead to bugs, such as * opencontainers#1725 * kubernetes/kubernetes#61937 * moby/moby#29638 This commit gives a way to compile runc without kernel memory setting support. To do so, use something like make BUILDTAGS="seccomp nokmem" Signed-off-by: Kir Kolyshkin <[email protected]>
It seems that the kernel bug which causes this error is finally fixed now, and will be released in kernel-3.10.0-1075.el7, which is due in RHEL 7.8 http://jira.tenxcloud.com/browse/LOT-1896 http://jira.tenxcloud.com/browse/MAS-159 kubernetes#61937 opencontainers/runc#1725 Signed-off-by: weiwei <[email protected]>
There is several wrong information in the artical https://pingcap.com/blog/try-to-fix-two-linux-kernel-bugs-while-testing-tidb-operator-in-k8s :
docker run -d --name test --kernel-memory 100M nginx:1.14.2
eb8dfe53ea903a9207bd356999b14c7ef3b57d9d35d33635c0e8b700387f60ce
docker: Error response from daemon: OCI runtime create failed: container_linux.go:345: starting container process caused "process_linux.go:430: container init caused \"process_linux.go:396: setting cgroup config for procHooks process caused \\\"kernel memory accounting disabled in this runc build\\\"\"": unknown.
./build/run.sh make kubelet KUBE_BUILD_PLATFORMS=linux/amd64 BUILDTAGS="nokmem"
find /sys/fs/cgroup/memory/kubepods/ -name memory.kmem.slabinfo | xargs cat > /tmp/mem.txt If mem.txt is empty, kmem accounting is disabled. |
We didn't write the article you linked (and this issue is long since closed) so I don't know why you've posted in this issue, but to your points:
EDIT: Ah, someone else linked to the article above. |
Commit e882dae (PR #1350) enables kernel memory accounting for all cgroups created by libcontainer -- even if kmem limit is not configured. Kernel memory accounting is known to be broken in some kernels, specifically the ones from RHEL7 (including RHEL 7.5). Those kernels do not support kernel memory reclaim, and are prone to oopses. Unconditionally enabling kmem acct on such kernels lead to bugs, such as * opencontainers/runc#1725 * kubernetes/kubernetes#61937 * moby/moby#29638 This commit gives a way to compile runc without kernel memory setting support. To do so, use something like make BUILDTAGS="seccomp nokmem" Signed-off-by: Kir Kolyshkin <[email protected]>
Commit c786d3e (PR #1350) enables kernel memory accounting for all cgroups created by libcontainer -- even if kmem limit is not configured. Kernel memory accounting is known to be broken in some kernels, specifically the ones from RHEL7 (including RHEL 7.5). Those kernels do not support kernel memory reclaim, and are prone to oopses. Unconditionally enabling kmem acct on such kernels lead to bugs, such as * opencontainers/runc#1725 * kubernetes/kubernetes#61937 * moby/moby#29638 This commit gives a way to compile runc without kernel memory setting support. To do so, use something like make BUILDTAGS="seccomp nokmem" Signed-off-by: Kir Kolyshkin <[email protected]>
Commit ce43d2d (PR #1350) enables kernel memory accounting for all cgroups created by libcontainer -- even if kmem limit is not configured. Kernel memory accounting is known to be broken in some kernels, specifically the ones from RHEL7 (including RHEL 7.5). Those kernels do not support kernel memory reclaim, and are prone to oopses. Unconditionally enabling kmem acct on such kernels lead to bugs, such as * opencontainers/runc#1725 * kubernetes/kubernetes#61937 * moby/moby#29638 This commit gives a way to compile runc without kernel memory setting support. To do so, use something like make BUILDTAGS="seccomp nokmem" Signed-off-by: Kir Kolyshkin <[email protected]>
Commit 3e0f215 (PR #1350) enables kernel memory accounting for all cgroups created by libcontainer -- even if kmem limit is not configured. Kernel memory accounting is known to be broken in some kernels, specifically the ones from RHEL7 (including RHEL 7.5). Those kernels do not support kernel memory reclaim, and are prone to oopses. Unconditionally enabling kmem acct on such kernels lead to bugs, such as * opencontainers/runc#1725 * kubernetes/kubernetes#61937 * moby/moby#29638 This commit gives a way to compile runc without kernel memory setting support. To do so, use something like make BUILDTAGS="seccomp nokmem" Signed-off-by: Kir Kolyshkin <[email protected]>
After upgrading from Docker (CE) 17.05.0 to 17.12.0, these kernel messages started showing up on my machines:
I was able to reproduce this on Docker 17.06.0, and eventually traced it to the commit introduced by #1350 which enables kmem accounting for all containers. But as Docker helpfully suggests, kmemcg are experimental before linux 4.0:
After a few more tests I was able to reproduce the issue on 17.05.0, by passing
--kernel-memory 1000g
todocker run
. The kernel log slowly fills up (10 messages per hour?) with SLUB warnings, and containers seem far less stable than normal (i.e. they crash).Steps to reproduce
Eventually, these messages will start popping up in the kernel logs, and in rare cases it leads to an application getting killed/crashed.
With all of the above said, I'm 99% sure this is a kernel bug related to running an ancient kernel, and runc's patch would at best be a workaround. If only I had a way to get redhat's attention so they can fix it 😃
The text was updated successfully, but these errors were encountered: