Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docker-runc init failed on centos 7.6 xfs XFS: runc:[1:CHILD](3580) possible memory allocation deadlock in kmem_zone_alloc (mode:0x82d0) #2039

Open
imatespl opened this issue Apr 12, 2019 · 12 comments

Comments

@imatespl
Copy link

docker-runc init failed
in console loop print
XFS: runc:1:CHILD possible memory allocation deadlock in kmem_zone_alloc (mode:0x82d0)
cat /proc/3580/stack
[] congestion_wait+0x82/0x110
[] kmem_zone_alloc+0x8c/0x130 [xfs]
[] xfs_trans_alloc+0x6d/0x140 [xfs]
[] xfs_inactive_ifree+0x55/0x230 [xfs]
[] xfs_inactive+0x8b/0x130 [xfs]
[] xfs_fs_destroy_inode+0x95/0x190 [xfs]
[] destroy_inode+0x3b/0x60
[] evict+0x115/0x180
[] iput+0xfc/0x190
[] __dentry_kill+0x120/0x180
[] dput+0xb0/0x160
[] drop_mountpoint+0x16/0x30
[] pin_kill+0x7d/0x100
[] group_pin_kill+0x21/0x30
[] namespace_unlock+0x71/0x80
[] drop_collected_mounts+0x54/0x60
[] put_mnt_ns+0x24/0x30
[] create_new_namespaces+0x165/0x180
[] unshare_nsproxy_namespaces+0x5a/0xc0
[] SyS_unshare+0x173/0x2e0
[] system_call_fastpath+0x22/0x27
[] 0xffffffffffffffff
the memory use low
Tasks: 240 total, 1 running, 239 sleeping, 0 stopped, 0 zombie
%Cpu(s): 1.1 us, 0.8 sy, 0.0 ni, 73.5 id, 24.5 wa, 0.0 hi, 0.0 si, 0.0 st
KiB Mem : 32780772 total, 12619844 free, 16482460 used, 3678468 buff/cache
KiB Swap: 0 total, 0 free, 0 used. 10126064 avail Mem

ps -aux --forest
root 3558 0.0 0.0 7488 2804 ? Sl Apr02 0:21 docker-containerd-shim -namespace moby -workdir /var/lib/docker/containerd/daemo
root 3572 0.0 0.0 138832 7832 ? Sl Apr02 0:00 _ docker-runc --root /var/run/docker/runtime-runc/moby --log /run/docker/conta
root 3579 0.0 0.0 18388 4348 ? S Apr02 0:00 _ docker-runc init
root 3580 1.3 0.0 18388 2384 ? D Apr02 197:00 _ docker-runc init

system
3.10.0-957.5.1.el7.x86_64 #1 SMP Fri Feb 1 14:54:57 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux

@imatespl
Copy link
Author

docker-runc -version
runc version 1.0.0-rc5+dev
commit: 69663f0
spec: 1.0.0

@cyphar
Copy link
Member

cyphar commented Apr 26, 2019

That looks like an XFS bug to me and I would suggest reporting it to CentOS, it's happening when we are creating a new mount namespaces with unshare(CLONE_NEWNS).

@Aisuko
Copy link

Aisuko commented Oct 8, 2019

I hit the same issue with you guys. The machine is my Kubernetes worker node. The node with Red Hat Enterprise Linux Server 7.5 (Maipo) 3.10.0-1062.el7.x86_64 docker://19.3.2. And this issue can let PLEG and Kubelet stop work.

runc version 1.0.0-rc8
commit: 425e105d5a03fabd737a126ad93d62a9eeede87f
spec: 1.0.1-dev

@brian-arms
Copy link

I've also run into this issue; similar to @Aisuko, it presented on my Kubernetes worker node, which also showed PLEG and Kubelet failures. Node is running RHEL 7.6, Docker 18.09.9.

@strgrb
Copy link

strgrb commented Aug 11, 2020

Has anyone found the reason? I have the same issue with kubernetes 1.16.3, docker version 19.03.3, and containerd 1.2.10, nvidia 1.0.0-rc8+dev, docker-init 0.18.0

@ddl-rolandsugars
Copy link

@strgrb I've run into this issue as well, it looks like it is fixed in newer kernel versions, and may be related to #1725
and
https://bugzilla.redhat.com/show_bug.cgi?id=1507149

What OS and OS version are you running?

@strgrb
Copy link

strgrb commented Oct 22, 2020

@ddl-rolandsugars I use centos7.6 and kernel version is 3.10.0-957. I don't think my problem is related to #1725 because I can't see kernel messages like 'SLUB: Unable to allocate memory on node'.
I set vm.lowmem_reserve_ratio="1 256 32" to reserve more memory for dma, and I have not seen this error for several weeks. But I don't know whether this is a correct solution.

@ddl-rolandsugars
Copy link

@strgrb What is the storage device you're using?

@strgrb
Copy link

strgrb commented Oct 22, 2020

@ddl-rolandsugars An ssd for / and another ssd for /var on some machine

@ddl-rolandsugars
Copy link

@strgrb my bad, I meant storage driver, if you run docker info it should tell you. I think you're probably using devicemapper?

Example output:

$ docker info
Client:
 Debug Mode: false

Server:
 Containers: 2
  Running: 0
  Paused: 0
  Stopped: 2
 Images: 5
 Server Version: 19.03.13
 Storage Driver: overlay2                          <= this.
  Backing Filesystem: extfs
  Supports d_type: true
  Native Overlay Diff: true
 Logging Driver: json-file
 Cgroup Driver: cgroupfs
 Plugins:
  Volume: local
  Network: bridge host ipvlan macvlan null overlay
  Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
 Swarm: inactive
 Runtimes: runc
 Default Runtime: runc
 Init Binary: docker-init
 containerd version: 8fba4e9a7d01810a393d5d25a3621dc101981175
 runc version: dc9208a3303feef5b3839f4323d9beb36df0a9dd
 init version: fec3683
 Security Options:
  seccomp
   Profile: default
 Kernel Version: 4.19.76-linuxkit
 Operating System: Docker Desktop
 OSType: linux
 Architecture: x86_64
 CPUs: 4
 Total Memory: 1.944GiB
 Name: docker-desktop
 ID: RMQE:67ZV:WKCO:PNIS:FD2M:ON2P:HVYC:DSLI:5S7R:NEBG:RVDX:XTG7
 Docker Root Dir: /var/lib/docker
 Debug Mode: false
 HTTP Proxy: gateway.docker.internal:3128
 HTTPS Proxy: gateway.docker.internal:3129
 Registry: https://index.docker.io/v1/
 Labels:
 Experimental: false
 Insecure Registries:
  127.0.0.0/8
 Live Restore Enabled: false
 Product License: Community Engine
``

@strgrb
Copy link

strgrb commented Oct 22, 2020

@ddl-rolandsugars My storage driver is overlay2

@ThinkMo
Copy link

ThinkMo commented Dec 8, 2020

Update kernel to 3.10.0-1062.el7.x86_64, and disable kmem account, add cgroup.memory=nokmem to boot cmdline
also see https://access.redhat.com/solutions/532663

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants