Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kernel Panic "BUG: unable to handle kernel NULL pointer dereference at 0000000000000010" because of "runc:[2:INIT]" #1857

Closed
s-r-engineer opened this issue Aug 6, 2018 · 8 comments

Comments

@s-r-engineer
Copy link

s-r-engineer commented Aug 6, 2018

Hi.

We worked on Centos 7.4-1708 (kernel 3.10.0-693.11.6) and Docker CE 17.12.0.ce-1.
After system crash we reinstalled OS to Centos 7.5-1804 (kernel 3.10.0-862.9.1) and Docker 18.06.0.ce-3.
Because of that system crash (human factor btw) we started to reindexing ElasticSearch shards (a lot of shards).
And in this circumstances we have had this bug:

SYS

KERNEL: /usr/lib/debug/lib/modules/3.10.0-862.9.1.el7.x86_64/vmlinux
DUMPFILE: vmcore [PARTIAL DUMP]
CPUS: 40
DATE: Thu Aug 2 20:42:21 2018
UPTIME: 13:00:49
LOAD AVERAGE: 5.42, 6.01, 6.93
TASKS: 1493
NODENAME: tech-node2-dcdp
RELEASE: 3.10.0-862.9.1.el7.x86_64
VERSION: #1 SMP Mon Jul 16 16:29:36 UTC 2018
MACHINE: x86_64 (2200 Mhz)
MEMORY: 126.6 GB
PANIC: "BUG: unable to handle kernel NULL pointer dereference at 0000000000000010"
PID: 191240
COMMAND: "runc:[2:INIT]"
TASK: ffff9780a738af70 [THREAD_INFO: ffff97743cb5c000]
CPU: 16
STATE: TASK_RUNNING (PANIC)

Backtrace

PID: 191240  TASK: ffff9780a738af70  CPU: 16  COMMAND: "runc:[2:INIT]"
 #0 [ffff97743cb5f9f8] machine_kexec at ffffffff9666178a
 #1 [ffff97743cb5fa58] __crash_kexec at ffffffff96713bf2
 #2 [ffff97743cb5fb28] crash_kexec at ffffffff96713ce0
 #3 [ffff97743cb5fb40] oops_end at ffffffff96d18738
 #4 [ffff97743cb5fb68] no_context at ffffffff96d0807e
 #5 [ffff97743cb5fbb8] __bad_area_nosemaphore at ffffffff96d08115
 #6 [ffff97743cb5fc08] bad_area_nosemaphore at ffffffff96d08286
 #7 [ffff97743cb5fc18] __do_page_fault at ffffffff96d1b6f0
 #8 [ffff97743cb5fc80] do_page_fault at ffffffff96d1b8e5
 #9 [ffff97743cb5fcb0] page_fault at ffffffff96d17758
    [exception RIP: strchr]
    RIP: ffffffff96954850  RSP: ffff97743cb5fd60  RFLAGS: 00010202
    RAX: 0000000000000001  RBX: ffff9774c5227000  RCX: 0000000000000000
    RDX: 0000000000000008  RSI: 000000000000002c  RDI: 0000000000000010
    RBP: ffff97743cb5fdb8   R8: 000038747d0249f0   R9: ffff9774c5227000
    R10: ffff977579eff800  R11: ffff97743cb5fc96  R12: ffff978514eea480
    R13: ffff9774c5227000  R14: 0000000000000001  R15: 0000000000000000
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
#10 [ffff97743cb5fd60] selinux_sb_show_options at ffffffff968dbbb5
#11 [ffff97743cb5fdc0] security_sb_show_options at ffffffff968d0c2c
#12 [ffff97743cb5fdd0] show_sb_opts at ffffffff9686077c
#13 [ffff97743cb5fdf8] show_mountinfo at ffffffff96860d3b
#14 [ffff97743cb5fe58] m_show at ffffffff9683c219
#15 [ffff97743cb5fe68] seq_read at ffffffff96841b80
#16 [ffff97743cb5fed8] vfs_read at ffffffff9681b2ff
#17 [ffff97743cb5ff08] sys_read at ffffffff9681c1cf
#18 [ffff97743cb5ff50] system_call_fastpath at ffffffff96d20795
    RIP: 0000558110c73710  RSP: 000000c4200a3c58  RFLAGS: 00000216
    RAX: 0000000000000000  RBX: 0000000000000000  RCX: ffffffffffffffff
    RDX: 0000000000001000  RSI: 000000c42011b000  RDI: 0000000000000004
    RBP: 000000c4200a3a68   R8: 0000000000000000   R9: 0000000000000000
    R10: 0000000000000000  R11: 0000000000000202  R12: ffffffffffffffff
    R13: 0000000000000002  R14: 0000000000000001  R15: 0000000000000002
    ORIG_RAX: 0000000000000000  CS: 0033  SS: 002b

LOG

[46658.106388] SLUB: Unable to allocate memory on node -1 (gfp=0x20)
[46658.106390]   cache: kmalloc-128(4:9d0ace7f06cbe7f359d76e723f38758818620843970c59bb68e823c817fe1f35), object size: 128, buffer size: 128, default order: 1, min order: 0
[46658.106391]   node 0: slabs: 164, objs: 10368, free: 0
[46658.106392]   node 1: slabs: 53, objs: 3392, free: 88
[46658.106414] SLUB: Unable to allocate memory on node -1 (gfp=0x20)
[46658.106416]   cache: kmalloc-128(4:9d0ace7f06cbe7f359d76e723f38758818620843970c59bb68e823c817fe1f35), object size: 128, buffer size: 128, default order: 1, min order: 0
[46658.106417]   node 0: slabs: 164, objs: 10368, free: 0
[46658.106419]   node 1: slabs: 53, objs: 3392, free: 88
[46658.106453] SLUB: Unable to allocate memory on node -1 (gfp=0x20)
[46658.106455]   cache: kmalloc-128(4:9d0ace7f06cbe7f359d76e723f38758818620843970c59bb68e823c817fe1f35), object size: 128, buffer size: 128, default order: 1, min order: 0
[46658.106456]   node 0: slabs: 164, objs: 10368, free: 0
[46658.106458]   node 1: slabs: 53, objs: 3392, free: 88
[46658.132214] SLUB: Unable to allocate memory on node -1 (gfp=0x20)
[46658.132217]   cache: kmalloc-128(4:9d0ace7f06cbe7f359d76e723f38758818620843970c59bb68e823c817fe1f35), object size: 128, buffer size: 128, default order: 1, min order: 0
[46658.132219]   node 0: slabs: 164, objs: 10368, free: 0
[46658.132220]   node 1: slabs: 53, objs: 3392, free: 88
[46658.132611] SLUB: Unable to allocate memory on node -1 (gfp=0x20)
[46658.132613]   cache: kmalloc-128(4:9d0ace7f06cbe7f359d76e723f38758818620843970c59bb68e823c817fe1f35), object size: 128, buffer size: 128, default order: 1, min order: 0
[46658.132614]   node 0: slabs: 164, objs: 10368, free: 0
[46658.132616]   node 1: slabs: 53, objs: 3392, free: 88
[46658.132635] SLUB: Unable to allocate memory on node -1 (gfp=0x20)
[46658.132637]   cache: kmalloc-128(4:9d0ace7f06cbe7f359d76e723f38758818620843970c59bb68e823c817fe1f35), object size: 128, buffer size: 128, default order: 1, min order: 0
[46658.132638]   node 0: slabs: 164, objs: 10368, free: 0
[46658.132639]   node 1: slabs: 53, objs: 3392, free: 88
[46658.132657] SLUB: Unable to allocate memory on node -1 (gfp=0x20)
[46658.132658]   cache: kmalloc-128(4:9d0ace7f06cbe7f359d76e723f38758818620843970c59bb68e823c817fe1f35), object size: 128, buffer size: 128, default order: 1, min order: 0
[46658.132660]   node 0: slabs: 164, objs: 10368, free: 0
[46658.132661]   node 1: slabs: 53, objs: 3392, free: 88
[46725.383816] ___slab_alloc: 1 callbacks suppressed
[46725.383820] SLUB: Unable to allocate memory on node -1 (gfp=0x20)
[46725.383822]   cache: kmalloc-128(4:9d0ace7f06cbe7f359d76e723f38758818620843970c59bb68e823c817fe1f35), object size: 128, buffer size: 128, default order: 1, min order: 0
[46725.383824]   node 0: slabs: 169, objs: 10688, free: 0
[46725.383827]   node 1: slabs: 53, objs: 3392, free: 88
[46725.383867] SLUB: Unable to allocate memory on node -1 (gfp=0x20)
[46725.383869]   cache: kmalloc-128(4:9d0ace7f06cbe7f359d76e723f38758818620843970c59bb68e823c817fe1f35), object size: 128, buffer size: 128, default order: 1, min order: 0
[46725.383870]   node 0: slabs: 169, objs: 10688, free: 0
[46725.383872]   node 1: slabs: 53, objs: 3392, free: 88
[46739.898316] BUG: unable to handle kernel NULL pointer dereference at 0000000000000010
[46739.898350] IP: [<ffffffff96954850>] strchr+0x0/0x40
[46739.898371] PGD 0 
[46739.898379] Oops: 0000 [#1] SMP 
[46739.898393] Modules linked in: xt_REDIRECT nf_nat_redirect ip_vs_rr xt_ipvs ip_vs xt_nat veth vxlan ip6_udp_tunnel udp_tunnel xt_mark ipt_MASQUERADE nf_nat_masquerade_ipv4 nf_conntrack_netlink xt_addrtype br_netfilter overlay(T) ip6t_rpfilter ipt_REJECT nf_reject_ipv4 ip6t_REJECT nf_reject_ipv6 xt_conntrack ip_set nfnetlink ebtable_nat ebtable_broute bridge stp llc ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_security ip6table_raw iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle iptable_security iptable_raw ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter dcdbas iTCO_wdt vfat iTCO_vendor_support fat skx_edac intel_powerclamp coretemp intel_rapl xfs iosf_mbi kvm_intel kvm irqbypass libcrc32c crc32_pclmul
[46739.898664]  ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd ipmi_ssif pcspkr sg mei_me mei i2c_i801 lpc_ich shpchp ipmi_si ipmi_devintf ipmi_msghandler nfit libnvdimm acpi_pad acpi_power_meter ip_tables ext4 mbcache jbd2 sd_mod crc_t10dif crct10dif_generic mgag200 i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm crct10dif_pclmul crct10dif_common crc32c_intel ixgbe ahci drm libahci tg3 libata megaraid_sas mdio dca ptp i2c_core pps_core
[46739.898844] CPU: 16 PID: 191240 Comm: runc:[2:INIT] Kdump: loaded Tainted: G               ------------ T 3.10.0-862.9.1.el7.x86_64 #1
[46739.898878] Hardware name: Dell Inc. PowerEdge R440/0N28XX, BIOS 1.2.11 10/20/2017
[46739.898900] task: ffff9780a738af70 ti: ffff97743cb5c000 task.ti: ffff97743cb5c000
[46739.898922] RIP: 0010:[<ffffffff96954850>]  [<ffffffff96954850>] strchr+0x0/0x40
[46739.898948] RSP: 0018:ffff97743cb5fd60  EFLAGS: 00010202
[46739.898964] RAX: 0000000000000001 RBX: ffff9774c5227000 RCX: 0000000000000000
[46739.898984] RDX: 0000000000000008 RSI: 000000000000002c RDI: 0000000000000010
[46739.899004] RBP: ffff97743cb5fdb8 R08: 000038747d0249f0 R09: ffff9774c5227000
[46739.899025] R10: ffff977579eff800 R11: ffff97743cb5fc96 R12: ffff978514eea480
[46739.899044] R13: ffff9774c5227000 R14: 0000000000000001 R15: 0000000000000000
[46739.899065] FS:  00007fe7ad864740(0000) GS:ffff97757dc00000(0000) knlGS:0000000000000000
[46739.899088] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[46739.899105] CR2: 0000000000000010 CR3: 000000045c318000 CR4: 00000000007607e0
[46739.899143] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[46739.899165] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[46739.899186] PKRU: 55555554
[46739.899196] Call Trace:
[46739.899209]  [<ffffffff968dbbb5>] ? selinux_sb_show_options+0x135/0x470
[46739.899234]  [<ffffffff968d0c2c>] security_sb_show_options+0x1c/0x20
[46739.899267]  [<ffffffff9686077c>] show_sb_opts+0x5c/0x70
[46739.899284]  [<ffffffff96860d3b>] show_mountinfo+0x20b/0x2d0
[46739.899304]  [<ffffffff9683c219>] m_show+0x19/0x20
[46739.899319]  [<ffffffff96841b80>] seq_read+0x110/0x3f0
[46739.899337]  [<ffffffff9681b2ff>] vfs_read+0x9f/0x170
[46739.899353]  [<ffffffff9681c1cf>] SyS_read+0x7f/0xf0
[46739.899372]  [<ffffffff96d20795>] system_call_fastpath+0x1c/0x21
[46739.899390] Code: c1 75 18 48 83 c0 01 84 c9 74 05 48 39 d0 75 e3 31 c0 5d c3 0f 1f 80 00 00 00 00 44 38 c1 19 c0 83 c8 01 5d c3 66 0f 1f 44 00 00 <0fb6 07 55 48 89 e5 40 38 f0 74 1b 84 c0 89 f2 75 0a eb 1c 0f 
[46739.900319] RIP  [<ffffffff96954850>] strchr+0x0/0x40
[46739.901106]  RSP <ffff97743cb5fd60>
[46739.901846] CR2: 0000000000000010

Situation is very similar to this in case of reproducing the problem.
After all we just downgraded our kernel and docker to last known working configuration (kernel 3.10.0-693.11.6 and Docker CE 17.12.0.ce-1) and all works fine.
#1725

@cyphar
Copy link
Member

cyphar commented Aug 6, 2018

It looks like something in SELinux is triggering this -- in particular it looks like runc does a cat /proc/mountinfo which SELinux then fails to fill the mount options for. Looking into the code itself, it looks like SELinux is checking whether there is a comma in its internal mount options to decide whether it should quote it (from security/selinux/hooks.c):

static void selinux_write_opts(struct seq_file *m,
			       struct security_mnt_opts *opts)
{
	int i;
	char *prefix;

	for (i = 0; i < opts->num_mnt_opts; i++) {
		char *has_comma;

		if (opts->mnt_opts[i])
			has_comma = strchr(opts->mnt_opts[i], ','); /* <<< panic is here <<< */
		else
			has_comma = NULL;

		switch (opts->mnt_opts_flags[i]) {
		case CONTEXT_MNT:
			prefix = CONTEXT_STR;
			break;
		case FSCONTEXT_MNT:
			prefix = FSCONTEXT_STR;
			break;
		case ROOTCONTEXT_MNT:
			prefix = ROOTCONTEXT_STR;
			break;
		case DEFCONTEXT_MNT:
			prefix = DEFCONTEXT_STR;
			break;
		case SBLABEL_MNT:
			seq_putc(m, ',');
			seq_puts(m, LABELSUPP_STR);
			continue;
		default:
			BUG();
			return;
		};
		/* we need a comma before each option */
		seq_putc(m, ',');
		seq_puts(m, prefix);
		if (has_comma)
			seq_putc(m, '\"');
		seq_escape(m, opts->mnt_opts[i], "\"\n\\");
		if (has_comma)
			seq_putc(m, '\"');
	}
}

Where the line marked is where the panic appears to be.

/cc @rhatdan

@s-r-engineer
Copy link
Author

@cyphar
Yeap. We tried configuration with disabled SELinux. All the same.

@s-r-engineer
Copy link
Author

But to be honest that run (without SELinux) we couldn't catch. We have had bad dump and I not completely sure what exactly happens.

@cyphar
Copy link
Member

cyphar commented Aug 6, 2018

@PPerminov

We tried configuration with disabled SELinux. All the same.

How did you disable it? setenforce 0 or by actually switching LSMs and rebooting? I ask because it's possible that the same codepath would be exercised even if you disable SELinux with setenforce (as far as I know the LSM hooks aren't de-registered at runtime).

@s-r-engineer
Copy link
Author

@cyphar It was changed in config and rebooted.

@s-r-engineer
Copy link
Author

@cyphar
Hi
Centos Bug Report

@benamazing
Copy link

Hi, is the problem solved? I encountered the same bug as well, and the kernel version is also 3.10.0-862.9.1.el7.x86_64

@AkihiroSuda
Copy link
Member

This should be reported to the (distro) kernel maintainers.

If only RHEL 7 users are hitting this, this should be reported to Red Hat.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants