constant create/destroy namespaces leads to bad labels #91

etsaur4 · 2019-03-15T21:47:58Z

This is not the easiest to reproduce and does not reproduce 100% of the time. If you constantly create/destroy namespaces labels tend to get unhappy. I also believe using smaller namespaces contributes to the problem.

Steps to reproduce

Create-namespace
ndctl create-namespace --region=region1 --mode=fsdax --size=16M --verbose
ndctl create-namespace --region=region2 --mode=fsdax --size=16M --verbose
ndctl create-namespace --region=region3 --mode=fsdax --size=16M --verbose
ndctl create-namespace --region=region1 --mode=fsdax --size=16M --verbose
ndctl create-namespace --region=region2 --mode=fsdax --size=16M --verbose
ndctl create-namespace --region=region3 --mode=fsdax --size=16M --verbose
I personally randomize the sizes
A separate topic, but sizes less than 16M fail
Destory namespace
ndctl disable-namespace all
ndctl destroy-namespace all

You'll eventually error messages like below..

[root@ban25uut138 ~]# ndctl create-namespace --region=region1 --mode=fsdax --size=12G --verbose
[40834.474424] nd namespace1.0: failed to track label: 1
[40834.480097] ------------[ cut here ]------------
[40834.485258] WARNING: CPU: 39 PID: 43500 at drivers/nvdimm/label.c:860 nd_pmem_namespace_label_update+0x6b0/0x6c0 [libnvdimm]
[40834.497778] Modules linked in: xt_CHECKSUM ipt_MASQUERADE nf_nat_masquerade_ipv4 tun ipt_REJECT nf_reject_ipv4 xt_conntrack ip_set nfnetlink ebtable_nat ebtable_broute bridge stp llc iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle iptable_security iptable_raw mptctl mptbase ebtable_filter ebtables ip6_tables iptable_filter nd_pmem dax_pmem nd_btt device_dax skx_edac intel_powerclamp coretemp kvm_intel kvm irqbypass crct10dif_pclmul vfat crc32_pclmul fat ext4 ghash_clmulni_intel pcbc aesni_intel mbcache jbd2 crypto_simd glue_helper fscrypto cryptd iTCO_wdt iTCO_vendor_support pcspkr cdc_ether usbnet sg mii i2c_i801 lpc_ich ioatdma shpchp wmi ipmi_si ipmi_devintf nfit ipmi_msghandler libnvdimm pcc_cpufreq nfsd auth_rpcgss nfs_acl lockd grace sunrpc binfmt_misc
[40834.577035] ip_tables xfs libcrc32c sd_mod crc32c_intel mgag200 drm_kms_helper syscopyarea sysfillrect sysimgblt megaraid_sas fb_sys_fops ttm igb ahci ptp bnxt_en drm libahci uas pps_core libata usb_storage dca i2c_algo_bit dm_mirror dm_region_hash dm_log dm_mod
[40834.603045] CPU: 39 PID: 43500 Comm: ndctl Not tainted 4.14.35-1902.0.6.el7uek.x86_64 #2
[40834.612077] Hardware name: Oracle Corporation ORACLE SERVER X8-2/ASM, MB, X7-2, BIOS 51010301 03/08/2019
[40834.622659] task: ffff8b224e0c3d80 task.stack: ffffa1a2cd780000
[40834.629270] RIP: 0010:nd_pmem_namespace_label_update+0x6b0/0x6c0 [libnvdimm]
[40834.637136] RSP: 0018:ffffa1a2cd783c70 EFLAGS: 00010246
[40834.642966] RAX: 0000000000000029 RBX: ffff8a51441ce010 RCX: 0000000000000000
[40834.650931] RDX: 0000000000000000 RSI: ffff8a51e0fd6938 RDI: ffff8a51e0fd6938
[40834.658894] RBP: ffffa1a2cd783d30 R08: 00000000fffffffe R09: 0000000000000d08
[40834.666858] R10: 0000000000000005 R11: 0000000000000d07 R12: 0000000000000001
[40834.674821] R13: ffff8a5147890300 R14: ffff8a3b98682800 R15: ffff8a51458db400
[40834.682784] FS: 00007f64978c87c0(0000) GS:ffff8a51e0fc0000(0000) knlGS:0000000000000000
[40834.691814] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[40834.698226] CR2: 00007f3a68917140 CR3: 00000017459a0003 CR4: 00000000007606e0
[40834.706190] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[40834.714154] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[40834.722118] PKRU: 55555554
[40834.725136] Call Trace:
[40834.727870] nd_namespace_label_update+0xec/0x130 [libnvdimm]
[40834.734286] uuid_store+0x16d/0x190 [libnvdimm]
[40834.739346] dev_attr_store+0x1b/0x25
[40834.743436] sysfs_kf_write+0x3f/0x46
[40834.747523] kernfs_fop_write+0x124/0x1a3
[40834.752000] __vfs_write+0x3a/0x16d
[40834.755895] ? __fd_install+0x31/0xce
[40834.759984] ? entry_SYSCALL_64_after_hwframe+0x113/0x0
[40834.765816] vfs_write+0xb2/0x1a1
[40834.769518] ? syscall_trace_enter+0x1ce/0x2b8
[40834.774477] SyS_write+0x55/0xb9
[40834.778077] ? entry_SYSCALL_64_after_hwframe+0x95/0x0
[40834.783810] do_syscall_64+0x79/0x1ae
[40834.787895] entry_SYSCALL_64_after_hwframe+0x151/0x0
[40834.793532] RIP: 0033:0x7f64969c6cd0
[40834.797520] RSP: 002b:00007ffcaa677ab8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[40834.805969] RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00007f64969c6cd0
[40834.813934] RDX: 0000000000000025 RSI: 00007ffcaa677b10 RDI: 0000000000000003
[40834.821896] RBP: 00007ffcaa677b10 R08: 0000000000000000 R09: 00007f649692416d
[40834.829861] R10: 00007ffcaa676ea0 R11: 0000000000000246 R12: 0000000000000025
[40834.837823] R13: 0000000000000000 R14: 00007ffcaa677c04 R15: 0000000000000000
[40834.845787] Code: 85 db 49 89 c4 75 04 49 8b 5f 18 49 8d 7f 08 e8 c7 c9 d7 d6 44 89 e1 48 89 c6 48 89 da 48 c7 c7 30 e4 7d c0 31 c0 e8 c3 4d 92 d6 <0f> 0b eb 94 66 90 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00
[40834.866855] ---[ end trace d6462b84b00844f3 ]---
setup_namespace:344: namespace1.0: set_uuid failed: No such device or address
failed to create namespace: No such device or address

[root@ban25uut138 nsbug]# ndctl create-namespace --region=region6 --mode=fsdax --size=16M --verbose
[95382.086706] nd_pmem pfn6.1: Conflicting mapping in same section
[95382.093315] ------------[ cut here ]------------
[95382.098465] WARNING: CPU: 27 PID: 31869 at kernel/memremap.c:188 devm_memremap_pages+0x3a2/0x401
[95382.108270] Modules linked in: xt_CHECKSUM ipt_MASQUERADE nf_nat_masquerade_ipv4 tun ipt_REJECT nf_reject_ipv4 xt_conntrack ip_set nfnetlink ebtable_nat ebtable_broute bridge stp llc iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle iptable_security iptable_raw mptctl mptbase ebtable_filter ebtables ip6_tables iptable_filter nd_pmem dax_pmem nd_btt device_dax skx_edac intel_powerclamp coretemp kvm_intel kvm irqbypass crct10dif_pclmul vfat crc32_pclmul fat ext4 ghash_clmulni_intel pcbc aesni_intel mbcache jbd2 crypto_simd glue_helper fscrypto cryptd iTCO_wdt iTCO_vendor_support pcspkr cdc_ether usbnet sg mii i2c_i801 lpc_ich ioatdma shpchp wmi ipmi_si ipmi_devintf nfit ipmi_msghandler libnvdimm pcc_cpufreq nfsd auth_rpcgss nfs_acl lockd grace sunrpc binfmt_misc
[95382.187531] ip_tables xfs libcrc32c sd_mod crc32c_intel mgag200 drm_kms_helper syscopyarea sysfillrect sysimgblt megaraid_sas fb_sys_fops ttm igb ahci ptp bnxt_en drm libahci uas pps_core libata usb_storage dca i2c_algo_bit dm_mirror dm_region_hash dm_log dm_mod
[95382.213542] CPU: 27 PID: 31869 Comm: ndctl Tainted: G W 4.14.35-1902.0.6.el7uek.x86_64 #2
[95382.223930] Hardware name: Oracle Corporation ORACLE SERVER X8-2/ASM, MB, X7-2, BIOS 51010301 03/08/2019
[95382.234513] task: ffff8b2565550000 task.stack: ffffa1a2e3154000
[95382.241119] RIP: 0010:devm_memremap_pages+0x3a2/0x401
[95382.246755] RSP: 0018:ffffa1a2e3157bf0 EFLAGS: 00010246
[95382.252585] RAX: 0000000000000033 RBX: ffff8b26df5fb2f8 RCX: 0000000000000000
[95382.260548] RDX: 0000000000000000 RSI: ffff8b26df0d6938 RDI: ffff8b26df0d6938
[95382.268513] RBP: ffffa1a2e3157c48 R08: 00000000fffffffe R09: 000000000000243c
[95382.276477] R10: 0000000000000004 R11: 000000000000243b R12: ffff8b26c08382a0
[95382.284442] R13: 000000ed40000000 R14: ffff8b26c08382e0 R15: ffff8b26c0b5eca0
[95382.292407] FS: 00007fa5905957c0(0000) GS:ffff8b26df0c0000(0000) knlGS:0000000000000000
[95382.301438] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[95382.307851] CR2: 00007fa5905cb000 CR3: 000000e7ef89a002 CR4: 00000000007606e0
[95382.315814] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[95382.323780] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[95382.331742] PKRU: 55555554
[95382.334762] Call Trace:
[95382.337494] pmem_attach_disk+0x1b0/0x6f0 [nd_pmem]
[95382.342937] ? devm_memremap+0x6d/0x95
[95382.347120] nd_pmem_probe+0x7e/0xa0 [nd_pmem]
[95382.352087] nvdimm_bus_probe+0x71/0x180 [libnvdimm]
[95382.357632] driver_probe_device+0x2a7/0x460
[95382.362397] bind_store+0xd7/0x10f
[95382.366192] drv_attr_store+0x27/0x31
[95382.370280] sysfs_kf_write+0x3f/0x46
[95382.374358] kernfs_fop_write+0x124/0x1a3
[95382.378834] __vfs_write+0x3a/0x16d
[95382.382727] ? __fd_install+0x31/0xce
[95382.386815] ? entry_SYSCALL_64_after_hwframe+0x113/0x0
[95382.392639] vfs_write+0xb2/0x1a1
[95382.396338] ? syscall_trace_enter+0x1ce/0x2b8
[95382.401299] SyS_write+0x55/0xb9
[95382.404899] ? entry_SYSCALL_64_after_hwframe+0x95/0x0
[95382.410633] do_syscall_64+0x79/0x1ae
[95382.414719] entry_SYSCALL_64_after_hwframe+0x151/0x0
[95382.420358] RIP: 0033:0x7fa58f693cd0
[95382.424348] RSP: 002b:00007fff84758a88 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[95382.432796] RAX: ffffffffffffffda RBX: 0000000000000010 RCX: 00007fa58f693cd0
[95382.440759] RDX: 0000000000000007 RSI: 00000000019aec20 RDI: 0000000000000010
[95382.448724] RBP: 00000000019aec20 R08: 0000000000000000 R09: 00000000019af1e0
[95382.456687] R10: 646e69622f6d656d R11: 0000000000000246 R12: 0000000000000007
[95382.464652] R13: 0000000000000001 R14: 00000000019aec20 R15: 00007fff84758ae8
[95382.472616] Code: ff 48 8b 45 b8 48 8b 58 50 48 85 db 74 64 48 8b 7d b8 e8 32 47 38 00 48 89 da 48 89 c6 48 c7 c7 e0 9d 1d 98 31 c0 e8 31 cb f2 ff <0f> 0b 49 8b bf 80 00 00 00 e8 20 f9 ff ff 48 c7 c0 f4 ff ff ff
[95382.493684] ---[ end trace d6462b84b00844f4 ]---
[95382.498858] nd_pmem: probe of pfn6.1 failed with error -12
libndctl: ndctl_pfn_enable: pfn6.1: failed to enable
Error: namespace6.2: failed to enable

failed to create namespace: No such device or address

etsaur4 · 2019-03-15T21:50:43Z

Label dumps of the 2 NVDIMMs in hex and json attached.

nsbug.zip

etsaur4 · 2019-03-15T21:52:45Z

To recover:
ndctl zero-labels nmem
ndctl init-labels nmem

ndctl check-labels return fine.

etsaur4 · 2019-03-15T22:01:45Z

Another symptom is that after a reboot, the dimms with the unhappy labels suddenly show up with a "state:disabled" status.

djbw · 2019-03-16T01:33:28Z

I think there's 2 issues in the above backtraces:

[95382.086706] nd_pmem pfn6.1: Conflicting mapping in same section
[95382.098465] WARNING: CPU: 27 PID: 31869 at kernel/memremap.c:188

This is the sub-section alignment problem. This is being fixed by removing the alignment constraint, but that won't be done until v5.2 at the earliest. For the gory details on why we didn't fix this for v5.1 you can read the saga here:

 https://patchwork.kernel.org/patch/10808711/

This other issue is concerning:

ndctl create-namespace --region=region1 --mode=fsdax --size=12G --verbose
[40834.474424] nd namespace1.0: failed to track label: 1
[40834.485258] WARNING: CPU: 39 PID: 43500 at drivers/nvdimm/label.c:860

...but I'd like to see if its reproducible without the alignment padding collision issue being in the mix.

commit c4703ce upstream. Users have reported intermittent occurrences of DIMM initialization failures due to duplicate allocations of address capacity detected in the labels, or errors of the form below, both have the same root cause. nd namespace1.4: failed to track label: 0 WARNING: CPU: 17 PID: 1381 at drivers/nvdimm/label.c:863 RIP: 0010:__pmem_label_update+0x56c/0x590 [libnvdimm] Call Trace: ? nd_pmem_namespace_label_update+0xd6/0x160 [libnvdimm] nd_pmem_namespace_label_update+0xd6/0x160 [libnvdimm] uuid_store+0x17e/0x190 [libnvdimm] kernfs_fop_write+0xf0/0x1a0 vfs_write+0xb7/0x1b0 ksys_write+0x57/0xd0 do_syscall_64+0x60/0x210 Unfortunately those reports were typically with a busy parallel namespace creation / destruction loop making it difficult to see the components of the bug. However, Jane provided a simple reproducer using the work-in-progress sub-section implementation. When ndctl is reconfiguring a namespace it may take an existing defunct / disabled namespace and reconfigure it with a new uuid and other parameters. Critically namespace_update_uuid() takes existing address resources and renames them for the new namespace to use / reconfigure as it sees fit. The bug is that this rename only happens in the resource tracking tree. Existing labels with the old uuid are not reaped leading to a scenario where multiple active labels reference the same span of address range. Teach namespace_update_uuid() to flag any references to the old uuid for reaping at the next label update attempt. Cc: <[email protected]> Fixes: bf9bccc ("libnvdimm: pmem label sets and namespace instantiation") Link: pmem/ndctl#91 Reported-by: Jane Chu <[email protected]> Reported-by: Jeff Moyer <[email protected]> Reported-by: Erwin Tsaur <[email protected]> Cc: Johannes Thumshirn <[email protected]> Signed-off-by: Dan Williams <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>

commit c4703ce11c23423d4b46e3d59aef7979814fd608 upstream. Users have reported intermittent occurrences of DIMM initialization failures due to duplicate allocations of address capacity detected in the labels, or errors of the form below, both have the same root cause. nd namespace1.4: failed to track label: 0 WARNING: CPU: 17 PID: 1381 at drivers/nvdimm/label.c:863 RIP: 0010:__pmem_label_update+0x56c/0x590 [libnvdimm] Call Trace: ? nd_pmem_namespace_label_update+0xd6/0x160 [libnvdimm] nd_pmem_namespace_label_update+0xd6/0x160 [libnvdimm] uuid_store+0x17e/0x190 [libnvdimm] kernfs_fop_write+0xf0/0x1a0 vfs_write+0xb7/0x1b0 ksys_write+0x57/0xd0 do_syscall_64+0x60/0x210 Unfortunately those reports were typically with a busy parallel namespace creation / destruction loop making it difficult to see the components of the bug. However, Jane provided a simple reproducer using the work-in-progress sub-section implementation. When ndctl is reconfiguring a namespace it may take an existing defunct / disabled namespace and reconfigure it with a new uuid and other parameters. Critically namespace_update_uuid() takes existing address resources and renames them for the new namespace to use / reconfigure as it sees fit. The bug is that this rename only happens in the resource tracking tree. Existing labels with the old uuid are not reaped leading to a scenario where multiple active labels reference the same span of address range. Teach namespace_update_uuid() to flag any references to the old uuid for reaping at the next label update attempt. Cc: <[email protected]> Fixes: bf9bccc ("libnvdimm: pmem label sets and namespace instantiation") Link: pmem/ndctl#91 Reported-by: Jane Chu <[email protected]> Reported-by: Jeff Moyer <[email protected]> Reported-by: Erwin Tsaur <[email protected]> Cc: Johannes Thumshirn <[email protected]> Signed-off-by: Dan Williams <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>

commit c4703ce upstream. Users have reported intermittent occurrences of DIMM initialization failures due to duplicate allocations of address capacity detected in the labels, or errors of the form below, both have the same root cause. nd namespace1.4: failed to track label: 0 WARNING: CPU: 17 PID: 1381 at drivers/nvdimm/label.c:863 RIP: 0010:__pmem_label_update+0x56c/0x590 [libnvdimm] Call Trace: ? nd_pmem_namespace_label_update+0xd6/0x160 [libnvdimm] nd_pmem_namespace_label_update+0xd6/0x160 [libnvdimm] uuid_store+0x17e/0x190 [libnvdimm] kernfs_fop_write+0xf0/0x1a0 vfs_write+0xb7/0x1b0 ksys_write+0x57/0xd0 do_syscall_64+0x60/0x210 Unfortunately those reports were typically with a busy parallel namespace creation / destruction loop making it difficult to see the components of the bug. However, Jane provided a simple reproducer using the work-in-progress sub-section implementation. When ndctl is reconfiguring a namespace it may take an existing defunct / disabled namespace and reconfigure it with a new uuid and other parameters. Critically namespace_update_uuid() takes existing address resources and renames them for the new namespace to use / reconfigure as it sees fit. The bug is that this rename only happens in the resource tracking tree. Existing labels with the old uuid are not reaped leading to a scenario where multiple active labels reference the same span of address range. Teach namespace_update_uuid() to flag any references to the old uuid for reaping at the next label update attempt. Cc: <[email protected]> Fixes: bf9bccc ("libnvdimm: pmem label sets and namespace instantiation") Link: pmem/ndctl#91 Reported-by: Jane Chu <[email protected]> Reported-by: Jeff Moyer <[email protected]> Reported-by: Erwin Tsaur <[email protected]> Cc: Johannes Thumshirn <[email protected]> Signed-off-by: Dan Williams <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>

commit c4703ce11c23423d4b46e3d59aef7979814fd608 upstream. Users have reported intermittent occurrences of DIMM initialization failures due to duplicate allocations of address capacity detected in the labels, or errors of the form below, both have the same root cause. nd namespace1.4: failed to track label: 0 WARNING: CPU: 17 PID: 1381 at drivers/nvdimm/label.c:863 RIP: 0010:__pmem_label_update+0x56c/0x590 [libnvdimm] Call Trace: ? nd_pmem_namespace_label_update+0xd6/0x160 [libnvdimm] nd_pmem_namespace_label_update+0xd6/0x160 [libnvdimm] uuid_store+0x17e/0x190 [libnvdimm] kernfs_fop_write+0xf0/0x1a0 vfs_write+0xb7/0x1b0 ksys_write+0x57/0xd0 do_syscall_64+0x60/0x210 Unfortunately those reports were typically with a busy parallel namespace creation / destruction loop making it difficult to see the components of the bug. However, Jane provided a simple reproducer using the work-in-progress sub-section implementation. When ndctl is reconfiguring a namespace it may take an existing defunct / disabled namespace and reconfigure it with a new uuid and other parameters. Critically namespace_update_uuid() takes existing address resources and renames them for the new namespace to use / reconfigure as it sees fit. The bug is that this rename only happens in the resource tracking tree. Existing labels with the old uuid are not reaped leading to a scenario where multiple active labels reference the same span of address range. Teach namespace_update_uuid() to flag any references to the old uuid for reaping at the next label update attempt. Cc: <[email protected]> Fixes: bf9bccc ("libnvdimm: pmem label sets and namespace instantiation") Link: pmem/ndctl#91 Reported-by: Jane Chu <[email protected]> Reported-by: Jeff Moyer <[email protected]> Reported-by: Erwin Tsaur <[email protected]> Cc: Johannes Thumshirn <[email protected]> Signed-off-by: Dan Williams <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>

djbw · 2019-06-03T18:09:25Z

Please re-open if the above issue is not fixed by the latest kernel.

commit c4703ce11c23423d4b46e3d59aef7979814fd608 upstream. Users have reported intermittent occurrences of DIMM initialization failures due to duplicate allocations of address capacity detected in the labels, or errors of the form below, both have the same root cause. nd namespace1.4: failed to track label: 0 WARNING: CPU: 17 PID: 1381 at drivers/nvdimm/label.c:863 RIP: 0010:__pmem_label_update+0x56c/0x590 [libnvdimm] Call Trace: ? nd_pmem_namespace_label_update+0xd6/0x160 [libnvdimm] nd_pmem_namespace_label_update+0xd6/0x160 [libnvdimm] uuid_store+0x17e/0x190 [libnvdimm] kernfs_fop_write+0xf0/0x1a0 vfs_write+0xb7/0x1b0 ksys_write+0x57/0xd0 do_syscall_64+0x60/0x210 Unfortunately those reports were typically with a busy parallel namespace creation / destruction loop making it difficult to see the components of the bug. However, Jane provided a simple reproducer using the work-in-progress sub-section implementation. When ndctl is reconfiguring a namespace it may take an existing defunct / disabled namespace and reconfigure it with a new uuid and other parameters. Critically namespace_update_uuid() takes existing address resources and renames them for the new namespace to use / reconfigure as it sees fit. The bug is that this rename only happens in the resource tracking tree. Existing labels with the old uuid are not reaped leading to a scenario where multiple active labels reference the same span of address range. Teach namespace_update_uuid() to flag any references to the old uuid for reaping at the next label update attempt. Cc: <[email protected]> Fixes: bf9bccc14c05 ("libnvdimm: pmem label sets and namespace instantiation") Link: pmem/ndctl#91 Reported-by: Jane Chu <[email protected]> Reported-by: Jeff Moyer <[email protected]> Reported-by: Erwin Tsaur <[email protected]> Cc: Johannes Thumshirn <[email protected]> Signed-off-by: Dan Williams <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>