Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Numerious errors and NULL pointer references in 0.6.4 #3335

Closed
seletskiy opened this issue Apr 23, 2015 · 6 comments
Closed

Numerious errors and NULL pointer references in 0.6.4 #3335

seletskiy opened this issue Apr 23, 2015 · 6 comments

Comments

@seletskiy
Copy link
Contributor

ZFS 0.6.4

Some zfs get commands stuck in uninterruptable sleep:

root@db5 ~ # ps axfu | awk '$8~/D/'    
root     28608  0.0  0.0  35812  3156 pts/0    D+   15:12   0:00  |           |       \_ zfs get -Hp -o value mountpoint zroot/instances/dbfarm-mongodb-comments.mongo-comments-1406
root     24631  0.0  0.0  35812  3192 pts/0    D+   15:18   0:00  |                   \_ zfs get -Hp -o value mountpoint zroot/instances/dbfarm-mongodb-turizm.mongo-turizm-1630
root     10686  0.0  0.0  35812  3152 ?        D    15:40   0:00  |       \_ zfs get -Hp -o value origin zroot/instances/mongodb-messages.mongo-messages-1891
VERIFY3(0 == nvlist_add_nvlist(config, "feature_stats", features)) failed (0 == 0)
PANIC at spa.c:3280:spa_add_feature_stats()
Showing stack for process 10686
CPU: 14 PID: 10686 Comm: zfs Tainted: P      D    O  3.16.4-1-apparmor #1
Hardware name: Cisco Systems Inc R200-1120402W/R200-1120402W, BIOS C200.1.4.3l.0.071820140350 07/18/2014
 0000000000000000 00000000154d5614 ffff88036856bbd8 ffffffff8154be7a
 ffffffffa0a7fddb ffff88036856bbe8 ffffffffa0152094 ffff88036856bd70
 ffffffffa015215f ffff88036856bc08 ffff880300000030 ffff88036856bd80
Call Trace:
 [<ffffffff8154be7a>] dump_stack+0x4d/0x6f
 [<ffffffffa0152094>] spl_dumpstack+0x44/0x50 [spl]
 [<ffffffffa015215f>] spl_panic+0xbf/0xf0 [spl]
 [<ffffffffa019b834>] ? nvlist_copy_pairs.isra.54+0x84/0xa0 [znvpair]
 [<ffffffffa019b1c7>] ? nvlist_remove_all+0x67/0xc0 [znvpair]
 [<ffffffffa019b697>] ? nvlist_add_common.part.53+0x317/0x430 [znvpair]
 [<ffffffffa09fff99>] spa_get_stats+0x389/0x530 [zfs]
 [<ffffffffa0a33ac9>] zfs_ioc_pool_stats+0x39/0x90 [zfs]
 [<ffffffffa0a37045>] zfsdev_ioctl+0x485/0x520 [zfs]
 [<ffffffff811db67b>] ? getname_flags+0x4b/0x180
 [<ffffffff811e3550>] do_vfs_ioctl+0x2d0/0x4b0
 [<ffffffff811db602>] ? final_putname+0x22/0x50
 [<ffffffff811e37b1>] SyS_ioctl+0x81/0xa0
 [<ffffffff81551be9>] system_call_fastpath+0x16/0x1b
BUG: unable to handle kernel NULL pointer dereference at           (null)
IP: [<ffffffffa0199510>] nvp_buf_unlink.isra.3+0x20/0x70 [znvpair]
PGD 2e7a29067 PUD 2f86ea067 PMD 0 
Oops: 0002 [#2] PREEMPT SMP 
Modules linked in: veth bridge stp llc 8021q mrp ext4 crc16 mbcache iTCO_wdt coretemp iTCO_vendor_support jbd2 gpio_ich intel_powerclamp kvm_intel kvm crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel aesni_intel mgag200 aes_x86_64 lrw gf128mul glue_helper ablk_helper cryptd ttm microcode drm_kms_helper pcspkr joydev igb evdev drm mousedev mac_hid syscopyarea ptp sysfillrect pps_core sysimgblt i2c_i801 i2c_algo_bit i2c_core lpc_ich ioatdma i7core_edac tpm_tis edac_core ipmi_si tpm dca acpi_power_meter ipmi_msghandler hwmon ac wmi button shpchp processor sch_fq_codel hid_generic usbhid hid uas usb_storage zfs(PO) zunicode(PO) zcommon(PO) znvpair(PO) spl(O) zavl(PO) sd_mod crc_t10dif crct10dif_common sr_mod cdrom ata_generic pata_acpi ehci_pci uhci_hcd mpt2sas ata_piix raid_class
 ehci_hcd libata scsi_transport_sas usbcore usb_common scsi_mod
CPU: 14 PID: 31642 Comm: zfs Tainted: P      D    O  3.16.4-1-apparmor #1
Hardware name: Cisco Systems Inc R200-1120402W/R200-1120402W, BIOS C200.1.4.3l.0.071820140350 07/18/2014
task: ffff88080a1d7010 ti: ffff8802e675c000 task.ti: ffff8802e675c000
RIP: 0010:[<ffffffffa0199510>]  [<ffffffffa0199510>] nvp_buf_unlink.isra.3+0x20/0x70 [znvpair]
RSP: 0018:ffff8802e675fcb0  EFLAGS: 00010287
RAX: ffff8808182f0b40 RBX: ffff8808182f0b40 RCX: ffff8808182f0420
RDX: 0000000000000000 RSI: ffff8808182f0b50 RDI: ffff880c4b475940
RBP: ffff8802e675fcb0 R08: 0000000000000000 R09: ffffea002060bc00
R10: ffffffffa014cd32 R11: 00000000000000de R12: ffff8808182f0420
R13: ffffffffa0a85ff8 R14: ffff880c4c029ca0 R15: ffff8808182f0b50
FS:  00007f2089ad1b80(0000) GS:ffff880657ce0000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000000000000000 CR3: 00000003f2558000 CR4: 00000000000007e0
Stack:
 ffff8802e675fce8 ffffffffa019b1db ffff880c3dcba001 0000000000000001
 ffff8802e675fd68 ffffffffa0a85ff8 0000000000000008 ffff8802e675fd58
 ffffffffa019b697 ffff880830c1e6d0 ffff880800000000 0000001f00000008
Call Trace:
 [<ffffffffa019b1db>] nvlist_remove_all+0x7b/0xc0 [znvpair]
 [<ffffffffa019b697>] nvlist_add_common.part.53+0x317/0x430 [znvpair]
 [<ffffffffa019bbeb>] nvlist_add_uint64+0x3b/0x40 [znvpair]
 [<ffffffffa09ffdcc>] spa_get_stats+0x1bc/0x530 [zfs]
 [<ffffffffa0a33ac9>] zfs_ioc_pool_stats+0x39/0x90 [zfs]
 [<ffffffffa0a37045>] zfsdev_ioctl+0x485/0x520 [zfs]
 [<ffffffff811db67b>] ? getname_flags+0x4b/0x180
 [<ffffffff811e3550>] do_vfs_ioctl+0x2d0/0x4b0
 [<ffffffff811db602>] ? final_putname+0x22/0x50
 [<ffffffff811e37b1>] SyS_ioctl+0x81/0xa0
 [<ffffffff81551be9>] system_call_fastpath+0x16/0x1b
Code: eb c8 66 0f 1f 84 00 00 00 00 00 66 66 66 66 90 55 48 8d 46 f0 48 3b 47 10 48 89 e5 74 2d 48 3b 07 74 38 48 8b 56 f8 48 8b 4e f0 <48> 89 0a 48 3b 47 08 74 37 48 8b 46 f0 48 8b 56 f8 48 89 50 08 
RIP  [<ffffffffa0199510>] nvp_buf_unlink.isra.3+0x20/0x70 [znvpair]
 RSP <ffff8802e675fcb0>
CR2: 0000000000000000
---[ end trace 81afea2de310f037 ]---
@behlendorf
Copy link
Contributor

@nedbass this is almost certainly related to 417104b. It seems like perhaps there's a race here for the nvlist manipulation and we need to have this locked. Thoughts?

@nedbass
Copy link
Contributor

nedbass commented Apr 23, 2015

Yes the manipulation of spa->spa_feat_stats is racy. We can add a mutex to serialize access to it.

nedbass added a commit to nedbass/zfs that referenced this issue Apr 23, 2015
The function spa_add_feature_stats() manipulates the shared nvlist
spa->spa_feat_stats in an unsafe concurrent manner. Add a mutex to
protect the list.

Issue openzfs#3335
Signed-off-by: Ned Bass <[email protected]>
kernelOfTruth pushed a commit to kernelOfTruth/zfs that referenced this issue Apr 23, 2015
The function spa_add_feature_stats() manipulates the shared nvlist
spa->spa_feat_stats in an unsafe concurrent manner. Add a mutex to
protect the list.

Issue openzfs#3335
Signed-off-by: Ned Bass <[email protected]>
@nedbass
Copy link
Contributor

nedbass commented Apr 23, 2015

@seletskiy if you have the ability and time to test this patch please do so. Thanks

@kernelOfTruth
Copy link
Contributor

adding related pull-request: #3339

@seletskiy
Copy link
Contributor Author

@nedbass: looks like issue is not reproducing using your patch. Thanks!

@behlendorf
Copy link
Contributor

@seletskiy thanks for independently verifying the patch.

nedbass added a commit that referenced this issue Jun 24, 2015
The function spa_add_feature_stats() manipulates the shared nvlist
spa->spa_feat_stats in an unsafe concurrent manner. Add a mutex to
protect the list.

Signed-off-by: Ned Bass <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]>
Closes #3335
MorpheusTeam pushed a commit to Xyratex/lustre-stable that referenced this issue Aug 10, 2015
Updates ZFS and SPL to latest maintence version.  Includes the
following:

Bug Fixes:
* Fix panic due to corrupt nvlist when running utilities
(openzfs/zfs#3335)
* Fix hard lockup due to infinite loop in zfs_zget()
(openzfs/zfs#3349)
* Fix panic on unmount due to iput taskq (openzfs/zfs#3281)
* Improve metadata shrinker performance on pre-3.1 kernels
(openzfs/zfs#3501)
* Linux 4.1 compat: use read_iter() / write_iter()
* Linux 3.12 compat: NUMA-aware per-superblock shrinker
* Fix spurious hung task watchdog stack traces (openzfs/zfs#3402)
* Fix module loading in zfs import systemd service
(openzfs/zfs#3440)
* Fix intermittent libzfs_init() failure to open /dev/zfs
(openzfs/zfs#2556)

Signed-off-by: Nathaniel Clark <[email protected]>
Change-Id: I053087317ff9e5bedc1671bb46062e96bfe6f074
Reviewed-on: http://review.whamcloud.com/15481
Reviewed-by: Alex Zhuravlev <[email protected]>
Tested-by: Jenkins
Reviewed-by: Isaac Huang <[email protected]>
Tested-by: Maloo <[email protected]>
Reviewed-by: Oleg Drokin <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants