-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sa_find_idx_tab() ASSERTION failed ,spl panic,lead that ls hung #2801
Comments
@inevity Have you got either or both of |
i set xattr=sa;and the corrupted file was create on the zfs 0.6.0-RC14 so next should i do ? fortunately i have a replication of the file on other machine,the two same file got write by the glusterfs afr . |
The inode of interest is 3793886 as shown in your dump of the directory above. While you're at it, could you also do |
@dweeezil This sounds a lot like the variable length SA issue which impacted symlinks and was fixed in 0.6.3. It sounds like this file was created with 0.6.0-rc14 so that may explain how this happened. |
[root@CNC-LQ-o-9ED zfs-ce58fc178bd5c6e8d462c21f1b8952685d2f852d]# ./cmd/zdb/zdb -ddddddd zpool/zfs 3793886
3793886 1 16K 512 14.0K 512 0.00 ZFS plain file (K=inherit) (Z=inherit)
lt-zdb: ../../cmd/zdb/zdb.c:1539: Assertion `zap_lookup(os, MASTER_NODE_OBJ, ZFS_FUID_TABLES, 8, 1, &fuid_obj) == 0' failed. [root@CNC-LQ-o-9ED zfs-ce58fc178bd5c6e8d462c21f1b8952685d2f852d]# zdb -dddd zpool/zfs 5 6
ECC memory,how to check whether we have? some other info for your referencr crash> struct nameidata ffff8808712cdd88 crash> struct dentry ffffff9c //this address or command corrcet? |
What i need to know not only how this happed,also that avoiding the crash on the fs not removing the file or recreate a fs. |
@inevity Oops, @behlendorf's comment made me realize I overlooked the fact that your files may have been created pre 0.6.3. I was concerned that you may have been seeing the current elusive problem in which the SA layout is incorrect. In your case, however, the layout is correct. The cause of your corruption should have been fixed the trio of 83021b4, 5d862cb and 472e7c6. You're likely going to have to recreate the filesystem(s) with this corruption. |
I know the three sa related issues have force me to upgrade zfs to 0.6.3 from 0.6.0-RC14.,but i came across the issue #2597 at 0.6.3.i think we should apply the patch 'Add object type checking to zap_lockdir() '.so i finally use the zfs master. how can I recreate fs while keep original file? or is there a patch we can apply to work around the corrupted file same as the method issue 2597 using. In #2597 i apply the patch,the ls no long hang ,only return invalid argument when ls one corrupted file. |
Matching commits and issue tracker entries in Illumos: https://illumos.org/issues/6434 sa_find_sizes() may compute wrong SA header size |
Closing, all of these issues have been addressed in ZoL and we'd pushed the fix upstream to illumos. |
Using zol master , zfs git zfs e82cdc3 spl de2a22f。
LOAD AVERAGE: 1.11, 1.47, 1.37
TASKS: 623
NODENAME: CNC-LQ-o-9ED
RELEASE: 2.6.32-358.el6.x86_64
VERSION: #1 SMP Fri Feb 22 00:31:26 UTC 2013
MACHINE: x86_64 (2000 Mhz)
MEMORY: 64 GB
PANIC: "Kernel panic - not syncing: hung_task: blocked tasks"
PID: 257
COMMAND: "khungtaskd"
TASK: ffff880873ac8aa0 [THREAD_INFO: ffff8808713de000]
CPU: 19
STATE: TASK_RUNNING (PANIC)
ZFS: Unloaded module v0.6.3-1
SPL: Unloaded module v0.6.3-1
SPL: Loaded module v0.6.3-12_gde2a22f (DEBUG mode) //
ZFS: Loaded module v0.6.3-113_ge82cdc3 (DEBUG mode), ZFS pool version 5000, ZFS filesystem version 5
SPL: using hostid 0x00000000
SPLError: 190622:0:(sa.c:1538:sa_find_idx_tab()) ASSERTION((IS_SA_BONUSTYPE(bonustype) && SA_HDR_SIZE_MATCH_LAYOUT(hdr, tb)) || !IS_SA_BONUSTYPE(bonustype) || (IS_SA_BONUSTYPE(bonustype) && hdr->sa_layout_info == 0)) failed
SPLError: 190622:0:(sa.c:1538:sa_find_idx_tab()) SPL PANIC
SPL: Showing stack for process 190622
Pid: 190622, comm: ls Tainted: P --------------- 2.6.32-358.el6.x86_64 #1
Call Trace:
[] ? spl_debug_dumpstack+0x46/0x60 [spl]
[] ? spl_debug_bug+0x81/0xd0 [spl]
[] ? spl_PANIC+0xba/0xf0 [spl]
[] ? submit_bio+0x8d/0x120
[] ? avl_find+0x65/0x100 [zavl]
[] ? sa_find_idx_tab+0x227/0x2e0 [zfs]
[] ? __cv_init+0x89/0x1f0 [spl]
[] ? zio_cons+0x47/0x120 [zfs]
[] ? sa_build_index+0x93/0x1b0 [zfs]
[] ? sa_handle_get_from_db+0x11c/0x160 [zfs]
[] ? zfs_znode_sa_init+0x144/0x200 [zfs]
[] ? zfs_znode_alloc+0x177/0x6c0 [zfs]
[] ? zio_wait+0x22b/0x3d0 [zfs]
[] ? dbuf_read+0x640/0xcd0 [zfs]
[] ? mutex_lock+0x1e/0x50
[] ? refcount_remove+0x16/0x20 [zfs]
[] ? mutex_lock+0x1e/0x50
[] ? dmu_object_info_from_dnode+0x129/0x200 [zfs]
[] ? zfs_zget+0x260/0x300 [zfs]
[] ? zfs_dirent_lock+0x560/0x670 [zfs]
[] ? zfs_dirlook+0x93/0x2c0 [zfs]
[] ? zfs_zaccess+0xa0/0x4b0 [zfs]
[] ? zfs_lookup+0x2ee/0x340 [zfs]
[] ? zpl_lookup+0x78/0x130 [zfs]
[] ? do_lookup+0x1a5/0x230
[] ? __link_path_walk+0x734/0x1030
[] ? path_walk+0x6a/0xe0
[] ? do_path_lookup+0x5b/0xa0
[] ? user_path_at+0x57/0xa0
[] ? putname+0x35/0x50
[] ? user_path_at+0x62/0xa0
[] ? vfs_fstatat+0x3c/0x80
[] ? _atomic_dec_and_lock+0x55/0x80
[] ? vfs_lstat+0x1e/0x20
[] ? sys_newlstat+0x24/0x50
[] ? path_put+0x31/0x40
[] ? sys_lgetxattr+0x61/0x80
[] ? system_call_fastpath+0x16/0x1b
INFO: task ls:190622 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
ls D 0000000000000007 0 190622 190452 0x00000000
ffff880fa71473f8 0000000000000082 ffff880fa71473c0 ffff880fa71473bc
ffff880fa7147388 ffff88087fe82c00 ffff88089c4f6700 0000000000000400
ffff880faf66faf8 ffff880fa7147fd8 000000000000fb88 ffff880faf66faf8
Call Trace:
[] spl_debug_bug+0xa5/0xd0 [spl]
[] spl_PANIC+0xba/0xf0 [spl]
[] ? submit_bio+0x8d/0x120
[] ? avl_find+0x65/0x100 [zavl]
[] sa_find_idx_tab+0x227/0x2e0 [zfs]
[] ? __cv_init+0x89/0x1f0 [spl]
[] ? zio_cons+0x47/0x120 [zfs]
[] sa_build_index+0x93/0x1b0 [zfs]
[] sa_handle_get_from_db+0x11c/0x160 [zfs]
[] zfs_znode_sa_init+0x144/0x200 [zfs]
[] zfs_znode_alloc+0x177/0x6c0 [zfs]
[] ? zio_wait+0x22b/0x3d0 [zfs]
[] ? dbuf_read+0x640/0xcd0 [zfs]
[] ? mutex_lock+0x1e/0x50
[] ? refcount_remove+0x16/0x20 [zfs]
[] ? mutex_lock+0x1e/0x50
[] ? dmu_object_info_from_dnode+0x129/0x200 [zfs]
[] zfs_zget+0x260/0x300 [zfs]
[] zfs_dirent_lock+0x560/0x670 [zfs]
[] zfs_dirlook+0x93/0x2c0 [zfs]
[] ? zfs_zaccess+0xa0/0x4b0 [zfs]
[] zfs_lookup+0x2ee/0x340 [zfs]
[] zpl_lookup+0x78/0x130 [zfs]
[] do_lookup+0x1a5/0x230
[] __link_path_walk+0x734/0x1030
[] path_walk+0x6a/0xe0
[] do_path_lookup+0x5b/0xa0
[] user_path_at+0x57/0xa0
[] ? putname+0x35/0x50
[] ? user_path_at+0x62/0xa0
[] vfs_fstatat+0x3c/0x80
[] ? _atomic_dec_and_lock+0x55/0x80
[] vfs_lstat+0x1e/0x20
[] sys_newlstat+0x24/0x50
[] ? path_put+0x31/0x40
[] ? sys_lgetxattr+0x61/0x80
[] system_call_fastpath+0x16/0x1b
Kernel panic - not syncing: hung_task: blocked tasks
Pid: 257, comm: khungtaskd Tainted: P --------------- 2.6.32-358.el6.x86_64 #1
Call Trace:
[] ? panic+0xa7/0x16f
[] ? watchdog+0x217/0x220
[] ? watchdog+0x0/0x220
[] ? kthread+0x96/0xa0
[] ? child_rip+0xa/0x20
[] ? kthread+0x0/0xa0
[] ? child_rip+0x0/0x20
crash> ps |grep UN
190622 190452 7 ffff880faf66f540 UN 0.0 141460 24804 ls
crash> bt 190622
PID: 190622 TASK: ffff880faf66f540 CPU: 7 COMMAND: "ls"
#0 [ffff880fa7147338] schedule at ffffffff8150d692
#1 [ffff880fa7147400] spl_debug_bug at ffffffffa0565df5 [spl]
#2 [ffff880fa7147430] spl_PANIC at ffffffffa057553a [spl]
#3 [ffff880fa71475d0] sa_find_idx_tab at ffffffffa0669b67 [zfs]
#4 [ffff880fa71476a0] sa_build_index at ffffffffa066a8c3 [zfs]
#5 [ffff880fa71476e0] sa_handle_get_from_db at ffffffffa066cedc [zfs]
#6 [ffff880fa7147760] zfs_znode_sa_init at ffffffffa06d7284 [zfs]
#7 [ffff880fa71477b0] zfs_znode_alloc at ffffffffa06d8dd7 [zfs]
#8 [ffff880fa7147990] zfs_zget at ffffffffa06d9580 [zfs]
#9 [ffff880fa7147a50] zfs_dirent_lock at ffffffffa06b4d00 [zfs]
#10 [ffff880fa7147b00] zfs_dirlook at ffffffffa06b5023 [zfs]
#11 [ffff880fa7147b80] zfs_lookup at ffffffffa06d263e [zfs]
#12 [ffff880fa7147bf0] zpl_lookup at ffffffffa06f40a8 [zfs]
#13 [ffff880fa7147c40] do_lookup at ffffffff81190405
#14 [ffff880fa7147ca0] __link_path_walk at ffffffff81190bc4
#15 [ffff880fa7147d60] path_walk at ffffffff8119174a
#16 [ffff880fa7147da0] do_path_lookup at ffffffff8119191b
#17 [ffff880fa7147dd0] user_path_at at ffffffff811925a7
#18 [ffff880fa7147ea0] vfs_fstatat at ffffffff811869bc
#19 [ffff880fa7147ee0] vfs_lstat at ffffffff81186a6e
#20 [ffff880fa7147ef0] sys_newlstat at ffffffff81186a94
#21 [ffff880fa7147f80] system_call_fastpath at ffffffff8100b072
crash> files 190622
PID: 190622 TASK: ffff880faf66f540 CPU: 7 COMMAND: "ls"
ROOT: / CWD: /root/src/zfs
FD FILE DENTRY INODE TYPE PATH
0 ffff88106f80e5c0 ffff8808467c8d80 ffff88086ff0a838 CHR /dev/pts/5
1 ffff88106f80e5c0 ffff8808467c8d80 ffff88086ff0a838 CHR /dev/pts/5
2 ffff8807934232c0 ffff880844e22780 ffff880844c82a78 REG /root/src/zfs/err1.txt
3 ffff881070bfa0c0 ffff880f8df69740 ffff880663c83c98 DIR /mnt/zpool/zfs/.glusterfs/40/92
[root@CNC-LQ-o-9ED ~]# stat /mnt/zpool/zfs/.glusterfs/40/92
File: `/mnt/zpool/zfs/.glusterfs/40/92'
Size: 16 Blocks: 29 IO Block: 1024 directory
Device: 13h/19d Inode: 63662 Links: 2
Access: (0700/drwx------) Uid: ( 0/ root) Gid: ( 0/ root)
Access: 2014-06-20 02:13:36.032128000 +0800
Modify: 2014-10-01 22:38:31.143065000 +0800
Change: 2014-10-01 22:38:31.143065000 +0800
[root@CNC-LQ-o-9ED ~]# zdb -vvvv zpool/zfs 63662
Dataset zpool/zfs [ZPL], ID 41, cr_txg 6, 34.0T, 1714220 objects, rootbp DVA[0]=<0:20c049484000:2000> DVA[1]=<0:27c124424000:2000> [L0 DMU objset] fletcher4 lzjb LE contiguous unique double size=800L/200P birth=1959830L/1959830P fill=1714220 cksum=19eaa47e24:89d712eb261:18ef85bb41fe9:33776644b0976d
Indirect blocks:
0 L0 0:27c140742000:2000 400L/400P F=1 B=1691223/1691223
since then we found the wrong dir,I found that below
ls -lahR /mnt/zpool/zfs/.glusterfs/40/92
/mnt/zpool/zfs/.glusterfs/40/92:
Message from syslogd@、 at Oct 15 14:08:06 ...
kernel:SPLError: 99053:0:(sa.c:1538:sa_find_idx_tab()) ASSERTION((IS_SA_BONUSTYPE(bonustype) && SA_HDR_SIZE_MATCH_LAYOUT(hdr, tb)) || !IS_SA_BONUSTYPE(bonustype) || (IS_SA_BONUSTYPE(bonustype) && hdr->sa_layout_info == 0)) failed
Message from syslogd@Oct 15 14:08:06 ...
kernel:SPLError: 99053:0:(sa.c:1538:sa_find_idx_tab()) SPL PANIC
ls hung!
The text was updated successfully, but these errors were encountered: