Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

zfs command hangs after corrupting storage #45

Closed
nedbass opened this issue Aug 5, 2010 · 1 comment
Closed

zfs command hangs after corrupting storage #45

nedbass opened this issue Aug 5, 2010 · 1 comment

Comments

@nedbass
Copy link
Contributor

nedbass commented Aug 5, 2010

I created a raidz zpool with 16 disks and a 100G volume within it then
trashed the storage by dd'ing from /dev/zero over the first 1GB of
each disk. I scrubbed the zpool to make ZFS aware of the damage, then
attempted to access the volume with the zfs command. This triggered an
SPLError and the zfs command never returns. The console shows stack traces
for the hung process. I can reproduce this with both 'zfs list' and 'zfs destroy
tank/fish'.

Note that if I don't first scrub the zpool, so that 'zpool status'
doesn't report any errors, then 'zfs list' and 'zfs destroy tank/fish'
both return normally.

Here are the commands used with output trimmed.

> cat disks

> cat disks | xargs zpool create -f tank

> zfs create -V 100G tank/fish
> for x in `cat disks` ; do \
        dd  if=/dev/zero of=$x bs=32k count=327680 & done

> zpool scrub tank

> zpool status -v

> zfs list

With output.

> cat disks
/dev/disk/zpool/disk1
/dev/disk/zpool/disk2
/dev/disk/zpool/disk3
/dev/disk/zpool/disk4
/dev/disk/zpool/disk5
/dev/disk/zpool/disk6
/dev/disk/zpool/disk7
/dev/disk/zpool/disk8
/dev/disk/zpool/disk9
/dev/disk/zpool/disk10
/dev/disk/zpool/disk11
/dev/disk/zpool/disk12
/dev/disk/zpool/disk13
/dev/disk/zpool/disk14
/dev/disk/zpool/disk15
/dev/disk/zpool/disk16

> cat disks | xargs zpool create -f tank

> zfs create -V 100G tank/fish
> for x in `cat disks` ; do \
        sudo dd  if=/dev/zero of=$x bs=32k count=327680 & done
...

> zpool scrub tank

> zpool status -v
  pool: tank
 state: ONLINE
status: One or more devices has experienced an error resulting in data
        corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
        entire pool from backup.
   see: http://www.sun.com/msg/ZFS-8000-8A
 scan: scrub repaired 0 in 0h0m with 27 errors on Thu Aug  5 10:55:47 2010
config:

        NAME            STATE     READ WRITE CKSUM
        tank            ONLINE       0     0    27
          disk1-part1   ONLINE       0     0    54
          disk2-part1   ONLINE       0     0    46
          disk3-part1   ONLINE       0     0     0
          disk4-part1   ONLINE       0     0     0
          disk5-part1   ONLINE       0     0     0
          disk6-part1   ONLINE       0     0     0
          disk7-part1   ONLINE       0     0     0
          disk8-part1   ONLINE       0     0     0
          disk9-part1   ONLINE       0     0     0
          disk10-part1  ONLINE       0     0     0
          disk11-part1  ONLINE       0     0     0
          disk12-part1  ONLINE       0     0     0
          disk13-part1  ONLINE       0     0     0
          disk14-part1  ONLINE       0     0     0
          disk15-part1  ONLINE       0     0     0
          disk16-part1  ONLINE       0     0    54

errors: Permanent errors have been detected in the following files:
        <metadata>:<0x3>
        <metadata>:<0x4>
        <metadata>:<0x6>
        <metadata>:<0x7>
        <metadata>:<0x9>
        <metadata>:<0xa>
        <metadata>:<0xd>
        <metadata>:<0xe>
        <metadata>:<0x10>
        <metadata>:<0x11>
        <metadata>:<0x13>
        <metadata>:<0x16>
        <metadata>:<0x17>
        <metadata>:<0x19>
        <metadata>:<0x1a>
        <metadata>:<0x1b>
        <metadata>:<0x1e>
        <metadata>:<0x2d>
        <metadata>:<0x2f>
        <metadata>:<0x32>
        <metadata>:<0x33>
        <metadata>:<0x35>
        <metadata>:<0x36>
        tank:<0x0>
        tank/fish:<0x0>

> zfs list

Message from syslogd@mrtwig at Aug  5 10:56:30 ...
 kernel:VERIFY(zvol_get_stats(os, nv) == 0) failed

Message from syslogd@mrtwig at Aug  5 10:56:30 ...
 kernel:SPLError: 11472:0:(zfs_ioctl.c:1642:zfs_ioc_objset_stats()) SPL PANIC

Console log:

SPL: Showing stack for process 11175
Pid: 11175, comm: txg_sync Tainted: P        W  2.6.32-11chaos.16k #1
Call Trace:
 [] spl_debug_dumpstack+0x27/0x40 [spl]
 [] kmem_alloc_debug+0x11d/0x130 [spl]
 [] dsl_scan_setup_sync+0x1e1/0x210 [zfs]
 [] ? ftrace_call+0x5/0x2b
 [] dsl_sync_task_group_sync+0x12b/0x210 [zfs]
 [] dsl_pool_sync+0x1eb/0x460 [zfs]
 [] spa_sync+0x387/0x960 [zfs]
 [] ? ftrace_call+0x5/0x2b
 [] ? ftrace_call+0x5/0x2b
 [] txg_sync_thread+0x1c7/0x3d0 [zfs]
 [] ? txg_sync_thread+0x0/0x3d0 [zfs]
 [] ? txg_sync_thread+0x0/0x3d0 [zfs]
 [] ? thread_generic_wrapper+0x0/0x80 [spl]
 [] thread_generic_wrapper+0x68/0x80 [spl]
 [] kthread+0x96/0xa0
 [] ? early_idt_handler+0x0/0x71
 [] child_rip+0xa/0x20
 [] ? early_idt_handler+0x0/0x71
 [] ? kthread+0x0/0xa0
 [] ? child_rip+0x0/0x20
VERIFY(zvol_get_stats(os, nv) == 0) failed
SPLError: 11472:0:(zfs_ioctl.c:1642:zfs_ioc_objset_stats()) SPL PANIC
SPL: Showing stack for process 11472
Pid: 11472, comm: zfs Tainted: P        W  2.6.32-11chaos.16k #1
Call Trace:
 [] spl_debug_dumpstack+0x27/0x40 [spl]
 [] spl_debug_bug+0x81/0xd0 [spl]
 [] zfs_ioc_objset_stats+0x11b/0x120 [zfs]
 [] zfs_ioc_dataset_list_next+0x15b/0x1d0 [zfs]
 [] zfs_ioctl+0xf8/0x1f0 [zfs]
 [] vfs_ioctl+0x36/0xa0
 [] do_vfs_ioctl+0xab/0x550
 [] sys_ioctl+0x87/0xa0
 [] system_call_fastpath+0x16/0x1b
SPL: Dumping log to /tmp/spl-log.1281030990.11472
INFO: task zfs:11472 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
zfs           D ffff880230c23680     0 11472   2764 0x00000080
 ffff8802637c7d58 0000000000000086 0000000000000000 0000000000016840
 ffff88041ffe0670 ffff88041ffe0180 ffff88041ffe00c0 ffff88022831aa60
 ffff88041ffe0670 00000001000eea63 00000000637c7d10 0000000000000000
Call Trace:
 [] spl_debug_bug+0xad/0xd0 [spl]
 [] zfs_ioc_objset_stats+0x11b/0x120 [zfs]
 [] zfs_ioc_dataset_list_next+0x15b/0x1d0 [zfs]
 [] zfs_ioctl+0xf8/0x1f0 [zfs]
 [] vfs_ioctl+0x36/0xa0
 [] do_vfs_ioctl+0xab/0x550
 [] sys_ioctl+0x87/0xa0
 [] system_call_fastpath+0x16/0x1b
@behlendorf
Copy link
Contributor

Add missing zfs_ioc_objset_stats() error handling

Interestingly this looks like an upstream bug as well. If for some
reason we are unable to get a zvols statistics, because perhaps the
zpool is hopelessly corrupt, we would trigger the VERIFY. This
commit adds the proper error handling just to propagate the error
back to user space. Now the user space tools still must handle this
properly but in the worst case the tool will crash or perhaps have
some missing output. That's far far better than crashing the host.

Closed by 89f0abf

akatrevorjay added a commit to akatrevorjay/zfs that referenced this issue Dec 16, 2017
# This is the 1st commit message:
Merge branch 'master' of https://github.com/zfsonlinux/zfs

* 'master' of https://github.com/zfsonlinux/zfs:
  Enable QAT support in zfs-dkms RPM

# This is the commit message openzfs#2:

Import 0.6.5.7-0ubuntu3

# This is the commit message openzfs#3:

gbp changes

# This is the commit message openzfs#4:

Bump ver

# This is the commit message openzfs#5:

-j9 baby

# This is the commit message openzfs#6:

Up

# This is the commit message openzfs#7:

Yup

# This is the commit message openzfs#8:

Add new module

# This is the commit message openzfs#9:

Up

# This is the commit message openzfs#10:

Up

# This is the commit message openzfs#11:

Bump

# This is the commit message openzfs#12:

Grr

# This is the commit message openzfs#13:

Yay

# This is the commit message openzfs#14:

Yay

# This is the commit message openzfs#15:

Yay

# This is the commit message openzfs#16:

Yay

# This is the commit message openzfs#17:

Yay

# This is the commit message openzfs#18:

Yay

# This is the commit message openzfs#19:

yay

# This is the commit message openzfs#20:

yay

# This is the commit message openzfs#21:

yay

# This is the commit message openzfs#22:

Update ppa script

# This is the commit message openzfs#23:

Update gbp conf with br changes

# This is the commit message openzfs#24:

Update gbp conf with br changes

# This is the commit message openzfs#25:

Bump

# This is the commit message openzfs#26:

No pristine

# This is the commit message openzfs#27:

Bump

# This is the commit message openzfs#28:

Lol whoops

# This is the commit message openzfs#29:

Fix name

# This is the commit message openzfs#30:

Fix name

# This is the commit message openzfs#31:

rebase

# This is the commit message openzfs#32:

Bump

# This is the commit message openzfs#33:

Bump

# This is the commit message openzfs#34:

Bump

# This is the commit message openzfs#35:

Bump

# This is the commit message openzfs#36:

ntrim

# This is the commit message openzfs#37:

Bump

# This is the commit message openzfs#38:

9

# This is the commit message openzfs#39:

Bump

# This is the commit message openzfs#40:

Bump

# This is the commit message openzfs#41:

Bump

# This is the commit message openzfs#42:

Revert "9"

This reverts commit de488f1.

# This is the commit message openzfs#43:

Bump

# This is the commit message openzfs#44:

Account for zconfig.sh being removed

# This is the commit message openzfs#45:

Bump

# This is the commit message openzfs#46:

Add artful

# This is the commit message openzfs#47:

Add in zed.d and zpool.d scripts

# This is the commit message openzfs#48:

Bump

# This is the commit message openzfs#49:

Bump

# This is the commit message openzfs#50:

Bump

# This is the commit message openzfs#51:

Bump

# This is the commit message openzfs#52:

ugh

# This is the commit message openzfs#53:

fix zed upgrade

# This is the commit message openzfs#54:

Bump

# This is the commit message openzfs#55:

conf file zed.d

# This is the commit message #56:

Bump
richardelling pushed a commit to richardelling/zfs that referenced this issue Oct 15, 2018
* Rebuild failure UT automation changes.
* [US1430] implementation task of updating ZAP periodically with IO number
Signed-off-by: satbir <[email protected]>
sdimitro pushed a commit to sdimitro/zfs that referenced this issue Apr 9, 2019
allanjude pushed a commit to KlaraSystems/zfs that referenced this issue Apr 28, 2020
…stem (openzfs#45)

Signed-off-by: Bryant G. Ly <[email protected]>

Conflicts:
	cmd/zpool/zpool_main.c
anodos325 pushed a commit to anodos325/zfs that referenced this issue Mar 18, 2022
NAS-113231 / Add more control/visibility to spa_load_verify().
This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants