Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kernel crash during heavy activity #475

Closed
tomharvey opened this issue Nov 30, 2011 · 21 comments
Closed

Kernel crash during heavy activity #475

tomharvey opened this issue Nov 30, 2011 · 21 comments
Milestone

Comments

@tomharvey
Copy link

Under heavy load - while SMB sharing, we've started to see this kind of kernel dump, and the load average just climbs and climbs and climbs while CPU is idle.

Strangely - the system is responsive...in all ways, but currently seeing load average of 151.49, 144.46, 126.38!

Ubuntu 10.04
SPL commit ecc3981
ZFS commit 5cbf6db
samba version 3.4.7

Nov 30 18:42:08 server kernel: [430251.364381] [] txg_sync_thread+0x225/0x3a0 [zfs]
Nov 30 18:42:08 server kernel: [430251.364439] [] ? txg_sync_thread+0x0/0x3a0 [zfs]
Nov 30 18:42:08 server kernel: [430251.364456] [] thread_generic_wrapper+0x68/0x80 [spl]
Nov 30 18:42:08 server kernel: [430251.364473] [] ? thread_generic_wrapper+0x0/0x80 [spl]
Nov 30 18:42:08 server kernel: [430251.364482] [] kthread+0x96/0xa0
Nov 30 18:42:08 server kernel: [430251.364491] [] child_rip+0xa/0x20
Nov 30 18:42:08 server kernel: [430251.364500] [] ? kthread+0x0/0xa0
Nov 30 18:42:08 server kernel: [430251.364508] [] ? child_rip+0x0/0x20
Nov 30 18:42:08 server kernel: [430251.364517] INFO: task smbd:22102 blocked for more than 120 seconds.
Nov 30 18:42:08 server kernel: [430251.364531] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Nov 30 18:42:08 server kernel: [430251.364949] smbd D 0000000000000000 0 22102 22095 0x00000004
Nov 30 18:42:08 server kernel: [430251.364961] ffff880169c2db78 0000000000000082 0000000000015bc0 0000000000015bc0
Nov 30 18:42:08 server kernel: [430251.364974] ffff8802346783b8 ffff880169c2dfd8 0000000000015bc0 ffff880234678000
Nov 30 18:42:08 server kernel: [430251.364988] 0000000000015bc0 ffff880169c2dfd8 0000000000015bc0 ffff8802346783b8
Nov 30 18:42:08 server kernel: [430251.365002] Call Trace:
Nov 30 18:42:08 server kernel: [430251.365018] [] cv_wait_common+0x78/0xe0 [spl]
Nov 30 18:42:08 server kernel: [430251.365027] [] ? autoremove_wake_function+0x0/0x40
Nov 30 18:42:08 server kernel: [430251.365045] [] __cv_wait+0x13/0x20 [spl]
Nov 30 18:42:08 server kernel: [430251.365100] [] txg_wait_open+0x7b/0xa0 [zfs]
Nov 30 18:42:08 server kernel: [430251.365146] [] dmu_tx_wait+0xed/0xf0 [zfs]
Nov 30 18:42:08 server kernel: [430251.365202] [] zfs_write+0x3be/0xc90 [zfs]
Nov 30 18:42:08 server kernel: [430251.365213] [] ? locks_free_lock+0x43/0x60
Nov 30 18:42:08 server kernel: [430251.365223] [] ? vfs_test_lock+0x35/0x40
Nov 30 18:42:08 server kernel: [430251.365231] [] ? fcntl_getlk+0xfb/0x110
Nov 30 18:42:08 server kernel: [430251.365286] [] zpl_write_common+0x52/0x70 [zfs]
Nov 30 18:42:08 server kernel: [430251.365341] [] zpl_write+0x68/0xa0 [zfs]
Nov 30 18:42:08 server kernel: [430251.365352] [] ? security_file_permission+0x16/0x20
Nov 30 18:42:08 server kernel: [430251.365363] [] vfs_write+0xb8/0x1a0
Nov 30 18:42:08 server kernel: [430251.365371] [] ? do_fcntl+0x29b/0x3d0
Nov 30 18:42:08 server kernel: [430251.365381] [] sys_pwrite64+0x82/0xa0
Nov 30 18:42:08 server kernel: [430251.365391] [] system_call_fastpath+0x16/0x1b
Nov 30 18:42:08 server kernel: [430251.365400] INFO: task smbd:19603 blocked for more than 120 seconds.
Nov 30 18:42:08 server kernel: [430251.365819] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Nov 30 18:42:08 server kernel: [430251.366236] smbd D ffff88021d756ac0 0 19603 22095 0x00000004
Nov 30 18:42:08 server kernel: [430251.366242] ffff88019fec3948 0000000000000082 0000000000015bc0 0000000000015bc0
Nov 30 18:42:08 server kernel: [430251.366248] ffff88022b331aa8 ffff88019fec3fd8 0000000000015bc0 ffff88022b3316f0
Nov 30 18:42:08 server kernel: [430251.366253] 0000000000015bc0 ffff88019fec3fd8 0000000000015bc0 ffff88022b331aa8
Nov 30 18:42:08 server kernel: [430251.366259] Call Trace:
Nov 30 18:42:08 server kernel: [430251.366273] [] cv_wait_common+0x78/0xe0 [spl]
Nov 30 18:42:08 server kernel: [430251.366279] [] ? autoremove_wake_function+0x0/0x40
Nov 30 18:42:08 server kernel: [430251.366292] [] __cv_wait+0x13/0x20 [spl]
Nov 30 18:42:08 server kernel: [430251.366343] [] txg_wait_open+0x7b/0xa0 [zfs]
Nov 30 18:42:08 server kernel: [430251.366384] [] dmu_tx_wait+0xed/0xf0 [zfs]
Nov 30 18:42:08 server kernel: [430251.366430] [] dmu_tx_assign+0x6a/0x410 [zfs]
Nov 30 18:42:08 server kernel: [430251.366487] [] zfs_inactive+0xef/0x1e0 [zfs]
Nov 30 18:42:08 server kernel: [430251.366542] [] zpl_clear_inode+0xe/0x10 [zfs]
Nov 30 18:42:08 server kernel: [430251.366551] [] clear_inode+0x7e/0x100
Nov 30 18:42:08 server kernel: [430251.366559] [] dispose_list+0x40/0x150
Nov 30 18:42:08 server kernel: [430251.366568] [] prune_icache+0x199/0x2b0
Nov 30 18:42:08 server kernel: [430251.366578] [] shrink_icache_memory+0x3f/0x50
Nov 30 18:42:08 server kernel: [430251.366587] [] shrink_slab+0x125/0x190
Nov 30 18:42:08 server kernel: [430251.366596] [] do_try_to_free_pages+0x19f/0x370
Nov 30 18:42:08 server kernel: [430251.366606] [] try_to_free_pages+0x6f/0x80
Nov 30 18:42:08 server kernel: [430251.366615] [] ? isolate_pages_global+0x0/0x50
Nov 30 18:42:08 server kernel: [430251.366625] [] __alloc_pages_slowpath+0x2d8/0x590
Nov 30 18:42:08 server kernel: [430251.366636] [] __alloc_pages_nodemask+0x171/0x180
Nov 30 18:42:08 server kernel: [430251.366647] [] alloc_pages_current+0x87/0xd0
Nov 30 18:42:08 server kernel: [430251.366656] [] __get_free_pages+0xe/0x50
Nov 30 18:42:08 server kernel: [430251.366665] [] sys_getcwd+0x34/0x1b0
Nov 30 18:42:08 server kernel: [430251.366674] [] system_call_fastpath+0x16/0x1b

@behlendorf
Copy link
Contributor

I've occasionally seen this myself under a heavy write load, we'll be digging in to it.

@MarkRidley123
Copy link

Hi,

I am seeing this issue too.

I am testing RC6 on a Dell Power Edge 2950, 16GB Ram, separate attached Dell storage array with 12TB.

I have created 1 large pool.

I have tried dedup=off and on, compression=on and off
The performance starts of at around 240MB/sec writing, but then slows down to less than 1MB/sec and then i get this:

Jan 24 05:43:48 bk577 kernel: INFO: task smbd:2775 blocked for more than 120 seconds.
Jan 24 05:43:48 bk577 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jan 24 05:43:48 bk577 kernel: smbd D 0000000000000007 0 2775 2344 0x00000084
Jan 24 05:43:48 bk577 kernel: ffff880267503b78 0000000000000082 0000000000000000 0000000000000001
Jan 24 05:43:48 bk577 kernel: ffff88003d512c00 0000000000000086 ffff880267503af8 ffff880421f4dda0
Jan 24 05:43:48 bk577 kernel: ffff8804257f0678 ffff880267503fd8 000000000000f4e8 ffff8804257f0678
Jan 24 05:43:48 bk577 kernel: Call Trace:
Jan 24 05:43:48 bk577 kernel: [] cv_wait_common+0x78/0xe0 [spl]
Jan 24 05:43:48 bk577 kernel: [] ? autoremove_wake_function+0x0/0x40
Jan 24 05:43:48 bk577 kernel: [] __cv_wait+0x13/0x20 [spl]
Jan 24 05:43:48 bk577 kernel: [] txg_wait_open+0x7b/0xa0 [zfs]
Jan 24 05:43:48 bk577 kernel: [] dmu_tx_wait+0xed/0xf0 [zfs]
Jan 24 05:43:48 bk577 kernel: [] zfs_write+0x3be/0xca0 [zfs]
Jan 24 05:43:48 bk577 kernel: [] ? autoremove_wake_function+0x0/0x40
Jan 24 05:43:48 bk577 kernel: [] ? posix_test_lock+0xdb/0xe0
Jan 24 05:43:48 bk577 kernel: [] zpl_write_common+0x52/0x70 [zfs]
Jan 24 05:43:48 bk577 kernel: [] zpl_write+0x68/0xa0 [zfs]
Jan 24 05:43:48 bk577 kernel: [] ? security_file_permission+0x16/0x20
Jan 24 05:43:48 bk577 kernel: [] vfs_write+0xb8/0x1a0
Jan 24 05:43:48 bk577 kernel: [] ? audit_syscall_entry+0x272/0x2a0
Jan 24 05:43:48 bk577 kernel: [] sys_pwrite64+0x82/0xa0
Jan 24 05:43:48 bk577 kernel: [] system_call_fastpath+0x16/0x1b

It is not just samba that hangs and gets disconnected, rsync daemon blocks too.

I have managed to get about 4 TB of data into the pool but the performance and disconnects all the time now make it unusable. I appreciate this is a release candidate and great job so far.

What help do you need from me to get zfsonlinux working?

@behlendorf
Copy link
Contributor

Your able to consistently reproduce this it would be helpful to get the full stack for the txg_sync_thread. Basically, the smbd process is simply wait for there to be space is the next transaction group. The txg_sync_thread is responsible for moving these txgs forward so we need to see why it's unable to do this. I also suspect that many other threads are blocked waiting on the same thing which is why you see such a high load average.

@MarkRidley123
Copy link

Yes. It hangs every couple of hours.

Stack trace - Will do - how do i do that? I am running Centos 6.2.

-----Original Message-----

From: Brian Behlendorf
Sent: 25 Jan 2012 17:29:13 GMT
To: MarkRidley123
Subject: Re: [zfs] Kernel crash during heavy activity (#475)

Your able to consistently reproduce this it would be helpful to get the full stack for the txg_sync_thread. Basically, the smbd process is simply wait for there to be space is the next transaction group. The txg_sync_thread is responsible for moving these txgs forward so we need to see why it's unable to do this. I also suspect that many other threads are blocked waiting on the same thing which is why you see such a high load average.


Reply to this email directly or view it on GitHub:
#475 (comment)

@maxximino
Copy link
Contributor

If that task is blocked:
echo w >/proc/sysrq-trigger
(result in dmesg)
Otherwise:
echo t >/proc/sysrq-trigger
(for dumping stack traces of every task in your system,that's a lot of data!!)

@behlendorf
Copy link
Contributor

Getting a look at all the stacks would be ideal echo t >/proc/sysrq-trigger.

@MarkRidley123
Copy link

Hi,
Zfs has been blocked for 30 mins now.

Hope this helps. I will a full process listing too.

Thanks.

SysRq : Show Blocked State
task PC stack pid father
zfs_iput_task D 0000000000000000 0 1356 2 0x00000000
ffff880423e17880 0000000000000046 0000000400000000 0000000000000001
0000000000000000 0000000000000086 0000000000000000 ffff88041062bda0
ffff8804257ddb38 ffff880423e17fd8 000000000000f4e8 ffff8804257ddb38
Call Trace:
[] ? prepare_to_wait_exclusive+0x4e/0x80
[] cv_wait_common+0x78/0xe0 [spl]
[] ? autoremove_wake_function+0x0/0x40
[] __cv_wait+0x13/0x20 [spl]
[] txg_wait_open+0x7b/0xa0 [zfs]
[] dmu_tx_wait+0xed/0xf0 [zfs]
[] dmu_tx_assign+0x6a/0x410 [zfs]
[] ? dsl_dataset_block_freeable+0x43/0x60 [zfs]
[] zfs_purgedir+0x144/0x230 [zfs]
[] ? dmu_object_info_from_dnode+0x115/0x190 [zfs]
[] ? zfs_zget+0x165/0x1e0 [zfs]
[] zfs_unlinked_drain+0x11e/0x130 [zfs]
[] ? perf_event_task_sched_out+0x33/0x80
[] ? thread_return+0x4e/0x79d
[] taskq_thread+0x1d2/0x330 [spl]
[] ? default_wake_function+0x0/0x20
[] ? taskq_thread+0x0/0x330 [spl]
[] kthread+0x96/0xa0
[] child_rip+0xa/0x20
[] ? kthread+0x0/0xa0
[] ? child_rip+0x0/0x20
txg_sync D 0000000000000006 0 1358 2 0x00000000
ffff88041062f8b0 0000000000000046 0000000000000000 0000000000000000
ffff88041062f820 ffffffff8105e782 ffff88041062f870 ffffffff8104c969
ffff880425090638 ffff88041062ffd8 000000000000f4e8 ffff880425090638
Call Trace:
[] ? default_wake_function+0x12/0x20
[] ? __wake_up_common+0x59/0x90
[] ? prepare_to_wait_exclusive+0x4e/0x80
[] cv_wait_common+0x78/0xe0 [spl]
[] ? autoremove_wake_function+0x0/0x40
[] __cv_wait+0x13/0x20 [spl]
[] zio_wait+0xeb/0x160 [zfs]
[] dbuf_read+0x409/0x700 [zfs]
[] dmu_buf_hold+0xf0/0x1a0 [zfs]
[] zap_idx_to_blk+0xab/0x140 [zfs]
[] ? dmu_buf_hold+0x107/0x1a0 [zfs]
[] zap_deref_leaf+0x51/0x80 [zfs]
[] fzap_remove+0x37/0xb0 [zfs]
[] ? zap_name_alloc_uint64+0x76/0xa0 [zfs]
[] zap_remove_uint64+0x7a/0xc0 [zfs]
[] ddt_zap_remove+0x16/0x20 [zfs]
[] ddt_sync+0x2aa/0x8b0 [zfs]
[] ? __wake_up_common+0x59/0x90
[] ? kmem_free_debug+0x16/0x20 [spl]
[] ? autoremove_wake_function+0x0/0x40
[] ? zio_destroy+0x56/0xa0 [zfs]
[] spa_sync+0x3ee/0x9a0 [zfs]
[] ? try_to_wake_up+0x24c/0x3e0
[] ? wake_up_process+0x15/0x20
[] txg_sync_thread+0x225/0x3b0 [zfs]
[] ? txg_sync_thread+0x0/0x3b0 [zfs]
[] ? txg_sync_thread+0x0/0x3b0 [zfs]
[] thread_generic_wrapper+0x68/0x80 [spl]
[] ? thread_generic_wrapper+0x0/0x80 [spl]
[] kthread+0x96/0xa0
[] child_rip+0xa/0x20
[] ? kthread+0x0/0xa0
[] ? child_rip+0x0/0x20
rsync D 0000000000000005 0 5708 5707 0x00000080
ffff88041190bb78 0000000000000082 ffff88041190bb40 ffff88041190bb3c
ffff88021854fcc0 ffff88043fc24b00 ffff880028255f80 0000000000000203
ffff8803fe9c0638 ffff88041190bfd8 000000000000f4e8 ffff8803fe9c0638
Call Trace:
[] cv_wait_common+0x78/0xe0 [spl]
[] ? autoremove_wake_function+0x0/0x40
[] __cv_wait+0x13/0x20 [spl]
[] txg_wait_open+0x7b/0xa0 [zfs]
[] dmu_tx_wait+0xed/0xf0 [zfs]
[] zfs_write+0x3be/0xca0 [zfs]
[] zpl_write_common+0x52/0x70 [zfs]
[] zpl_write+0x68/0xa0 [zfs]
[] ? security_file_permission+0x16/0x20
[] vfs_write+0xb8/0x1a0
[] ? audit_syscall_entry+0x272/0x2a0
[] sys_write+0x51/0x90
[] system_call_fastpath+0x16/0x1b
rsync D 0000000000000003 0 32079 23761 0x00000080
ffff88028c5abb78 0000000000000086 0000000405668ca0 0000000000000001
ffff88028ef45800 0000000000000086 ffff88028c5abaf8 ffff88041062bda0
ffff880323d9daf8 ffff88028c5abfd8 000000000000f4e8 ffff880323d9daf8
Call Trace:
[] ? prepare_to_wait_exclusive+0x4e/0x80
[] cv_wait_common+0x78/0xe0 [spl]
[] ? autoremove_wake_function+0x0/0x40
[] __cv_wait+0x13/0x20 [spl]
[] txg_wait_open+0x7b/0xa0 [zfs]
[] dmu_tx_wait+0xed/0xf0 [zfs]
[] zfs_write+0x3be/0xca0 [zfs]
[] zpl_write_common+0x52/0x70 [zfs]
[] zpl_write+0x68/0xa0 [zfs]
[] ? security_file_permission+0x16/0x20
[] vfs_write+0xb8/0x1a0
[] ? audit_syscall_entry+0x272/0x2a0
[] sys_write+0x51/0x90
[] system_call_fastpath+0x16/0x1b
rsync D 0000000000000002 0 18292 2701 0x00000080
ffff88020c72fca8 0000000000000086 ffff88020c72fc70 ffff88020c72fc6c
ffff880300000000 ffff88043fc24500 ffff880028255f80 0000000000000207
ffff8804251c7038 ffff88020c72ffd8 000000000000f4e8 ffff8804251c7038
Call Trace:
[] cv_wait_common+0x78/0xe0 [spl]
[] ? autoremove_wake_function+0x0/0x40
[] __cv_wait+0x13/0x20 [spl]
[] txg_wait_open+0x7b/0xa0 [zfs]
[] dmu_tx_wait+0xed/0xf0 [zfs]
[] zfs_remove+0x101/0x430 [zfs]
[] ? mntput_no_expire+0x30/0x110
[] zpl_unlink+0x46/0x70 [zfs]
[] vfs_unlink+0x9f/0xe0
[] ? lookup_hash+0x3a/0x50
[] do_unlinkat+0x183/0x1c0
[] ? audit_syscall_entry+0x272/0x2a0
[] sys_unlink+0x16/0x20
[] system_call_fastpath+0x16/0x1b
rsync D 0000000000000000 0 18909 18292 0x00000080
ffff880415147918 0000000000000082 0000000000000000 0000000000000001
ffff8803ed3f0e00 0000000000000086 ffff880415147898 ffff88041062bda0
ffff8803c648da78 ffff880415147fd8 000000000000f4e8 ffff8803c648da78
Call Trace:
[] cv_wait_common+0x78/0xe0 [spl]
[] ? autoremove_wake_function+0x0/0x40
[] __cv_wait+0x13/0x20 [spl]
[] txg_wait_open+0x7b/0xa0 [zfs]
[] dmu_tx_wait+0xed/0xf0 [zfs]
[] zfs_make_xattrdir+0xe8/0x290 [zfs]
[] ? txg_list_add+0x5d/0x70 [zfs]
[] zfs_get_xattrdir+0x105/0x180 [zfs]
[] ? down_read+0x16/0x30
[] ? string+0x40/0x100
[] ? vsnprintf+0x310/0x5f0
[] zfs_lookup+0x225/0x350 [zfs]
[] zpl_xattr_set+0x7a/0x270 [zfs]
[] ? kmem_asprintf+0x56/0x70 [spl]
[] zpl_xattr_user_set+0x95/0xb0 [zfs]
[] generic_setxattr+0xa2/0xb0
[] __vfs_setxattr_noperm+0x4e/0x160
[] ? inode_permission+0xaf/0xd0
[] vfs_setxattr+0xbc/0xc0
[] setxattr+0xd0/0x150
[] ? putname+0x35/0x50
[] ? user_path_at+0x62/0xa0
[] ? _atomic_dec_and_lock+0x55/0x80
[] ? mntput_no_expire+0x30/0x110
[] sys_lsetxattr+0xa5/0xc0
[] system_call_fastpath+0x16/0x1b
smbd D 0000000000000005 0 19108 2328 0x00000084
ffff880102cb9b78 0000000000000082 ffff880102cb9b40 ffff880102cb9b3c
ffff8803dc9fb5c0 ffff88043fc24b00 ffff880028295f80 0000000000000207
ffff880425127038 ffff880102cb9fd8 000000000000f4e8 ffff880425127038
Call Trace:
[] cv_wait_common+0x78/0xe0 [spl]
[] ? autoremove_wake_function+0x0/0x40
[] __cv_wait+0x13/0x20 [spl]
[] txg_wait_open+0x7b/0xa0 [zfs]
[] dmu_tx_wait+0xed/0xf0 [zfs]
[] zfs_write+0x3be/0xca0 [zfs]
[] ? autoremove_wake_function+0x0/0x40
[] ? dmu_object_size_from_db+0x69/0xa0 [zfs]
[] zpl_write_common+0x52/0x70 [zfs]
[] zpl_write+0x68/0xa0 [zfs]
[] ? security_file_permission+0x16/0x20
[] vfs_write+0xb8/0x1a0
[] ? audit_syscall_entry+0x272/0x2a0
[] sys_pwrite64+0x82/0xa0
[] system_call_fastpath+0x16/0x1b
smbd D 0000000000000007 0 19117 2328 0x00000084
ffff88027b1adb78 0000000000000082 0000000000000000 ffff88027b1adb3c
ffff880300000000 ffff88043fc24f00 ffff880028395f80 000000000000010b
ffff88028fe49ab8 ffff88027b1adfd8 000000000000f4e8 ffff88028fe49ab8
Call Trace:
[] cv_wait_common+0x78/0xe0 [spl]
[] ? autoremove_wake_function+0x0/0x40
[] __cv_wait+0x13/0x20 [spl]
[] txg_wait_open+0x7b/0xa0 [zfs]
[] dmu_tx_wait+0xed/0xf0 [zfs]
[] zfs_write+0x3be/0xca0 [zfs]
[] ? autoremove_wake_function+0x0/0x40
[] ? dmu_object_size_from_db+0x69/0xa0 [zfs]
[] zpl_write_common+0x52/0x70 [zfs]
[] zpl_write+0x68/0xa0 [zfs]
[] ? security_file_permission+0x16/0x20
[] vfs_write+0xb8/0x1a0
[] ? audit_syscall_entry+0x272/0x2a0
[] sys_pwrite64+0x82/0xa0
[] system_call_fastpath+0x16/0x1b
smbd D 0000000000000006 0 19445 2328 0x00000084
ffff88024bf8bb78 0000000000000082 ffff88024bf8bb40 ffff88024bf8bb3c
ffff880221f45c80 ffff88043fc24d00 ffff880028295f80 0000000000000210
ffff8801b1e406b8 ffff88024bf8bfd8 000000000000f4e8 ffff8801b1e406b8
Call Trace:
[] cv_wait_common+0x78/0xe0 [spl]
[] ? autoremove_wake_function+0x0/0x40
[] __cv_wait+0x13/0x20 [spl]
[] txg_wait_open+0x7b/0xa0 [zfs]
[] dmu_tx_wait+0xed/0xf0 [zfs]
[] zfs_write+0x3be/0xca0 [zfs]
[] ? autoremove_wake_function+0x0/0x40
[] zpl_write_common+0x52/0x70 [zfs]
[] zpl_write+0x68/0xa0 [zfs]
[] ? security_file_permission+0x16/0x20
[] vfs_write+0xb8/0x1a0
[] ? audit_syscall_entry+0x272/0x2a0
[] sys_pwrite64+0x82/0xa0
[] system_call_fastpath+0x16/0x1b
smbd D 0000000000000000 0 19562 2328 0x00000084
ffff88011a577b78 0000000000000082 ffff88011a577b40 ffff88011a577b3c
ffff8801a2c33bc0 ffff88043fc24100 ffff880028315f80 000000000000af5c
ffff880425019078 ffff88011a577fd8 000000000000f4e8 ffff880425019078
Call Trace:
[] cv_wait_common+0x78/0xe0 [spl]
[] ? autoremove_wake_function+0x0/0x40
[] __cv_wait+0x13/0x20 [spl]
[] txg_wait_open+0x7b/0xa0 [zfs]
[] dmu_tx_wait+0xed/0xf0 [zfs]
[] zfs_write+0x3be/0xca0 [zfs]
[] ? autoremove_wake_function+0x0/0x40
[] zpl_write_common+0x52/0x70 [zfs]
[] zpl_write+0x68/0xa0 [zfs]
[] ? security_file_permission+0x16/0x20
[] vfs_write+0xb8/0x1a0
[] ? audit_syscall_entry+0x272/0x2a0
[] sys_pwrite64+0x82/0xa0
[] system_call_fastpath+0x16/0x1b
backup.pl D 0000000000000000 0 19673 1 0x00000080
ffff88023a405ca8 0000000000000082 0000000000000000 ffff88023a405c6c
ffff880200000000 ffff88043fc24100 ffff8800283d5f80 00000000000056d0
ffff88043253faf8 ffff88023a405fd8 000000000000f4e8 ffff88043253faf8
Call Trace:
[] cv_wait_common+0x78/0xe0 [spl]
[] ? autoremove_wake_function+0x0/0x40
[] __cv_wait+0x13/0x20 [spl]
[] zfs_range_lock+0x477/0x5d0 [zfs]
[] zfs_read+0x112/0x4c0 [zfs]
[] ? thread_return+0x4e/0x79d
[] ? __hrtimer_start_range_ns+0x1a3/0x460
[] zpl_read_common+0x52/0x70 [zfs]
[] ? cap_file_permission+0x0/0x10
[] zpl_read+0x68/0xa0 [zfs]
[] ? security_file_permission+0x16/0x20
[] vfs_read+0xb5/0x1a0
[] ? audit_syscall_entry+0x272/0x2a0
[] sys_read+0x51/0x90
[] system_call_fastpath+0x16/0x1b
smbd D 0000000000000006 0 19730 2328 0x00000084
ffff8801bc391b78 0000000000000086 ffff8801bc391b40 ffff8801bc391b3c
ffff8803bbc38440 ffff88043fc24d00 ffff880028255f80 0000000000000207
ffff88028fe49078 ffff8801bc391fd8 000000000000f4e8 ffff88028fe49078
Call Trace:
[] cv_wait_common+0x78/0xe0 [spl]
[] ? autoremove_wake_function+0x0/0x40
[] __cv_wait+0x13/0x20 [spl]
[] txg_wait_open+0x7b/0xa0 [zfs]
[] dmu_tx_wait+0xed/0xf0 [zfs]
[] zfs_write+0x3be/0xca0 [zfs]
[] ? autoremove_wake_function+0x0/0x40
[] ? posix_test_lock+0xdb/0xe0
[] zpl_write_common+0x52/0x70 [zfs]
[] zpl_write+0x68/0xa0 [zfs]
[] ? security_file_permission+0x16/0x20
[] vfs_write+0xb8/0x1a0
[] ? audit_syscall_entry+0x272/0x2a0
[] sys_pwrite64+0x82/0xa0
[] system_call_fastpath+0x16/0x1b
rsync D 0000000000000006 0 19843 19586 0x00000080
ffff8803a6ec1b78 0000000000000082 0000000452822bc0 0000000000000001
ffff880221f45ec0 0000000000000086 ffff8803a6ec1af8 ffff88041062bda0
ffff8803fe9c1ab8 ffff8803a6ec1fd8 000000000000f4e8 ffff8803fe9c1ab8
Call Trace:
[] ? prepare_to_wait_exclusive+0x4e/0x80
[] cv_wait_common+0x78/0xe0 [spl]
[] ? autoremove_wake_function+0x0/0x40
[] __cv_wait+0x13/0x20 [spl]
[] txg_wait_open+0x7b/0xa0 [zfs]
[] dmu_tx_wait+0xed/0xf0 [zfs]
[] zfs_write+0x3be/0xca0 [zfs]
[] zpl_write_common+0x52/0x70 [zfs]
[] zpl_write+0x68/0xa0 [zfs]
[] ? security_file_permission+0x16/0x20
[] vfs_write+0xb8/0x1a0
[] ? audit_syscall_entry+0x272/0x2a0
[] sys_write+0x51/0x90
[] system_call_fastpath+0x16/0x1b
rsync D 0000000000000004 0 19851 19848 0x00000080
ffff8801a26e7b78 0000000000000082 0000000000000000 0000000000000001
ffff8803b69d7ec0 0000000000000086 ffff8801a26e7af8 ffff88041062bda0
ffff880430923078 ffff8801a26e7fd8 000000000000f4e8 ffff880430923078
Call Trace:
[] cv_wait_common+0x78/0xe0 [spl]
[] ? autoremove_wake_function+0x0/0x40
[] __cv_wait+0x13/0x20 [spl]
[] txg_wait_open+0x7b/0xa0 [zfs]
[] dmu_tx_wait+0xed/0xf0 [zfs]
[] zfs_write+0x3be/0xca0 [zfs]
[] zpl_write_common+0x52/0x70 [zfs]
[] zpl_write+0x68/0xa0 [zfs]
[] ? security_file_permission+0x16/0x20
[] vfs_write+0xb8/0x1a0
[] ? audit_syscall_entry+0x272/0x2a0
[] sys_write+0x51/0x90
[] system_call_fastpath+0x16/0x1b
chmod D 0000000000000007 0 20728 17046 0x00000080
ffff88034bb4db48 0000000000000086 00000004b5bbe8e0 0000000000000001
ffff8801cbd26540 0000000000000082 ffff88034bb4dac8 ffff88041062bda0
ffff880425195b38 ffff88034bb4dfd8 000000000000f4e8 ffff880425195b38
Call Trace:
[] ? prepare_to_wait_exclusive+0x4e/0x80
[] cv_wait_common+0x78/0xe0 [spl]
[] ? autoremove_wake_function+0x0/0x40
[] __cv_wait+0x13/0x20 [spl]
[] txg_wait_open+0x7b/0xa0 [zfs]
[] dmu_tx_wait+0xed/0xf0 [zfs]
[] zfs_setattr+0x15bc/0x17f0 [zfs]
[] ? mntput_no_expire+0x30/0x110
[] ? __kmalloc+0x20c/0x220
[] zpl_setattr+0xdc/0x110 [zfs]
[] notify_change+0x168/0x340
[] sys_fchmodat+0xc3/0x100
[] ? sys_newstat+0x36/0x50
[] ? audit_syscall_entry+0x272/0x2a0
[] system_call_fastpath+0x16/0x1b
touch D 0000000000000006 0 20983 3460 0x00000080
ffff880254bd5b58 0000000000000086 ffff880254bd5b20 ffff880254bd5b1c
ffff8802dd4e7840 ffff88043fc24d00 ffff880028215f80 00000000000057c4
ffff8804250e0638 ffff880254bd5fd8 000000000000f4e8 ffff8804250e0638
Call Trace:
[] cv_wait_common+0x78/0xe0 [spl]
[] ? autoremove_wake_function+0x0/0x40
[] __cv_wait+0x13/0x20 [spl]
[

@MarkRidley123
Copy link

Hi Brian,

Here is a listing of all processes during the 30 min block.

A samba process died just before i got this list. I think it gave up.

Thanks for your help.

Dedup=on
Primarycache=all
Compression=on
Secondarycache=none

SysRq : Show State
task PC stack pid father
init S 0000000000000002 0 1 0 0x00000000
ffff8804361c9908 0000000000000082 0000000000000000 ffff8804361c98cc
0000000000000000 ffff88043fc24500 ffff8800283d5f80 00000000000000ff
ffff8804361c7a78 ffff8804361c9fd8 000000000000f4e8 ffff8804361c7a78
Call Trace:
[] schedule_hrtimeout_range+0x13d/0x160
[] ? add_wait_queue+0x46/0x60
[] ? __pollwait+0x75/0xf0
[] ? __pollwait+0x75/0xf0
[] poll_schedule_timeout+0x39/0x60
[] do_select+0x578/0x6b0
[] ? common_interrupt+0xe/0x13
[] ? __pollwait+0x0/0xf0
[] ? pollwake+0x0/0x60
[] ? pollwake+0x0/0x60
[] ? pollwake+0x0/0x60
[] ? pollwake+0x0/0x60
[] ? pollwake+0x0/0x60
[] ? d_lookup+0x3c/0x60
[] ? d_hash_and_lookup+0x83/0xb0
[] core_sys_select+0x18a/0x2c0
[] ? security_task_wait+0x16/0x20
[] ? wait_consider_task+0x9d/0xb20
[] ? remove_wait_queue+0x3c/0x50
[] ? do_wait+0x17f/0x240
[] sys_select+0x47/0x110
[] ? child_wait_callback+0x0/0x70
[] system_call_fastpath+0x16/0x1b
kthreadd S 0000000000000002 0 2 0 0x00000000
ffff8804361cded0 0000000000000046 0000000000000000 ffff8804361cde94
ffff880400000000 ffff88043fc24500 ffff880028255f80 00000000000001fe
ffff8804361c7038 ffff8804361cdfd8 000000000000f4e8 ffff8804361c7038
Call Trace:
[] ? kthread+0x0/0xa0
[] kthreadd+0x1b5/0x1c0
[] child_rip+0xa/0x20
[] ? kthreadd+0x0/0x1c0
[] ? child_rip+0x0/0x20
migration/0 S 0000000000000000 0 3 2 0x00000000
ffff8804361d1e70 0000000000000046 ffff8804361d1dd0 ffff880425194b40
ffff8804361d1e00 ffffffff8105dd9e ffff880028355f80 ffff880425194b40
ffff8804361c65f8 ffff8804361d1fd8 000000000000f4e8 ffff8804361c65f8
Call Trace:
[] ? pull_task+0x4e/0x60
[] migration_thread+0x265/0x2e0
[] ? migration_thread+0x0/0x2e0
[] kthread+0x96/0xa0
[] child_rip+0xa/0x20
[] ? kthread+0x0/0xa0
[] ? child_rip+0x0/0x20
ksoftirqd/0 S 0000000000000000 0 4 2 0x00000000
ffff8804361d5ea0 0000000000000046 0000000000000000 ffff8804361d5e64
ffff880400000000 ffff88043fc24100 ffff8800283d5f80 000000000000010b
ffff8804361d3ab8 ffff8804361d5fd8 000000000000f4e8 ffff8804361d3ab8
Call Trace:
[] ksoftirqd+0xd5/0x110
[] ? ksoftirqd+0x0/0x110
[] kthread+0x96/0xa0
[] child_rip+0xa/0x20
[] ? kthread+0x0/0xa0
[] ? child_rip+0x0/0x20
migration/0 S 0000000000000000 0 5 2 0x00000000
ffff8804361d7df0 0000000000000046 0000000000000000 ffff8804361d7da0
ffff8804361d7d60 ffffffff812821c9 ffff8804361d7df0 0000000000000086
ffff8804361d3078 ffff8804361d7fd8 000000000000f4e8 ffff8804361d3078
Call Trace:
[] ? pci_bus_write_config_byte+0x69/0x90
[] ? stop_machine_cpu_stop+0x0/0xe0
[] cpu_stopper_thread+0x125/0x1b0
[] ? thread_return+0x4e/0x79d
[] ? default_wake_function+0x12/0x20
[] ? cpu_stopper_thread+0x0/0x1b0
[] kthread+0x96/0xa0
[] child_rip+0xa/0x20
[] ? kthread+0x0/0xa0
[] ? child_rip+0x0/0x20
watchdog/0 S 0000000000000000 0 6 2 0x00000000
ffff8804361dbea0 0000000000000046 0000000000000000 ffffffff81013563
ffff8804361dbe10 ffffffff81012b09 ffff8804361dbe40 ffffffff81097955
ffff8804361d2638 ffff8804361dbfd8 000000000000f4e8 ffff8804361d2638
Call Trace:
[] ? native_sched_clock+0x13/0x60
[] ? sched_clock+0x9/0x10
[] ? sched_clock_local+0x25/0x90
[] watchdog+0x9a/0xd0
[] ? watchdog+0x0/0xd0
[] kthread+0x96/0xa0
[] child_rip+0xa/0x20
[] ? kthread+0x0/0xa0
[] ? child_rip+0x0/0x20
migration/1 S 0000000000000001 0 7 2 0x00000000
ffff8804361dfe70 0000000000000046 ffff8804361dfdd0 ffff8802a01e4040
ffff8804361dfe00 ffffffff8105dd9e ffff880028395f80 ffff8802a01e4040
ffff8804361ddaf8 ffff8804361dffd8 000000000000f4e8 ffff8804361ddaf8
Call Trace:
[] ? pull_task+0x4e/0x60
[] migration_thread+0x265/0x2e0
[] ? migration_thread+0x0/0x2e0
[] kthread+0x96/0xa0
[] child_rip+0xa/0x20
[] ? kthread+0x0/0xa0
[] ? child_rip+0x0/0x20
migration/1 S 0000000000000001 0 8 2 0x00000000
ffff880436251df0 0000000000000046 0000000000000000 ffff880436251db4
ffff880400000000 ffff88043fc24300 ffff880028395f80 0000000000000316
ffff8804361dd0b8 ffff880436251fd8 000000000000f4e8 ffff8804361dd0b8
Call Trace:
[] ? stop_machine_cpu_stop+0x0/0xe0
[] cpu_stopper_thread+0x125/0x1b0
[] ? thread_return+0x4e/0x79d
[] ? default_wake_function+0x12/0x20
[] ? cpu_stopper_thread+0x0/0x1b0
[] kthread+0x96/0xa0
[] child_rip+0xa/0x20
[] ? kthread+0x0/0xa0
[] ? child_rip+0x0/0x20
ksoftirqd/1 S 0000000000000001 0 9 2 0x00000000
ffff880436277ea0 0000000000000046 0000000000000000 ffff880436277e64
0000000000000000 ffff88043fc24300 ffff8800283d5f80 0000000000000105
ffff8804361dc678 ffff880436277fd8 000000000000f4e8 ffff8804361dc678
Call Trace:
[] ksoftirqd+0xd5/0x110
[] ? ksoftirqd+0x0/0x110
[] kthread+0x96/0xa0
[] child_rip+0xa/0x20
[] ? kthread+0x0/0xa0
[] ? child_rip+0x0/0x20
watchdog/1 S 0000000000000001 0 10 2 0x00000000
ffff88043627dea0 0000000000000046 0000000000000000 ffff88043627de64
ffff880400000000 ffff88043fc24300 ffff8800283d5f80 0000000000000103
ffff8804362790f8 ffff88043627dfd8 000000000000f4e8 ffff8804362790f8
Call Trace:
[] watchdog+0x9a/0xd0
[] ? watchdog+0x0/0xd0
[] kthread+0x96/0xa0
[] child_rip+0xa/0x20
[] ? kthread+0x0/0xa0
[] ? child_rip+0x0/0x20
migration/2 S 0000000000000002 0 11 2 0x00000000
ffff880436281e70 0000000000000046 ffff880436281dd0 ffff8802a016b540
ffff880436281e00 ffffffff8105dd9e ffff880028215f80 ffff8802a016b540
ffff8804362786b8 ffff880436281fd8 000000000000f4e8 ffff8804362786b8
Call Trace:
[] ? pull_task+0x4e/0x60
[] migration_thread+0x265/0x2e0
[] ? migration_thread+0x0/0x2e0
[] kthread+0x96/0xa0
[] child_rip+0xa/0x20
[] ? kthread+0x0/0xa0
[] ? child_rip+0x0/0x20
migration/2 S 0000000000000002 0 12 2 0x00000000
ffff880436285df0 0000000000000046 0000000000000000 ffffffff8102728c
ffff880436285d80 ffffffff8100bc0e ffff880436285df0 0000000000000001
ffff880436283a78 ffff880436285fd8 000000000000f4e8 ffff880436283a78
Call Trace:
[] ? mtrr_wrmsr+0x2c/0x70
[] ? apic_timer_interrupt+0xe/0x20
[] ? stop_machine_cpu_stop+0x0/0xe0
[] cpu_stopper_thread+0x125/0x1b0
[] ? thread_return+0x4e/0x79d
[] ? default_wake_function+0x12/0x20
[] ? cpu_stopper_thread+0x0/0x1b0
[] kthread+0x96/0xa0
[] child_rip+0xa/0x20
[] ? kthread+0x0/0xa0
[] ? child_rip+0x0/0x20
ksoftirqd/2 S 0000000000000002 0 13 2 0x00000000
ffff880436287ea0 0000000000000046 ffff880436287e68 ffff880436287e64
0000000000000000 ffff88043fc24500 ffff8800283d5f80 0000000000000201
ffff880436283038 ffff880436287fd8 000000000000f4e8 ffff880436283038
Call Trace:
[] ksoftirqd+0xd5/0x110
[] ? ksoftirqd+0x0/0x110
[] kthread+0x96/0xa0
[] child_rip+0xa/0x20
[] ? kthread+0x0/0xa0
[] ? child_rip+0x0/0x20
watchdog/2 S 0000000000000002 0 14 2 0x00000000
ffff8804362b3ea0 0000000000000046 0000000000000000 ffff8804362b3e64
ffff880400000000 ffff88043fc24500 ffff8800283d5f80 0000000000000101
ffff8804362b1ab8 ffff8804362b3fd8 000000000000f4e8 ffff8804362b1ab8
Call Trace:
[] watchdog+0x9a/0xd0
[] ? watchdog+0x0/0xd0
[] kthread+0x96/0xa0
[] child_rip+0xa/0x20
[] ? kthread+0x0/0xa0
[] ? child_rip+0x0/0x20
migration/3 S 0000000000000003 0 15 2 0x00000000
ffff8804362b5e70 0000000000000046 ffff8804362b5dd0 ffff880425194b40
ffff8804362b5e00 ffffffff8105dd9e ffff8800283d5f80 ffff880425194b40
ffff8804362b1078 ffff8804362b5fd8 000000000000f4e8 ffff8804362b1078
Call Trace:
[] ? pull_task+0x4e/0x60
[] migration_thread+0x265/0x2e0
[] ? migration_thread+0x0/0x2e0
[] kthread+0x96/0xa0
[] child_rip+0xa/0x20
[] ? kthread+0x0/0xa0
[] ? child_rip+0x0/0x20
migration/3 S 0000000000000003 0 16 2 0x00000000
ffff8804362b9df0 0000000000000046 0000000000000000 ffffffff8102728c
ffff8804362b9d80 ffffffff8100bc0e ffff8804362b9df0 0000000000000001
ffff8804362b0638 ffff8804362b9fd8 000000000000f4e8 ffff8804362b0638
Call Trace:
[] ? mtrr_wrmsr+0x2c/0x70
[] ? apic_timer_interrupt+0xe/0x20
[] ? stop_machine_cpu_stop+0x0/0xe0
[] cpu_stopper_thread+0x125/0x1b0
[] ? thread_return+0x4e/0x79d
[] ? default_wake_function+0x12/0x20
[] ? cpu_stopper_thread+0x0/0x1b0
[] kthread+0x96/0xa0
[] child_rip+0xa/0x20
[] ? kthread+0x0/0xa0
[] ? child_rip+0x0/0x20
ksoftirqd/3 S 0000000000000003 0 17 2 0x00000000
ffff8804362c1ea0 0000000000000046 0000000000000000 ffff8804362c1e64
0000000000000000 ffff88043fc24700 ffff880028315f80 000000000000010a
ffff8804362bbaf8 ffff8804362c1fd8 000000000000f4e8 ffff8804362bbaf8
Call Trace:
[] ksoftirqd+0xd5/0x110
[] ? ksoftirqd+0x0/0x110
[] kthread+0x96/0xa0
[] child_rip+0xa/0x20
[] ? kthread+0x0/0xa0
[] ? child_rip+0x0/0x20
watchdog/3 S 0000000000000003 0 18 2 0x00000000
ffff8804362c7ea0 0000000000000046 0000000000000000 ffffffff81013563
ffff8804362c7e10 ffffffff81012b09 ffff8804362c7e40 ffffffff81097955
ffff8804362ba678 ffff8804362c7fd8 000000000000f4e8 ffff8804362ba678
Call Trace:
[] ? native_sched_clock+0x13/0x60
[] ? sched_clock+0x9/0x10
[] ? sched_clock_local+0x25/0x90
[] watchdog+0x9a/0xd0
[] ? watchdog+0x0/0xd0
[] kthread+0x96/0xa0
[] child_rip+0xa/0x20
[] ? kthread+0x0/0xa0
[] ? child_rip+0x0/0x20
migration/4 S 0000000000000004 0 19 2 0x00000000
ffff8804362cbe70 0000000000000046 ffff8804362cbdd0 ffff880425194b40
ffff8804362cbe00 ffffffff8105dd9e ffff8800283d5f80 ffff880425194b40
ffff8804362c9b38 ffff8804362cbfd8 000000000000f4e8 ffff8804362c9b38
Call Trace:
[] ? pull_task+0x4e/0x60
[] migration_thread+0x265/0x2e0
[] ? migration_thread+0x0/0x2e0
[] kthread+0x96/0xa0
[] child_rip+0xa/0x20
[] ? kthread+0x0/0xa0
[] ? child_rip+0x0/0x20
migration/4 S 0000000000000004 0 20 2 0x00000000
ffff8804362cddf0 0000000000000046 0000000000000000 ffff880401a23e28
0000000000000000 0000000000000000 ffff8804362cdd70 ffffffff8105e782
ffff8804362c90f8 ffff8804362cdfd8 000000000000f4e8 ffff8804362c90f8
Call Trace:
[] ? default_wake_function+0x12/0x20
[] ? stop_machine_cpu_stop+0x0/0xe0
[] cpu_stopper_thread+0x125/0x1b0
[] ? thread_return+0x4e/0x79d
[] ? default_wake_function+0x12/0x20
[] ? cpu_stopper_thread+0x0/0x1b0
[] kthread+0x96/0xa0
[] child_rip+0xa/0x20
[] ? kthread+0x0/0xa0
[] ? child_rip+0x0/0x20
ksoftirqd/4 S 0000000000000004 0 21 2 0x00000000
ffff8804362f3ea0 0000000000000046 ffff8804362f3e68 ffff8804362f3e64
0000000000000000 ffff88043fc24900 ffff8800282d5f80 0000000000000205
ffff8804362c86b8 ffff8804362f3fd8 000000000000f4e8 ffff8804362c86b8
Call Trace:
[] ksoftirqd+0xd5/0x110
[] ? ksoftirqd+0x0/0x110
[] kthread+0x96/0xa0
[] child_rip+0xa/0x20
[] ? kthread+0x0/0xa0
[] ? child_rip+0x0/0x20
watchdog/4 S 0000000000000004 0 22 2 0x00000000
ffff8804362f9ea0 0000000000000046 0000000000000000 ffff8804362f9e64
ffff880400000000 ffff88043fc24900 ffff8800283d5f80 000000000000

-----Original Message-----

From: Mark Ridley
Sent: 30 Jan 2012 23:48:02 GMT
To: [email protected]
Subject: All processes

@ryao
Copy link
Contributor

ryao commented Jan 31, 2012

Would you try reproducing this with the latest GIT code? Commit a7b125e fixed a race condition that was in the code that these reports used.

If you are using Ubuntu, I think that you should be able to get it by doing something like 'apt-add-repository --yes ppa:zfs-native/daily' and then updating your software.

@MarkRidley123
Copy link

Hi Richard,

I am using centos 6.2.

How do I get this code?

Thanks

-----Original Message-----

From: Richard
Sent: 31 Jan 2012 00:09:42 GMT
To: MarkRidley123
Subject: Re: [zfs] Kernel crash during heavy activity (#475)

Would you try reproducing this with the latest GIT code? Commit a7b125e fixed a race condition that was in the code that these reports used.

If you are using Ubuntu, I think that you should be able to get it by doing something like 'apt-add-repository --yes ppa:zfs-native/daily' and then updating your software.


Reply to this email directly or view it on GitHub:
#475 (comment)

@ryao
Copy link
Contributor

ryao commented Jan 31, 2012

I am not familiar with CentOS, but I know that you can do git checkouts of spl and zfs by using the following commands:

git clone git://github.com/zfsonlinux/spl.git
git clone git://github.com/zfsonlinux/zfs.git

The README files in the created spl and zfs directories should describe how to do the installation, although if you are familiar with IRC, I suggest that you ask for help in your distribution's IRC channel. There might be particular values for --with-linux= and --with-linux-obj= that you should set for CentOS.

@MarkRidley123
Copy link

I have done a git clone git://github.com/zfsonlinux/spl.gitand git clone git://github.com/zfsonlinux/zfs.git
as git checkout was complaining about it not being a valid git repository??
anyway -
Do I need to uninstall the existing zfsonlinux rc6?
How do i do this anyway?
Is there an uninstall script?
I will let you know tonight how the testing goes.
Thanks.

Date: Mon, 30 Jan 2012 18:26:51 -0800
From: [email protected]
To: [email protected]
Subject: Re: [zfs] Kernel crash during heavy activity (#475)

I am not familiar with CentOS, but I know that you can do git checkouts of spl and zfs by using the following commands:

git checkout git://github.com/zfsonlinux/spl.git
git checkout git://github.com/zfsonlinux/zfs.git

The README files in the created spl and zfs directories should describe how to do the installation, although if you are familiar with IRC, I suggest that you ask for help in your distribution's IRC channel. There might be particular values for --with-linux= and --with-linux-obj= that you should set for CentOS.


Reply to this email directly or view it on GitHub:
#475 (comment)

@MarkRidley123
Copy link

SPL configures and makes no probe, but zfs will not make. i did a ./configureand make pkg. I also tried make rpm like it say on the zfsonlinux web.am i building incorrectly?
I get:make[5]: Entering directory /usr/src/kernels/2.6.32-220.2.1.el6.x86_64' CC [M] /tmp/zfs-build-root-JW4KiVON/BUILD/zfs-0.6.0/module/avl/../../module/avl/avl.o LD [M] /tmp/zfs-build-root-JW4KiVON/BUILD/zfs-0.6.0/module/avl/zavl.o CC [M] /tmp/zfs-build-root-JW4KiVON/BUILD/zfs-0.6.0/module/nvpair/../../module/nvpair/nvpair.o CC [M] /tmp/zfs-build-root-JW4KiVON/BUILD/zfs-0.6.0/module/nvpair/../../module/nvpair/nvpair_alloc_spl.o CC [M] /tmp/zfs-build-root-JW4KiVON/BUILD/zfs-0.6.0/module/nvpair/../../module/nvpair/nvpair_alloc_fixed.o LD [M] /tmp/zfs-build-root-JW4KiVON/BUILD/zfs-0.6.0/module/nvpair/znvpair.o CC [M] /tmp/zfs-build-root-JW4KiVON/BUILD/zfs-0.6.0/module/unicode/../../module/unicode/u8_textprep.o CC [M] /tmp/zfs-build-root-JW4KiVON/BUILD/zfs-0.6.0/module/unicode/../../module/unicode/uconv.o LD [M] /tmp/zfs-build-root-JW4KiVON/BUILD/zfs-0.6.0/module/unicode/zunicode.o CC [M] /tmp/zfs-build-root-JW4KiVON/BUILD/zfs-0.6.0/module/zcommon/../../module/zcommon/zfs_deleg.oIn file included from /tmp/zfs-build-root-JW4KiVON/BUILD/zfs-0.6.0/include/sys/dsl_pool.h:32, from /tmp/zfs-build-root-JW4KiVON/BUILD/zfs-0.6.0/include/sys/dsl_deleg.h:29, from /tmp/zfs-build-root-JW4KiVON/BUILD/zfs-0.6.0/module/zcommon/../../module/zcommon/zfs_deleg.c:38:/tmp/zfs-build-root-JW4KiVON/BUILD/zfs-0.6.0/include/sys/zio.h:431: error: expected specifier-qualifier-list before 'taskq_ent_t'make[7]: *** [/tmp/zfs-build-root-JW4KiVON/BUILD/zfs-0.6.0/module/zcommon/../../module/zcommon/zfs_deleg.o] Error 1make[6]: *** [/tmp/zfs-build-root-JW4KiVON/BUILD/zfs-0.6.0/module/zcommon] Error 2make[5]: *** [_module_/tmp/zfs-build-root-JW4KiVON/BUILD/zfs-0.6.0/module] Error 2make[5]: Leaving directory/usr/src/kernels/2.6.32-220.2.1.el6.x86_64'make[4]: *** [modules] Error 2make[4]: Leaving directory /tmp/zfs-build-root-JW4KiVON/BUILD/zfs-0.6.0/module'make[3]: *** [all-recursive] Error 1make[3]: Leaving directory/tmp/zfs-build-root-JW4KiVON/BUILD/zfs-0.6.0'make[2]: *** [all] Error 2make[2]: Leaving directory `/tmp/zfs-build-root-JW4KiVON/BUILD/zfs-0.6.0'error: Bad exit status from /tmp/zfs-build-root-JW4KiVON/TMP/rpm-tmp.5LED2M (%build)

RPM build errors: Bad exit status from /tmp/zfs-build-root-JW4KiVON/TMP/rpm-tmp.5LED2M (%build)make[1]: *** [rpm-common] Error 1make[1]: Leaving directory `/root/zfsbuild/zfs'make: *** [rpm-modules] Error 2

Date: Mon, 30 Jan 2012 18:26:51 -0800
From: [email protected]
To: [email protected]
Subject: Re: [zfs] Kernel crash during heavy activity (#475)

I am not familiar with CentOS, but I know that you can do git checkouts of spl and zfs by using the following commands:

git checkout git://github.com/zfsonlinux/spl.git
git checkout git://github.com/zfsonlinux/zfs.git

The README files in the created spl and zfs directories should describe how to do the installation, although if you are familiar with IRC, I suggest that you ask for help in your distribution's IRC channel. There might be particular values for --with-linux= and --with-linux-obj= that you should set for CentOS.


Reply to this email directly or view it on GitHub:
#475 (comment)

@behlendorf
Copy link
Contributor

I think this was probably caused by openzfs/spl@ec2b410 which was fixed in the latest spl source by commit openzfs/spl@3c6ed54. The issue was a small race which was introduce in the task queue handling which would cause a wakeup to be missed resulted in the txg_sync thread getting stalled. Which is what the above stacks seems to indicate.

Anyway, the fix is still just to update to the latest master source from the spl and zfs repositories. Please make sure you uninstall all previous spl and zfs packages and then attempt the build again. Your build failure looks like it found and used and old version of the spl.

@MarkRidley123
Copy link

Hi Brian,
How do I uninstall all previous spl and zfs packages?
I did the rpm -Uvh as mentioned in the zfsonlinux page to install.
Thanks,
Mark

Date: Tue, 31 Jan 2012 10:07:13 -0800
From: [email protected]
To: [email protected]
Subject: Re: [zfs] Kernel crash during heavy activity (#475)

I think this was probably caused by openzfs/spl@ec2b410 which was fixed in the latest spl source by commit openzfs/spl@3c6ed54. The issue was a small race which was introduce in the task queue handling which would cause a wakeup to be missed resulted in the txg_sync thread getting stalled. Which is what the above stacks seems to indicate.

Anyway, the fix is still just to update to the latest master source from the spl and zfs repositories. Please make sure you uninstall all previous spl and zfs packages and then attempt the build again. Your build failure looks like it found and used and old version of the spl.


Reply to this email directly or view it on GitHub:
#475 (comment)

@MarkRidley123
Copy link

Hi Brian,
You are right about it picking up an hold version (i tried it on another system and it worked. but i need to get it to install on this broken system)...but how do I remove it.
I have done: rpm -qa | grep spland rpm -qa | grep zfsand done rpm -e everything
but the new code will still not build.
How can I remove whatever is being referenced in the make?
Thanks,
Mark

Date: Tue, 31 Jan 2012 10:07:13 -0800
From: [email protected]
To: [email protected]
Subject: Re: [zfs] Kernel crash during heavy activity (#475)

I think this was probably caused by openzfs/spl@ec2b410 which was fixed in the latest spl source by commit openzfs/spl@3c6ed54. The issue was a small race which was introduce in the task queue handling which would cause a wakeup to be missed resulted in the txg_sync thread getting stalled. Which is what the above stacks seems to indicate.

Anyway, the fix is still just to update to the latest master source from the spl and zfs repositories. Please make sure you uninstall all previous spl and zfs packages and then attempt the build again. Your build failure looks like it found and used and old version of the spl.


Reply to this email directly or view it on GitHub:
#475 (comment)

@behlendorf
Copy link
Contributor

Removing the packages should be enough unless you've built locally and done a make install. If so simply remove any spl/zfs directories under /usr/src/ on your system.

@MarkRidley123
Copy link

Hi,

That new version was better...but eventually samba was blocked again:

INFO: task smbd:5366 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
smbd D 0000000000000003 0 5366 2210 0x00000080
ffff88024825db78 0000000000000086 ffff88024825db40 ffff88024825db3c
ffff880260e0e180 ffff88043fc24700 ffff8800283d5f80 000000000000020e
ffff88025cd906b8 ffff88024825dfd8 000000000000f4e8 ffff88025cd906b8
Call Trace:
[] cv_wait_common+0x78/0xe0 [spl]
[] ? autoremove_wake_function+0x0/0x40
[] __cv_wait+0x13/0x20 [spl]
[] txg_wait_open+0x7b/0xa0 [zfs]
[] dmu_tx_wait+0xed/0xf0 [zfs]
[] zfs_write+0x3be/0xca0 [zfs]
[] ? autoremove_wake_function+0x0/0x40
[] ? tsd_hash_search+0x7b/0xe0 [spl]
[] ? tsd_exit+0x41/0x1a0 [spl]
[] zpl_write_common+0x52/0x70 [zfs]
[] zpl_write+0x68/0xa0 [zfs]
[] ? security_file_permission+0x16/0x20
[] vfs_write+0xb8/0x1a0
[] ? audit_syscall_entry+0x272/0x2a0
[] sys_pwrite64+0x82/0xa0
[] system_call_fastpath+0x16/0x1b

-----Original Message-----

From: Brian Behlendorf
Sent: 1 Feb 2012 00:58:22 GMT
To: MarkRidley123
Subject: Re: [zfs] Kernel crash during heavy activity (#475)

Removing the packages should be enough unless you've built locally and done a make install. If so simply remove any spl/zfs directories under /usr/src/ on your system.


Reply to this email directly or view it on GitHub:
#475 (comment)

@pyavdr
Copy link
Contributor

pyavdr commented Feb 3, 2012

In my case (ubuntu 10.04, ubuntu 11.10 and suse 12.1 with zfs/spl 0.48) it helps to make sure that samba is on version 3.6.1. There are lots of issues with the lower samba versions.

@behlendorf
Copy link
Contributor

Once again the key bit would be to see the txg_sync_thread stack by dumping all the thread stacks.

@behlendorf
Copy link
Contributor

Closing issue since it's quite stale and I know many people are using Samba successfully. If there are still issues your seeing lets open a new issue to track them.

ryao added a commit to ryao/zfs that referenced this issue Oct 9, 2014
The below excerpt of a backtrace is from a ztest failure when running
ZoL's ztest.

/openzfs#453 0x00007f03c8060b35 in vdev_queue_io_to_issue (vq=vq@entry=0x99f8a8) at ../../module/zfs/vdev_queue.c:706
/openzfs#454 0x00007f03c806106e in vdev_queue_io (zio=zio@entry=0x7f0350003de0) at ../../module/zfs/vdev_queue.c:747
/openzfs#455 0x00007f03c80818c1 in zio_vdev_io_start (zio=0x7f0350003de0) at ../../module/zfs/zio.c:2659
/openzfs#456 0x00007f03c807f243 in __zio_execute (zio=0x7f0350003de0) at ../../module/zfs/zio.c:1399
/openzfs#457 zio_nowait (zio=0x7f0350003de0) at ../../module/zfs/zio.c:1456
/openzfs#458 0x00007f03c805f71b in vdev_mirror_io_start (zio=0x7f0350003a10) at ../../module/zfs/vdev_mirror.c:374
/openzfs#459 0x00007f03c807f243 in __zio_execute (zio=0x7f0350003a10) at ../../module/zfs/zio.c:1399
/openzfs#460 zio_nowait (zio=0x7f0350003a10) at ../../module/zfs/zio.c:1456
/openzfs#461 0x00007f03c806464c in vdev_raidz_io_start (zio=0x7f0350003380) at ../../module/zfs/vdev_raidz.c:1607
/openzfs#462 0x00007f03c807f243 in __zio_execute (zio=0x7f0350003380) at ../../module/zfs/zio.c:1399
/openzfs#463 zio_nowait (zio=0x7f0350003380) at ../../module/zfs/zio.c:1456
/openzfs#464 0x00007f03c805f71b in vdev_mirror_io_start (zio=0x7f0350002fb0) at ../../module/zfs/vdev_mirror.c:374
/openzfs#465 0x00007f03c807f243 in __zio_execute (zio=0x7f0350002fb0) at ../../module/zfs/zio.c:1399
/openzfs#466 zio_nowait (zio=0x7f0350002fb0) at ../../module/zfs/zio.c:1456
/openzfs#467 0x00007f03c805ed43 in vdev_mirror_io_done (zio=0x7f033957ebf0) at ../../module/zfs/vdev_mirror.c:499
/openzfs#468 0x00007f03c807a0c0 in zio_vdev_io_done (zio=0x7f033957ebf0) at ../../module/zfs/zio.c:2707
/openzfs#469 0x00007f03c808285b in __zio_execute (zio=0x7f033957ebf0) at ../../module/zfs/zio.c:1399
/openzfs#470 zio_notify_parent (wait=ZIO_WAIT_DONE, zio=0x7f0390001330, pio=0x7f033957ebf0) at ../../module/zfs/zio.c:547
/openzfs#471 zio_done (zio=0x7f0390001330) at ../../module/zfs/zio.c:3278
/openzfs#472 0x00007f03c808285b in __zio_execute (zio=0x7f0390001330) at ../../module/zfs/zio.c:1399
/openzfs#473 zio_notify_parent (wait=ZIO_WAIT_DONE, zio=0x7f03b4013a00, pio=0x7f0390001330) at ../../module/zfs/zio.c:547
/openzfs#474 zio_done (zio=0x7f03b4013a00) at ../../module/zfs/zio.c:3278
/openzfs#475 0x00007f03c808285b in __zio_execute (zio=0x7f03b4013a00) at ../../module/zfs/zio.c:1399
/openzfs#476 zio_notify_parent (wait=ZIO_WAIT_DONE, zio=0x7f03b4014210, pio=0x7f03b4013a00) at ../../module/zfs/zio.c:547
/openzfs#477 zio_done (zio=0x7f03b4014210) at ../../module/zfs/zio.c:3278
/openzfs#478 0x00007f03c808285b in __zio_execute (zio=0x7f03b4014210) at ../../module/zfs/zio.c:1399
/openzfs#479 zio_notify_parent (wait=ZIO_WAIT_DONE, zio=0x7f03b4014620, pio=0x7f03b4014210) at ../../module/zfs/zio.c:547
/openzfs#480 zio_done (zio=0x7f03b4014620) at ../../module/zfs/zio.c:3278
/openzfs#481 0x00007f03c807a6d3 in __zio_execute (zio=0x7f03b4014620) at ../../module/zfs/zio.c:1399
/openzfs#482 zio_execute (zio=zio@entry=0x7f03b4014620) at ../../module/zfs/zio.c:1337
/openzfs#483 0x00007f03c8060b35 in vdev_queue_io_to_issue (vq=vq@entry=0x99f8a8) at ../../module/zfs/vdev_queue.c:706
/openzfs#484 0x00007f03c806106e in vdev_queue_io (zio=zio@entry=0x7f0350002be0) at ../../module/zfs/vdev_queue.c:747
/openzfs#485 0x00007f03c80818c1 in zio_vdev_io_start (zio=0x7f0350002be0) at ../../module/zfs/zio.c:2659
/openzfs#486 0x00007f03c807f243 in __zio_execute (zio=0x7f0350002be0) at ../../module/zfs/zio.c:1399
/openzfs#487 zio_nowait (zio=0x7f0350002be0) at ../../module/zfs/zio.c:1456
/openzfs#488 0x00007f03c805f71b in vdev_mirror_io_start (zio=0x7f0350002810) at ../../module/zfs/vdev_mirror.c:374
/openzfs#489 0x00007f03c807f243 in __zio_execute (zio=0x7f0350002810) at ../../module/zfs/zio.c:1399
/openzfs#490 zio_nowait (zio=0x7f0350002810) at ../../module/zfs/zio.c:1456
/openzfs#491 0x00007f03c8064593 in vdev_raidz_io_start (zio=0x7f0350001270) at ../../module/zfs/vdev_raidz.c:1591
/openzfs#492 0x00007f03c807f243 in __zio_execute (zio=0x7f0350001270) at ../../module/zfs/zio.c:1399
/openzfs#493 zio_nowait (zio=0x7f0350001270) at ../../module/zfs/zio.c:1456
/openzfs#494 0x00007f03c805f71b in vdev_mirror_io_start (zio=0x7f0350001e60) at ../../module/zfs/vdev_mirror.c:374
/openzfs#495 0x00007f03c807f243 in __zio_execute (zio=0x7f0350001e60) at ../../module/zfs/zio.c:1399
/openzfs#496 zio_nowait (zio=0x7f0350001e60) at ../../module/zfs/zio.c:1456
/openzfs#497 0x00007f03c805ed43 in vdev_mirror_io_done (zio=0x7f033a0c39c0) at ../../module/zfs/vdev_mirror.c:499
/openzfs#498 0x00007f03c807a0c0 in zio_vdev_io_done (zio=0x7f033a0c39c0) at ../../module/zfs/zio.c:2707
/openzfs#499 0x00007f03c808285b in __zio_execute (zio=0x7f033a0c39c0) at ../../module/zfs/zio.c:1399
/openzfs#500 zio_notify_parent (wait=ZIO_WAIT_DONE, zio=0x7f03a8003c00, pio=0x7f033a0c39c0) at ../../module/zfs/zio.c:547
/openzfs#501 zio_done (zio=0x7f03a8003c00) at ../../module/zfs/zio.c:3278
/openzfs#502 0x00007f03c808285b in __zio_execute (zio=0x7f03a8003c00) at ../../module/zfs/zio.c:1399
/openzfs#503 zio_notify_parent (wait=ZIO_WAIT_DONE, zio=0x7f038800c400, pio=0x7f03a8003c00) at ../../module/zfs/zio.c:547
/openzfs#504 zio_done (zio=0x7f038800c400) at ../../module/zfs/zio.c:3278
/openzfs#505 0x00007f03c808285b in __zio_execute (zio=0x7f038800c400) at ../../module/zfs/zio.c:1399
/openzfs#506 zio_notify_parent (wait=ZIO_WAIT_DONE, zio=0x7f038800da00, pio=0x7f038800c400) at ../../module/zfs/zio.c:547
/openzfs#507 zio_done (zio=0x7f038800da00) at ../../module/zfs/zio.c:3278
/openzfs#508 0x00007f03c808285b in __zio_execute (zio=0x7f038800da00) at ../../module/zfs/zio.c:1399
/openzfs#509 zio_notify_parent (wait=ZIO_WAIT_DONE, zio=0x7f038800fd80, pio=0x7f038800da00) at ../../module/zfs/zio.c:547
/openzfs#510 zio_done (zio=0x7f038800fd80) at ../../module/zfs/zio.c:3278
/openzfs#511 0x00007f03c807a6d3 in __zio_execute (zio=0x7f038800fd80) at ../../module/zfs/zio.c:1399
/openzfs#512 zio_execute (zio=zio@entry=0x7f038800fd80) at ../../module/zfs/zio.c:1337
/openzfs#513 0x00007f03c8060b35 in vdev_queue_io_to_issue (vq=vq@entry=0x99f8a8) at ../../module/zfs/vdev_queue.c:706
/openzfs#514 0x00007f03c806119d in vdev_queue_io_done (zio=zio@entry=0x7f03a0010950) at ../../module/zfs/vdev_queue.c:775
/openzfs#515 0x00007f03c807a0e8 in zio_vdev_io_done (zio=0x7f03a0010950) at ../../module/zfs/zio.c:2686
/openzfs#516 0x00007f03c807a6d3 in __zio_execute (zio=0x7f03a0010950) at ../../module/zfs/zio.c:1399
/openzfs#517 zio_execute (zio=0x7f03a0010950) at ../../module/zfs/zio.c:1337
/openzfs#518 0x00007f03c7fcd0c4 in taskq_thread (arg=0x966d50) at ../../lib/libzpool/taskq.c:215
/openzfs#519 0x00007f03c7fc7937 in zk_thread_helper (arg=0x967e90) at ../../lib/libzpool/kernel.c:135
/openzfs#520 0x00007f03c78890a3 in start_thread (arg=0x7f03c2703700) at pthread_create.c:309
/openzfs#521 0x00007f03c75c50fd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111

The backtrace was an infinite loop of `vdev_queue_io_to_issue()` invoking
`zio_execute()` until it overran the stack. vdev_queue_io_to_issue() will ony
invoke `zio_execute()` on raidz vdevs when aggregation I/Os are generated to
improve aggregation continuity. These I/Os do not trigger any writes. However,
it appears that they can be generated in such a way that they recurse
infinitely upon return to `vdev_queue_io_to_issue()`. As a consequence, we see
the number of parents by 1 each time the recursion returns to
`vdev_raidz_io_start()`.

Signed-off-by: Richard Yao <[email protected]>
ryao added a commit to ryao/zfs that referenced this issue Oct 9, 2014
The below excerpt of a backtrace is from a ztest failure when running
ZoL's ztest.

/openzfs#453 0x00007f03c8060b35 in vdev_queue_io_to_issue (vq=vq@entry=0x99f8a8) at ../../module/zfs/vdev_queue.c:706
/openzfs#454 0x00007f03c806106e in vdev_queue_io (zio=zio@entry=0x7f0350003de0) at ../../module/zfs/vdev_queue.c:747
/openzfs#455 0x00007f03c80818c1 in zio_vdev_io_start (zio=0x7f0350003de0) at ../../module/zfs/zio.c:2659
/openzfs#456 0x00007f03c807f243 in __zio_execute (zio=0x7f0350003de0) at ../../module/zfs/zio.c:1399
/openzfs#457 zio_nowait (zio=0x7f0350003de0) at ../../module/zfs/zio.c:1456
/openzfs#458 0x00007f03c805f71b in vdev_mirror_io_start (zio=0x7f0350003a10) at ../../module/zfs/vdev_mirror.c:374
/openzfs#459 0x00007f03c807f243 in __zio_execute (zio=0x7f0350003a10) at ../../module/zfs/zio.c:1399
/openzfs#460 zio_nowait (zio=0x7f0350003a10) at ../../module/zfs/zio.c:1456
/openzfs#461 0x00007f03c806464c in vdev_raidz_io_start (zio=0x7f0350003380) at ../../module/zfs/vdev_raidz.c:1607
/openzfs#462 0x00007f03c807f243 in __zio_execute (zio=0x7f0350003380) at ../../module/zfs/zio.c:1399
/openzfs#463 zio_nowait (zio=0x7f0350003380) at ../../module/zfs/zio.c:1456
/openzfs#464 0x00007f03c805f71b in vdev_mirror_io_start (zio=0x7f0350002fb0) at ../../module/zfs/vdev_mirror.c:374
/openzfs#465 0x00007f03c807f243 in __zio_execute (zio=0x7f0350002fb0) at ../../module/zfs/zio.c:1399
/openzfs#466 zio_nowait (zio=0x7f0350002fb0) at ../../module/zfs/zio.c:1456
/openzfs#467 0x00007f03c805ed43 in vdev_mirror_io_done (zio=0x7f033957ebf0) at ../../module/zfs/vdev_mirror.c:499
/openzfs#468 0x00007f03c807a0c0 in zio_vdev_io_done (zio=0x7f033957ebf0) at ../../module/zfs/zio.c:2707
/openzfs#469 0x00007f03c808285b in __zio_execute (zio=0x7f033957ebf0) at ../../module/zfs/zio.c:1399
/openzfs#470 zio_notify_parent (wait=ZIO_WAIT_DONE, zio=0x7f0390001330, pio=0x7f033957ebf0) at ../../module/zfs/zio.c:547
/openzfs#471 zio_done (zio=0x7f0390001330) at ../../module/zfs/zio.c:3278
/openzfs#472 0x00007f03c808285b in __zio_execute (zio=0x7f0390001330) at ../../module/zfs/zio.c:1399
/openzfs#473 zio_notify_parent (wait=ZIO_WAIT_DONE, zio=0x7f03b4013a00, pio=0x7f0390001330) at ../../module/zfs/zio.c:547
/openzfs#474 zio_done (zio=0x7f03b4013a00) at ../../module/zfs/zio.c:3278
/openzfs#475 0x00007f03c808285b in __zio_execute (zio=0x7f03b4013a00) at ../../module/zfs/zio.c:1399
/openzfs#476 zio_notify_parent (wait=ZIO_WAIT_DONE, zio=0x7f03b4014210, pio=0x7f03b4013a00) at ../../module/zfs/zio.c:547
/openzfs#477 zio_done (zio=0x7f03b4014210) at ../../module/zfs/zio.c:3278
/openzfs#478 0x00007f03c808285b in __zio_execute (zio=0x7f03b4014210) at ../../module/zfs/zio.c:1399
/openzfs#479 zio_notify_parent (wait=ZIO_WAIT_DONE, zio=0x7f03b4014620, pio=0x7f03b4014210) at ../../module/zfs/zio.c:547
/openzfs#480 zio_done (zio=0x7f03b4014620) at ../../module/zfs/zio.c:3278
/openzfs#481 0x00007f03c807a6d3 in __zio_execute (zio=0x7f03b4014620) at ../../module/zfs/zio.c:1399
/openzfs#482 zio_execute (zio=zio@entry=0x7f03b4014620) at ../../module/zfs/zio.c:1337
/openzfs#483 0x00007f03c8060b35 in vdev_queue_io_to_issue (vq=vq@entry=0x99f8a8) at ../../module/zfs/vdev_queue.c:706
/openzfs#484 0x00007f03c806106e in vdev_queue_io (zio=zio@entry=0x7f0350002be0) at ../../module/zfs/vdev_queue.c:747
/openzfs#485 0x00007f03c80818c1 in zio_vdev_io_start (zio=0x7f0350002be0) at ../../module/zfs/zio.c:2659
/openzfs#486 0x00007f03c807f243 in __zio_execute (zio=0x7f0350002be0) at ../../module/zfs/zio.c:1399
/openzfs#487 zio_nowait (zio=0x7f0350002be0) at ../../module/zfs/zio.c:1456
/openzfs#488 0x00007f03c805f71b in vdev_mirror_io_start (zio=0x7f0350002810) at ../../module/zfs/vdev_mirror.c:374
/openzfs#489 0x00007f03c807f243 in __zio_execute (zio=0x7f0350002810) at ../../module/zfs/zio.c:1399
/openzfs#490 zio_nowait (zio=0x7f0350002810) at ../../module/zfs/zio.c:1456
/openzfs#491 0x00007f03c8064593 in vdev_raidz_io_start (zio=0x7f0350001270) at ../../module/zfs/vdev_raidz.c:1591
/openzfs#492 0x00007f03c807f243 in __zio_execute (zio=0x7f0350001270) at ../../module/zfs/zio.c:1399
/openzfs#493 zio_nowait (zio=0x7f0350001270) at ../../module/zfs/zio.c:1456
/openzfs#494 0x00007f03c805f71b in vdev_mirror_io_start (zio=0x7f0350001e60) at ../../module/zfs/vdev_mirror.c:374
/openzfs#495 0x00007f03c807f243 in __zio_execute (zio=0x7f0350001e60) at ../../module/zfs/zio.c:1399
/openzfs#496 zio_nowait (zio=0x7f0350001e60) at ../../module/zfs/zio.c:1456
/openzfs#497 0x00007f03c805ed43 in vdev_mirror_io_done (zio=0x7f033a0c39c0) at ../../module/zfs/vdev_mirror.c:499
/openzfs#498 0x00007f03c807a0c0 in zio_vdev_io_done (zio=0x7f033a0c39c0) at ../../module/zfs/zio.c:2707
/openzfs#499 0x00007f03c808285b in __zio_execute (zio=0x7f033a0c39c0) at ../../module/zfs/zio.c:1399
/openzfs#500 zio_notify_parent (wait=ZIO_WAIT_DONE, zio=0x7f03a8003c00, pio=0x7f033a0c39c0) at ../../module/zfs/zio.c:547
/openzfs#501 zio_done (zio=0x7f03a8003c00) at ../../module/zfs/zio.c:3278
/openzfs#502 0x00007f03c808285b in __zio_execute (zio=0x7f03a8003c00) at ../../module/zfs/zio.c:1399
/openzfs#503 zio_notify_parent (wait=ZIO_WAIT_DONE, zio=0x7f038800c400, pio=0x7f03a8003c00) at ../../module/zfs/zio.c:547
/openzfs#504 zio_done (zio=0x7f038800c400) at ../../module/zfs/zio.c:3278
/openzfs#505 0x00007f03c808285b in __zio_execute (zio=0x7f038800c400) at ../../module/zfs/zio.c:1399
/openzfs#506 zio_notify_parent (wait=ZIO_WAIT_DONE, zio=0x7f038800da00, pio=0x7f038800c400) at ../../module/zfs/zio.c:547
/openzfs#507 zio_done (zio=0x7f038800da00) at ../../module/zfs/zio.c:3278
/openzfs#508 0x00007f03c808285b in __zio_execute (zio=0x7f038800da00) at ../../module/zfs/zio.c:1399
/openzfs#509 zio_notify_parent (wait=ZIO_WAIT_DONE, zio=0x7f038800fd80, pio=0x7f038800da00) at ../../module/zfs/zio.c:547
/openzfs#510 zio_done (zio=0x7f038800fd80) at ../../module/zfs/zio.c:3278
/openzfs#511 0x00007f03c807a6d3 in __zio_execute (zio=0x7f038800fd80) at ../../module/zfs/zio.c:1399
/openzfs#512 zio_execute (zio=zio@entry=0x7f038800fd80) at ../../module/zfs/zio.c:1337
/openzfs#513 0x00007f03c8060b35 in vdev_queue_io_to_issue (vq=vq@entry=0x99f8a8) at ../../module/zfs/vdev_queue.c:706
/openzfs#514 0x00007f03c806119d in vdev_queue_io_done (zio=zio@entry=0x7f03a0010950) at ../../module/zfs/vdev_queue.c:775
/openzfs#515 0x00007f03c807a0e8 in zio_vdev_io_done (zio=0x7f03a0010950) at ../../module/zfs/zio.c:2686
/openzfs#516 0x00007f03c807a6d3 in __zio_execute (zio=0x7f03a0010950) at ../../module/zfs/zio.c:1399
/openzfs#517 zio_execute (zio=0x7f03a0010950) at ../../module/zfs/zio.c:1337
/openzfs#518 0x00007f03c7fcd0c4 in taskq_thread (arg=0x966d50) at ../../lib/libzpool/taskq.c:215
/openzfs#519 0x00007f03c7fc7937 in zk_thread_helper (arg=0x967e90) at ../../lib/libzpool/kernel.c:135
/openzfs#520 0x00007f03c78890a3 in start_thread (arg=0x7f03c2703700) at pthread_create.c:309
/openzfs#521 0x00007f03c75c50fd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111

The backtrace was an infinite loop of `vdev_queue_io_to_issue()` invoking
`zio_execute()` until it overran the stack. vdev_queue_io_to_issue() will ony
invoke `zio_execute()` on raidz vdevs when aggregation I/Os are generated to
improve aggregation continuity. These I/Os do not trigger any writes. However,
it appears that they can be generated in such a way that they recurse
infinitely upon return to `vdev_queue_io_to_issue()`. As a consequence, we see
the number of parents by 1 each time the recursion returns to
`vdev_raidz_io_start()`.

Signed-off-by: Richard Yao <[email protected]>
ryao added a commit to ryao/zfs that referenced this issue Oct 9, 2014
The below excerpt of a backtrace is from a ztest failure when running
ZoL's ztest.

/openzfs#453 0x00007f03c8060b35 in vdev_queue_io_to_issue (vq=vq@entry=0x99f8a8) at ../../module/zfs/vdev_queue.c:706
/openzfs#454 0x00007f03c806106e in vdev_queue_io (zio=zio@entry=0x7f0350003de0) at ../../module/zfs/vdev_queue.c:747
/openzfs#455 0x00007f03c80818c1 in zio_vdev_io_start (zio=0x7f0350003de0) at ../../module/zfs/zio.c:2659
/openzfs#456 0x00007f03c807f243 in __zio_execute (zio=0x7f0350003de0) at ../../module/zfs/zio.c:1399
/openzfs#457 zio_nowait (zio=0x7f0350003de0) at ../../module/zfs/zio.c:1456
/openzfs#458 0x00007f03c805f71b in vdev_mirror_io_start (zio=0x7f0350003a10) at ../../module/zfs/vdev_mirror.c:374
/openzfs#459 0x00007f03c807f243 in __zio_execute (zio=0x7f0350003a10) at ../../module/zfs/zio.c:1399
/openzfs#460 zio_nowait (zio=0x7f0350003a10) at ../../module/zfs/zio.c:1456
/openzfs#461 0x00007f03c806464c in vdev_raidz_io_start (zio=0x7f0350003380) at ../../module/zfs/vdev_raidz.c:1607
/openzfs#462 0x00007f03c807f243 in __zio_execute (zio=0x7f0350003380) at ../../module/zfs/zio.c:1399
/openzfs#463 zio_nowait (zio=0x7f0350003380) at ../../module/zfs/zio.c:1456
/openzfs#464 0x00007f03c805f71b in vdev_mirror_io_start (zio=0x7f0350002fb0) at ../../module/zfs/vdev_mirror.c:374
/openzfs#465 0x00007f03c807f243 in __zio_execute (zio=0x7f0350002fb0) at ../../module/zfs/zio.c:1399
/openzfs#466 zio_nowait (zio=0x7f0350002fb0) at ../../module/zfs/zio.c:1456
/openzfs#467 0x00007f03c805ed43 in vdev_mirror_io_done (zio=0x7f033957ebf0) at ../../module/zfs/vdev_mirror.c:499
/openzfs#468 0x00007f03c807a0c0 in zio_vdev_io_done (zio=0x7f033957ebf0) at ../../module/zfs/zio.c:2707
/openzfs#469 0x00007f03c808285b in __zio_execute (zio=0x7f033957ebf0) at ../../module/zfs/zio.c:1399
/openzfs#470 zio_notify_parent (wait=ZIO_WAIT_DONE, zio=0x7f0390001330, pio=0x7f033957ebf0) at ../../module/zfs/zio.c:547
/openzfs#471 zio_done (zio=0x7f0390001330) at ../../module/zfs/zio.c:3278
/openzfs#472 0x00007f03c808285b in __zio_execute (zio=0x7f0390001330) at ../../module/zfs/zio.c:1399
/openzfs#473 zio_notify_parent (wait=ZIO_WAIT_DONE, zio=0x7f03b4013a00, pio=0x7f0390001330) at ../../module/zfs/zio.c:547
/openzfs#474 zio_done (zio=0x7f03b4013a00) at ../../module/zfs/zio.c:3278
/openzfs#475 0x00007f03c808285b in __zio_execute (zio=0x7f03b4013a00) at ../../module/zfs/zio.c:1399
/openzfs#476 zio_notify_parent (wait=ZIO_WAIT_DONE, zio=0x7f03b4014210, pio=0x7f03b4013a00) at ../../module/zfs/zio.c:547
/openzfs#477 zio_done (zio=0x7f03b4014210) at ../../module/zfs/zio.c:3278
/openzfs#478 0x00007f03c808285b in __zio_execute (zio=0x7f03b4014210) at ../../module/zfs/zio.c:1399
/openzfs#479 zio_notify_parent (wait=ZIO_WAIT_DONE, zio=0x7f03b4014620, pio=0x7f03b4014210) at ../../module/zfs/zio.c:547
/openzfs#480 zio_done (zio=0x7f03b4014620) at ../../module/zfs/zio.c:3278
/openzfs#481 0x00007f03c807a6d3 in __zio_execute (zio=0x7f03b4014620) at ../../module/zfs/zio.c:1399
/openzfs#482 zio_execute (zio=zio@entry=0x7f03b4014620) at ../../module/zfs/zio.c:1337
/openzfs#483 0x00007f03c8060b35 in vdev_queue_io_to_issue (vq=vq@entry=0x99f8a8) at ../../module/zfs/vdev_queue.c:706
/openzfs#484 0x00007f03c806106e in vdev_queue_io (zio=zio@entry=0x7f0350002be0) at ../../module/zfs/vdev_queue.c:747
/openzfs#485 0x00007f03c80818c1 in zio_vdev_io_start (zio=0x7f0350002be0) at ../../module/zfs/zio.c:2659
/openzfs#486 0x00007f03c807f243 in __zio_execute (zio=0x7f0350002be0) at ../../module/zfs/zio.c:1399
/openzfs#487 zio_nowait (zio=0x7f0350002be0) at ../../module/zfs/zio.c:1456
/openzfs#488 0x00007f03c805f71b in vdev_mirror_io_start (zio=0x7f0350002810) at ../../module/zfs/vdev_mirror.c:374
/openzfs#489 0x00007f03c807f243 in __zio_execute (zio=0x7f0350002810) at ../../module/zfs/zio.c:1399
/openzfs#490 zio_nowait (zio=0x7f0350002810) at ../../module/zfs/zio.c:1456
/openzfs#491 0x00007f03c8064593 in vdev_raidz_io_start (zio=0x7f0350001270) at ../../module/zfs/vdev_raidz.c:1591
/openzfs#492 0x00007f03c807f243 in __zio_execute (zio=0x7f0350001270) at ../../module/zfs/zio.c:1399
/openzfs#493 zio_nowait (zio=0x7f0350001270) at ../../module/zfs/zio.c:1456
/openzfs#494 0x00007f03c805f71b in vdev_mirror_io_start (zio=0x7f0350001e60) at ../../module/zfs/vdev_mirror.c:374
/openzfs#495 0x00007f03c807f243 in __zio_execute (zio=0x7f0350001e60) at ../../module/zfs/zio.c:1399
/openzfs#496 zio_nowait (zio=0x7f0350001e60) at ../../module/zfs/zio.c:1456
/openzfs#497 0x00007f03c805ed43 in vdev_mirror_io_done (zio=0x7f033a0c39c0) at ../../module/zfs/vdev_mirror.c:499
/openzfs#498 0x00007f03c807a0c0 in zio_vdev_io_done (zio=0x7f033a0c39c0) at ../../module/zfs/zio.c:2707
/openzfs#499 0x00007f03c808285b in __zio_execute (zio=0x7f033a0c39c0) at ../../module/zfs/zio.c:1399
/openzfs#500 zio_notify_parent (wait=ZIO_WAIT_DONE, zio=0x7f03a8003c00, pio=0x7f033a0c39c0) at ../../module/zfs/zio.c:547
/openzfs#501 zio_done (zio=0x7f03a8003c00) at ../../module/zfs/zio.c:3278
/openzfs#502 0x00007f03c808285b in __zio_execute (zio=0x7f03a8003c00) at ../../module/zfs/zio.c:1399
/openzfs#503 zio_notify_parent (wait=ZIO_WAIT_DONE, zio=0x7f038800c400, pio=0x7f03a8003c00) at ../../module/zfs/zio.c:547
/openzfs#504 zio_done (zio=0x7f038800c400) at ../../module/zfs/zio.c:3278
/openzfs#505 0x00007f03c808285b in __zio_execute (zio=0x7f038800c400) at ../../module/zfs/zio.c:1399
/openzfs#506 zio_notify_parent (wait=ZIO_WAIT_DONE, zio=0x7f038800da00, pio=0x7f038800c400) at ../../module/zfs/zio.c:547
/openzfs#507 zio_done (zio=0x7f038800da00) at ../../module/zfs/zio.c:3278
/openzfs#508 0x00007f03c808285b in __zio_execute (zio=0x7f038800da00) at ../../module/zfs/zio.c:1399
/openzfs#509 zio_notify_parent (wait=ZIO_WAIT_DONE, zio=0x7f038800fd80, pio=0x7f038800da00) at ../../module/zfs/zio.c:547
/openzfs#510 zio_done (zio=0x7f038800fd80) at ../../module/zfs/zio.c:3278
/openzfs#511 0x00007f03c807a6d3 in __zio_execute (zio=0x7f038800fd80) at ../../module/zfs/zio.c:1399
/openzfs#512 zio_execute (zio=zio@entry=0x7f038800fd80) at ../../module/zfs/zio.c:1337
/openzfs#513 0x00007f03c8060b35 in vdev_queue_io_to_issue (vq=vq@entry=0x99f8a8) at ../../module/zfs/vdev_queue.c:706
/openzfs#514 0x00007f03c806119d in vdev_queue_io_done (zio=zio@entry=0x7f03a0010950) at ../../module/zfs/vdev_queue.c:775
/openzfs#515 0x00007f03c807a0e8 in zio_vdev_io_done (zio=0x7f03a0010950) at ../../module/zfs/zio.c:2686
/openzfs#516 0x00007f03c807a6d3 in __zio_execute (zio=0x7f03a0010950) at ../../module/zfs/zio.c:1399
/openzfs#517 zio_execute (zio=0x7f03a0010950) at ../../module/zfs/zio.c:1337
/openzfs#518 0x00007f03c7fcd0c4 in taskq_thread (arg=0x966d50) at ../../lib/libzpool/taskq.c:215
/openzfs#519 0x00007f03c7fc7937 in zk_thread_helper (arg=0x967e90) at ../../lib/libzpool/kernel.c:135
/openzfs#520 0x00007f03c78890a3 in start_thread (arg=0x7f03c2703700) at pthread_create.c:309
/openzfs#521 0x00007f03c75c50fd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111

The backtrace was an infinite loop of `vdev_queue_io_to_issue()` invoking
`zio_execute()` until it overran the stack. vdev_queue_io_to_issue() will ony
invoke `zio_execute()` on raidz vdevs when aggregation I/Os are generated to
improve aggregation continuity. These I/Os do not trigger any writes. However,
it appears that they can be generated in such a way that they recurse
infinitely upon return to `vdev_queue_io_to_issue()`.

Signed-off-by: Richard Yao <[email protected]>
ryao added a commit to ryao/zfs that referenced this issue Oct 10, 2014
The below excerpt of a backtrace is from a ztest failure when running ZoL's
ztest:

/openzfs#453 0x00007f03c8060b35 in vdev_queue_io_to_issue (vq=vq@entry=0x99f8a8) at ../../module/zfs/vdev_queue.c:706
/openzfs#454 0x00007f03c806106e in vdev_queue_io (zio=zio@entry=0x7f0350003de0) at ../../module/zfs/vdev_queue.c:747
/openzfs#455 0x00007f03c80818c1 in zio_vdev_io_start (zio=0x7f0350003de0) at ../../module/zfs/zio.c:2659
/openzfs#456 0x00007f03c807f243 in __zio_execute (zio=0x7f0350003de0) at ../../module/zfs/zio.c:1399
/openzfs#457 zio_nowait (zio=0x7f0350003de0) at ../../module/zfs/zio.c:1456
/openzfs#458 0x00007f03c805f71b in vdev_mirror_io_start (zio=0x7f0350003a10) at ../../module/zfs/vdev_mirror.c:374
/openzfs#459 0x00007f03c807f243 in __zio_execute (zio=0x7f0350003a10) at ../../module/zfs/zio.c:1399
/openzfs#460 zio_nowait (zio=0x7f0350003a10) at ../../module/zfs/zio.c:1456
/openzfs#461 0x00007f03c806464c in vdev_raidz_io_start (zio=0x7f0350003380) at ../../module/zfs/vdev_raidz.c:1607
/openzfs#462 0x00007f03c807f243 in __zio_execute (zio=0x7f0350003380) at ../../module/zfs/zio.c:1399
/openzfs#463 zio_nowait (zio=0x7f0350003380) at ../../module/zfs/zio.c:1456
/openzfs#464 0x00007f03c805f71b in vdev_mirror_io_start (zio=0x7f0350002fb0) at ../../module/zfs/vdev_mirror.c:374
/openzfs#465 0x00007f03c807f243 in __zio_execute (zio=0x7f0350002fb0) at ../../module/zfs/zio.c:1399
/openzfs#466 zio_nowait (zio=0x7f0350002fb0) at ../../module/zfs/zio.c:1456
/openzfs#467 0x00007f03c805ed43 in vdev_mirror_io_done (zio=0x7f033957ebf0) at ../../module/zfs/vdev_mirror.c:499
/openzfs#468 0x00007f03c807a0c0 in zio_vdev_io_done (zio=0x7f033957ebf0) at ../../module/zfs/zio.c:2707
/openzfs#469 0x00007f03c808285b in __zio_execute (zio=0x7f033957ebf0) at ../../module/zfs/zio.c:1399
/openzfs#470 zio_notify_parent (wait=ZIO_WAIT_DONE, zio=0x7f0390001330, pio=0x7f033957ebf0) at ../../module/zfs/zio.c:547
/openzfs#471 zio_done (zio=0x7f0390001330) at ../../module/zfs/zio.c:3278
/openzfs#472 0x00007f03c808285b in __zio_execute (zio=0x7f0390001330) at ../../module/zfs/zio.c:1399
/openzfs#473 zio_notify_parent (wait=ZIO_WAIT_DONE, zio=0x7f03b4013a00, pio=0x7f0390001330) at ../../module/zfs/zio.c:547
/openzfs#474 zio_done (zio=0x7f03b4013a00) at ../../module/zfs/zio.c:3278
/openzfs#475 0x00007f03c808285b in __zio_execute (zio=0x7f03b4013a00) at ../../module/zfs/zio.c:1399
/openzfs#476 zio_notify_parent (wait=ZIO_WAIT_DONE, zio=0x7f03b4014210, pio=0x7f03b4013a00) at ../../module/zfs/zio.c:547
/openzfs#477 zio_done (zio=0x7f03b4014210) at ../../module/zfs/zio.c:3278
/openzfs#478 0x00007f03c808285b in __zio_execute (zio=0x7f03b4014210) at ../../module/zfs/zio.c:1399
/openzfs#479 zio_notify_parent (wait=ZIO_WAIT_DONE, zio=0x7f03b4014620, pio=0x7f03b4014210) at ../../module/zfs/zio.c:547
/openzfs#480 zio_done (zio=0x7f03b4014620) at ../../module/zfs/zio.c:3278
/openzfs#481 0x00007f03c807a6d3 in __zio_execute (zio=0x7f03b4014620) at ../../module/zfs/zio.c:1399
/openzfs#482 zio_execute (zio=zio@entry=0x7f03b4014620) at ../../module/zfs/zio.c:1337
/openzfs#483 0x00007f03c8060b35 in vdev_queue_io_to_issue (vq=vq@entry=0x99f8a8) at ../../module/zfs/vdev_queue.c:706
/openzfs#484 0x00007f03c806106e in vdev_queue_io (zio=zio@entry=0x7f0350002be0) at ../../module/zfs/vdev_queue.c:747
/openzfs#485 0x00007f03c80818c1 in zio_vdev_io_start (zio=0x7f0350002be0) at ../../module/zfs/zio.c:2659
/openzfs#486 0x00007f03c807f243 in __zio_execute (zio=0x7f0350002be0) at ../../module/zfs/zio.c:1399
/openzfs#487 zio_nowait (zio=0x7f0350002be0) at ../../module/zfs/zio.c:1456
/openzfs#488 0x00007f03c805f71b in vdev_mirror_io_start (zio=0x7f0350002810) at ../../module/zfs/vdev_mirror.c:374
/openzfs#489 0x00007f03c807f243 in __zio_execute (zio=0x7f0350002810) at ../../module/zfs/zio.c:1399
/openzfs#490 zio_nowait (zio=0x7f0350002810) at ../../module/zfs/zio.c:1456
/openzfs#491 0x00007f03c8064593 in vdev_raidz_io_start (zio=0x7f0350001270) at ../../module/zfs/vdev_raidz.c:1591
/openzfs#492 0x00007f03c807f243 in __zio_execute (zio=0x7f0350001270) at ../../module/zfs/zio.c:1399
/openzfs#493 zio_nowait (zio=0x7f0350001270) at ../../module/zfs/zio.c:1456
/openzfs#494 0x00007f03c805f71b in vdev_mirror_io_start (zio=0x7f0350001e60) at ../../module/zfs/vdev_mirror.c:374
/openzfs#495 0x00007f03c807f243 in __zio_execute (zio=0x7f0350001e60) at ../../module/zfs/zio.c:1399
/openzfs#496 zio_nowait (zio=0x7f0350001e60) at ../../module/zfs/zio.c:1456
/openzfs#497 0x00007f03c805ed43 in vdev_mirror_io_done (zio=0x7f033a0c39c0) at ../../module/zfs/vdev_mirror.c:499
/openzfs#498 0x00007f03c807a0c0 in zio_vdev_io_done (zio=0x7f033a0c39c0) at ../../module/zfs/zio.c:2707
/openzfs#499 0x00007f03c808285b in __zio_execute (zio=0x7f033a0c39c0) at ../../module/zfs/zio.c:1399
/openzfs#500 zio_notify_parent (wait=ZIO_WAIT_DONE, zio=0x7f03a8003c00, pio=0x7f033a0c39c0) at ../../module/zfs/zio.c:547
/openzfs#501 zio_done (zio=0x7f03a8003c00) at ../../module/zfs/zio.c:3278
/openzfs#502 0x00007f03c808285b in __zio_execute (zio=0x7f03a8003c00) at ../../module/zfs/zio.c:1399
/openzfs#503 zio_notify_parent (wait=ZIO_WAIT_DONE, zio=0x7f038800c400, pio=0x7f03a8003c00) at ../../module/zfs/zio.c:547
/openzfs#504 zio_done (zio=0x7f038800c400) at ../../module/zfs/zio.c:3278
/openzfs#505 0x00007f03c808285b in __zio_execute (zio=0x7f038800c400) at ../../module/zfs/zio.c:1399
/openzfs#506 zio_notify_parent (wait=ZIO_WAIT_DONE, zio=0x7f038800da00, pio=0x7f038800c400) at ../../module/zfs/zio.c:547
/openzfs#507 zio_done (zio=0x7f038800da00) at ../../module/zfs/zio.c:3278
/openzfs#508 0x00007f03c808285b in __zio_execute (zio=0x7f038800da00) at ../../module/zfs/zio.c:1399
/openzfs#509 zio_notify_parent (wait=ZIO_WAIT_DONE, zio=0x7f038800fd80, pio=0x7f038800da00) at ../../module/zfs/zio.c:547
/openzfs#510 zio_done (zio=0x7f038800fd80) at ../../module/zfs/zio.c:3278
/openzfs#511 0x00007f03c807a6d3 in __zio_execute (zio=0x7f038800fd80) at ../../module/zfs/zio.c:1399
/openzfs#512 zio_execute (zio=zio@entry=0x7f038800fd80) at ../../module/zfs/zio.c:1337
/openzfs#513 0x00007f03c8060b35 in vdev_queue_io_to_issue (vq=vq@entry=0x99f8a8) at ../../module/zfs/vdev_queue.c:706
/openzfs#514 0x00007f03c806119d in vdev_queue_io_done (zio=zio@entry=0x7f03a0010950) at ../../module/zfs/vdev_queue.c:775
/openzfs#515 0x00007f03c807a0e8 in zio_vdev_io_done (zio=0x7f03a0010950) at ../../module/zfs/zio.c:2686
/openzfs#516 0x00007f03c807a6d3 in __zio_execute (zio=0x7f03a0010950) at ../../module/zfs/zio.c:1399
/openzfs#517 zio_execute (zio=0x7f03a0010950) at ../../module/zfs/zio.c:1337
/openzfs#518 0x00007f03c7fcd0c4 in taskq_thread (arg=0x966d50) at ../../lib/libzpool/taskq.c:215
/openzfs#519 0x00007f03c7fc7937 in zk_thread_helper (arg=0x967e90) at ../../lib/libzpool/kernel.c:135
/openzfs#520 0x00007f03c78890a3 in start_thread (arg=0x7f03c2703700) at pthread_create.c:309
/openzfs#521 0x00007f03c75c50fd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111

This occurred when ztest was simulating a scrub under heavy I/O load. Under
those circumstances, it was possible for a mix of noop I/Os for aggregation
continuity and the I/O elevator to generate arbitrarily deep recursion.

This patch modifies ZFS to propapage a recursion counter inside the zio_t
objects such that IOs will be redispatched upon reaching a given recursion
depth.  We can detect long call chains and dispatch to another ZIO taskq. We
cut in-line when we do this to minimize the potential for taskq exhaustion that
can prevent a zio from notifying its parent.

Signed-off-by: Richard Yao <[email protected]>
ryao added a commit to ryao/zfs that referenced this issue Oct 10, 2014
The below excerpt of a backtrace is from a ztest failure when running ZoL's
ztest:

/openzfs#453 0x00007f03c8060b35 in vdev_queue_io_to_issue (vq=vq@entry=0x99f8a8) at ../../module/zfs/vdev_queue.c:706
/openzfs#454 0x00007f03c806106e in vdev_queue_io (zio=zio@entry=0x7f0350003de0) at ../../module/zfs/vdev_queue.c:747
/openzfs#455 0x00007f03c80818c1 in zio_vdev_io_start (zio=0x7f0350003de0) at ../../module/zfs/zio.c:2659
/openzfs#456 0x00007f03c807f243 in __zio_execute (zio=0x7f0350003de0) at ../../module/zfs/zio.c:1399
/openzfs#457 zio_nowait (zio=0x7f0350003de0) at ../../module/zfs/zio.c:1456
/openzfs#458 0x00007f03c805f71b in vdev_mirror_io_start (zio=0x7f0350003a10) at ../../module/zfs/vdev_mirror.c:374
/openzfs#459 0x00007f03c807f243 in __zio_execute (zio=0x7f0350003a10) at ../../module/zfs/zio.c:1399
/openzfs#460 zio_nowait (zio=0x7f0350003a10) at ../../module/zfs/zio.c:1456
/openzfs#461 0x00007f03c806464c in vdev_raidz_io_start (zio=0x7f0350003380) at ../../module/zfs/vdev_raidz.c:1607
/openzfs#462 0x00007f03c807f243 in __zio_execute (zio=0x7f0350003380) at ../../module/zfs/zio.c:1399
/openzfs#463 zio_nowait (zio=0x7f0350003380) at ../../module/zfs/zio.c:1456
/openzfs#464 0x00007f03c805f71b in vdev_mirror_io_start (zio=0x7f0350002fb0) at ../../module/zfs/vdev_mirror.c:374
/openzfs#465 0x00007f03c807f243 in __zio_execute (zio=0x7f0350002fb0) at ../../module/zfs/zio.c:1399
/openzfs#466 zio_nowait (zio=0x7f0350002fb0) at ../../module/zfs/zio.c:1456
/openzfs#467 0x00007f03c805ed43 in vdev_mirror_io_done (zio=0x7f033957ebf0) at ../../module/zfs/vdev_mirror.c:499
/openzfs#468 0x00007f03c807a0c0 in zio_vdev_io_done (zio=0x7f033957ebf0) at ../../module/zfs/zio.c:2707
/openzfs#469 0x00007f03c808285b in __zio_execute (zio=0x7f033957ebf0) at ../../module/zfs/zio.c:1399
/openzfs#470 zio_notify_parent (wait=ZIO_WAIT_DONE, zio=0x7f0390001330, pio=0x7f033957ebf0) at ../../module/zfs/zio.c:547
/openzfs#471 zio_done (zio=0x7f0390001330) at ../../module/zfs/zio.c:3278
/openzfs#472 0x00007f03c808285b in __zio_execute (zio=0x7f0390001330) at ../../module/zfs/zio.c:1399
/openzfs#473 zio_notify_parent (wait=ZIO_WAIT_DONE, zio=0x7f03b4013a00, pio=0x7f0390001330) at ../../module/zfs/zio.c:547
/openzfs#474 zio_done (zio=0x7f03b4013a00) at ../../module/zfs/zio.c:3278
/openzfs#475 0x00007f03c808285b in __zio_execute (zio=0x7f03b4013a00) at ../../module/zfs/zio.c:1399
/openzfs#476 zio_notify_parent (wait=ZIO_WAIT_DONE, zio=0x7f03b4014210, pio=0x7f03b4013a00) at ../../module/zfs/zio.c:547
/openzfs#477 zio_done (zio=0x7f03b4014210) at ../../module/zfs/zio.c:3278
/openzfs#478 0x00007f03c808285b in __zio_execute (zio=0x7f03b4014210) at ../../module/zfs/zio.c:1399
/openzfs#479 zio_notify_parent (wait=ZIO_WAIT_DONE, zio=0x7f03b4014620, pio=0x7f03b4014210) at ../../module/zfs/zio.c:547
/openzfs#480 zio_done (zio=0x7f03b4014620) at ../../module/zfs/zio.c:3278
/openzfs#481 0x00007f03c807a6d3 in __zio_execute (zio=0x7f03b4014620) at ../../module/zfs/zio.c:1399
/openzfs#482 zio_execute (zio=zio@entry=0x7f03b4014620) at ../../module/zfs/zio.c:1337
/openzfs#483 0x00007f03c8060b35 in vdev_queue_io_to_issue (vq=vq@entry=0x99f8a8) at ../../module/zfs/vdev_queue.c:706
/openzfs#484 0x00007f03c806106e in vdev_queue_io (zio=zio@entry=0x7f0350002be0) at ../../module/zfs/vdev_queue.c:747
/openzfs#485 0x00007f03c80818c1 in zio_vdev_io_start (zio=0x7f0350002be0) at ../../module/zfs/zio.c:2659
/openzfs#486 0x00007f03c807f243 in __zio_execute (zio=0x7f0350002be0) at ../../module/zfs/zio.c:1399
/openzfs#487 zio_nowait (zio=0x7f0350002be0) at ../../module/zfs/zio.c:1456
/openzfs#488 0x00007f03c805f71b in vdev_mirror_io_start (zio=0x7f0350002810) at ../../module/zfs/vdev_mirror.c:374
/openzfs#489 0x00007f03c807f243 in __zio_execute (zio=0x7f0350002810) at ../../module/zfs/zio.c:1399
/openzfs#490 zio_nowait (zio=0x7f0350002810) at ../../module/zfs/zio.c:1456
/openzfs#491 0x00007f03c8064593 in vdev_raidz_io_start (zio=0x7f0350001270) at ../../module/zfs/vdev_raidz.c:1591
/openzfs#492 0x00007f03c807f243 in __zio_execute (zio=0x7f0350001270) at ../../module/zfs/zio.c:1399
/openzfs#493 zio_nowait (zio=0x7f0350001270) at ../../module/zfs/zio.c:1456
/openzfs#494 0x00007f03c805f71b in vdev_mirror_io_start (zio=0x7f0350001e60) at ../../module/zfs/vdev_mirror.c:374
/openzfs#495 0x00007f03c807f243 in __zio_execute (zio=0x7f0350001e60) at ../../module/zfs/zio.c:1399
/openzfs#496 zio_nowait (zio=0x7f0350001e60) at ../../module/zfs/zio.c:1456
/openzfs#497 0x00007f03c805ed43 in vdev_mirror_io_done (zio=0x7f033a0c39c0) at ../../module/zfs/vdev_mirror.c:499
/openzfs#498 0x00007f03c807a0c0 in zio_vdev_io_done (zio=0x7f033a0c39c0) at ../../module/zfs/zio.c:2707
/openzfs#499 0x00007f03c808285b in __zio_execute (zio=0x7f033a0c39c0) at ../../module/zfs/zio.c:1399
/openzfs#500 zio_notify_parent (wait=ZIO_WAIT_DONE, zio=0x7f03a8003c00, pio=0x7f033a0c39c0) at ../../module/zfs/zio.c:547
/openzfs#501 zio_done (zio=0x7f03a8003c00) at ../../module/zfs/zio.c:3278
/openzfs#502 0x00007f03c808285b in __zio_execute (zio=0x7f03a8003c00) at ../../module/zfs/zio.c:1399
/openzfs#503 zio_notify_parent (wait=ZIO_WAIT_DONE, zio=0x7f038800c400, pio=0x7f03a8003c00) at ../../module/zfs/zio.c:547
/openzfs#504 zio_done (zio=0x7f038800c400) at ../../module/zfs/zio.c:3278
/openzfs#505 0x00007f03c808285b in __zio_execute (zio=0x7f038800c400) at ../../module/zfs/zio.c:1399
/openzfs#506 zio_notify_parent (wait=ZIO_WAIT_DONE, zio=0x7f038800da00, pio=0x7f038800c400) at ../../module/zfs/zio.c:547
/openzfs#507 zio_done (zio=0x7f038800da00) at ../../module/zfs/zio.c:3278
/openzfs#508 0x00007f03c808285b in __zio_execute (zio=0x7f038800da00) at ../../module/zfs/zio.c:1399
/openzfs#509 zio_notify_parent (wait=ZIO_WAIT_DONE, zio=0x7f038800fd80, pio=0x7f038800da00) at ../../module/zfs/zio.c:547
/openzfs#510 zio_done (zio=0x7f038800fd80) at ../../module/zfs/zio.c:3278
/openzfs#511 0x00007f03c807a6d3 in __zio_execute (zio=0x7f038800fd80) at ../../module/zfs/zio.c:1399
/openzfs#512 zio_execute (zio=zio@entry=0x7f038800fd80) at ../../module/zfs/zio.c:1337
/openzfs#513 0x00007f03c8060b35 in vdev_queue_io_to_issue (vq=vq@entry=0x99f8a8) at ../../module/zfs/vdev_queue.c:706
/openzfs#514 0x00007f03c806119d in vdev_queue_io_done (zio=zio@entry=0x7f03a0010950) at ../../module/zfs/vdev_queue.c:775
/openzfs#515 0x00007f03c807a0e8 in zio_vdev_io_done (zio=0x7f03a0010950) at ../../module/zfs/zio.c:2686
/openzfs#516 0x00007f03c807a6d3 in __zio_execute (zio=0x7f03a0010950) at ../../module/zfs/zio.c:1399
/openzfs#517 zio_execute (zio=0x7f03a0010950) at ../../module/zfs/zio.c:1337
/openzfs#518 0x00007f03c7fcd0c4 in taskq_thread (arg=0x966d50) at ../../lib/libzpool/taskq.c:215
/openzfs#519 0x00007f03c7fc7937 in zk_thread_helper (arg=0x967e90) at ../../lib/libzpool/kernel.c:135
/openzfs#520 0x00007f03c78890a3 in start_thread (arg=0x7f03c2703700) at pthread_create.c:309
/openzfs#521 0x00007f03c75c50fd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111

This occurred when ztest was simulating a scrub under heavy I/O load. Under
those circumstances, it was possible for a mix of noop I/Os for aggregation
continuity and the I/O elevator to generate arbitrarily deep recursion.

This patch modifies ZFS to propapage a recursion counter inside the zio_t
objects such that IOs will be redispatched upon reaching a given recursion
depth.  We can detect long call chains and dispatch to another ZIO taskq. We
cut in-line when we do this to minimize the potential for taskq exhaustion that
can prevent a zio from notifying its parent.

Signed-off-by: Richard Yao <[email protected]>
ryao added a commit to ryao/zfs that referenced this issue Oct 10, 2014
The below excerpt of a backtrace is from a ztest failure when running ZoL's
ztest:

/openzfs#453 0x00007f03c8060b35 in vdev_queue_io_to_issue (vq=vq@entry=0x99f8a8) at ../../module/zfs/vdev_queue.c:706
/openzfs#454 0x00007f03c806106e in vdev_queue_io (zio=zio@entry=0x7f0350003de0) at ../../module/zfs/vdev_queue.c:747
/openzfs#455 0x00007f03c80818c1 in zio_vdev_io_start (zio=0x7f0350003de0) at ../../module/zfs/zio.c:2659
/openzfs#456 0x00007f03c807f243 in __zio_execute (zio=0x7f0350003de0) at ../../module/zfs/zio.c:1399
/openzfs#457 zio_nowait (zio=0x7f0350003de0) at ../../module/zfs/zio.c:1456
/openzfs#458 0x00007f03c805f71b in vdev_mirror_io_start (zio=0x7f0350003a10) at ../../module/zfs/vdev_mirror.c:374
/openzfs#459 0x00007f03c807f243 in __zio_execute (zio=0x7f0350003a10) at ../../module/zfs/zio.c:1399
/openzfs#460 zio_nowait (zio=0x7f0350003a10) at ../../module/zfs/zio.c:1456
/openzfs#461 0x00007f03c806464c in vdev_raidz_io_start (zio=0x7f0350003380) at ../../module/zfs/vdev_raidz.c:1607
/openzfs#462 0x00007f03c807f243 in __zio_execute (zio=0x7f0350003380) at ../../module/zfs/zio.c:1399
/openzfs#463 zio_nowait (zio=0x7f0350003380) at ../../module/zfs/zio.c:1456
/openzfs#464 0x00007f03c805f71b in vdev_mirror_io_start (zio=0x7f0350002fb0) at ../../module/zfs/vdev_mirror.c:374
/openzfs#465 0x00007f03c807f243 in __zio_execute (zio=0x7f0350002fb0) at ../../module/zfs/zio.c:1399
/openzfs#466 zio_nowait (zio=0x7f0350002fb0) at ../../module/zfs/zio.c:1456
/openzfs#467 0x00007f03c805ed43 in vdev_mirror_io_done (zio=0x7f033957ebf0) at ../../module/zfs/vdev_mirror.c:499
/openzfs#468 0x00007f03c807a0c0 in zio_vdev_io_done (zio=0x7f033957ebf0) at ../../module/zfs/zio.c:2707
/openzfs#469 0x00007f03c808285b in __zio_execute (zio=0x7f033957ebf0) at ../../module/zfs/zio.c:1399
/openzfs#470 zio_notify_parent (wait=ZIO_WAIT_DONE, zio=0x7f0390001330, pio=0x7f033957ebf0) at ../../module/zfs/zio.c:547
/openzfs#471 zio_done (zio=0x7f0390001330) at ../../module/zfs/zio.c:3278
/openzfs#472 0x00007f03c808285b in __zio_execute (zio=0x7f0390001330) at ../../module/zfs/zio.c:1399
/openzfs#473 zio_notify_parent (wait=ZIO_WAIT_DONE, zio=0x7f03b4013a00, pio=0x7f0390001330) at ../../module/zfs/zio.c:547
/openzfs#474 zio_done (zio=0x7f03b4013a00) at ../../module/zfs/zio.c:3278
/openzfs#475 0x00007f03c808285b in __zio_execute (zio=0x7f03b4013a00) at ../../module/zfs/zio.c:1399
/openzfs#476 zio_notify_parent (wait=ZIO_WAIT_DONE, zio=0x7f03b4014210, pio=0x7f03b4013a00) at ../../module/zfs/zio.c:547
/openzfs#477 zio_done (zio=0x7f03b4014210) at ../../module/zfs/zio.c:3278
/openzfs#478 0x00007f03c808285b in __zio_execute (zio=0x7f03b4014210) at ../../module/zfs/zio.c:1399
/openzfs#479 zio_notify_parent (wait=ZIO_WAIT_DONE, zio=0x7f03b4014620, pio=0x7f03b4014210) at ../../module/zfs/zio.c:547
/openzfs#480 zio_done (zio=0x7f03b4014620) at ../../module/zfs/zio.c:3278
/openzfs#481 0x00007f03c807a6d3 in __zio_execute (zio=0x7f03b4014620) at ../../module/zfs/zio.c:1399
/openzfs#482 zio_execute (zio=zio@entry=0x7f03b4014620) at ../../module/zfs/zio.c:1337
/openzfs#483 0x00007f03c8060b35 in vdev_queue_io_to_issue (vq=vq@entry=0x99f8a8) at ../../module/zfs/vdev_queue.c:706
/openzfs#484 0x00007f03c806106e in vdev_queue_io (zio=zio@entry=0x7f0350002be0) at ../../module/zfs/vdev_queue.c:747
/openzfs#485 0x00007f03c80818c1 in zio_vdev_io_start (zio=0x7f0350002be0) at ../../module/zfs/zio.c:2659
/openzfs#486 0x00007f03c807f243 in __zio_execute (zio=0x7f0350002be0) at ../../module/zfs/zio.c:1399
/openzfs#487 zio_nowait (zio=0x7f0350002be0) at ../../module/zfs/zio.c:1456
/openzfs#488 0x00007f03c805f71b in vdev_mirror_io_start (zio=0x7f0350002810) at ../../module/zfs/vdev_mirror.c:374
/openzfs#489 0x00007f03c807f243 in __zio_execute (zio=0x7f0350002810) at ../../module/zfs/zio.c:1399
/openzfs#490 zio_nowait (zio=0x7f0350002810) at ../../module/zfs/zio.c:1456
/openzfs#491 0x00007f03c8064593 in vdev_raidz_io_start (zio=0x7f0350001270) at ../../module/zfs/vdev_raidz.c:1591
/openzfs#492 0x00007f03c807f243 in __zio_execute (zio=0x7f0350001270) at ../../module/zfs/zio.c:1399
/openzfs#493 zio_nowait (zio=0x7f0350001270) at ../../module/zfs/zio.c:1456
/openzfs#494 0x00007f03c805f71b in vdev_mirror_io_start (zio=0x7f0350001e60) at ../../module/zfs/vdev_mirror.c:374
/openzfs#495 0x00007f03c807f243 in __zio_execute (zio=0x7f0350001e60) at ../../module/zfs/zio.c:1399
/openzfs#496 zio_nowait (zio=0x7f0350001e60) at ../../module/zfs/zio.c:1456
/openzfs#497 0x00007f03c805ed43 in vdev_mirror_io_done (zio=0x7f033a0c39c0) at ../../module/zfs/vdev_mirror.c:499
/openzfs#498 0x00007f03c807a0c0 in zio_vdev_io_done (zio=0x7f033a0c39c0) at ../../module/zfs/zio.c:2707
/openzfs#499 0x00007f03c808285b in __zio_execute (zio=0x7f033a0c39c0) at ../../module/zfs/zio.c:1399
/openzfs#500 zio_notify_parent (wait=ZIO_WAIT_DONE, zio=0x7f03a8003c00, pio=0x7f033a0c39c0) at ../../module/zfs/zio.c:547
/openzfs#501 zio_done (zio=0x7f03a8003c00) at ../../module/zfs/zio.c:3278
/openzfs#502 0x00007f03c808285b in __zio_execute (zio=0x7f03a8003c00) at ../../module/zfs/zio.c:1399
/openzfs#503 zio_notify_parent (wait=ZIO_WAIT_DONE, zio=0x7f038800c400, pio=0x7f03a8003c00) at ../../module/zfs/zio.c:547
/openzfs#504 zio_done (zio=0x7f038800c400) at ../../module/zfs/zio.c:3278
/openzfs#505 0x00007f03c808285b in __zio_execute (zio=0x7f038800c400) at ../../module/zfs/zio.c:1399
/openzfs#506 zio_notify_parent (wait=ZIO_WAIT_DONE, zio=0x7f038800da00, pio=0x7f038800c400) at ../../module/zfs/zio.c:547
/openzfs#507 zio_done (zio=0x7f038800da00) at ../../module/zfs/zio.c:3278
/openzfs#508 0x00007f03c808285b in __zio_execute (zio=0x7f038800da00) at ../../module/zfs/zio.c:1399
/openzfs#509 zio_notify_parent (wait=ZIO_WAIT_DONE, zio=0x7f038800fd80, pio=0x7f038800da00) at ../../module/zfs/zio.c:547
/openzfs#510 zio_done (zio=0x7f038800fd80) at ../../module/zfs/zio.c:3278
/openzfs#511 0x00007f03c807a6d3 in __zio_execute (zio=0x7f038800fd80) at ../../module/zfs/zio.c:1399
/openzfs#512 zio_execute (zio=zio@entry=0x7f038800fd80) at ../../module/zfs/zio.c:1337
/openzfs#513 0x00007f03c8060b35 in vdev_queue_io_to_issue (vq=vq@entry=0x99f8a8) at ../../module/zfs/vdev_queue.c:706
/openzfs#514 0x00007f03c806119d in vdev_queue_io_done (zio=zio@entry=0x7f03a0010950) at ../../module/zfs/vdev_queue.c:775
/openzfs#515 0x00007f03c807a0e8 in zio_vdev_io_done (zio=0x7f03a0010950) at ../../module/zfs/zio.c:2686
/openzfs#516 0x00007f03c807a6d3 in __zio_execute (zio=0x7f03a0010950) at ../../module/zfs/zio.c:1399
/openzfs#517 zio_execute (zio=0x7f03a0010950) at ../../module/zfs/zio.c:1337
/openzfs#518 0x00007f03c7fcd0c4 in taskq_thread (arg=0x966d50) at ../../lib/libzpool/taskq.c:215
/openzfs#519 0x00007f03c7fc7937 in zk_thread_helper (arg=0x967e90) at ../../lib/libzpool/kernel.c:135
/openzfs#520 0x00007f03c78890a3 in start_thread (arg=0x7f03c2703700) at pthread_create.c:309
/openzfs#521 0x00007f03c75c50fd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111

This occurred when ztest was simulating a scrub under heavy I/O load. Under
those circumstances, it was possible for a mix of noop I/Os for aggregation
continuity and the I/O elevator to generate arbitrarily deep recursion.

This patch modifies ZFS to propapage a recursion counter inside the zio_t
objects such that IOs will be redispatched upon reaching a given recursion
depth.  We can detect long call chains and dispatch to another ZIO taskq. We
cut in-line when we do this to minimize the potential for taskq exhaustion that
can prevent a zio from notifying its parent.

Signed-off-by: Richard Yao <[email protected]>
ryao added a commit to ryao/zfs that referenced this issue Oct 10, 2014
The below excerpt of a backtrace is from a ztest failure when running ZoL's
ztest:

/openzfs#453 0x00007f03c8060b35 in vdev_queue_io_to_issue (vq=vq@entry=0x99f8a8) at ../../module/zfs/vdev_queue.c:706
/openzfs#454 0x00007f03c806106e in vdev_queue_io (zio=zio@entry=0x7f0350003de0) at ../../module/zfs/vdev_queue.c:747
/openzfs#455 0x00007f03c80818c1 in zio_vdev_io_start (zio=0x7f0350003de0) at ../../module/zfs/zio.c:2659
/openzfs#456 0x00007f03c807f243 in __zio_execute (zio=0x7f0350003de0) at ../../module/zfs/zio.c:1399
/openzfs#457 zio_nowait (zio=0x7f0350003de0) at ../../module/zfs/zio.c:1456
/openzfs#458 0x00007f03c805f71b in vdev_mirror_io_start (zio=0x7f0350003a10) at ../../module/zfs/vdev_mirror.c:374
/openzfs#459 0x00007f03c807f243 in __zio_execute (zio=0x7f0350003a10) at ../../module/zfs/zio.c:1399
/openzfs#460 zio_nowait (zio=0x7f0350003a10) at ../../module/zfs/zio.c:1456
/openzfs#461 0x00007f03c806464c in vdev_raidz_io_start (zio=0x7f0350003380) at ../../module/zfs/vdev_raidz.c:1607
/openzfs#462 0x00007f03c807f243 in __zio_execute (zio=0x7f0350003380) at ../../module/zfs/zio.c:1399
/openzfs#463 zio_nowait (zio=0x7f0350003380) at ../../module/zfs/zio.c:1456
/openzfs#464 0x00007f03c805f71b in vdev_mirror_io_start (zio=0x7f0350002fb0) at ../../module/zfs/vdev_mirror.c:374
/openzfs#465 0x00007f03c807f243 in __zio_execute (zio=0x7f0350002fb0) at ../../module/zfs/zio.c:1399
/openzfs#466 zio_nowait (zio=0x7f0350002fb0) at ../../module/zfs/zio.c:1456
/openzfs#467 0x00007f03c805ed43 in vdev_mirror_io_done (zio=0x7f033957ebf0) at ../../module/zfs/vdev_mirror.c:499
/openzfs#468 0x00007f03c807a0c0 in zio_vdev_io_done (zio=0x7f033957ebf0) at ../../module/zfs/zio.c:2707
/openzfs#469 0x00007f03c808285b in __zio_execute (zio=0x7f033957ebf0) at ../../module/zfs/zio.c:1399
/openzfs#470 zio_notify_parent (wait=ZIO_WAIT_DONE, zio=0x7f0390001330, pio=0x7f033957ebf0) at ../../module/zfs/zio.c:547
/openzfs#471 zio_done (zio=0x7f0390001330) at ../../module/zfs/zio.c:3278
/openzfs#472 0x00007f03c808285b in __zio_execute (zio=0x7f0390001330) at ../../module/zfs/zio.c:1399
/openzfs#473 zio_notify_parent (wait=ZIO_WAIT_DONE, zio=0x7f03b4013a00, pio=0x7f0390001330) at ../../module/zfs/zio.c:547
/openzfs#474 zio_done (zio=0x7f03b4013a00) at ../../module/zfs/zio.c:3278
/openzfs#475 0x00007f03c808285b in __zio_execute (zio=0x7f03b4013a00) at ../../module/zfs/zio.c:1399
/openzfs#476 zio_notify_parent (wait=ZIO_WAIT_DONE, zio=0x7f03b4014210, pio=0x7f03b4013a00) at ../../module/zfs/zio.c:547
/openzfs#477 zio_done (zio=0x7f03b4014210) at ../../module/zfs/zio.c:3278
/openzfs#478 0x00007f03c808285b in __zio_execute (zio=0x7f03b4014210) at ../../module/zfs/zio.c:1399
/openzfs#479 zio_notify_parent (wait=ZIO_WAIT_DONE, zio=0x7f03b4014620, pio=0x7f03b4014210) at ../../module/zfs/zio.c:547
/openzfs#480 zio_done (zio=0x7f03b4014620) at ../../module/zfs/zio.c:3278
/openzfs#481 0x00007f03c807a6d3 in __zio_execute (zio=0x7f03b4014620) at ../../module/zfs/zio.c:1399
/openzfs#482 zio_execute (zio=zio@entry=0x7f03b4014620) at ../../module/zfs/zio.c:1337
/openzfs#483 0x00007f03c8060b35 in vdev_queue_io_to_issue (vq=vq@entry=0x99f8a8) at ../../module/zfs/vdev_queue.c:706
/openzfs#484 0x00007f03c806106e in vdev_queue_io (zio=zio@entry=0x7f0350002be0) at ../../module/zfs/vdev_queue.c:747
/openzfs#485 0x00007f03c80818c1 in zio_vdev_io_start (zio=0x7f0350002be0) at ../../module/zfs/zio.c:2659
/openzfs#486 0x00007f03c807f243 in __zio_execute (zio=0x7f0350002be0) at ../../module/zfs/zio.c:1399
/openzfs#487 zio_nowait (zio=0x7f0350002be0) at ../../module/zfs/zio.c:1456
/openzfs#488 0x00007f03c805f71b in vdev_mirror_io_start (zio=0x7f0350002810) at ../../module/zfs/vdev_mirror.c:374
/openzfs#489 0x00007f03c807f243 in __zio_execute (zio=0x7f0350002810) at ../../module/zfs/zio.c:1399
/openzfs#490 zio_nowait (zio=0x7f0350002810) at ../../module/zfs/zio.c:1456
/openzfs#491 0x00007f03c8064593 in vdev_raidz_io_start (zio=0x7f0350001270) at ../../module/zfs/vdev_raidz.c:1591
/openzfs#492 0x00007f03c807f243 in __zio_execute (zio=0x7f0350001270) at ../../module/zfs/zio.c:1399
/openzfs#493 zio_nowait (zio=0x7f0350001270) at ../../module/zfs/zio.c:1456
/openzfs#494 0x00007f03c805f71b in vdev_mirror_io_start (zio=0x7f0350001e60) at ../../module/zfs/vdev_mirror.c:374
/openzfs#495 0x00007f03c807f243 in __zio_execute (zio=0x7f0350001e60) at ../../module/zfs/zio.c:1399
/openzfs#496 zio_nowait (zio=0x7f0350001e60) at ../../module/zfs/zio.c:1456
/openzfs#497 0x00007f03c805ed43 in vdev_mirror_io_done (zio=0x7f033a0c39c0) at ../../module/zfs/vdev_mirror.c:499
/openzfs#498 0x00007f03c807a0c0 in zio_vdev_io_done (zio=0x7f033a0c39c0) at ../../module/zfs/zio.c:2707
/openzfs#499 0x00007f03c808285b in __zio_execute (zio=0x7f033a0c39c0) at ../../module/zfs/zio.c:1399
/openzfs#500 zio_notify_parent (wait=ZIO_WAIT_DONE, zio=0x7f03a8003c00, pio=0x7f033a0c39c0) at ../../module/zfs/zio.c:547
/openzfs#501 zio_done (zio=0x7f03a8003c00) at ../../module/zfs/zio.c:3278
/openzfs#502 0x00007f03c808285b in __zio_execute (zio=0x7f03a8003c00) at ../../module/zfs/zio.c:1399
/openzfs#503 zio_notify_parent (wait=ZIO_WAIT_DONE, zio=0x7f038800c400, pio=0x7f03a8003c00) at ../../module/zfs/zio.c:547
/openzfs#504 zio_done (zio=0x7f038800c400) at ../../module/zfs/zio.c:3278
/openzfs#505 0x00007f03c808285b in __zio_execute (zio=0x7f038800c400) at ../../module/zfs/zio.c:1399
/openzfs#506 zio_notify_parent (wait=ZIO_WAIT_DONE, zio=0x7f038800da00, pio=0x7f038800c400) at ../../module/zfs/zio.c:547
/openzfs#507 zio_done (zio=0x7f038800da00) at ../../module/zfs/zio.c:3278
/openzfs#508 0x00007f03c808285b in __zio_execute (zio=0x7f038800da00) at ../../module/zfs/zio.c:1399
/openzfs#509 zio_notify_parent (wait=ZIO_WAIT_DONE, zio=0x7f038800fd80, pio=0x7f038800da00) at ../../module/zfs/zio.c:547
/openzfs#510 zio_done (zio=0x7f038800fd80) at ../../module/zfs/zio.c:3278
/openzfs#511 0x00007f03c807a6d3 in __zio_execute (zio=0x7f038800fd80) at ../../module/zfs/zio.c:1399
/openzfs#512 zio_execute (zio=zio@entry=0x7f038800fd80) at ../../module/zfs/zio.c:1337
/openzfs#513 0x00007f03c8060b35 in vdev_queue_io_to_issue (vq=vq@entry=0x99f8a8) at ../../module/zfs/vdev_queue.c:706
/openzfs#514 0x00007f03c806119d in vdev_queue_io_done (zio=zio@entry=0x7f03a0010950) at ../../module/zfs/vdev_queue.c:775
/openzfs#515 0x00007f03c807a0e8 in zio_vdev_io_done (zio=0x7f03a0010950) at ../../module/zfs/zio.c:2686
/openzfs#516 0x00007f03c807a6d3 in __zio_execute (zio=0x7f03a0010950) at ../../module/zfs/zio.c:1399
/openzfs#517 zio_execute (zio=0x7f03a0010950) at ../../module/zfs/zio.c:1337
/openzfs#518 0x00007f03c7fcd0c4 in taskq_thread (arg=0x966d50) at ../../lib/libzpool/taskq.c:215
/openzfs#519 0x00007f03c7fc7937 in zk_thread_helper (arg=0x967e90) at ../../lib/libzpool/kernel.c:135
/openzfs#520 0x00007f03c78890a3 in start_thread (arg=0x7f03c2703700) at pthread_create.c:309
/openzfs#521 0x00007f03c75c50fd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111

This occurred when ztest was simulating a scrub under heavy I/O load. Under
those circumstances, it was possible for a mix of noop I/Os for aggregation
continuity and the I/O elevator to generate arbitrarily deep recursion.

This patch modifies ZFS to propapage a recursion counter inside the zio_t
objects such that IOs will be redispatched upon reaching a given recursion
depth.  We can detect long call chains and dispatch to another ZIO taskq. We
cut in-line when we do this to minimize the potential for taskq exhaustion that
can prevent a zio from notifying its parent.

Signed-off-by: Richard Yao <[email protected]>
ryao added a commit to ryao/zfs that referenced this issue Oct 10, 2014
The below excerpt of a backtrace is from a ztest failure when running ZoL's
ztest:

/openzfs#453 0x00007f03c8060b35 in vdev_queue_io_to_issue (vq=vq@entry=0x99f8a8) at ../../module/zfs/vdev_queue.c:706
/openzfs#454 0x00007f03c806106e in vdev_queue_io (zio=zio@entry=0x7f0350003de0) at ../../module/zfs/vdev_queue.c:747
/openzfs#455 0x00007f03c80818c1 in zio_vdev_io_start (zio=0x7f0350003de0) at ../../module/zfs/zio.c:2659
/openzfs#456 0x00007f03c807f243 in __zio_execute (zio=0x7f0350003de0) at ../../module/zfs/zio.c:1399
/openzfs#457 zio_nowait (zio=0x7f0350003de0) at ../../module/zfs/zio.c:1456
/openzfs#458 0x00007f03c805f71b in vdev_mirror_io_start (zio=0x7f0350003a10) at ../../module/zfs/vdev_mirror.c:374
/openzfs#459 0x00007f03c807f243 in __zio_execute (zio=0x7f0350003a10) at ../../module/zfs/zio.c:1399
/openzfs#460 zio_nowait (zio=0x7f0350003a10) at ../../module/zfs/zio.c:1456
/openzfs#461 0x00007f03c806464c in vdev_raidz_io_start (zio=0x7f0350003380) at ../../module/zfs/vdev_raidz.c:1607
/openzfs#462 0x00007f03c807f243 in __zio_execute (zio=0x7f0350003380) at ../../module/zfs/zio.c:1399
/openzfs#463 zio_nowait (zio=0x7f0350003380) at ../../module/zfs/zio.c:1456
/openzfs#464 0x00007f03c805f71b in vdev_mirror_io_start (zio=0x7f0350002fb0) at ../../module/zfs/vdev_mirror.c:374
/openzfs#465 0x00007f03c807f243 in __zio_execute (zio=0x7f0350002fb0) at ../../module/zfs/zio.c:1399
/openzfs#466 zio_nowait (zio=0x7f0350002fb0) at ../../module/zfs/zio.c:1456
/openzfs#467 0x00007f03c805ed43 in vdev_mirror_io_done (zio=0x7f033957ebf0) at ../../module/zfs/vdev_mirror.c:499
/openzfs#468 0x00007f03c807a0c0 in zio_vdev_io_done (zio=0x7f033957ebf0) at ../../module/zfs/zio.c:2707
/openzfs#469 0x00007f03c808285b in __zio_execute (zio=0x7f033957ebf0) at ../../module/zfs/zio.c:1399
/openzfs#470 zio_notify_parent (wait=ZIO_WAIT_DONE, zio=0x7f0390001330, pio=0x7f033957ebf0) at ../../module/zfs/zio.c:547
/openzfs#471 zio_done (zio=0x7f0390001330) at ../../module/zfs/zio.c:3278
/openzfs#472 0x00007f03c808285b in __zio_execute (zio=0x7f0390001330) at ../../module/zfs/zio.c:1399
/openzfs#473 zio_notify_parent (wait=ZIO_WAIT_DONE, zio=0x7f03b4013a00, pio=0x7f0390001330) at ../../module/zfs/zio.c:547
/openzfs#474 zio_done (zio=0x7f03b4013a00) at ../../module/zfs/zio.c:3278
/openzfs#475 0x00007f03c808285b in __zio_execute (zio=0x7f03b4013a00) at ../../module/zfs/zio.c:1399
/openzfs#476 zio_notify_parent (wait=ZIO_WAIT_DONE, zio=0x7f03b4014210, pio=0x7f03b4013a00) at ../../module/zfs/zio.c:547
/openzfs#477 zio_done (zio=0x7f03b4014210) at ../../module/zfs/zio.c:3278
/openzfs#478 0x00007f03c808285b in __zio_execute (zio=0x7f03b4014210) at ../../module/zfs/zio.c:1399
/openzfs#479 zio_notify_parent (wait=ZIO_WAIT_DONE, zio=0x7f03b4014620, pio=0x7f03b4014210) at ../../module/zfs/zio.c:547
/openzfs#480 zio_done (zio=0x7f03b4014620) at ../../module/zfs/zio.c:3278
/openzfs#481 0x00007f03c807a6d3 in __zio_execute (zio=0x7f03b4014620) at ../../module/zfs/zio.c:1399
/openzfs#482 zio_execute (zio=zio@entry=0x7f03b4014620) at ../../module/zfs/zio.c:1337
/openzfs#483 0x00007f03c8060b35 in vdev_queue_io_to_issue (vq=vq@entry=0x99f8a8) at ../../module/zfs/vdev_queue.c:706
/openzfs#484 0x00007f03c806106e in vdev_queue_io (zio=zio@entry=0x7f0350002be0) at ../../module/zfs/vdev_queue.c:747
/openzfs#485 0x00007f03c80818c1 in zio_vdev_io_start (zio=0x7f0350002be0) at ../../module/zfs/zio.c:2659
/openzfs#486 0x00007f03c807f243 in __zio_execute (zio=0x7f0350002be0) at ../../module/zfs/zio.c:1399
/openzfs#487 zio_nowait (zio=0x7f0350002be0) at ../../module/zfs/zio.c:1456
/openzfs#488 0x00007f03c805f71b in vdev_mirror_io_start (zio=0x7f0350002810) at ../../module/zfs/vdev_mirror.c:374
/openzfs#489 0x00007f03c807f243 in __zio_execute (zio=0x7f0350002810) at ../../module/zfs/zio.c:1399
/openzfs#490 zio_nowait (zio=0x7f0350002810) at ../../module/zfs/zio.c:1456
/openzfs#491 0x00007f03c8064593 in vdev_raidz_io_start (zio=0x7f0350001270) at ../../module/zfs/vdev_raidz.c:1591
/openzfs#492 0x00007f03c807f243 in __zio_execute (zio=0x7f0350001270) at ../../module/zfs/zio.c:1399
/openzfs#493 zio_nowait (zio=0x7f0350001270) at ../../module/zfs/zio.c:1456
/openzfs#494 0x00007f03c805f71b in vdev_mirror_io_start (zio=0x7f0350001e60) at ../../module/zfs/vdev_mirror.c:374
/openzfs#495 0x00007f03c807f243 in __zio_execute (zio=0x7f0350001e60) at ../../module/zfs/zio.c:1399
/openzfs#496 zio_nowait (zio=0x7f0350001e60) at ../../module/zfs/zio.c:1456
/openzfs#497 0x00007f03c805ed43 in vdev_mirror_io_done (zio=0x7f033a0c39c0) at ../../module/zfs/vdev_mirror.c:499
/openzfs#498 0x00007f03c807a0c0 in zio_vdev_io_done (zio=0x7f033a0c39c0) at ../../module/zfs/zio.c:2707
/openzfs#499 0x00007f03c808285b in __zio_execute (zio=0x7f033a0c39c0) at ../../module/zfs/zio.c:1399
/openzfs#500 zio_notify_parent (wait=ZIO_WAIT_DONE, zio=0x7f03a8003c00, pio=0x7f033a0c39c0) at ../../module/zfs/zio.c:547
/openzfs#501 zio_done (zio=0x7f03a8003c00) at ../../module/zfs/zio.c:3278
/openzfs#502 0x00007f03c808285b in __zio_execute (zio=0x7f03a8003c00) at ../../module/zfs/zio.c:1399
/openzfs#503 zio_notify_parent (wait=ZIO_WAIT_DONE, zio=0x7f038800c400, pio=0x7f03a8003c00) at ../../module/zfs/zio.c:547
/openzfs#504 zio_done (zio=0x7f038800c400) at ../../module/zfs/zio.c:3278
/openzfs#505 0x00007f03c808285b in __zio_execute (zio=0x7f038800c400) at ../../module/zfs/zio.c:1399
/openzfs#506 zio_notify_parent (wait=ZIO_WAIT_DONE, zio=0x7f038800da00, pio=0x7f038800c400) at ../../module/zfs/zio.c:547
/openzfs#507 zio_done (zio=0x7f038800da00) at ../../module/zfs/zio.c:3278
/openzfs#508 0x00007f03c808285b in __zio_execute (zio=0x7f038800da00) at ../../module/zfs/zio.c:1399
/openzfs#509 zio_notify_parent (wait=ZIO_WAIT_DONE, zio=0x7f038800fd80, pio=0x7f038800da00) at ../../module/zfs/zio.c:547
/openzfs#510 zio_done (zio=0x7f038800fd80) at ../../module/zfs/zio.c:3278
/openzfs#511 0x00007f03c807a6d3 in __zio_execute (zio=0x7f038800fd80) at ../../module/zfs/zio.c:1399
/openzfs#512 zio_execute (zio=zio@entry=0x7f038800fd80) at ../../module/zfs/zio.c:1337
/openzfs#513 0x00007f03c8060b35 in vdev_queue_io_to_issue (vq=vq@entry=0x99f8a8) at ../../module/zfs/vdev_queue.c:706
/openzfs#514 0x00007f03c806119d in vdev_queue_io_done (zio=zio@entry=0x7f03a0010950) at ../../module/zfs/vdev_queue.c:775
/openzfs#515 0x00007f03c807a0e8 in zio_vdev_io_done (zio=0x7f03a0010950) at ../../module/zfs/zio.c:2686
/openzfs#516 0x00007f03c807a6d3 in __zio_execute (zio=0x7f03a0010950) at ../../module/zfs/zio.c:1399
/openzfs#517 zio_execute (zio=0x7f03a0010950) at ../../module/zfs/zio.c:1337
/openzfs#518 0x00007f03c7fcd0c4 in taskq_thread (arg=0x966d50) at ../../lib/libzpool/taskq.c:215
/openzfs#519 0x00007f03c7fc7937 in zk_thread_helper (arg=0x967e90) at ../../lib/libzpool/kernel.c:135
/openzfs#520 0x00007f03c78890a3 in start_thread (arg=0x7f03c2703700) at pthread_create.c:309
/openzfs#521 0x00007f03c75c50fd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111

This occurred when ztest was simulating a scrub under heavy I/O load. Under
those circumstances, it was possible for a mix of noop I/Os for aggregation
continuity and the I/O elevator to generate arbitrarily deep recursion.

This patch modifies ZFS to propapage a recursion counter inside the zio_t
objects such that IOs will be redispatched upon reaching a given recursion
depth.  We can detect long call chains and dispatch to another ZIO taskq. We
cut in-line when we do this to minimize the potential for taskq exhaustion that
can prevent a zio from notifying its parent.

Signed-off-by: Richard Yao <[email protected]>
ryao added a commit to ryao/zfs that referenced this issue Oct 10, 2014
The below excerpt of a backtrace is from a ztest failure when running ZoL's
ztest:

/openzfs#453 0x00007f03c8060b35 in vdev_queue_io_to_issue (vq=vq@entry=0x99f8a8) at ../../module/zfs/vdev_queue.c:706
/openzfs#454 0x00007f03c806106e in vdev_queue_io (zio=zio@entry=0x7f0350003de0) at ../../module/zfs/vdev_queue.c:747
/openzfs#455 0x00007f03c80818c1 in zio_vdev_io_start (zio=0x7f0350003de0) at ../../module/zfs/zio.c:2659
/openzfs#456 0x00007f03c807f243 in __zio_execute (zio=0x7f0350003de0) at ../../module/zfs/zio.c:1399
/openzfs#457 zio_nowait (zio=0x7f0350003de0) at ../../module/zfs/zio.c:1456
/openzfs#458 0x00007f03c805f71b in vdev_mirror_io_start (zio=0x7f0350003a10) at ../../module/zfs/vdev_mirror.c:374
/openzfs#459 0x00007f03c807f243 in __zio_execute (zio=0x7f0350003a10) at ../../module/zfs/zio.c:1399
/openzfs#460 zio_nowait (zio=0x7f0350003a10) at ../../module/zfs/zio.c:1456
/openzfs#461 0x00007f03c806464c in vdev_raidz_io_start (zio=0x7f0350003380) at ../../module/zfs/vdev_raidz.c:1607
/openzfs#462 0x00007f03c807f243 in __zio_execute (zio=0x7f0350003380) at ../../module/zfs/zio.c:1399
/openzfs#463 zio_nowait (zio=0x7f0350003380) at ../../module/zfs/zio.c:1456
/openzfs#464 0x00007f03c805f71b in vdev_mirror_io_start (zio=0x7f0350002fb0) at ../../module/zfs/vdev_mirror.c:374
/openzfs#465 0x00007f03c807f243 in __zio_execute (zio=0x7f0350002fb0) at ../../module/zfs/zio.c:1399
/openzfs#466 zio_nowait (zio=0x7f0350002fb0) at ../../module/zfs/zio.c:1456
/openzfs#467 0x00007f03c805ed43 in vdev_mirror_io_done (zio=0x7f033957ebf0) at ../../module/zfs/vdev_mirror.c:499
/openzfs#468 0x00007f03c807a0c0 in zio_vdev_io_done (zio=0x7f033957ebf0) at ../../module/zfs/zio.c:2707
/openzfs#469 0x00007f03c808285b in __zio_execute (zio=0x7f033957ebf0) at ../../module/zfs/zio.c:1399
/openzfs#470 zio_notify_parent (wait=ZIO_WAIT_DONE, zio=0x7f0390001330, pio=0x7f033957ebf0) at ../../module/zfs/zio.c:547
/openzfs#471 zio_done (zio=0x7f0390001330) at ../../module/zfs/zio.c:3278
/openzfs#472 0x00007f03c808285b in __zio_execute (zio=0x7f0390001330) at ../../module/zfs/zio.c:1399
/openzfs#473 zio_notify_parent (wait=ZIO_WAIT_DONE, zio=0x7f03b4013a00, pio=0x7f0390001330) at ../../module/zfs/zio.c:547
/openzfs#474 zio_done (zio=0x7f03b4013a00) at ../../module/zfs/zio.c:3278
/openzfs#475 0x00007f03c808285b in __zio_execute (zio=0x7f03b4013a00) at ../../module/zfs/zio.c:1399
/openzfs#476 zio_notify_parent (wait=ZIO_WAIT_DONE, zio=0x7f03b4014210, pio=0x7f03b4013a00) at ../../module/zfs/zio.c:547
/openzfs#477 zio_done (zio=0x7f03b4014210) at ../../module/zfs/zio.c:3278
/openzfs#478 0x00007f03c808285b in __zio_execute (zio=0x7f03b4014210) at ../../module/zfs/zio.c:1399
/openzfs#479 zio_notify_parent (wait=ZIO_WAIT_DONE, zio=0x7f03b4014620, pio=0x7f03b4014210) at ../../module/zfs/zio.c:547
/openzfs#480 zio_done (zio=0x7f03b4014620) at ../../module/zfs/zio.c:3278
/openzfs#481 0x00007f03c807a6d3 in __zio_execute (zio=0x7f03b4014620) at ../../module/zfs/zio.c:1399
/openzfs#482 zio_execute (zio=zio@entry=0x7f03b4014620) at ../../module/zfs/zio.c:1337
/openzfs#483 0x00007f03c8060b35 in vdev_queue_io_to_issue (vq=vq@entry=0x99f8a8) at ../../module/zfs/vdev_queue.c:706
/openzfs#484 0x00007f03c806106e in vdev_queue_io (zio=zio@entry=0x7f0350002be0) at ../../module/zfs/vdev_queue.c:747
/openzfs#485 0x00007f03c80818c1 in zio_vdev_io_start (zio=0x7f0350002be0) at ../../module/zfs/zio.c:2659
/openzfs#486 0x00007f03c807f243 in __zio_execute (zio=0x7f0350002be0) at ../../module/zfs/zio.c:1399
/openzfs#487 zio_nowait (zio=0x7f0350002be0) at ../../module/zfs/zio.c:1456
/openzfs#488 0x00007f03c805f71b in vdev_mirror_io_start (zio=0x7f0350002810) at ../../module/zfs/vdev_mirror.c:374
/openzfs#489 0x00007f03c807f243 in __zio_execute (zio=0x7f0350002810) at ../../module/zfs/zio.c:1399
/openzfs#490 zio_nowait (zio=0x7f0350002810) at ../../module/zfs/zio.c:1456
/openzfs#491 0x00007f03c8064593 in vdev_raidz_io_start (zio=0x7f0350001270) at ../../module/zfs/vdev_raidz.c:1591
/openzfs#492 0x00007f03c807f243 in __zio_execute (zio=0x7f0350001270) at ../../module/zfs/zio.c:1399
/openzfs#493 zio_nowait (zio=0x7f0350001270) at ../../module/zfs/zio.c:1456
/openzfs#494 0x00007f03c805f71b in vdev_mirror_io_start (zio=0x7f0350001e60) at ../../module/zfs/vdev_mirror.c:374
/openzfs#495 0x00007f03c807f243 in __zio_execute (zio=0x7f0350001e60) at ../../module/zfs/zio.c:1399
/openzfs#496 zio_nowait (zio=0x7f0350001e60) at ../../module/zfs/zio.c:1456
/openzfs#497 0x00007f03c805ed43 in vdev_mirror_io_done (zio=0x7f033a0c39c0) at ../../module/zfs/vdev_mirror.c:499
/openzfs#498 0x00007f03c807a0c0 in zio_vdev_io_done (zio=0x7f033a0c39c0) at ../../module/zfs/zio.c:2707
/openzfs#499 0x00007f03c808285b in __zio_execute (zio=0x7f033a0c39c0) at ../../module/zfs/zio.c:1399
/openzfs#500 zio_notify_parent (wait=ZIO_WAIT_DONE, zio=0x7f03a8003c00, pio=0x7f033a0c39c0) at ../../module/zfs/zio.c:547
/openzfs#501 zio_done (zio=0x7f03a8003c00) at ../../module/zfs/zio.c:3278
/openzfs#502 0x00007f03c808285b in __zio_execute (zio=0x7f03a8003c00) at ../../module/zfs/zio.c:1399
/openzfs#503 zio_notify_parent (wait=ZIO_WAIT_DONE, zio=0x7f038800c400, pio=0x7f03a8003c00) at ../../module/zfs/zio.c:547
/openzfs#504 zio_done (zio=0x7f038800c400) at ../../module/zfs/zio.c:3278
/openzfs#505 0x00007f03c808285b in __zio_execute (zio=0x7f038800c400) at ../../module/zfs/zio.c:1399
/openzfs#506 zio_notify_parent (wait=ZIO_WAIT_DONE, zio=0x7f038800da00, pio=0x7f038800c400) at ../../module/zfs/zio.c:547
/openzfs#507 zio_done (zio=0x7f038800da00) at ../../module/zfs/zio.c:3278
/openzfs#508 0x00007f03c808285b in __zio_execute (zio=0x7f038800da00) at ../../module/zfs/zio.c:1399
/openzfs#509 zio_notify_parent (wait=ZIO_WAIT_DONE, zio=0x7f038800fd80, pio=0x7f038800da00) at ../../module/zfs/zio.c:547
/openzfs#510 zio_done (zio=0x7f038800fd80) at ../../module/zfs/zio.c:3278
/openzfs#511 0x00007f03c807a6d3 in __zio_execute (zio=0x7f038800fd80) at ../../module/zfs/zio.c:1399
/openzfs#512 zio_execute (zio=zio@entry=0x7f038800fd80) at ../../module/zfs/zio.c:1337
/openzfs#513 0x00007f03c8060b35 in vdev_queue_io_to_issue (vq=vq@entry=0x99f8a8) at ../../module/zfs/vdev_queue.c:706
/openzfs#514 0x00007f03c806119d in vdev_queue_io_done (zio=zio@entry=0x7f03a0010950) at ../../module/zfs/vdev_queue.c:775
/openzfs#515 0x00007f03c807a0e8 in zio_vdev_io_done (zio=0x7f03a0010950) at ../../module/zfs/zio.c:2686
/openzfs#516 0x00007f03c807a6d3 in __zio_execute (zio=0x7f03a0010950) at ../../module/zfs/zio.c:1399
/openzfs#517 zio_execute (zio=0x7f03a0010950) at ../../module/zfs/zio.c:1337
/openzfs#518 0x00007f03c7fcd0c4 in taskq_thread (arg=0x966d50) at ../../lib/libzpool/taskq.c:215
/openzfs#519 0x00007f03c7fc7937 in zk_thread_helper (arg=0x967e90) at ../../lib/libzpool/kernel.c:135
/openzfs#520 0x00007f03c78890a3 in start_thread (arg=0x7f03c2703700) at pthread_create.c:309
/openzfs#521 0x00007f03c75c50fd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111

This occurred when ztest was simulating a scrub under heavy I/O load. Under
those circumstances, it was possible for a mix of noop I/Os for aggregation
continuity and the I/O elevator to generate arbitrarily deep recursion.

This patch modifies ZFS to propapage a recursion counter inside the zio_t
objects such that IOs will be redispatched upon reaching a given recursion
depth.  We can detect long call chains and dispatch to another ZIO taskq. We
cut in-line when we do this to minimize the potential for taskq exhaustion that
can prevent a zio from notifying its parent.

Signed-off-by: Richard Yao <[email protected]>
ryao added a commit to ryao/zfs that referenced this issue Oct 11, 2014
The below excerpt of a backtrace is from a ztest failure when running ZoL's
ztest:

/openzfs#453 0x00007f03c8060b35 in vdev_queue_io_to_issue (vq=vq@entry=0x99f8a8) at ../../module/zfs/vdev_queue.c:706
/openzfs#454 0x00007f03c806106e in vdev_queue_io (zio=zio@entry=0x7f0350003de0) at ../../module/zfs/vdev_queue.c:747
/openzfs#455 0x00007f03c80818c1 in zio_vdev_io_start (zio=0x7f0350003de0) at ../../module/zfs/zio.c:2659
/openzfs#456 0x00007f03c807f243 in __zio_execute (zio=0x7f0350003de0) at ../../module/zfs/zio.c:1399
/openzfs#457 zio_nowait (zio=0x7f0350003de0) at ../../module/zfs/zio.c:1456
/openzfs#458 0x00007f03c805f71b in vdev_mirror_io_start (zio=0x7f0350003a10) at ../../module/zfs/vdev_mirror.c:374
/openzfs#459 0x00007f03c807f243 in __zio_execute (zio=0x7f0350003a10) at ../../module/zfs/zio.c:1399
/openzfs#460 zio_nowait (zio=0x7f0350003a10) at ../../module/zfs/zio.c:1456
/openzfs#461 0x00007f03c806464c in vdev_raidz_io_start (zio=0x7f0350003380) at ../../module/zfs/vdev_raidz.c:1607
/openzfs#462 0x00007f03c807f243 in __zio_execute (zio=0x7f0350003380) at ../../module/zfs/zio.c:1399
/openzfs#463 zio_nowait (zio=0x7f0350003380) at ../../module/zfs/zio.c:1456
/openzfs#464 0x00007f03c805f71b in vdev_mirror_io_start (zio=0x7f0350002fb0) at ../../module/zfs/vdev_mirror.c:374
/openzfs#465 0x00007f03c807f243 in __zio_execute (zio=0x7f0350002fb0) at ../../module/zfs/zio.c:1399
/openzfs#466 zio_nowait (zio=0x7f0350002fb0) at ../../module/zfs/zio.c:1456
/openzfs#467 0x00007f03c805ed43 in vdev_mirror_io_done (zio=0x7f033957ebf0) at ../../module/zfs/vdev_mirror.c:499
/openzfs#468 0x00007f03c807a0c0 in zio_vdev_io_done (zio=0x7f033957ebf0) at ../../module/zfs/zio.c:2707
/openzfs#469 0x00007f03c808285b in __zio_execute (zio=0x7f033957ebf0) at ../../module/zfs/zio.c:1399
/openzfs#470 zio_notify_parent (wait=ZIO_WAIT_DONE, zio=0x7f0390001330, pio=0x7f033957ebf0) at ../../module/zfs/zio.c:547
/openzfs#471 zio_done (zio=0x7f0390001330) at ../../module/zfs/zio.c:3278
/openzfs#472 0x00007f03c808285b in __zio_execute (zio=0x7f0390001330) at ../../module/zfs/zio.c:1399
/openzfs#473 zio_notify_parent (wait=ZIO_WAIT_DONE, zio=0x7f03b4013a00, pio=0x7f0390001330) at ../../module/zfs/zio.c:547
/openzfs#474 zio_done (zio=0x7f03b4013a00) at ../../module/zfs/zio.c:3278
/openzfs#475 0x00007f03c808285b in __zio_execute (zio=0x7f03b4013a00) at ../../module/zfs/zio.c:1399
/openzfs#476 zio_notify_parent (wait=ZIO_WAIT_DONE, zio=0x7f03b4014210, pio=0x7f03b4013a00) at ../../module/zfs/zio.c:547
/openzfs#477 zio_done (zio=0x7f03b4014210) at ../../module/zfs/zio.c:3278
/openzfs#478 0x00007f03c808285b in __zio_execute (zio=0x7f03b4014210) at ../../module/zfs/zio.c:1399
/openzfs#479 zio_notify_parent (wait=ZIO_WAIT_DONE, zio=0x7f03b4014620, pio=0x7f03b4014210) at ../../module/zfs/zio.c:547
/openzfs#480 zio_done (zio=0x7f03b4014620) at ../../module/zfs/zio.c:3278
/openzfs#481 0x00007f03c807a6d3 in __zio_execute (zio=0x7f03b4014620) at ../../module/zfs/zio.c:1399
/openzfs#482 zio_execute (zio=zio@entry=0x7f03b4014620) at ../../module/zfs/zio.c:1337
/openzfs#483 0x00007f03c8060b35 in vdev_queue_io_to_issue (vq=vq@entry=0x99f8a8) at ../../module/zfs/vdev_queue.c:706
/openzfs#484 0x00007f03c806106e in vdev_queue_io (zio=zio@entry=0x7f0350002be0) at ../../module/zfs/vdev_queue.c:747
/openzfs#485 0x00007f03c80818c1 in zio_vdev_io_start (zio=0x7f0350002be0) at ../../module/zfs/zio.c:2659
/openzfs#486 0x00007f03c807f243 in __zio_execute (zio=0x7f0350002be0) at ../../module/zfs/zio.c:1399
/openzfs#487 zio_nowait (zio=0x7f0350002be0) at ../../module/zfs/zio.c:1456
/openzfs#488 0x00007f03c805f71b in vdev_mirror_io_start (zio=0x7f0350002810) at ../../module/zfs/vdev_mirror.c:374
/openzfs#489 0x00007f03c807f243 in __zio_execute (zio=0x7f0350002810) at ../../module/zfs/zio.c:1399
/openzfs#490 zio_nowait (zio=0x7f0350002810) at ../../module/zfs/zio.c:1456
/openzfs#491 0x00007f03c8064593 in vdev_raidz_io_start (zio=0x7f0350001270) at ../../module/zfs/vdev_raidz.c:1591
/openzfs#492 0x00007f03c807f243 in __zio_execute (zio=0x7f0350001270) at ../../module/zfs/zio.c:1399
/openzfs#493 zio_nowait (zio=0x7f0350001270) at ../../module/zfs/zio.c:1456
/openzfs#494 0x00007f03c805f71b in vdev_mirror_io_start (zio=0x7f0350001e60) at ../../module/zfs/vdev_mirror.c:374
/openzfs#495 0x00007f03c807f243 in __zio_execute (zio=0x7f0350001e60) at ../../module/zfs/zio.c:1399
/openzfs#496 zio_nowait (zio=0x7f0350001e60) at ../../module/zfs/zio.c:1456
/openzfs#497 0x00007f03c805ed43 in vdev_mirror_io_done (zio=0x7f033a0c39c0) at ../../module/zfs/vdev_mirror.c:499
/openzfs#498 0x00007f03c807a0c0 in zio_vdev_io_done (zio=0x7f033a0c39c0) at ../../module/zfs/zio.c:2707
/openzfs#499 0x00007f03c808285b in __zio_execute (zio=0x7f033a0c39c0) at ../../module/zfs/zio.c:1399
/openzfs#500 zio_notify_parent (wait=ZIO_WAIT_DONE, zio=0x7f03a8003c00, pio=0x7f033a0c39c0) at ../../module/zfs/zio.c:547
/openzfs#501 zio_done (zio=0x7f03a8003c00) at ../../module/zfs/zio.c:3278
/openzfs#502 0x00007f03c808285b in __zio_execute (zio=0x7f03a8003c00) at ../../module/zfs/zio.c:1399
/openzfs#503 zio_notify_parent (wait=ZIO_WAIT_DONE, zio=0x7f038800c400, pio=0x7f03a8003c00) at ../../module/zfs/zio.c:547
/openzfs#504 zio_done (zio=0x7f038800c400) at ../../module/zfs/zio.c:3278
/openzfs#505 0x00007f03c808285b in __zio_execute (zio=0x7f038800c400) at ../../module/zfs/zio.c:1399
/openzfs#506 zio_notify_parent (wait=ZIO_WAIT_DONE, zio=0x7f038800da00, pio=0x7f038800c400) at ../../module/zfs/zio.c:547
/openzfs#507 zio_done (zio=0x7f038800da00) at ../../module/zfs/zio.c:3278
/openzfs#508 0x00007f03c808285b in __zio_execute (zio=0x7f038800da00) at ../../module/zfs/zio.c:1399
/openzfs#509 zio_notify_parent (wait=ZIO_WAIT_DONE, zio=0x7f038800fd80, pio=0x7f038800da00) at ../../module/zfs/zio.c:547
/openzfs#510 zio_done (zio=0x7f038800fd80) at ../../module/zfs/zio.c:3278
/openzfs#511 0x00007f03c807a6d3 in __zio_execute (zio=0x7f038800fd80) at ../../module/zfs/zio.c:1399
/openzfs#512 zio_execute (zio=zio@entry=0x7f038800fd80) at ../../module/zfs/zio.c:1337
/openzfs#513 0x00007f03c8060b35 in vdev_queue_io_to_issue (vq=vq@entry=0x99f8a8) at ../../module/zfs/vdev_queue.c:706
/openzfs#514 0x00007f03c806119d in vdev_queue_io_done (zio=zio@entry=0x7f03a0010950) at ../../module/zfs/vdev_queue.c:775
/openzfs#515 0x00007f03c807a0e8 in zio_vdev_io_done (zio=0x7f03a0010950) at ../../module/zfs/zio.c:2686
/openzfs#516 0x00007f03c807a6d3 in __zio_execute (zio=0x7f03a0010950) at ../../module/zfs/zio.c:1399
/openzfs#517 zio_execute (zio=0x7f03a0010950) at ../../module/zfs/zio.c:1337
/openzfs#518 0x00007f03c7fcd0c4 in taskq_thread (arg=0x966d50) at ../../lib/libzpool/taskq.c:215
/openzfs#519 0x00007f03c7fc7937 in zk_thread_helper (arg=0x967e90) at ../../lib/libzpool/kernel.c:135
/openzfs#520 0x00007f03c78890a3 in start_thread (arg=0x7f03c2703700) at pthread_create.c:309
/openzfs#521 0x00007f03c75c50fd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111

This occurred when ztest was simulating a scrub under heavy I/O load. Under
those circumstances, it was possible for a mix of noop I/Os for aggregation
continuity and the I/O elevator to generate arbitrarily deep recursion.

This patch modifies ZFS to propapage a recursion counter inside the zio_t
objects such that IOs will be redispatched upon reaching a given recursion
depth.  We can detect long call chains and dispatch to another ZIO taskq. We
cut in-line when we do this to minimize the potential for taskq exhaustion that
can prevent a zio from notifying its parent.

Signed-off-by: Richard Yao <[email protected]>
ryao added a commit to ryao/zfs that referenced this issue Oct 11, 2014
The below excerpt of a backtrace is from a ztest failure when running ZoL's
ztest:

/openzfs#453 0x00007f03c8060b35 in vdev_queue_io_to_issue (vq=vq@entry=0x99f8a8) at ../../module/zfs/vdev_queue.c:706
/openzfs#454 0x00007f03c806106e in vdev_queue_io (zio=zio@entry=0x7f0350003de0) at ../../module/zfs/vdev_queue.c:747
/openzfs#455 0x00007f03c80818c1 in zio_vdev_io_start (zio=0x7f0350003de0) at ../../module/zfs/zio.c:2659
/openzfs#456 0x00007f03c807f243 in __zio_execute (zio=0x7f0350003de0) at ../../module/zfs/zio.c:1399
/openzfs#457 zio_nowait (zio=0x7f0350003de0) at ../../module/zfs/zio.c:1456
/openzfs#458 0x00007f03c805f71b in vdev_mirror_io_start (zio=0x7f0350003a10) at ../../module/zfs/vdev_mirror.c:374
/openzfs#459 0x00007f03c807f243 in __zio_execute (zio=0x7f0350003a10) at ../../module/zfs/zio.c:1399
/openzfs#460 zio_nowait (zio=0x7f0350003a10) at ../../module/zfs/zio.c:1456
/openzfs#461 0x00007f03c806464c in vdev_raidz_io_start (zio=0x7f0350003380) at ../../module/zfs/vdev_raidz.c:1607
/openzfs#462 0x00007f03c807f243 in __zio_execute (zio=0x7f0350003380) at ../../module/zfs/zio.c:1399
/openzfs#463 zio_nowait (zio=0x7f0350003380) at ../../module/zfs/zio.c:1456
/openzfs#464 0x00007f03c805f71b in vdev_mirror_io_start (zio=0x7f0350002fb0) at ../../module/zfs/vdev_mirror.c:374
/openzfs#465 0x00007f03c807f243 in __zio_execute (zio=0x7f0350002fb0) at ../../module/zfs/zio.c:1399
/openzfs#466 zio_nowait (zio=0x7f0350002fb0) at ../../module/zfs/zio.c:1456
/openzfs#467 0x00007f03c805ed43 in vdev_mirror_io_done (zio=0x7f033957ebf0) at ../../module/zfs/vdev_mirror.c:499
/openzfs#468 0x00007f03c807a0c0 in zio_vdev_io_done (zio=0x7f033957ebf0) at ../../module/zfs/zio.c:2707
/openzfs#469 0x00007f03c808285b in __zio_execute (zio=0x7f033957ebf0) at ../../module/zfs/zio.c:1399
/openzfs#470 zio_notify_parent (wait=ZIO_WAIT_DONE, zio=0x7f0390001330, pio=0x7f033957ebf0) at ../../module/zfs/zio.c:547
/openzfs#471 zio_done (zio=0x7f0390001330) at ../../module/zfs/zio.c:3278
/openzfs#472 0x00007f03c808285b in __zio_execute (zio=0x7f0390001330) at ../../module/zfs/zio.c:1399
/openzfs#473 zio_notify_parent (wait=ZIO_WAIT_DONE, zio=0x7f03b4013a00, pio=0x7f0390001330) at ../../module/zfs/zio.c:547
/openzfs#474 zio_done (zio=0x7f03b4013a00) at ../../module/zfs/zio.c:3278
/openzfs#475 0x00007f03c808285b in __zio_execute (zio=0x7f03b4013a00) at ../../module/zfs/zio.c:1399
/openzfs#476 zio_notify_parent (wait=ZIO_WAIT_DONE, zio=0x7f03b4014210, pio=0x7f03b4013a00) at ../../module/zfs/zio.c:547
/openzfs#477 zio_done (zio=0x7f03b4014210) at ../../module/zfs/zio.c:3278
/openzfs#478 0x00007f03c808285b in __zio_execute (zio=0x7f03b4014210) at ../../module/zfs/zio.c:1399
/openzfs#479 zio_notify_parent (wait=ZIO_WAIT_DONE, zio=0x7f03b4014620, pio=0x7f03b4014210) at ../../module/zfs/zio.c:547
/openzfs#480 zio_done (zio=0x7f03b4014620) at ../../module/zfs/zio.c:3278
/openzfs#481 0x00007f03c807a6d3 in __zio_execute (zio=0x7f03b4014620) at ../../module/zfs/zio.c:1399
/openzfs#482 zio_execute (zio=zio@entry=0x7f03b4014620) at ../../module/zfs/zio.c:1337
/openzfs#483 0x00007f03c8060b35 in vdev_queue_io_to_issue (vq=vq@entry=0x99f8a8) at ../../module/zfs/vdev_queue.c:706
/openzfs#484 0x00007f03c806106e in vdev_queue_io (zio=zio@entry=0x7f0350002be0) at ../../module/zfs/vdev_queue.c:747
/openzfs#485 0x00007f03c80818c1 in zio_vdev_io_start (zio=0x7f0350002be0) at ../../module/zfs/zio.c:2659
/openzfs#486 0x00007f03c807f243 in __zio_execute (zio=0x7f0350002be0) at ../../module/zfs/zio.c:1399
/openzfs#487 zio_nowait (zio=0x7f0350002be0) at ../../module/zfs/zio.c:1456
/openzfs#488 0x00007f03c805f71b in vdev_mirror_io_start (zio=0x7f0350002810) at ../../module/zfs/vdev_mirror.c:374
/openzfs#489 0x00007f03c807f243 in __zio_execute (zio=0x7f0350002810) at ../../module/zfs/zio.c:1399
/openzfs#490 zio_nowait (zio=0x7f0350002810) at ../../module/zfs/zio.c:1456
/openzfs#491 0x00007f03c8064593 in vdev_raidz_io_start (zio=0x7f0350001270) at ../../module/zfs/vdev_raidz.c:1591
/openzfs#492 0x00007f03c807f243 in __zio_execute (zio=0x7f0350001270) at ../../module/zfs/zio.c:1399
/openzfs#493 zio_nowait (zio=0x7f0350001270) at ../../module/zfs/zio.c:1456
/openzfs#494 0x00007f03c805f71b in vdev_mirror_io_start (zio=0x7f0350001e60) at ../../module/zfs/vdev_mirror.c:374
/openzfs#495 0x00007f03c807f243 in __zio_execute (zio=0x7f0350001e60) at ../../module/zfs/zio.c:1399
/openzfs#496 zio_nowait (zio=0x7f0350001e60) at ../../module/zfs/zio.c:1456
/openzfs#497 0x00007f03c805ed43 in vdev_mirror_io_done (zio=0x7f033a0c39c0) at ../../module/zfs/vdev_mirror.c:499
/openzfs#498 0x00007f03c807a0c0 in zio_vdev_io_done (zio=0x7f033a0c39c0) at ../../module/zfs/zio.c:2707
/openzfs#499 0x00007f03c808285b in __zio_execute (zio=0x7f033a0c39c0) at ../../module/zfs/zio.c:1399
/openzfs#500 zio_notify_parent (wait=ZIO_WAIT_DONE, zio=0x7f03a8003c00, pio=0x7f033a0c39c0) at ../../module/zfs/zio.c:547
/openzfs#501 zio_done (zio=0x7f03a8003c00) at ../../module/zfs/zio.c:3278
/openzfs#502 0x00007f03c808285b in __zio_execute (zio=0x7f03a8003c00) at ../../module/zfs/zio.c:1399
/openzfs#503 zio_notify_parent (wait=ZIO_WAIT_DONE, zio=0x7f038800c400, pio=0x7f03a8003c00) at ../../module/zfs/zio.c:547
/openzfs#504 zio_done (zio=0x7f038800c400) at ../../module/zfs/zio.c:3278
/openzfs#505 0x00007f03c808285b in __zio_execute (zio=0x7f038800c400) at ../../module/zfs/zio.c:1399
/openzfs#506 zio_notify_parent (wait=ZIO_WAIT_DONE, zio=0x7f038800da00, pio=0x7f038800c400) at ../../module/zfs/zio.c:547
/openzfs#507 zio_done (zio=0x7f038800da00) at ../../module/zfs/zio.c:3278
/openzfs#508 0x00007f03c808285b in __zio_execute (zio=0x7f038800da00) at ../../module/zfs/zio.c:1399
/openzfs#509 zio_notify_parent (wait=ZIO_WAIT_DONE, zio=0x7f038800fd80, pio=0x7f038800da00) at ../../module/zfs/zio.c:547
/openzfs#510 zio_done (zio=0x7f038800fd80) at ../../module/zfs/zio.c:3278
/openzfs#511 0x00007f03c807a6d3 in __zio_execute (zio=0x7f038800fd80) at ../../module/zfs/zio.c:1399
/openzfs#512 zio_execute (zio=zio@entry=0x7f038800fd80) at ../../module/zfs/zio.c:1337
/openzfs#513 0x00007f03c8060b35 in vdev_queue_io_to_issue (vq=vq@entry=0x99f8a8) at ../../module/zfs/vdev_queue.c:706
/openzfs#514 0x00007f03c806119d in vdev_queue_io_done (zio=zio@entry=0x7f03a0010950) at ../../module/zfs/vdev_queue.c:775
/openzfs#515 0x00007f03c807a0e8 in zio_vdev_io_done (zio=0x7f03a0010950) at ../../module/zfs/zio.c:2686
/openzfs#516 0x00007f03c807a6d3 in __zio_execute (zio=0x7f03a0010950) at ../../module/zfs/zio.c:1399
/openzfs#517 zio_execute (zio=0x7f03a0010950) at ../../module/zfs/zio.c:1337
/openzfs#518 0x00007f03c7fcd0c4 in taskq_thread (arg=0x966d50) at ../../lib/libzpool/taskq.c:215
/openzfs#519 0x00007f03c7fc7937 in zk_thread_helper (arg=0x967e90) at ../../lib/libzpool/kernel.c:135
/openzfs#520 0x00007f03c78890a3 in start_thread (arg=0x7f03c2703700) at pthread_create.c:309
/openzfs#521 0x00007f03c75c50fd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111

This occurred when ztest was simulating a scrub under heavy I/O load. Under
those circumstances, it was possible for a mix of noop I/Os for aggregation
continuity and the I/O elevator to generate arbitrarily deep recursion.

This patch modifies ZFS to propapage a recursion counter inside the zio_t
objects such that IOs will be redispatched upon reaching a given recursion
depth.  We can detect long call chains and dispatch to another ZIO taskq. We
cut in-line when we do this to minimize the potential for taskq exhaustion that
can prevent a zio from notifying its parent.

Signed-off-by: Richard Yao <[email protected]>
sdimitro pushed a commit to sdimitro/zfs that referenced this issue Sep 30, 2021
DOSE-408 Zfs_unload-key
DOSE-407 Zfs_sysfs
DOSE-406 Zfs_snapshot
DOSE-404 Zfs_set
DOSE-403 Zfs_send
DOSE-337 Mv_files
pcd1193182 added a commit to pcd1193182/zfs that referenced this issue Jun 27, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants