Reduce the stack usage of dsl_dataset_remove_clones_key #1730

kohsuke · 2013-09-17T22:56:47Z

This fixes issue #1726

behlendorf · 2013-09-18T20:23:00Z

This looks good, we'll get it in. One question though, do you know what the new stack frame size is and does it resolve your test case?

kohsuke · 2013-09-19T00:57:45Z

sizeof(zap_attribute_t) is 280 and sizeof(zap_cursor_t) is 56 on 64bit system, so it is 280+56-2*8=320 bytes saving.

IOW, it should bring down the stack frame size from 392 to 72.

kohsuke · 2013-09-19T17:20:11Z

The actual obsered result from /sys/kernel/debug/tracing/stack_trace after loading the modified module was 96. Not sure where I made the calculation error.

kohsuke · 2013-09-19T18:00:34Z

I noticed that I'm seeing lots of kernel stack trace after this change:

[ 2002.228803] SPL: Showing stack for process 559
[ 2002.229935] Pid: 559, comm: txg_sync Tainted: P            3.0.84-1-lts #1
[ 2002.231202] Call Trace:
[ 2002.232323]  [<ffffffffa0456867>] spl_debug_dumpstack+0x27/0x40 [spl]
[ 2002.233468]  [<ffffffffa045a98e>] kmem_alloc_debug+0x46e/0x500 [spl]
[ 2002.234619]  [<ffffffffa07aa7d9>] dsl_dataset_remove_clones_key.isra.14+0x99/0x1c0 [zfs]
[ 2002.235783]  [<ffffffffa07aa871>] dsl_dataset_remove_clones_key.isra.14+0x131/0x1c0 [zfs]
[ 2002.236932]  [<ffffffffa07aa871>] dsl_dataset_remove_clones_key.isra.14+0x131/0x1c0 [zfs]
[ 2002.238059]  [<ffffffffa07aa871>] dsl_dataset_remove_clones_key.isra.14+0x131/0x1c0 [zfs]
[ 2002.239170]  [<ffffffffa07aa871>] dsl_dataset_remove_clones_key.isra.14+0x131/0x1c0 [zfs]
[ 2002.240283]  [<ffffffffa07aa871>] dsl_dataset_remove_clones_key.isra.14+0x131/0x1c0 [zfs]
[ 2002.241500]  [<ffffffffa07aa871>] dsl_dataset_remove_clones_key.isra.14+0x131/0x1c0 [zfs]
[ 2002.242566]  [<ffffffffa07ac48d>] dsl_dataset_destroy_sync+0x31d/0xf80 [zfs]
[ 2002.243621]  [<ffffffff81421196>] ? ftrace_call+0x5/0x2b
[ 2002.244672]  [<ffffffff8141837e>] ? mutex_unlock+0xe/0x10
[ 2002.245731]  [<ffffffffa07a7255>] ? dsl_dataset_destroy_check+0x115/0x1d0 [zfs]
[ 2002.246793]  [<ffffffff81421196>] ? ftrace_call+0x5/0x2b
[ 2002.247867]  [<ffffffffa07bf46f>] dsl_sync_task_group_sync+0x12f/0x230 [zfs]
[ 2002.248951]  [<ffffffffa07b745b>] dsl_pool_sync+0x48b/0x5c0 [zfs]
[ 2002.250058]  [<ffffffffa07cf84d>] spa_sync+0x42d/0xb10 [zfs]
[ 2002.251262]  [<ffffffff81421196>] ? ftrace_call+0x5/0x2b
[ 2002.252330]  [<ffffffffa07df23d>] txg_sync_thread+0x33d/0x5c0 [zfs]
[ 2002.253396]  [<ffffffffa07def00>] ? txg_delay+0x160/0x160 [zfs]
[ 2002.254451]  [<ffffffffa045ec0a>] thread_generic_wrapper+0x7a/0x90 [spl]
[ 2002.255508]  [<ffffffffa045eb90>] ? __thread_exit+0xa0/0xa0 [spl]
[ 2002.256566]  [<ffffffff8107a0ec>] kthread+0x8c/0xa0
[ 2002.257637]  [<ffffffff81422524>] kernel_thread_helper+0x4/0x10
[ 2002.258725]  [<ffffffff8107a060>] ? kthread_flush_work_fn+0x20/0x20
[ 2002.259839]  [<ffffffff81422520>] ? gs_change+0x13/0x13
[ 2002.262344] SPL: Showing stack for process 559

But I'm puzzled as to why. The "SPL: Showing stack for process" output comes from spl_debug_dumpstack, but kmem_alloc_debug doesn't call that method. It's from kmem_alloc_track. In either case, this should only happen with a large kmem_alloc request, which I'm not making, and I don't see the "large kmem_alloc(...)" message. What's going on?

nedbass · 2013-09-19T18:37:49Z

The stack must come from kmalloc_nofail->sanitize_flags. I think you need to use KM_PUSHPAGE instead of KM_SLEEP. Maybe your spl_debug_* module options are masking out the console message.

behlendorf · 2013-09-19T18:54:30Z

Due to the allocation flags used it's warning a deadlock is possible because this was called in the sync context. Use KM_PUSHPAGE instead of KM_SLEEP.

dsl_dataset_remove_clones_key does recursion, so if the recursion goes deep it can overrun the linux kernel stack size of 8KB. I have seen this happen in the actual deployment, and subsequently confirmed it by running a test workload on a custom-built kernel that uses 32KB stack. See the following stack trace as an example of the case where it would have run over the 8KB stack kernel: Depth Size Location (42 entries) ----- ---- -------- 0) 11192 72 __kmalloc+0x2e/0x240 1) 11120 144 kmem_alloc_debug+0x20e/0x500 [spl] 2) 10976 72 dbuf_hold_impl+0x4a/0xa0 [zfs] 3) 10904 120 dbuf_prefetch+0xd3/0x280 [zfs] 4) 10784 80 dmu_zfetch_dofetch.isra.5+0x10f/0x180 [zfs] 5) 10704 240 dmu_zfetch+0x5f7/0x10e0 [zfs] 6) 10464 168 dbuf_read+0x71e/0x8f0 [zfs] 7) 10296 104 dnode_hold_impl+0x1ee/0x620 [zfs] 8) 10192 16 dnode_hold+0x19/0x20 [zfs] 9) 10176 88 dmu_buf_hold+0x42/0x1b0 [zfs] 10) 10088 144 zap_lockdir+0x48/0x730 [zfs] 11) 9944 128 zap_cursor_retrieve+0x1c4/0x2f0 [zfs] 12) 9816 392 dsl_dataset_remove_clones_key.isra.14+0xab/0x190 [zfs] 13) 9424 392 dsl_dataset_remove_clones_key.isra.14+0x10c/0x190 [zfs] 14) 9032 392 dsl_dataset_remove_clones_key.isra.14+0x10c/0x190 [zfs] 15) 8640 392 dsl_dataset_remove_clones_key.isra.14+0x10c/0x190 [zfs] 16) 8248 392 dsl_dataset_remove_clones_key.isra.14+0x10c/0x190 [zfs] 17) 7856 392 dsl_dataset_remove_clones_key.isra.14+0x10c/0x190 [zfs] 18) 7464 392 dsl_dataset_remove_clones_key.isra.14+0x10c/0x190 [zfs] 19) 7072 392 dsl_dataset_remove_clones_key.isra.14+0x10c/0x190 [zfs] 20) 6680 392 dsl_dataset_remove_clones_key.isra.14+0x10c/0x190 [zfs] 21) 6288 392 dsl_dataset_remove_clones_key.isra.14+0x10c/0x190 [zfs] 22) 5896 392 dsl_dataset_remove_clones_key.isra.14+0x10c/0x190 [zfs] 23) 5504 392 dsl_dataset_remove_clones_key.isra.14+0x10c/0x190 [zfs] 24) 5112 392 dsl_dataset_remove_clones_key.isra.14+0x10c/0x190 [zfs] 25) 4720 392 dsl_dataset_remove_clones_key.isra.14+0x10c/0x190 [zfs] 26) 4328 392 dsl_dataset_remove_clones_key.isra.14+0x10c/0x190 [zfs] 27) 3936 392 dsl_dataset_remove_clones_key.isra.14+0x10c/0x190 [zfs] 28) 3544 392 dsl_dataset_remove_clones_key.isra.14+0x10c/0x190 [zfs] 29) 3152 392 dsl_dataset_remove_clones_key.isra.14+0x10c/0x190 [zfs] 30) 2760 392 dsl_dataset_remove_clones_key.isra.14+0x10c/0x190 [zfs] 31) 2368 392 dsl_dataset_remove_clones_key.isra.14+0x10c/0x190 [zfs] 32) 1976 392 dsl_dataset_remove_clones_key.isra.14+0x10c/0x190 [zfs] 33) 1584 392 dsl_dataset_remove_clones_key.isra.14+0x10c/0x190 [zfs] 34) 1192 232 dsl_dataset_destroy_sync+0x311/0xf60 [zfs] 35) 960 72 dsl_sync_task_group_sync+0x12f/0x230 [zfs] 36) 888 168 dsl_pool_sync+0x48b/0x5c0 [zfs] 37) 720 184 spa_sync+0x417/0xb00 [zfs] 38) 536 184 txg_sync_thread+0x325/0x5b0 [zfs] 39) 352 48 thread_generic_wrapper+0x7a/0x90 [spl] 40) 304 128 kthread+0xc0/0xd0 41) 176 176 ret_from_fork+0x7c/0xb0 As suggested in openzfs#1726, this change is a quick fix to reduce the stack usage in dsl_dataset_remove_clones_key by allocating structures in heap, not in stack. This is not a fundamental fix, as one can create an arbitrary large data set that runs over any fixed size stack, but this will make the problem far less likely. Reference: openzfs#1726 Signed-off-by: Kohsuke Kawaguchi <[email protected]> Closes openzfs#1726

kohsuke · 2013-09-19T19:17:03Z

Patch revised based on the feedback.

For my own education, where can I learn the difference between KM_PUSHPAGE vs KM_SLEEP and what "the sync context" is?

nedbass · 2013-09-19T19:55:02Z

@kohsuke briefly:
KM_SLEEP means the process may reschedule and the allocation cannot fail. KM_PUSHPAGE means the allocation must not internally issue any IO. Sync context means a thread that may issue synchronous IO to disk. I believe this includes the txg_sync(), zfs_putpage(), zvol_write(), and zvol_discard() call paths.

These flags are defined and briefly described in include/sys/kmem.h in the SPL. You can find many other discussions on the topic if you search the SPL and ZFS repositories and issue trackers. For example see 8630650a openzfs/spl@eb0f407a2 and #924 (comment)

kohsuke · 2013-09-21T02:01:07Z

Thanks for the details. I really appreciate that!

ryao · 2013-09-21T16:18:49Z

I made a comment on the diff about using KM_SLEEP instead of KM_PUSHPAGE before reading the rest of the discussion. Please disregard it.

behlendorf closed this Sep 26, 2013

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reduce the stack usage of dsl_dataset_remove_clones_key #1730

Reduce the stack usage of dsl_dataset_remove_clones_key #1730

kohsuke commented Sep 17, 2013

behlendorf commented Sep 18, 2013

kohsuke commented Sep 19, 2013

kohsuke commented Sep 19, 2013

kohsuke commented Sep 19, 2013

nedbass commented Sep 19, 2013

behlendorf commented Sep 19, 2013

kohsuke commented Sep 19, 2013

nedbass commented Sep 19, 2013

kohsuke commented Sep 21, 2013

ryao commented Sep 21, 2013

Reduce the stack usage of dsl_dataset_remove_clones_key #1730

Reduce the stack usage of dsl_dataset_remove_clones_key #1730

Conversation

kohsuke commented Sep 17, 2013

behlendorf commented Sep 18, 2013

kohsuke commented Sep 19, 2013

kohsuke commented Sep 19, 2013

kohsuke commented Sep 19, 2013

nedbass commented Sep 19, 2013

behlendorf commented Sep 19, 2013

kohsuke commented Sep 19, 2013

nedbass commented Sep 19, 2013

kohsuke commented Sep 21, 2013

ryao commented Sep 21, 2013