-
Notifications
You must be signed in to change notification settings - Fork 5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
unzip tarball fail on symlink error: File name too long #34
Comments
I suspect this is a github issue rather than anything we are able to address. Closing, but do re-open if you can provide evidence to the contrary. I recommend either using git to clone the repo or trying to download the .tar.gz. |
yup it worked well with the tar.gz |
When we destroy a cm_id, we must purge associated events from the event queue. If the cm_id is for a listen request, we also purge corresponding pending connect requests. This requires destroying the cm_id's associated with the connect requests by calling rdma_destroy_id(). rdma_destroy_id() blocks until all outstanding callbacks have completed. The issue is that we hold file->mut while purging events from the event queue. We also acquire file->mut in our event handler. Calling rdma_destroy_id() while holding file->mut can lead to a deadlock, since the event handler callback cannot acquire file->mut, which prevents rdma_destroy_id() from completing. Fix this by moving events to purge from the event queue to a temporary list. We can then release file->mut and call rdma_destroy_id() outside of holding any locks. Bug report by Or Gerlitz <[email protected]>: [ INFO: possible circular locking dependency detected ] 3.3.0-rc5-00008-g79f1e43-dirty #34 Tainted: G I tgtd/9018 is trying to acquire lock: (&id_priv->handler_mutex){+.+.+.}, at: [<ffffffffa0359a41>] rdma_destroy_id+0x33/0x1f0 [rdma_cm] but task is already holding lock: (&file->mut){+.+.+.}, at: [<ffffffffa02470fe>] ucma_free_ctx+0xb6/0x196 [rdma_ucm] which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #1 (&file->mut){+.+.+.}: [<ffffffff810682f3>] lock_acquire+0xf0/0x116 [<ffffffff8135f179>] mutex_lock_nested+0x64/0x2e6 [<ffffffffa0247636>] ucma_event_handler+0x148/0x1dc [rdma_ucm] [<ffffffffa035a79a>] cma_ib_handler+0x1a7/0x1f7 [rdma_cm] [<ffffffffa0333e88>] cm_process_work+0x32/0x119 [ib_cm] [<ffffffffa03362ab>] cm_work_handler+0xfb8/0xfe5 [ib_cm] [<ffffffff810423e2>] process_one_work+0x2bd/0x4a6 [<ffffffff810429e2>] worker_thread+0x1d6/0x350 [<ffffffff810462a6>] kthread+0x84/0x8c [<ffffffff81369624>] kernel_thread_helper+0x4/0x10 -> #0 (&id_priv->handler_mutex){+.+.+.}: [<ffffffff81067b86>] __lock_acquire+0x10d5/0x1752 [<ffffffff810682f3>] lock_acquire+0xf0/0x116 [<ffffffff8135f179>] mutex_lock_nested+0x64/0x2e6 [<ffffffffa0359a41>] rdma_destroy_id+0x33/0x1f0 [rdma_cm] [<ffffffffa024715f>] ucma_free_ctx+0x117/0x196 [rdma_ucm] [<ffffffffa0247255>] ucma_close+0x77/0xb4 [rdma_ucm] [<ffffffff810df6ef>] fput+0x117/0x1cf [<ffffffff810dc76e>] filp_close+0x6d/0x78 [<ffffffff8102b667>] put_files_struct+0xbd/0x17d [<ffffffff8102b76d>] exit_files+0x46/0x4e [<ffffffff8102d057>] do_exit+0x299/0x75d [<ffffffff8102d599>] do_group_exit+0x7e/0xa9 [<ffffffff8103ae4b>] get_signal_to_deliver+0x536/0x555 [<ffffffff81001717>] do_signal+0x39/0x634 [<ffffffff81001d39>] do_notify_resume+0x27/0x69 [<ffffffff81361c03>] retint_signal+0x46/0x83 other info that might help us debug this: Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(&file->mut); lock(&id_priv->handler_mutex); lock(&file->mut); lock(&id_priv->handler_mutex); *** DEADLOCK *** 1 lock held by tgtd/9018: #0: (&file->mut){+.+.+.}, at: [<ffffffffa02470fe>] ucma_free_ctx+0xb6/0x196 [rdma_ucm] stack backtrace: Pid: 9018, comm: tgtd Tainted: G I 3.3.0-rc5-00008-g79f1e43-dirty #34 Call Trace: [<ffffffff81029e9c>] ? console_unlock+0x18e/0x207 [<ffffffff81066433>] print_circular_bug+0x28e/0x29f [<ffffffff81067b86>] __lock_acquire+0x10d5/0x1752 [<ffffffff810682f3>] lock_acquire+0xf0/0x116 [<ffffffffa0359a41>] ? rdma_destroy_id+0x33/0x1f0 [rdma_cm] [<ffffffff8135f179>] mutex_lock_nested+0x64/0x2e6 [<ffffffffa0359a41>] ? rdma_destroy_id+0x33/0x1f0 [rdma_cm] [<ffffffff8106546d>] ? trace_hardirqs_on_caller+0x11e/0x155 [<ffffffff810654b1>] ? trace_hardirqs_on+0xd/0xf [<ffffffffa0359a41>] rdma_destroy_id+0x33/0x1f0 [rdma_cm] [<ffffffffa024715f>] ucma_free_ctx+0x117/0x196 [rdma_ucm] [<ffffffffa0247255>] ucma_close+0x77/0xb4 [rdma_ucm] [<ffffffff810df6ef>] fput+0x117/0x1cf [<ffffffff810dc76e>] filp_close+0x6d/0x78 [<ffffffff8102b667>] put_files_struct+0xbd/0x17d [<ffffffff8102b5cc>] ? put_files_struct+0x22/0x17d [<ffffffff8102b76d>] exit_files+0x46/0x4e [<ffffffff8102d057>] do_exit+0x299/0x75d [<ffffffff8102d599>] do_group_exit+0x7e/0xa9 [<ffffffff8103ae4b>] get_signal_to_deliver+0x536/0x555 [<ffffffff810654b1>] ? trace_hardirqs_on+0xd/0xf [<ffffffff81001717>] do_signal+0x39/0x634 [<ffffffff8135e037>] ? printk+0x3c/0x45 [<ffffffff8106546d>] ? trace_hardirqs_on_caller+0x11e/0x155 [<ffffffff810654b1>] ? trace_hardirqs_on+0xd/0xf [<ffffffff81361803>] ? _raw_spin_unlock_irq+0x2b/0x40 [<ffffffff81039011>] ? set_current_blocked+0x44/0x49 [<ffffffff81361bce>] ? retint_signal+0x11/0x83 [<ffffffff81001d39>] do_notify_resume+0x27/0x69 [<ffffffff8118a1fe>] ? trace_hardirqs_on_thunk+0x3a/0x3f [<ffffffff81361c03>] retint_signal+0x46/0x83 Signed-off-by: Sean Hefty <[email protected]> Signed-off-by: Roland Dreier <[email protected]>
Since XRC support was added, the uverbs code has locked SRQ, CQ and PD objects needed during QP and SRQ creation in different orders depending on the the code path. This leads to the (at least theoretical) possibility of deadlock, and triggers the lockdep splat below. Fix this by making sure we always lock the SRQ first, then CQs and finally the PD. ====================================================== [ INFO: possible circular locking dependency detected ] 3.4.0-rc5+ #34 Not tainted ------------------------------------------------------- ibv_srq_pingpon/2484 is trying to acquire lock: (SRQ-uobj){+++++.}, at: [<ffffffffa00af51b>] idr_read_uobj+0x2f/0x4d [ib_uverbs] but task is already holding lock: (CQ-uobj){+++++.}, at: [<ffffffffa00af51b>] idr_read_uobj+0x2f/0x4d [ib_uverbs] which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #2 (CQ-uobj){+++++.}: [<ffffffff81070fd0>] lock_acquire+0xbf/0xfe [<ffffffff81384f28>] down_read+0x34/0x43 [<ffffffffa00af51b>] idr_read_uobj+0x2f/0x4d [ib_uverbs] [<ffffffffa00af542>] idr_read_obj+0x9/0x19 [ib_uverbs] [<ffffffffa00b16c3>] ib_uverbs_create_qp+0x180/0x684 [ib_uverbs] [<ffffffffa00ae3dd>] ib_uverbs_write+0xb7/0xc2 [ib_uverbs] [<ffffffff810fe47f>] vfs_write+0xa7/0xee [<ffffffff810fe65f>] sys_write+0x45/0x69 [<ffffffff8138cdf9>] system_call_fastpath+0x16/0x1b -> #1 (PD-uobj){++++++}: [<ffffffff81070fd0>] lock_acquire+0xbf/0xfe [<ffffffff81384f28>] down_read+0x34/0x43 [<ffffffffa00af51b>] idr_read_uobj+0x2f/0x4d [ib_uverbs] [<ffffffffa00af542>] idr_read_obj+0x9/0x19 [ib_uverbs] [<ffffffffa00af8ad>] __uverbs_create_xsrq+0x96/0x386 [ib_uverbs] [<ffffffffa00b31b9>] ib_uverbs_detach_mcast+0x1cd/0x1e6 [ib_uverbs] [<ffffffffa00ae3dd>] ib_uverbs_write+0xb7/0xc2 [ib_uverbs] [<ffffffff810fe47f>] vfs_write+0xa7/0xee [<ffffffff810fe65f>] sys_write+0x45/0x69 [<ffffffff8138cdf9>] system_call_fastpath+0x16/0x1b -> #0 (SRQ-uobj){+++++.}: [<ffffffff81070898>] __lock_acquire+0xa29/0xd06 [<ffffffff81070fd0>] lock_acquire+0xbf/0xfe [<ffffffff81384f28>] down_read+0x34/0x43 [<ffffffffa00af51b>] idr_read_uobj+0x2f/0x4d [ib_uverbs] [<ffffffffa00af542>] idr_read_obj+0x9/0x19 [ib_uverbs] [<ffffffffa00b1728>] ib_uverbs_create_qp+0x1e5/0x684 [ib_uverbs] [<ffffffffa00ae3dd>] ib_uverbs_write+0xb7/0xc2 [ib_uverbs] [<ffffffff810fe47f>] vfs_write+0xa7/0xee [<ffffffff810fe65f>] sys_write+0x45/0x69 [<ffffffff8138cdf9>] system_call_fastpath+0x16/0x1b other info that might help us debug this: Chain exists of: SRQ-uobj --> PD-uobj --> CQ-uobj Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(CQ-uobj); lock(PD-uobj); lock(CQ-uobj); lock(SRQ-uobj); *** DEADLOCK *** 3 locks held by ibv_srq_pingpon/2484: #0: (QP-uobj){+.+...}, at: [<ffffffffa00b162c>] ib_uverbs_create_qp+0xe9/0x684 [ib_uverbs] #1: (PD-uobj){++++++}, at: [<ffffffffa00af51b>] idr_read_uobj+0x2f/0x4d [ib_uverbs] #2: (CQ-uobj){+++++.}, at: [<ffffffffa00af51b>] idr_read_uobj+0x2f/0x4d [ib_uverbs] stack backtrace: Pid: 2484, comm: ibv_srq_pingpon Not tainted 3.4.0-rc5+ #34 Call Trace: [<ffffffff8137eff0>] print_circular_bug+0x1f8/0x209 [<ffffffff81070898>] __lock_acquire+0xa29/0xd06 [<ffffffffa00af37c>] ? __idr_get_uobj+0x20/0x5e [ib_uverbs] [<ffffffffa00af51b>] ? idr_read_uobj+0x2f/0x4d [ib_uverbs] [<ffffffff81070fd0>] lock_acquire+0xbf/0xfe [<ffffffffa00af51b>] ? idr_read_uobj+0x2f/0x4d [ib_uverbs] [<ffffffff81070eee>] ? lock_release+0x166/0x189 [<ffffffff81384f28>] down_read+0x34/0x43 [<ffffffffa00af51b>] ? idr_read_uobj+0x2f/0x4d [ib_uverbs] [<ffffffffa00af51b>] idr_read_uobj+0x2f/0x4d [ib_uverbs] [<ffffffffa00af542>] idr_read_obj+0x9/0x19 [ib_uverbs] [<ffffffffa00b1728>] ib_uverbs_create_qp+0x1e5/0x684 [ib_uverbs] [<ffffffff81070fec>] ? lock_acquire+0xdb/0xfe [<ffffffff81070c09>] ? lock_release_non_nested+0x94/0x213 [<ffffffff810d470f>] ? might_fault+0x40/0x90 [<ffffffff810d470f>] ? might_fault+0x40/0x90 [<ffffffffa00ae3dd>] ib_uverbs_write+0xb7/0xc2 [ib_uverbs] [<ffffffff810fe47f>] vfs_write+0xa7/0xee [<ffffffff810ff736>] ? fget_light+0x3b/0x99 [<ffffffff810fe65f>] sys_write+0x45/0x69 [<ffffffff8138cdf9>] system_call_fastpath+0x16/0x1b Reported-by: Or Gerlitz <[email protected]> Signed-off-by: Roland Dreier <[email protected]>
Above is a issue of unzip utility for the linux The original src of unzip is available at http://sourceforge.net/projects/infozip/?source=dlp |
commit 293ba3b upstream. Now that clk_unregister() frees the struct clk we're unregistering we'll free memory twice: first we'll call kfree() in __clk_release() with an address kmalloc doesn't know about and second we'll call kfree() in the devres layer. Remove the allocation of struct clk in devm_clk_register() and let clk_release() handle it. This fixes slab errors like: ============================================================================= BUG kmalloc-128 (Not tainted): Invalid object pointer 0xed08e8d0 ----------------------------------------------------------------------------- Disabling lock debugging due to kernel taint INFO: Slab 0xeec503f8 objects=25 used=15 fp=0xed08ea00 flags=0x4081 CPU: 2 PID: 73 Comm: rmmod Tainted: G B 3.14.0-11032-g526e9c764381 #34 [<c0014be0>] (unwind_backtrace) from [<c0012240>] (show_stack+0x10/0x14) [<c0012240>] (show_stack) from [<c04b74dc>] (dump_stack+0x70/0xbc) [<c04b74dc>] (dump_stack) from [<c00f6778>] (slab_err+0x74/0x84) [<c00f6778>] (slab_err) from [<c04b6278>] (free_debug_processing+0x2cc/0x31c) [<c04b6278>] (free_debug_processing) from [<c04b6300>] (__slab_free+0x38/0x41c) [<c04b6300>] (__slab_free) from [<c03931bc>] (clk_unregister+0xd4/0x140) [<c03931bc>] (clk_unregister) from [<c02fb774>] (release_nodes+0x164/0x1d8) [<c02fb774>] (release_nodes) from [<c02f8698>] (__device_release_driver+0x60/0xb0) [<c02f8698>] (__device_release_driver) from [<c02f9080>] (driver_detach+0xb4/0xb8) [<c02f9080>] (driver_detach) from [<c02f8480>] (bus_remove_driver+0x5c/0xc4) [<c02f8480>] (bus_remove_driver) from [<c008c9b8>] (SyS_delete_module+0x148/0x1d8) [<c008c9b8>] (SyS_delete_module) from [<c000ef80>] (ret_fast_syscall+0x0/0x48) FIX kmalloc-128: Object at 0xed08e8d0 not freed Fixes: fcb0ee6 (clk: Implement clk_unregister) Cc: Jiada Wang <[email protected]> Cc: Sylwester Nawrocki <[email protected]> Cc: Kyungmin Park <[email protected]> Signed-off-by: Stephen Boyd <[email protected]> Signed-off-by: Mike Turquette <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
Now that clk_unregister() frees the struct clk we're unregistering we'll free memory twice: first we'll call kfree() in __clk_release() with an address kmalloc doesn't know about and second we'll call kfree() in the devres layer. Remove the allocation of struct clk in devm_clk_register() and let clk_release() handle it. This fixes slab errors like: ============================================================================= BUG kmalloc-128 (Not tainted): Invalid object pointer 0xed08e8d0 ----------------------------------------------------------------------------- Disabling lock debugging due to kernel taint INFO: Slab 0xeec503f8 objects=25 used=15 fp=0xed08ea00 flags=0x4081 CPU: 2 PID: 73 Comm: rmmod Tainted: G B 3.14.0-11032-g526e9c764381 #34 [<c0014be0>] (unwind_backtrace) from [<c0012240>] (show_stack+0x10/0x14) [<c0012240>] (show_stack) from [<c04b74dc>] (dump_stack+0x70/0xbc) [<c04b74dc>] (dump_stack) from [<c00f6778>] (slab_err+0x74/0x84) [<c00f6778>] (slab_err) from [<c04b6278>] (free_debug_processing+0x2cc/0x31c) [<c04b6278>] (free_debug_processing) from [<c04b6300>] (__slab_free+0x38/0x41c) [<c04b6300>] (__slab_free) from [<c03931bc>] (clk_unregister+0xd4/0x140) [<c03931bc>] (clk_unregister) from [<c02fb774>] (release_nodes+0x164/0x1d8) [<c02fb774>] (release_nodes) from [<c02f8698>] (__device_release_driver+0x60/0xb0) [<c02f8698>] (__device_release_driver) from [<c02f9080>] (driver_detach+0xb4/0xb8) [<c02f9080>] (driver_detach) from [<c02f8480>] (bus_remove_driver+0x5c/0xc4) [<c02f8480>] (bus_remove_driver) from [<c008c9b8>] (SyS_delete_module+0x148/0x1d8) [<c008c9b8>] (SyS_delete_module) from [<c000ef80>] (ret_fast_syscall+0x0/0x48) FIX kmalloc-128: Object at 0xed08e8d0 not freed Fixes: fcb0ee6 (clk: Implement clk_unregister) Cc: Jiada Wang <[email protected]> Cc: Sylwester Nawrocki <[email protected]> Cc: Kyungmin Park <[email protected]> Signed-off-by: Stephen Boyd <[email protected]> Signed-off-by: Mike Turquette <[email protected]> Cc: [email protected]
…kup detector commit bde92cf upstream. Peter Wu noticed the following splat on his machine when updating /proc/sys/kernel/watchdog_thresh: BUG: sleeping function called from invalid context at mm/slub.c:965 in_atomic(): 1, irqs_disabled(): 0, pid: 1, name: init 3 locks held by init/1: #0: (sb_writers#3){.+.+.+}, at: [<ffffffff8117b663>] vfs_write+0x143/0x180 #1: (watchdog_proc_mutex){+.+.+.}, at: [<ffffffff810e02d3>] proc_dowatchdog+0x33/0x110 #2: (cpu_hotplug.lock){.+.+.+}, at: [<ffffffff810589c2>] get_online_cpus+0x32/0x80 Preemption disabled at:[<ffffffff810e0384>] proc_dowatchdog+0xe4/0x110 CPU: 0 PID: 1 Comm: init Not tainted 3.16.0-rc1-testing #34 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011 Call Trace: dump_stack+0x4e/0x7a __might_sleep+0x11d/0x190 kmem_cache_alloc_trace+0x4e/0x1e0 perf_event_alloc+0x55/0x440 perf_event_create_kernel_counter+0x26/0xe0 watchdog_nmi_enable+0x75/0x140 update_timers_all_cpus+0x53/0xa0 proc_dowatchdog+0xe4/0x110 proc_sys_call_handler+0xb3/0xc0 proc_sys_write+0x14/0x20 vfs_write+0xad/0x180 SyS_write+0x49/0xb0 system_call_fastpath+0x16/0x1b NMI watchdog: disabled (cpu0): hardware events not enabled What happened is after updating the watchdog_thresh, the lockup detector is restarted to utilize the new value. Part of this process involved disabling preemption. Once preemption was disabled, perf tried to allocate a new event (as part of the restart). This caused the above BUG_ON as you can't sleep with preemption disabled. The preemption restriction seemed agressive as we are not doing anything on that particular cpu, but with all the online cpus (which are protected by the get_online_cpus lock). Remove the restriction and the BUG_ON goes away. Signed-off-by: Don Zickus <[email protected]> Acked-by: Michal Hocko <[email protected]> Reported-by: Peter Wu <[email protected]> Tested-by: Peter Wu <[email protected]> Acked-by: David Rientjes <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
…kup detector Peter Wu noticed the following splat on his machine when updating /proc/sys/kernel/watchdog_thresh: BUG: sleeping function called from invalid context at mm/slub.c:965 in_atomic(): 1, irqs_disabled(): 0, pid: 1, name: init 3 locks held by init/1: #0: (sb_writers#3){.+.+.+}, at: [<ffffffff8117b663>] vfs_write+0x143/0x180 #1: (watchdog_proc_mutex){+.+.+.}, at: [<ffffffff810e02d3>] proc_dowatchdog+0x33/0x110 #2: (cpu_hotplug.lock){.+.+.+}, at: [<ffffffff810589c2>] get_online_cpus+0x32/0x80 Preemption disabled at:[<ffffffff810e0384>] proc_dowatchdog+0xe4/0x110 CPU: 0 PID: 1 Comm: init Not tainted 3.16.0-rc1-testing #34 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011 Call Trace: dump_stack+0x4e/0x7a __might_sleep+0x11d/0x190 kmem_cache_alloc_trace+0x4e/0x1e0 perf_event_alloc+0x55/0x440 perf_event_create_kernel_counter+0x26/0xe0 watchdog_nmi_enable+0x75/0x140 update_timers_all_cpus+0x53/0xa0 proc_dowatchdog+0xe4/0x110 proc_sys_call_handler+0xb3/0xc0 proc_sys_write+0x14/0x20 vfs_write+0xad/0x180 SyS_write+0x49/0xb0 system_call_fastpath+0x16/0x1b NMI watchdog: disabled (cpu0): hardware events not enabled What happened is after updating the watchdog_thresh, the lockup detector is restarted to utilize the new value. Part of this process involved disabling preemption. Once preemption was disabled, perf tried to allocate a new event (as part of the restart). This caused the above BUG_ON as you can't sleep with preemption disabled. The preemption restriction seemed agressive as we are not doing anything on that particular cpu, but with all the online cpus (which are protected by the get_online_cpus lock). Remove the restriction and the BUG_ON goes away. Signed-off-by: Don Zickus <[email protected]> Acked-by: Michal Hocko <[email protected]> Reported-by: Peter Wu <[email protected]> Tested-by: Peter Wu <[email protected]> Acked-by: David Rientjes <[email protected]> Cc: <[email protected]> [3.13+] Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
The local nohz kick is currently used by perf which needs it to be NMI-safe. Recent commit though (7d1311b) changed its implementation to fire the local kick using the remote kick API. It was convenient to make the code more generic but the remote kick isn't NMI-safe. As a result: WARNING: CPU: 3 PID: 18062 at kernel/irq_work.c:72 irq_work_queue_on+0x11e/0x140() CPU: 3 PID: 18062 Comm: trinity-subchil Not tainted 3.16.0+ #34 0000000000000009 00000000903774d1 ffff880244e06c00 ffffffff9a7f1e37 0000000000000000 ffff880244e06c38 ffffffff9a0791dd ffff880244fce180 0000000000000003 ffff880244e06d58 ffff880244e06ef8 0000000000000000 Call Trace: <NMI> [<ffffffff9a7f1e37>] dump_stack+0x4e/0x7a [<ffffffff9a0791dd>] warn_slowpath_common+0x7d/0xa0 [<ffffffff9a07930a>] warn_slowpath_null+0x1a/0x20 [<ffffffff9a17ca1e>] irq_work_queue_on+0x11e/0x140 [<ffffffff9a10a2c7>] tick_nohz_full_kick_cpu+0x57/0x90 [<ffffffff9a186cd5>] __perf_event_overflow+0x275/0x350 [<ffffffff9a184f80>] ? perf_event_task_disable+0xa0/0xa0 [<ffffffff9a01a4cf>] ? x86_perf_event_set_period+0xbf/0x150 [<ffffffff9a187934>] perf_event_overflow+0x14/0x20 [<ffffffff9a020386>] intel_pmu_handle_irq+0x206/0x410 [<ffffffff9a0b54d3>] ? arch_vtime_task_switch+0x63/0x130 [<ffffffff9a01937b>] perf_event_nmi_handler+0x2b/0x50 [<ffffffff9a007b72>] nmi_handle+0xd2/0x390 [<ffffffff9a007aa5>] ? nmi_handle+0x5/0x390 [<ffffffff9a0d131b>] ? lock_release+0xab/0x330 [<ffffffff9a008062>] default_do_nmi+0x72/0x1c0 [<ffffffff9a0c925f>] ? cpuacct_account_field+0xcf/0x200 [<ffffffff9a008268>] do_nmi+0xb8/0x100 Lets fix this by restoring the use of local irq work for the nohz local kick. Reported-by: Catalin Iacob <[email protected]> Reported-and-tested-by: Dave Jones <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Signed-off-by: Frederic Weisbecker <[email protected]>
commit 76a3306 upstream. The nohz full kick, which restarts the tick when any resource depend on it, can't be executed anywhere given the operation it does on timers. If it is called from the scheduler or timers code, chances are that we run into a deadlock. This is why we run the nohz full kick from an irq work. That way we make sure that the kick runs on a virgin context. However if that's the case when irq work runs in its own dedicated self-ipi, things are different for the big bunch of archs that don't support the self triggered way. In order to support them, irq works are also handled by the timer interrupt as fallback. Now when irq works run on the timer interrupt, the context isn't blank. More precisely, they can run in the context of the hrtimer that runs the tick. But the nohz kick cancels and restarts this hrtimer and cancelling an hrtimer from itself isn't allowed. This is why we run in an endless loop: Kernel panic - not syncing: Watchdog detected hard LOCKUP on cpu 2 CPU: 2 PID: 7538 Comm: kworker/u8:8 Not tainted 3.16.0+ #34 Workqueue: btrfs-endio-write normal_work_helper [btrfs] ffff880244c06c88 000000001b486fe1 ffff880244c06bf0 ffffffff8a7f1e37 ffffffff8ac52a18 ffff880244c06c78 ffffffff8a7ef928 0000000000000010 ffff880244c06c88 ffff880244c06c20 000000001b486fe1 0000000000000000 Call Trace: <NMI[<ffffffff8a7f1e37>] dump_stack+0x4e/0x7a [<ffffffff8a7ef928>] panic+0xd4/0x207 [<ffffffff8a1450e8>] watchdog_overflow_callback+0x118/0x120 [<ffffffff8a186b0e>] __perf_event_overflow+0xae/0x350 [<ffffffff8a184f80>] ? perf_event_task_disable+0xa0/0xa0 [<ffffffff8a01a4cf>] ? x86_perf_event_set_period+0xbf/0x150 [<ffffffff8a187934>] perf_event_overflow+0x14/0x20 [<ffffffff8a020386>] intel_pmu_handle_irq+0x206/0x410 [<ffffffff8a01937b>] perf_event_nmi_handler+0x2b/0x50 [<ffffffff8a007b72>] nmi_handle+0xd2/0x390 [<ffffffff8a007aa5>] ? nmi_handle+0x5/0x390 [<ffffffff8a0cb7f8>] ? match_held_lock+0x8/0x1b0 [<ffffffff8a008062>] default_do_nmi+0x72/0x1c0 [<ffffffff8a008268>] do_nmi+0xb8/0x100 [<ffffffff8a7ff66a>] end_repeat_nmi+0x1e/0x2e [<ffffffff8a0cb7f8>] ? match_held_lock+0x8/0x1b0 [<ffffffff8a0cb7f8>] ? match_held_lock+0x8/0x1b0 [<ffffffff8a0cb7f8>] ? match_held_lock+0x8/0x1b0 <<EOE><IRQ[<ffffffff8a0ccd2f>] lock_acquired+0xaf/0x450 [<ffffffff8a0f74c5>] ? lock_hrtimer_base.isra.20+0x25/0x50 [<ffffffff8a7fc678>] _raw_spin_lock_irqsave+0x78/0x90 [<ffffffff8a0f74c5>] ? lock_hrtimer_base.isra.20+0x25/0x50 [<ffffffff8a0f74c5>] lock_hrtimer_base.isra.20+0x25/0x50 [<ffffffff8a0f7723>] hrtimer_try_to_cancel+0x33/0x1e0 [<ffffffff8a0f78ea>] hrtimer_cancel+0x1a/0x30 [<ffffffff8a109237>] tick_nohz_restart+0x17/0x90 [<ffffffff8a10a213>] __tick_nohz_full_check+0xc3/0x100 [<ffffffff8a10a25e>] nohz_full_kick_work_func+0xe/0x10 [<ffffffff8a17c884>] irq_work_run_list+0x44/0x70 [<ffffffff8a17c8da>] irq_work_run+0x2a/0x50 [<ffffffff8a0f700b>] update_process_times+0x5b/0x70 [<ffffffff8a109005>] tick_sched_handle.isra.21+0x25/0x60 [<ffffffff8a109b81>] tick_sched_timer+0x41/0x60 [<ffffffff8a0f7aa2>] __run_hrtimer+0x72/0x470 [<ffffffff8a109b40>] ? tick_sched_do_timer+0xb0/0xb0 [<ffffffff8a0f8707>] hrtimer_interrupt+0x117/0x270 [<ffffffff8a034357>] local_apic_timer_interrupt+0x37/0x60 [<ffffffff8a80010f>] smp_apic_timer_interrupt+0x3f/0x50 [<ffffffff8a7fe52f>] apic_timer_interrupt+0x6f/0x80 To fix this we force non-lazy irq works to run on irq work self-IPIs when available. That ability of the arch to trigger irq work self IPIs is available with arch_irq_work_has_interrupt(). Reported-by: Catalin Iacob <[email protected]> Reported-by: Dave Jones <[email protected]> Acked-by: Peter Zijlstra (Intel) <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Paul E. McKenney <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Signed-off-by: Frederic Weisbecker <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
…kup detector commit bde92cf upstream. Peter Wu noticed the following splat on his machine when updating /proc/sys/kernel/watchdog_thresh: BUG: sleeping function called from invalid context at mm/slub.c:965 in_atomic(): 1, irqs_disabled(): 0, pid: 1, name: init 3 locks held by init/1: #0: (sb_writers#3){.+.+.+}, at: [<ffffffff8117b663>] vfs_write+0x143/0x180 #1: (watchdog_proc_mutex){+.+.+.}, at: [<ffffffff810e02d3>] proc_dowatchdog+0x33/0x110 #2: (cpu_hotplug.lock){.+.+.+}, at: [<ffffffff810589c2>] get_online_cpus+0x32/0x80 Preemption disabled at:[<ffffffff810e0384>] proc_dowatchdog+0xe4/0x110 CPU: 0 PID: 1 Comm: init Not tainted 3.16.0-rc1-testing #34 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011 Call Trace: dump_stack+0x4e/0x7a __might_sleep+0x11d/0x190 kmem_cache_alloc_trace+0x4e/0x1e0 perf_event_alloc+0x55/0x440 perf_event_create_kernel_counter+0x26/0xe0 watchdog_nmi_enable+0x75/0x140 update_timers_all_cpus+0x53/0xa0 proc_dowatchdog+0xe4/0x110 proc_sys_call_handler+0xb3/0xc0 proc_sys_write+0x14/0x20 vfs_write+0xad/0x180 SyS_write+0x49/0xb0 system_call_fastpath+0x16/0x1b NMI watchdog: disabled (cpu0): hardware events not enabled What happened is after updating the watchdog_thresh, the lockup detector is restarted to utilize the new value. Part of this process involved disabling preemption. Once preemption was disabled, perf tried to allocate a new event (as part of the restart). This caused the above BUG_ON as you can't sleep with preemption disabled. The preemption restriction seemed agressive as we are not doing anything on that particular cpu, but with all the online cpus (which are protected by the get_online_cpus lock). Remove the restriction and the BUG_ON goes away. Signed-off-by: Don Zickus <[email protected]> Acked-by: Michal Hocko <[email protected]> Reported-by: Peter Wu <[email protected]> Tested-by: Peter Wu <[email protected]> Acked-by: David Rientjes <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
…mit fadbe0c In commit fadbe0c entitled "staging: rtl8188eu:Remove rtw_zmalloc(), wrapper for kzalloc()", the author failed to note that the original code in the wrapper tested whether the caller could sleep, and set the flags argument to kzalloc() appropriately. After the patch, GFP_KERNEL is used unconditionally. Unfortunately, several of the routines may be entered from an interrupt routine and generate a BUG splat for every such call. Routine rtw_sitesurvey_cmd() is used in the example below: BUG: sleeping function called from invalid context at mm/slub.c:1240 in_atomic(): 1, irqs_disabled(): 0, pid: 756, name: wpa_supplicant INFO: lockdep is turned off. CPU: 2 PID: 756 Comm: wpa_supplicant Tainted: G WC O 3.18.0-rc4+ raspberrypi#34 Hardware name: TOSHIBA TECRA A50-A/TECRA A50-A, BIOS Version 4.20 04/17/2014 ffffc90005557000 ffff880216fafaa8 ffffffff816b0bbf 0000000000000000 ffff8800c3b58000 ffff880216fafac8 ffffffff8107af77 0000000000000001 0000000000000010 ffff880216fafb18 ffffffff811b06ce 0000000000000000 Call Trace: [<ffffffff816b0bbf>] dump_stack+0x4e/0x71 [<ffffffff8107af77>] __might_sleep+0xf7/0x120 [<ffffffff811b06ce>] kmem_cache_alloc_trace+0x4e/0x1f0 [<ffffffffa0888226>] ? rtw_sitesurvey_cmd+0x56/0x2a0 [r8188eu] [<ffffffffa0888226>] rtw_sitesurvey_cmd+0x56/0x2a0 [r8188eu] [<ffffffffa088f00d>] rtw_do_join+0x22d/0x370 [r8188eu] [<ffffffffa088f6e8>] rtw_set_802_11_ssid+0x218/0x3d0 [r8188eu] [<ffffffffa08c3ca5>] rtw_wx_set_essid+0x1e5/0x410 [r8188eu] [<ffffffffa08c3ac0>] ? rtw_wx_get_rate+0x50/0x50 [r8188eu] [<ffffffff816938f1>] ioctl_standard_iw_point+0x151/0x3f0 [<ffffffff81693d52>] ioctl_standard_call+0xb2/0xe0 [<ffffffff81597df7>] ? rtnl_lock+0x17/0x20 [<ffffffff816945a0>] ? iw_handler_get_private+0x70/0x70 [<ffffffff81693ca0>] ? call_commit_handler+0x40/0x40 [<ffffffff81693256>] wireless_process_ioctl+0x176/0x1c0 [<ffffffff81693e79>] wext_handle_ioctl+0x69/0xc0 [<ffffffff8159fe79>] dev_ioctl+0x309/0x5e0 [<ffffffff810be9c7>] ? call_rcu+0x17/0x20 [<ffffffff8156a472>] sock_ioctl+0x142/0x2e0 [<ffffffff811e0c70>] do_vfs_ioctl+0x300/0x520 [<ffffffff81101514>] ? __audit_syscall_entry+0xb4/0x110 [<ffffffff81101514>] ? __audit_syscall_entry+0xb4/0x110 [<ffffffff810102bc>] ? do_audit_syscall_entry+0x6c/0x70 [<ffffffff811e0f11>] SyS_ioctl+0x81/0xa0 [<ffffffff816ba1d2>] system_call_fastpath+0x12/0x17 Additional routines that generate this BUG are rtw_joinbss_cmd(), rtw_dynamic_chk_wk_cmd(), rtw_lps_ctrl_wk_cmd(), rtw_rpt_timer_cfg_cmd(), rtw_ps_cmd(), report_survey_event(), report_join_res(), survey_timer_hdl(), and rtw_check_bcn_info(). Signed-off-by: Larry Finger <[email protected]> Cc: navin patidar <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
commit ecf5fc6 upstream. Nikolay has reported a hang when a memcg reclaim got stuck with the following backtrace: PID: 18308 TASK: ffff883d7c9b0a30 CPU: 1 COMMAND: "rsync" #0 __schedule at ffffffff815ab152 raspberrypi#1 schedule at ffffffff815ab76e raspberrypi#2 schedule_timeout at ffffffff815ae5e5 raspberrypi#3 io_schedule_timeout at ffffffff815aad6a raspberrypi#4 bit_wait_io at ffffffff815abfc6 raspberrypi#5 __wait_on_bit at ffffffff815abda5 raspberrypi#6 wait_on_page_bit at ffffffff8111fd4f raspberrypi#7 shrink_page_list at ffffffff81135445 raspberrypi#8 shrink_inactive_list at ffffffff81135845 raspberrypi#9 shrink_lruvec at ffffffff81135ead raspberrypi#10 shrink_zone at ffffffff811360c3 raspberrypi#11 shrink_zones at ffffffff81136eff raspberrypi#12 do_try_to_free_pages at ffffffff8113712f raspberrypi#13 try_to_free_mem_cgroup_pages at ffffffff811372be raspberrypi#14 try_charge at ffffffff81189423 raspberrypi#15 mem_cgroup_try_charge at ffffffff8118c6f5 raspberrypi#16 __add_to_page_cache_locked at ffffffff8112137d raspberrypi#17 add_to_page_cache_lru at ffffffff81121618 raspberrypi#18 pagecache_get_page at ffffffff8112170b raspberrypi#19 grow_dev_page at ffffffff811c8297 raspberrypi#20 __getblk_slow at ffffffff811c91d6 raspberrypi#21 __getblk_gfp at ffffffff811c92c1 raspberrypi#22 ext4_ext_grow_indepth at ffffffff8124565c raspberrypi#23 ext4_ext_create_new_leaf at ffffffff81246ca8 raspberrypi#24 ext4_ext_insert_extent at ffffffff81246f09 raspberrypi#25 ext4_ext_map_blocks at ffffffff8124a848 raspberrypi#26 ext4_map_blocks at ffffffff8121a5b7 raspberrypi#27 mpage_map_one_extent at ffffffff8121b1fa raspberrypi#28 mpage_map_and_submit_extent at ffffffff8121f07b raspberrypi#29 ext4_writepages at ffffffff8121f6d5 raspberrypi#30 do_writepages at ffffffff8112c490 raspberrypi#31 __filemap_fdatawrite_range at ffffffff81120199 raspberrypi#32 filemap_flush at ffffffff8112041c raspberrypi#33 ext4_alloc_da_blocks at ffffffff81219da1 raspberrypi#34 ext4_rename at ffffffff81229b91 raspberrypi#35 ext4_rename2 at ffffffff81229e32 raspberrypi#36 vfs_rename at ffffffff811a08a5 raspberrypi#37 SYSC_renameat2 at ffffffff811a3ffc raspberrypi#38 sys_renameat2 at ffffffff811a408e raspberrypi#39 sys_rename at ffffffff8119e51e raspberrypi#40 system_call_fastpath at ffffffff815afa89 Dave Chinner has properly pointed out that this is a deadlock in the reclaim code because ext4 doesn't submit pages which are marked by PG_writeback right away. The heuristic was introduced by commit e62e384 ("memcg: prevent OOM with too many dirty pages") and it was applied only when may_enter_fs was specified. The code has been changed by c3b94f4 ("memcg: further prevent OOM with too many dirty pages") which has removed the __GFP_FS restriction with a reasoning that we do not get into the fs code. But this is not sufficient apparently because the fs doesn't necessarily submit pages marked PG_writeback for IO right away. ext4_bio_write_page calls io_submit_add_bh but that doesn't necessarily submit the bio. Instead it tries to map more pages into the bio and mpage_map_one_extent might trigger memcg charge which might end up waiting on a page which is marked PG_writeback but hasn't been submitted yet so we would end up waiting for something that never finishes. Fix this issue by replacing __GFP_IO by may_enter_fs check (for case 2) before we go to wait on the writeback. The page fault path, which is the only path that triggers memcg oom killer since 3.12, shouldn't require GFP_NOFS and so we shouldn't reintroduce the premature OOM killer issue which was originally addressed by the heuristic. As per David Chinner the xfs is doing similar thing since 2.6.15 already so ext4 is not the only affected filesystem. Moreover he notes: : For example: IO completion might require unwritten extent conversion : which executes filesystem transactions and GFP_NOFS allocations. The : writeback flag on the pages can not be cleared until unwritten : extent conversion completes. Hence memory reclaim cannot wait on : page writeback to complete in GFP_NOFS context because it is not : safe to do so, memcg reclaim or otherwise. Cc: [email protected] # 3.9+ [[email protected]: corrected the control flow] Fixes: c3b94f4 ("memcg: further prevent OOM with too many dirty pages") Reported-by: Nikolay Borisov <[email protected]> Signed-off-by: Michal Hocko <[email protected]> Signed-off-by: Hugh Dickins <[email protected]> Signed-off-by: Linus Torvalds <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
Nikolay has reported a hang when a memcg reclaim got stuck with the following backtrace: PID: 18308 TASK: ffff883d7c9b0a30 CPU: 1 COMMAND: "rsync" #0 __schedule at ffffffff815ab152 #1 schedule at ffffffff815ab76e #2 schedule_timeout at ffffffff815ae5e5 #3 io_schedule_timeout at ffffffff815aad6a #4 bit_wait_io at ffffffff815abfc6 #5 __wait_on_bit at ffffffff815abda5 #6 wait_on_page_bit at ffffffff8111fd4f #7 shrink_page_list at ffffffff81135445 #8 shrink_inactive_list at ffffffff81135845 #9 shrink_lruvec at ffffffff81135ead #10 shrink_zone at ffffffff811360c3 #11 shrink_zones at ffffffff81136eff #12 do_try_to_free_pages at ffffffff8113712f #13 try_to_free_mem_cgroup_pages at ffffffff811372be #14 try_charge at ffffffff81189423 #15 mem_cgroup_try_charge at ffffffff8118c6f5 #16 __add_to_page_cache_locked at ffffffff8112137d #17 add_to_page_cache_lru at ffffffff81121618 #18 pagecache_get_page at ffffffff8112170b #19 grow_dev_page at ffffffff811c8297 #20 __getblk_slow at ffffffff811c91d6 #21 __getblk_gfp at ffffffff811c92c1 #22 ext4_ext_grow_indepth at ffffffff8124565c #23 ext4_ext_create_new_leaf at ffffffff81246ca8 #24 ext4_ext_insert_extent at ffffffff81246f09 #25 ext4_ext_map_blocks at ffffffff8124a848 #26 ext4_map_blocks at ffffffff8121a5b7 #27 mpage_map_one_extent at ffffffff8121b1fa #28 mpage_map_and_submit_extent at ffffffff8121f07b #29 ext4_writepages at ffffffff8121f6d5 #30 do_writepages at ffffffff8112c490 #31 __filemap_fdatawrite_range at ffffffff81120199 #32 filemap_flush at ffffffff8112041c #33 ext4_alloc_da_blocks at ffffffff81219da1 #34 ext4_rename at ffffffff81229b91 #35 ext4_rename2 at ffffffff81229e32 #36 vfs_rename at ffffffff811a08a5 #37 SYSC_renameat2 at ffffffff811a3ffc #38 sys_renameat2 at ffffffff811a408e #39 sys_rename at ffffffff8119e51e #40 system_call_fastpath at ffffffff815afa89 Dave Chinner has properly pointed out that this is a deadlock in the reclaim code because ext4 doesn't submit pages which are marked by PG_writeback right away. The heuristic was introduced by commit e62e384 ("memcg: prevent OOM with too many dirty pages") and it was applied only when may_enter_fs was specified. The code has been changed by c3b94f4 ("memcg: further prevent OOM with too many dirty pages") which has removed the __GFP_FS restriction with a reasoning that we do not get into the fs code. But this is not sufficient apparently because the fs doesn't necessarily submit pages marked PG_writeback for IO right away. ext4_bio_write_page calls io_submit_add_bh but that doesn't necessarily submit the bio. Instead it tries to map more pages into the bio and mpage_map_one_extent might trigger memcg charge which might end up waiting on a page which is marked PG_writeback but hasn't been submitted yet so we would end up waiting for something that never finishes. Fix this issue by replacing __GFP_IO by may_enter_fs check (for case 2) before we go to wait on the writeback. The page fault path, which is the only path that triggers memcg oom killer since 3.12, shouldn't require GFP_NOFS and so we shouldn't reintroduce the premature OOM killer issue which was originally addressed by the heuristic. As per David Chinner the xfs is doing similar thing since 2.6.15 already so ext4 is not the only affected filesystem. Moreover he notes: : For example: IO completion might require unwritten extent conversion : which executes filesystem transactions and GFP_NOFS allocations. The : writeback flag on the pages can not be cleared until unwritten : extent conversion completes. Hence memory reclaim cannot wait on : page writeback to complete in GFP_NOFS context because it is not : safe to do so, memcg reclaim or otherwise. Cc: [email protected] # 3.9+ [[email protected]: corrected the control flow] Fixes: c3b94f4 ("memcg: further prevent OOM with too many dirty pages") Reported-by: Nikolay Borisov <[email protected]> Signed-off-by: Michal Hocko <[email protected]> Signed-off-by: Hugh Dickins <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
The tracepoint infrastructure uses RCU sched protection to enable and disable tracepoints safely. There are some instances where tracepoints are used in infrastructure code (like kfree()) that get called after a CPU is going offline, and perhaps when it is coming back online but hasn't been registered yet. This can probuce the following warning: [ INFO: suspicious RCU usage. ] 4.4.0-00006-g0fe53e8-dirty #34 Tainted: G S ------------------------------- include/trace/events/kmem.h:141 suspicious rcu_dereference_check() usage! other info that might help us debug this: RCU used illegally from offline CPU! rcu_scheduler_active = 1, debug_locks = 1 no locks held by swapper/8/0. stack backtrace: CPU: 8 PID: 0 Comm: swapper/8 Tainted: G S 4.4.0-00006-g0fe53e8-dirty #34 Call Trace: [c0000005b76c78d0] [c0000000008b9540] .dump_stack+0x98/0xd4 (unreliable) [c0000005b76c7950] [c00000000010c898] .lockdep_rcu_suspicious+0x108/0x170 [c0000005b76c79e0] [c00000000029adc0] .kfree+0x390/0x440 [c0000005b76c7a80] [c000000000055f74] .destroy_context+0x44/0x100 [c0000005b76c7b00] [c0000000000934a0] .__mmdrop+0x60/0x150 [c0000005b76c7b90] [c0000000000e3ff0] .idle_task_exit+0x130/0x140 [c0000005b76c7c20] [c000000000075804] .pseries_mach_cpu_die+0x64/0x310 [c0000005b76c7cd0] [c000000000043e7c] .cpu_die+0x3c/0x60 [c0000005b76c7d40] [c0000000000188d8] .arch_cpu_idle_dead+0x28/0x40 [c0000005b76c7db0] [c000000000101e6c] .cpu_startup_entry+0x50c/0x560 [c0000005b76c7ed0] [c000000000043bd8] .start_secondary+0x328/0x360 [c0000005b76c7f90] [c000000000008a6c] start_secondary_prolog+0x10/0x14 This warning is not a false positive either. RCU is not protecting code that is being executed while the CPU is offline. Instead of playing "whack-a-mole(TM)" and adding conditional statements to the tracepoints we find that are used in this instance, simply add a cpu_online() test to the tracepoint code where the tracepoint will be ignored if the CPU is offline. Use of raw_smp_processor_id() is fine, as there should never be a case where the tracepoint code goes from running on a CPU that is online and suddenly gets migrated to a CPU that is offline. Link: http://lkml.kernel.org/r/[email protected] Reported-by: Denis Kirjanov <[email protected]> Fixes: 97e1c18 ("tracing: Kernel Tracepoints") Cc: [email protected] # v2.6.28+ Signed-off-by: Steven Rostedt <[email protected]>
commit f377554 upstream. The tracepoint infrastructure uses RCU sched protection to enable and disable tracepoints safely. There are some instances where tracepoints are used in infrastructure code (like kfree()) that get called after a CPU is going offline, and perhaps when it is coming back online but hasn't been registered yet. This can probuce the following warning: [ INFO: suspicious RCU usage. ] 4.4.0-00006-g0fe53e8-dirty raspberrypi#34 Tainted: G S ------------------------------- include/trace/events/kmem.h:141 suspicious rcu_dereference_check() usage! other info that might help us debug this: RCU used illegally from offline CPU! rcu_scheduler_active = 1, debug_locks = 1 no locks held by swapper/8/0. stack backtrace: CPU: 8 PID: 0 Comm: swapper/8 Tainted: G S 4.4.0-00006-g0fe53e8-dirty raspberrypi#34 Call Trace: [c0000005b76c78d0] [c0000000008b9540] .dump_stack+0x98/0xd4 (unreliable) [c0000005b76c7950] [c00000000010c898] .lockdep_rcu_suspicious+0x108/0x170 [c0000005b76c79e0] [c00000000029adc0] .kfree+0x390/0x440 [c0000005b76c7a80] [c000000000055f74] .destroy_context+0x44/0x100 [c0000005b76c7b00] [c0000000000934a0] .__mmdrop+0x60/0x150 [c0000005b76c7b90] [c0000000000e3ff0] .idle_task_exit+0x130/0x140 [c0000005b76c7c20] [c000000000075804] .pseries_mach_cpu_die+0x64/0x310 [c0000005b76c7cd0] [c000000000043e7c] .cpu_die+0x3c/0x60 [c0000005b76c7d40] [c0000000000188d8] .arch_cpu_idle_dead+0x28/0x40 [c0000005b76c7db0] [c000000000101e6c] .cpu_startup_entry+0x50c/0x560 [c0000005b76c7ed0] [c000000000043bd8] .start_secondary+0x328/0x360 [c0000005b76c7f90] [c000000000008a6c] start_secondary_prolog+0x10/0x14 This warning is not a false positive either. RCU is not protecting code that is being executed while the CPU is offline. Instead of playing "whack-a-mole(TM)" and adding conditional statements to the tracepoints we find that are used in this instance, simply add a cpu_online() test to the tracepoint code where the tracepoint will be ignored if the CPU is offline. Use of raw_smp_processor_id() is fine, as there should never be a case where the tracepoint code goes from running on a CPU that is online and suddenly gets migrated to a CPU that is offline. Link: http://lkml.kernel.org/r/[email protected] Reported-by: Denis Kirjanov <[email protected]> Fixes: 97e1c18 ("tracing: Kernel Tracepoints") Signed-off-by: Steven Rostedt <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
[ Upstream commit f377554 ] The tracepoint infrastructure uses RCU sched protection to enable and disable tracepoints safely. There are some instances where tracepoints are used in infrastructure code (like kfree()) that get called after a CPU is going offline, and perhaps when it is coming back online but hasn't been registered yet. This can probuce the following warning: [ INFO: suspicious RCU usage. ] 4.4.0-00006-g0fe53e8-dirty #34 Tainted: G S ------------------------------- include/trace/events/kmem.h:141 suspicious rcu_dereference_check() usage! other info that might help us debug this: RCU used illegally from offline CPU! rcu_scheduler_active = 1, debug_locks = 1 no locks held by swapper/8/0. stack backtrace: CPU: 8 PID: 0 Comm: swapper/8 Tainted: G S 4.4.0-00006-g0fe53e8-dirty #34 Call Trace: [c0000005b76c78d0] [c0000000008b9540] .dump_stack+0x98/0xd4 (unreliable) [c0000005b76c7950] [c00000000010c898] .lockdep_rcu_suspicious+0x108/0x170 [c0000005b76c79e0] [c00000000029adc0] .kfree+0x390/0x440 [c0000005b76c7a80] [c000000000055f74] .destroy_context+0x44/0x100 [c0000005b76c7b00] [c0000000000934a0] .__mmdrop+0x60/0x150 [c0000005b76c7b90] [c0000000000e3ff0] .idle_task_exit+0x130/0x140 [c0000005b76c7c20] [c000000000075804] .pseries_mach_cpu_die+0x64/0x310 [c0000005b76c7cd0] [c000000000043e7c] .cpu_die+0x3c/0x60 [c0000005b76c7d40] [c0000000000188d8] .arch_cpu_idle_dead+0x28/0x40 [c0000005b76c7db0] [c000000000101e6c] .cpu_startup_entry+0x50c/0x560 [c0000005b76c7ed0] [c000000000043bd8] .start_secondary+0x328/0x360 [c0000005b76c7f90] [c000000000008a6c] start_secondary_prolog+0x10/0x14 This warning is not a false positive either. RCU is not protecting code that is being executed while the CPU is offline. Instead of playing "whack-a-mole(TM)" and adding conditional statements to the tracepoints we find that are used in this instance, simply add a cpu_online() test to the tracepoint code where the tracepoint will be ignored if the CPU is offline. Use of raw_smp_processor_id() is fine, as there should never be a case where the tracepoint code goes from running on a CPU that is online and suddenly gets migrated to a CPU that is offline. Link: http://lkml.kernel.org/r/[email protected] Reported-by: Denis Kirjanov <[email protected]> Fixes: 97e1c18 ("tracing: Kernel Tracepoints") Cc: [email protected] # v2.6.28+ Signed-off-by: Steven Rostedt <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
Correct "Above is a issue of unzip utility for the linux" try the above way OR yiu ca try.. Try to install p7zip and extract |
We must handle data access exception as well as memory address unaligned exceptions from return from trap window fill faults, not just normal TLB misses. Otherwise we can get an OOPS that looks like this: ld-linux.so.2(36808): Kernel bad sw trap 5 [#1] CPU: 1 PID: 36808 Comm: ld-linux.so.2 Not tainted 4.6.0 #34 task: fff8000303be5c60 ti: fff8000301344000 task.ti: fff8000301344000 TSTATE: 0000004410001601 TPC: 0000000000a1a784 TNPC: 0000000000a1a788 Y: 00000002 Not tainted TPC: <do_sparc64_fault+0x5c4/0x700> g0: fff8000024fc8248 g1: 0000000000db04dc g2: 0000000000000000 g3: 0000000000000001 g4: fff8000303be5c60 g5: fff800030e672000 g6: fff8000301344000 g7: 0000000000000001 o0: 0000000000b95ee8 o1: 000000000000012b o2: 0000000000000000 o3: 0000000200b9b358 o4: 0000000000000000 o5: fff8000301344040 sp: fff80003013475c1 ret_pc: 0000000000a1a77c RPC: <do_sparc64_fault+0x5bc/0x700> l0: 00000000000007ff l1: 0000000000000000 l2: 000000000000005f l3: 0000000000000000 l4: fff8000301347e98 l5: fff8000024ff3060 l6: 0000000000000000 l7: 0000000000000000 i0: fff8000301347f60 i1: 0000000000102400 i2: 0000000000000000 i3: 0000000000000000 i4: 0000000000000000 i5: 0000000000000000 i6: fff80003013476a1 i7: 0000000000404d4c I7: <user_rtt_fill_fixup+0x6c/0x7c> Call Trace: [0000000000404d4c] user_rtt_fill_fixup+0x6c/0x7c The window trap handlers are slightly clever, the trap table entries for them are composed of two pieces of code. First comes the code that actually performs the window fill or spill trap handling, and then there are three instructions at the end which are for exception processing. The userland register window fill handler is: add %sp, STACK_BIAS + 0x00, %g1; \ ldxa [%g1 + %g0] ASI, %l0; \ mov 0x08, %g2; \ mov 0x10, %g3; \ ldxa [%g1 + %g2] ASI, %l1; \ mov 0x18, %g5; \ ldxa [%g1 + %g3] ASI, %l2; \ ldxa [%g1 + %g5] ASI, %l3; \ add %g1, 0x20, %g1; \ ldxa [%g1 + %g0] ASI, %l4; \ ldxa [%g1 + %g2] ASI, %l5; \ ldxa [%g1 + %g3] ASI, %l6; \ ldxa [%g1 + %g5] ASI, %l7; \ add %g1, 0x20, %g1; \ ldxa [%g1 + %g0] ASI, %i0; \ ldxa [%g1 + %g2] ASI, %i1; \ ldxa [%g1 + %g3] ASI, %i2; \ ldxa [%g1 + %g5] ASI, %i3; \ add %g1, 0x20, %g1; \ ldxa [%g1 + %g0] ASI, %i4; \ ldxa [%g1 + %g2] ASI, %i5; \ ldxa [%g1 + %g3] ASI, %i6; \ ldxa [%g1 + %g5] ASI, %i7; \ restored; \ retry; nop; nop; nop; nop; \ b,a,pt %xcc, fill_fixup_dax; \ b,a,pt %xcc, fill_fixup_mna; \ b,a,pt %xcc, fill_fixup; And the way this works is that if any of those memory accesses generate an exception, the exception handler can revector to one of those final three branch instructions depending upon which kind of exception the memory access took. In this way, the fault handler doesn't have to know if it was a spill or a fill that it's handling the fault for. It just always branches to the last instruction in the parent trap's handler. For example, for a regular fault, the code goes: winfix_trampoline: rdpr %tpc, %g3 or %g3, 0x7c, %g3 wrpr %g3, %tnpc done All window trap handlers are 0x80 aligned, so if we "or" 0x7c into the trap time program counter, we'll get that final instruction in the trap handler. On return from trap, we have to pull the register window in but we do this by hand instead of just executing a "restore" instruction for several reasons. The largest being that from Niagara and onward we simply don't have enough levels in the trap stack to fully resolve all possible exception cases of a window fault when we are already at trap level 1 (which we enter to get ready to return from the original trap). This is executed inline via the FILL_*_RTRAP handlers. rtrap_64.S's code branches directly to these to do the window fill by hand if necessary. Now if you look at them, we'll see at the end: ba,a,pt %xcc, user_rtt_fill_fixup; ba,a,pt %xcc, user_rtt_fill_fixup; ba,a,pt %xcc, user_rtt_fill_fixup; And oops, all three cases are handled like a fault. This doesn't work because each of these trap types (data access exception, memory address unaligned, and faults) store their auxiliary info in different registers to pass on to the C handler which does the real work. So in the case where the stack was unaligned, the unaligned trap handler sets up the arg registers one way, and then we branched to the fault handler which expects them setup another way. So the FAULT_TYPE_* value ends up basically being garbage, and randomly would generate the backtrace seen above. Reported-by: Nick Alcock <[email protected]> Signed-off-by: David S. Miller <[email protected]>
[ Upstream commit 7cafc0b ] We must handle data access exception as well as memory address unaligned exceptions from return from trap window fill faults, not just normal TLB misses. Otherwise we can get an OOPS that looks like this: ld-linux.so.2(36808): Kernel bad sw trap 5 [#1] CPU: 1 PID: 36808 Comm: ld-linux.so.2 Not tainted 4.6.0 #34 task: fff8000303be5c60 ti: fff8000301344000 task.ti: fff8000301344000 TSTATE: 0000004410001601 TPC: 0000000000a1a784 TNPC: 0000000000a1a788 Y: 00000002 Not tainted TPC: <do_sparc64_fault+0x5c4/0x700> g0: fff8000024fc8248 g1: 0000000000db04dc g2: 0000000000000000 g3: 0000000000000001 g4: fff8000303be5c60 g5: fff800030e672000 g6: fff8000301344000 g7: 0000000000000001 o0: 0000000000b95ee8 o1: 000000000000012b o2: 0000000000000000 o3: 0000000200b9b358 o4: 0000000000000000 o5: fff8000301344040 sp: fff80003013475c1 ret_pc: 0000000000a1a77c RPC: <do_sparc64_fault+0x5bc/0x700> l0: 00000000000007ff l1: 0000000000000000 l2: 000000000000005f l3: 0000000000000000 l4: fff8000301347e98 l5: fff8000024ff3060 l6: 0000000000000000 l7: 0000000000000000 i0: fff8000301347f60 i1: 0000000000102400 i2: 0000000000000000 i3: 0000000000000000 i4: 0000000000000000 i5: 0000000000000000 i6: fff80003013476a1 i7: 0000000000404d4c I7: <user_rtt_fill_fixup+0x6c/0x7c> Call Trace: [0000000000404d4c] user_rtt_fill_fixup+0x6c/0x7c The window trap handlers are slightly clever, the trap table entries for them are composed of two pieces of code. First comes the code that actually performs the window fill or spill trap handling, and then there are three instructions at the end which are for exception processing. The userland register window fill handler is: add %sp, STACK_BIAS + 0x00, %g1; \ ldxa [%g1 + %g0] ASI, %l0; \ mov 0x08, %g2; \ mov 0x10, %g3; \ ldxa [%g1 + %g2] ASI, %l1; \ mov 0x18, %g5; \ ldxa [%g1 + %g3] ASI, %l2; \ ldxa [%g1 + %g5] ASI, %l3; \ add %g1, 0x20, %g1; \ ldxa [%g1 + %g0] ASI, %l4; \ ldxa [%g1 + %g2] ASI, %l5; \ ldxa [%g1 + %g3] ASI, %l6; \ ldxa [%g1 + %g5] ASI, %l7; \ add %g1, 0x20, %g1; \ ldxa [%g1 + %g0] ASI, %i0; \ ldxa [%g1 + %g2] ASI, %i1; \ ldxa [%g1 + %g3] ASI, %i2; \ ldxa [%g1 + %g5] ASI, %i3; \ add %g1, 0x20, %g1; \ ldxa [%g1 + %g0] ASI, %i4; \ ldxa [%g1 + %g2] ASI, %i5; \ ldxa [%g1 + %g3] ASI, %i6; \ ldxa [%g1 + %g5] ASI, %i7; \ restored; \ retry; nop; nop; nop; nop; \ b,a,pt %xcc, fill_fixup_dax; \ b,a,pt %xcc, fill_fixup_mna; \ b,a,pt %xcc, fill_fixup; And the way this works is that if any of those memory accesses generate an exception, the exception handler can revector to one of those final three branch instructions depending upon which kind of exception the memory access took. In this way, the fault handler doesn't have to know if it was a spill or a fill that it's handling the fault for. It just always branches to the last instruction in the parent trap's handler. For example, for a regular fault, the code goes: winfix_trampoline: rdpr %tpc, %g3 or %g3, 0x7c, %g3 wrpr %g3, %tnpc done All window trap handlers are 0x80 aligned, so if we "or" 0x7c into the trap time program counter, we'll get that final instruction in the trap handler. On return from trap, we have to pull the register window in but we do this by hand instead of just executing a "restore" instruction for several reasons. The largest being that from Niagara and onward we simply don't have enough levels in the trap stack to fully resolve all possible exception cases of a window fault when we are already at trap level 1 (which we enter to get ready to return from the original trap). This is executed inline via the FILL_*_RTRAP handlers. rtrap_64.S's code branches directly to these to do the window fill by hand if necessary. Now if you look at them, we'll see at the end: ba,a,pt %xcc, user_rtt_fill_fixup; ba,a,pt %xcc, user_rtt_fill_fixup; ba,a,pt %xcc, user_rtt_fill_fixup; And oops, all three cases are handled like a fault. This doesn't work because each of these trap types (data access exception, memory address unaligned, and faults) store their auxiliary info in different registers to pass on to the C handler which does the real work. So in the case where the stack was unaligned, the unaligned trap handler sets up the arg registers one way, and then we branched to the fault handler which expects them setup another way. So the FAULT_TYPE_* value ends up basically being garbage, and randomly would generate the backtrace seen above. Reported-by: Nick Alcock <[email protected]> Signed-off-by: David S. Miller <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
[ Upstream commit 7cafc0b ] We must handle data access exception as well as memory address unaligned exceptions from return from trap window fill faults, not just normal TLB misses. Otherwise we can get an OOPS that looks like this: ld-linux.so.2(36808): Kernel bad sw trap 5 [#1] CPU: 1 PID: 36808 Comm: ld-linux.so.2 Not tainted 4.6.0 #34 task: fff8000303be5c60 ti: fff8000301344000 task.ti: fff8000301344000 TSTATE: 0000004410001601 TPC: 0000000000a1a784 TNPC: 0000000000a1a788 Y: 00000002 Not tainted TPC: <do_sparc64_fault+0x5c4/0x700> g0: fff8000024fc8248 g1: 0000000000db04dc g2: 0000000000000000 g3: 0000000000000001 g4: fff8000303be5c60 g5: fff800030e672000 g6: fff8000301344000 g7: 0000000000000001 o0: 0000000000b95ee8 o1: 000000000000012b o2: 0000000000000000 o3: 0000000200b9b358 o4: 0000000000000000 o5: fff8000301344040 sp: fff80003013475c1 ret_pc: 0000000000a1a77c RPC: <do_sparc64_fault+0x5bc/0x700> l0: 00000000000007ff l1: 0000000000000000 l2: 000000000000005f l3: 0000000000000000 l4: fff8000301347e98 l5: fff8000024ff3060 l6: 0000000000000000 l7: 0000000000000000 i0: fff8000301347f60 i1: 0000000000102400 i2: 0000000000000000 i3: 0000000000000000 i4: 0000000000000000 i5: 0000000000000000 i6: fff80003013476a1 i7: 0000000000404d4c I7: <user_rtt_fill_fixup+0x6c/0x7c> Call Trace: [0000000000404d4c] user_rtt_fill_fixup+0x6c/0x7c The window trap handlers are slightly clever, the trap table entries for them are composed of two pieces of code. First comes the code that actually performs the window fill or spill trap handling, and then there are three instructions at the end which are for exception processing. The userland register window fill handler is: add %sp, STACK_BIAS + 0x00, %g1; \ ldxa [%g1 + %g0] ASI, %l0; \ mov 0x08, %g2; \ mov 0x10, %g3; \ ldxa [%g1 + %g2] ASI, %l1; \ mov 0x18, %g5; \ ldxa [%g1 + %g3] ASI, %l2; \ ldxa [%g1 + %g5] ASI, %l3; \ add %g1, 0x20, %g1; \ ldxa [%g1 + %g0] ASI, %l4; \ ldxa [%g1 + %g2] ASI, %l5; \ ldxa [%g1 + %g3] ASI, %l6; \ ldxa [%g1 + %g5] ASI, %l7; \ add %g1, 0x20, %g1; \ ldxa [%g1 + %g0] ASI, %i0; \ ldxa [%g1 + %g2] ASI, %i1; \ ldxa [%g1 + %g3] ASI, %i2; \ ldxa [%g1 + %g5] ASI, %i3; \ add %g1, 0x20, %g1; \ ldxa [%g1 + %g0] ASI, %i4; \ ldxa [%g1 + %g2] ASI, %i5; \ ldxa [%g1 + %g3] ASI, %i6; \ ldxa [%g1 + %g5] ASI, %i7; \ restored; \ retry; nop; nop; nop; nop; \ b,a,pt %xcc, fill_fixup_dax; \ b,a,pt %xcc, fill_fixup_mna; \ b,a,pt %xcc, fill_fixup; And the way this works is that if any of those memory accesses generate an exception, the exception handler can revector to one of those final three branch instructions depending upon which kind of exception the memory access took. In this way, the fault handler doesn't have to know if it was a spill or a fill that it's handling the fault for. It just always branches to the last instruction in the parent trap's handler. For example, for a regular fault, the code goes: winfix_trampoline: rdpr %tpc, %g3 or %g3, 0x7c, %g3 wrpr %g3, %tnpc done All window trap handlers are 0x80 aligned, so if we "or" 0x7c into the trap time program counter, we'll get that final instruction in the trap handler. On return from trap, we have to pull the register window in but we do this by hand instead of just executing a "restore" instruction for several reasons. The largest being that from Niagara and onward we simply don't have enough levels in the trap stack to fully resolve all possible exception cases of a window fault when we are already at trap level 1 (which we enter to get ready to return from the original trap). This is executed inline via the FILL_*_RTRAP handlers. rtrap_64.S's code branches directly to these to do the window fill by hand if necessary. Now if you look at them, we'll see at the end: ba,a,pt %xcc, user_rtt_fill_fixup; ba,a,pt %xcc, user_rtt_fill_fixup; ba,a,pt %xcc, user_rtt_fill_fixup; And oops, all three cases are handled like a fault. This doesn't work because each of these trap types (data access exception, memory address unaligned, and faults) store their auxiliary info in different registers to pass on to the C handler which does the real work. So in the case where the stack was unaligned, the unaligned trap handler sets up the arg registers one way, and then we branched to the fault handler which expects them setup another way. So the FAULT_TYPE_* value ends up basically being garbage, and randomly would generate the backtrace seen above. Reported-by: Nick Alcock <[email protected]> Signed-off-by: David S. Miller <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
Userspace can begin and suspend a transaction within the signal handler which means they might enter sys_rt_sigreturn() with the processor in suspended state. sys_rt_sigreturn() wants to restore process context (which may have been in a transaction before signal delivery). To do this it must restore TM SPRS. To achieve this, any transaction initiated within the signal frame must be discarded in order to be able to restore TM SPRs as TM SPRs can only be manipulated non-transactionally.. >From the PowerPC ISA: TM Bad Thing Exception [Category: Transactional Memory] An attempt is made to execute a mtspr targeting a TM register in other than Non-transactional state. Not doing so results in a TM Bad Thing: [12045.221359] Kernel BUG at c000000000050a40 [verbose debug info unavailable] [12045.221470] Unexpected TM Bad Thing exception at c000000000050a40 (msr 0x201033) [12045.221540] Oops: Unrecoverable exception, sig: 6 [#1] [12045.221586] SMP NR_CPUS=2048 NUMA PowerNV [12045.221634] Modules linked in: xt_CHECKSUM iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ipt_REJECT nf_reject_ipv4 xt_tcpudp bridge stp llc ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter ip_tables x_tables kvm_hv kvm uio_pdrv_genirq ipmi_powernv uio powernv_rng ipmi_msghandler autofs4 ses enclosure scsi_transport_sas bnx2x ipr mdio libcrc32c [12045.222167] CPU: 68 PID: 6178 Comm: sigreturnpanic Not tainted 4.7.0 #34 [12045.222224] task: c0000000fce38600 ti: c0000000fceb4000 task.ti: c0000000fceb4000 [12045.222293] NIP: c000000000050a40 LR: c0000000000163bc CTR: 0000000000000000 [12045.222361] REGS: c0000000fceb7ac0 TRAP: 0700 Not tainted (4.7.0) [12045.222418] MSR: 9000000300201033 <SF,HV,ME,IR,DR,RI,LE,TM[SE]> CR: 28444280 XER: 20000000 [12045.222625] CFAR: c0000000000163b8 SOFTE: 0 PACATMSCRATCH: 900000014280f033 GPR00: 01100000b8000001 c0000000fceb7d40 c00000000139c100 c0000000fce390d0 GPR04: 900000034280f033 0000000000000000 0000000000000000 0000000000000000 GPR08: 0000000000000000 b000000000001033 0000000000000001 0000000000000000 GPR12: 0000000000000000 c000000002926400 0000000000000000 0000000000000000 GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 GPR20: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 GPR24: 0000000000000000 00003ffff98cadd0 00003ffff98cb470 0000000000000000 GPR28: 900000034280f033 c0000000fceb7ea0 0000000000000001 c0000000fce390d0 [12045.223535] NIP [c000000000050a40] tm_restore_sprs+0xc/0x1c [12045.223584] LR [c0000000000163bc] tm_recheckpoint+0x5c/0xa0 [12045.223630] Call Trace: [12045.223655] [c0000000fceb7d80] [c000000000026e74] sys_rt_sigreturn+0x494/0x6c0 [12045.223738] [c0000000fceb7e30] [c0000000000092e0] system_call+0x38/0x108 [12045.223806] Instruction dump: [12045.223841] 7c800164 4e800020 7c0022a6 f80304a8 7c0222a6 f80304b0 7c0122a6 f80304b8 [12045.223955] 4e800020 e80304a8 7c0023a6 e80304b0 <7c0223a6> e80304b8 7c0123a6 4e800020 [12045.224074] ---[ end trace cb8002ee240bae76 ]--- It isn't clear exactly if there is really a use case for userspace returning with a suspended transaction, however, doing so doesn't (on its own) constitute a bad frame. As such, this patch simply discards the transactional state of the context calling the sigreturn and continues. Reported-by: Laurent Dufour <[email protected]> Signed-off-by: Cyril Bur <[email protected]> Tested-by: Laurent Dufour <[email protected]> Reviewed-by: Laurent Dufour <[email protected]> Acked-by: Simon Guo <[email protected]> Signed-off-by: Benjamin Herrenschmidt <[email protected]>
If we hit the error path, we have never called drm_encoder_init() and so have nothing to cleanup. Doing so hits a null dereference: [ 10.066261] BUG: unable to handle kernel NULL pointer dereference at 00000104 [ 10.066273] IP: [<c16054b4>] mutex_lock+0xa/0x15 [ 10.066287] *pde = 00000000 [ 10.066295] Oops: 0002 [#1] [ 10.066302] Modules linked in: i915(+) video i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm iTCO_wdt iTCO_vendor_support ppdev evdev snd_intel8x0 snd_ac97_codec ac97_bus psmouse snd_pcm snd_timer snd pcspkr uhci_hcd ehci_pci soundcore sr_mod ehci_hcd serio_raw i2c_i801 usbcore i2c_smbus cdrom lpc_ich mfd_core rng_core e100 mii floppy parport_pc parport acpi_cpufreq button processor usb_common eeprom lm85 hwmon_vid autofs4 [ 10.066378] CPU: 0 PID: 132 Comm: systemd-udevd Not tainted 4.8.0-rc3-00013-gef0e1ea #34 [ 10.066389] Hardware name: MicroLink /D865GLC , BIOS BF86510A.86A.0077.P25.0508040031 08/04/2005 [ 10.066401] task: f62db800 task.stack: f5970000 [ 10.066409] EIP: 0060:[<c16054b4>] EFLAGS: 00010286 CPU: 0 [ 10.066417] EIP is at mutex_lock+0xa/0x15 [ 10.066424] EAX: 00000104 EBX: 00000104 ECX: 00000000 EDX: 80000000 [ 10.066432] ESI: 00000000 EDI: 00000104 EBP: f5be8000 ESP: f5971b58 [ 10.066439] DS: 007b ES: 007b FS: 0000 GS: 00e0 SS: 0068 [ 10.066446] CR0: 80050033 CR2: 00000104 CR3: 35945000 CR4: 000006d0 [ 10.066453] Stack: [ 10.066459] f503d740 f824dddf 00000000 f61170c0 f61170c0 f82371ae f850f40e 00000001 [ 10.066476] f61170c0 f5971bcc f5be8000 f9c2d401 00000001 f8236fcc 00000001 00000000 [ 10.066491] f5144014 f5be8104 00000008 f9c5267c 00000007 f61170c0 f5144400 f9c4ff00 [ 10.066507] Call Trace: [ 10.066526] [<f824dddf>] ? drm_modeset_lock_all+0x27/0xb3 [drm] [ 10.066545] [<f82371ae>] ? drm_encoder_cleanup+0x1a/0x132 [drm] [ 10.066559] [<f850f40e>] ? drm_atomic_helper_connector_reset+0x3f/0x5c [drm_kms_helper] [ 10.066644] [<f9c2d401>] ? intel_dvo_init+0x569/0x788 [i915] [ 10.066663] [<f8236fcc>] ? drm_encoder_init+0x43/0x20b [drm] [ 10.066734] [<f9bf1fce>] ? intel_modeset_init+0x1436/0x17dd [i915] [ 10.066791] [<f9b37636>] ? i915_driver_load+0x85a/0x15d3 [i915] [ 10.066846] [<f9b3603d>] ? i915_driver_open+0x5/0x5 [i915] [ 10.066857] [<c14af4d0>] ? firmware_map_add_entry.part.2+0xc/0xc [ 10.066868] [<c1343daf>] ? pci_device_probe+0x8e/0x11c [ 10.066878] [<c140cec8>] ? driver_probe_device+0x1db/0x62e [ 10.066888] [<c120c010>] ? kernfs_new_node+0x29/0x9c [ 10.066897] [<c13438e0>] ? pci_match_device+0xd9/0x161 [ 10.066905] [<c120c48b>] ? kernfs_create_dir_ns+0x42/0x88 [ 10.066914] [<c140d401>] ? __driver_attach+0xe6/0x11b [ 10.066924] [<c1303b13>] ? kobject_add_internal+0x1bb/0x44f [ 10.066933] [<c140d31b>] ? driver_probe_device+0x62e/0x62e [ 10.066941] [<c140a2d2>] ? bus_for_each_dev+0x46/0x7f [ 10.066950] [<c140c502>] ? driver_attach+0x1a/0x34 [ 10.066958] [<c140d31b>] ? driver_probe_device+0x62e/0x62e [ 10.066966] [<c140b758>] ? bus_add_driver+0x217/0x32a [ 10.066975] [<f8403000>] ? 0xf8403000 [ 10.066982] [<c140de27>] ? driver_register+0x5f/0x108 [ 10.066991] [<c1000493>] ? do_one_initcall+0x49/0x1f6 [ 10.067000] [<c1082299>] ? pick_next_task_fair+0x14b/0x2a3 [ 10.067008] [<c1603c8d>] ? __schedule+0x15c/0x4fe [ 10.067016] [<c1604104>] ? preempt_schedule_common+0x19/0x3c [ 10.067027] [<c11051de>] ? do_init_module+0x17/0x230 [ 10.067035] [<c1604139>] ? _cond_resched+0x12/0x1a [ 10.067044] [<c116f9aa>] ? kmem_cache_alloc+0x8f/0x11f [ 10.067052] [<c11051de>] ? do_init_module+0x17/0x230 [ 10.067060] [<c11703dd>] ? kfree+0x137/0x203 [ 10.067068] [<c110523d>] ? do_init_module+0x76/0x230 [ 10.067078] [<c10cadf3>] ? load_module+0x2a39/0x333f [ 10.067087] [<c10cb8b2>] ? SyS_finit_module+0x96/0xd5 [ 10.067096] [<c1132231>] ? vm_mmap_pgoff+0x79/0xa0 [ 10.067105] [<c1001e96>] ? do_fast_syscall_32+0xb5/0x1b0 [ 10.067114] [<c16086a6>] ? sysenter_past_esp+0x47/0x75 [ 10.067121] Code: c8 f7 76 c1 e8 8e cc d2 ff e9 45 fe ff ff 66 90 66 90 66 90 66 90 90 ff 00 7f 05 e8 4e 0c 00 00 c3 53 89 c3 e8 75 ec ff ff 89 d8 <ff> 08 79 05 e8 fa 0a 00 00 5b c3 53 89 c3 85 c0 74 1b 8b 03 83 [ 10.067180] EIP: [<c16054b4>] mutex_lock+0xa/0x15 SS:ESP 0068:f5971b58 [ 10.067190] CR2: 0000000000000104 [ 10.067222] ---[ end trace 049f1f09da45a856 ]--- Reported-by: Meelis Roos <[email protected]> Fixes: 580d8ed ("drm/i915: Give encoders useful names") Reviewed-by: David Weinehall <[email protected]> Signed-off-by: Chris Wilson <[email protected]> Cc: Ville Syrjälä <[email protected]> Cc: [email protected] Signed-off-by: Jani Nikula <[email protected]> Link: http://patchwork.freedesktop.org/patch/msgid/[email protected] (cherry picked from commit 8f76aa0)
When checking a new device's descriptors, the USB core does not check for duplicate endpoint addresses. This can cause a problem when the sysfs files for those endpoints are created; trying to create multiple files with the same name will provoke a WARNING: WARNING: CPU: 2 PID: 865 at fs/sysfs/dir.c:31 sysfs_warn_dup+0x8a/0xa0 sysfs: cannot create duplicate filename '/devices/platform/dummy_hcd.0/usb2/2-1/2-1:64.0/ep_05' Kernel panic - not syncing: panic_on_warn set ... CPU: 2 PID: 865 Comm: kworker/2:1 Not tainted 4.9.0-rc7+ #34 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011 Workqueue: usb_hub_wq hub_event ffff88006bee64c8 ffffffff81f96b8a ffffffff00000001 1ffff1000d7dcc2c ffffed000d7dcc24 0000000000000001 0000000041b58ab3 ffffffff8598b510 ffffffff81f968f8 ffffffff850fee20 ffffffff85cff020 dffffc0000000000 Call Trace: [< inline >] __dump_stack lib/dump_stack.c:15 [<ffffffff81f96b8a>] dump_stack+0x292/0x398 lib/dump_stack.c:51 [<ffffffff8168c88e>] panic+0x1cb/0x3a9 kernel/panic.c:179 [<ffffffff812b80b4>] __warn+0x1c4/0x1e0 kernel/panic.c:542 [<ffffffff812b8195>] warn_slowpath_fmt+0xc5/0x110 kernel/panic.c:565 [<ffffffff819e70ca>] sysfs_warn_dup+0x8a/0xa0 fs/sysfs/dir.c:30 [<ffffffff819e7308>] sysfs_create_dir_ns+0x178/0x1d0 fs/sysfs/dir.c:59 [< inline >] create_dir lib/kobject.c:71 [<ffffffff81fa1b07>] kobject_add_internal+0x227/0xa60 lib/kobject.c:229 [< inline >] kobject_add_varg lib/kobject.c:366 [<ffffffff81fa2479>] kobject_add+0x139/0x220 lib/kobject.c:411 [<ffffffff82737a63>] device_add+0x353/0x1660 drivers/base/core.c:1088 [<ffffffff82738d8d>] device_register+0x1d/0x20 drivers/base/core.c:1206 [<ffffffff82cb77d3>] usb_create_ep_devs+0x163/0x260 drivers/usb/core/endpoint.c:195 [<ffffffff82c9f27b>] create_intf_ep_devs+0x13b/0x200 drivers/usb/core/message.c:1030 [<ffffffff82ca39d3>] usb_set_configuration+0x1083/0x18d0 drivers/usb/core/message.c:1937 [<ffffffff82cc9e2e>] generic_probe+0x6e/0xe0 drivers/usb/core/generic.c:172 [<ffffffff82caa7fa>] usb_probe_device+0xaa/0xe0 drivers/usb/core/driver.c:263 This patch prevents the problem by checking for duplicate endpoint addresses during enumeration and skipping any duplicates. Signed-off-by: Alan Stern <[email protected]> Reported-by: Andrey Konovalov <[email protected]> Tested-by: Andrey Konovalov <[email protected]> CC: <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
commit 0a8fd13 upstream. When checking a new device's descriptors, the USB core does not check for duplicate endpoint addresses. This can cause a problem when the sysfs files for those endpoints are created; trying to create multiple files with the same name will provoke a WARNING: WARNING: CPU: 2 PID: 865 at fs/sysfs/dir.c:31 sysfs_warn_dup+0x8a/0xa0 sysfs: cannot create duplicate filename '/devices/platform/dummy_hcd.0/usb2/2-1/2-1:64.0/ep_05' Kernel panic - not syncing: panic_on_warn set ... CPU: 2 PID: 865 Comm: kworker/2:1 Not tainted 4.9.0-rc7+ #34 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011 Workqueue: usb_hub_wq hub_event ffff88006bee64c8 ffffffff81f96b8a ffffffff00000001 1ffff1000d7dcc2c ffffed000d7dcc24 0000000000000001 0000000041b58ab3 ffffffff8598b510 ffffffff81f968f8 ffffffff850fee20 ffffffff85cff020 dffffc0000000000 Call Trace: [< inline >] __dump_stack lib/dump_stack.c:15 [<ffffffff81f96b8a>] dump_stack+0x292/0x398 lib/dump_stack.c:51 [<ffffffff8168c88e>] panic+0x1cb/0x3a9 kernel/panic.c:179 [<ffffffff812b80b4>] __warn+0x1c4/0x1e0 kernel/panic.c:542 [<ffffffff812b8195>] warn_slowpath_fmt+0xc5/0x110 kernel/panic.c:565 [<ffffffff819e70ca>] sysfs_warn_dup+0x8a/0xa0 fs/sysfs/dir.c:30 [<ffffffff819e7308>] sysfs_create_dir_ns+0x178/0x1d0 fs/sysfs/dir.c:59 [< inline >] create_dir lib/kobject.c:71 [<ffffffff81fa1b07>] kobject_add_internal+0x227/0xa60 lib/kobject.c:229 [< inline >] kobject_add_varg lib/kobject.c:366 [<ffffffff81fa2479>] kobject_add+0x139/0x220 lib/kobject.c:411 [<ffffffff82737a63>] device_add+0x353/0x1660 drivers/base/core.c:1088 [<ffffffff82738d8d>] device_register+0x1d/0x20 drivers/base/core.c:1206 [<ffffffff82cb77d3>] usb_create_ep_devs+0x163/0x260 drivers/usb/core/endpoint.c:195 [<ffffffff82c9f27b>] create_intf_ep_devs+0x13b/0x200 drivers/usb/core/message.c:1030 [<ffffffff82ca39d3>] usb_set_configuration+0x1083/0x18d0 drivers/usb/core/message.c:1937 [<ffffffff82cc9e2e>] generic_probe+0x6e/0xe0 drivers/usb/core/generic.c:172 [<ffffffff82caa7fa>] usb_probe_device+0xaa/0xe0 drivers/usb/core/driver.c:263 This patch prevents the problem by checking for duplicate endpoint addresses during enumeration and skipping any duplicates. Signed-off-by: Alan Stern <[email protected]> Reported-by: Andrey Konovalov <[email protected]> Tested-by: Andrey Konovalov <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
[ Upstream commit 5f50153 ] The function obtain the next buffer without boundary check. We should return with I/O error code. The bug is found by fuzzing and the crash report is attached. It is an OOB bug although reported as use-after-free. [ 4.804724] BUG: KASAN: use-after-free in aq_ring_rx_clean+0x1e88/0x2730 [atlantic] [ 4.805661] Read of size 4 at addr ffff888034fe93a8 by task ksoftirqd/0/9 [ 4.806505] [ 4.806703] CPU: 0 PID: 9 Comm: ksoftirqd/0 Tainted: G W 5.6.0 #34 [ 4.809030] Call Trace: [ 4.809343] dump_stack+0x76/0xa0 [ 4.809755] print_address_description.constprop.0+0x16/0x200 [ 4.810455] ? aq_ring_rx_clean+0x1e88/0x2730 [atlantic] [ 4.811234] ? aq_ring_rx_clean+0x1e88/0x2730 [atlantic] [ 4.813183] __kasan_report.cold+0x37/0x7c [ 4.813715] ? aq_ring_rx_clean+0x1e88/0x2730 [atlantic] [ 4.814393] kasan_report+0xe/0x20 [ 4.814837] aq_ring_rx_clean+0x1e88/0x2730 [atlantic] [ 4.815499] ? hw_atl_b0_hw_ring_rx_receive+0x9a5/0xb90 [atlantic] [ 4.816290] aq_vec_poll+0x179/0x5d0 [atlantic] [ 4.816870] ? _GLOBAL__sub_I_65535_1_aq_pci_func_init+0x20/0x20 [atlantic] [ 4.817746] ? __next_timer_interrupt+0xba/0xf0 [ 4.818322] net_rx_action+0x363/0xbd0 [ 4.818803] ? call_timer_fn+0x240/0x240 [ 4.819302] ? __switch_to_asm+0x40/0x70 [ 4.819809] ? napi_busy_loop+0x520/0x520 [ 4.820324] __do_softirq+0x18c/0x634 [ 4.820797] ? takeover_tasklets+0x5f0/0x5f0 [ 4.821343] run_ksoftirqd+0x15/0x20 [ 4.821804] smpboot_thread_fn+0x2f1/0x6b0 [ 4.822331] ? smpboot_unregister_percpu_thread+0x160/0x160 [ 4.823041] ? __kthread_parkme+0x80/0x100 [ 4.823571] ? smpboot_unregister_percpu_thread+0x160/0x160 [ 4.824301] kthread+0x2b5/0x3b0 [ 4.824723] ? kthread_create_on_node+0xd0/0xd0 [ 4.825304] ret_from_fork+0x35/0x40 Signed-off-by: Zekun Shen <[email protected]> Signed-off-by: David S. Miller <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
commit 07edfec upstream. At CPU-hotplug time, unbind_worker() may preempt a worker while it is waking up. In that case the following scenario can happen: unbind_workers() wq_worker_running() -------------- ------------------- if (!(worker->flags & WORKER_NOT_RUNNING)) //PREEMPTED by unbind_workers worker->flags |= WORKER_UNBOUND; [...] atomic_set(&pool->nr_running, 0); //resume to worker atomic_inc(&worker->pool->nr_running); After unbind_worker() resets pool->nr_running, the value is expected to remain 0 until the pool ever gets rebound in case cpu_up() is called on the target CPU in the future. But here the race leaves pool->nr_running with a value of 1, triggering the following warning when the worker goes idle: WARNING: CPU: 3 PID: 34 at kernel/workqueue.c:1823 worker_enter_idle+0x95/0xc0 Modules linked in: CPU: 3 PID: 34 Comm: kworker/3:0 Not tainted 5.16.0-rc1+ #34 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.12.0-59-gc9ba527-rebuilt.opensuse.org 04/01/2014 Workqueue: 0x0 (rcu_par_gp) RIP: 0010:worker_enter_idle+0x95/0xc0 Code: 04 85 f8 ff ff ff 39 c1 7f 09 48 8b 43 50 48 85 c0 74 1b 83 e2 04 75 99 8b 43 34 39 43 30 75 91 8b 83 00 03 00 00 85 c0 74 87 <0f> 0b 5b c3 48 8b 35 70 f1 37 01 48 8d 7b 48 48 81 c6 e0 93 0 RSP: 0000:ffff9b7680277ed0 EFLAGS: 00010086 RAX: 00000000ffffffff RBX: ffff93465eae9c00 RCX: 0000000000000000 RDX: 0000000000000000 RSI: ffff9346418a0000 RDI: ffff934641057140 RBP: ffff934641057170 R08: 0000000000000001 R09: ffff9346418a0080 R10: ffff9b768027fdf0 R11: 0000000000002400 R12: ffff93465eae9c20 R13: ffff93465eae9c20 R14: ffff93465eae9c70 R15: ffff934641057140 FS: 0000000000000000(0000) GS:ffff93465eac0000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000000 CR3: 000000001cc0c000 CR4: 00000000000006e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> worker_thread+0x89/0x3d0 ? process_one_work+0x400/0x400 kthread+0x162/0x190 ? set_kthread_struct+0x40/0x40 ret_from_fork+0x22/0x30 </TASK> Also due to this incorrect "nr_running == 1", further queued work may end up not being served, because no worker is awaken at work insert time. This raises rcutorture writer stalls for example. Fix this with disabling preemption in the right place in wq_worker_running(). It's worth noting that if the worker migrates and runs concurrently with unbind_workers(), it is guaranteed to see the WORKER_UNBOUND flag update due to set_cpus_allowed_ptr() acquiring/releasing rq->lock. Fixes: 6d25be5 ("sched/core, workqueues: Distangle worker accounting from rq lock") Reviewed-by: Lai Jiangshan <[email protected]> Tested-by: Paul E. McKenney <[email protected]> Acked-by: Peter Zijlstra (Intel) <[email protected]> Signed-off-by: Frederic Weisbecker <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Sebastian Andrzej Siewior <[email protected]> Cc: Daniel Bristot de Oliveira <[email protected]> Signed-off-by: Tejun Heo <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
[ Upstream commit 5f50153 ] The function obtain the next buffer without boundary check. We should return with I/O error code. The bug is found by fuzzing and the crash report is attached. It is an OOB bug although reported as use-after-free. [ 4.804724] BUG: KASAN: use-after-free in aq_ring_rx_clean+0x1e88/0x2730 [atlantic] [ 4.805661] Read of size 4 at addr ffff888034fe93a8 by task ksoftirqd/0/9 [ 4.806505] [ 4.806703] CPU: 0 PID: 9 Comm: ksoftirqd/0 Tainted: G W 5.6.0 #34 [ 4.809030] Call Trace: [ 4.809343] dump_stack+0x76/0xa0 [ 4.809755] print_address_description.constprop.0+0x16/0x200 [ 4.810455] ? aq_ring_rx_clean+0x1e88/0x2730 [atlantic] [ 4.811234] ? aq_ring_rx_clean+0x1e88/0x2730 [atlantic] [ 4.813183] __kasan_report.cold+0x37/0x7c [ 4.813715] ? aq_ring_rx_clean+0x1e88/0x2730 [atlantic] [ 4.814393] kasan_report+0xe/0x20 [ 4.814837] aq_ring_rx_clean+0x1e88/0x2730 [atlantic] [ 4.815499] ? hw_atl_b0_hw_ring_rx_receive+0x9a5/0xb90 [atlantic] [ 4.816290] aq_vec_poll+0x179/0x5d0 [atlantic] [ 4.816870] ? _GLOBAL__sub_I_65535_1_aq_pci_func_init+0x20/0x20 [atlantic] [ 4.817746] ? __next_timer_interrupt+0xba/0xf0 [ 4.818322] net_rx_action+0x363/0xbd0 [ 4.818803] ? call_timer_fn+0x240/0x240 [ 4.819302] ? __switch_to_asm+0x40/0x70 [ 4.819809] ? napi_busy_loop+0x520/0x520 [ 4.820324] __do_softirq+0x18c/0x634 [ 4.820797] ? takeover_tasklets+0x5f0/0x5f0 [ 4.821343] run_ksoftirqd+0x15/0x20 [ 4.821804] smpboot_thread_fn+0x2f1/0x6b0 [ 4.822331] ? smpboot_unregister_percpu_thread+0x160/0x160 [ 4.823041] ? __kthread_parkme+0x80/0x100 [ 4.823571] ? smpboot_unregister_percpu_thread+0x160/0x160 [ 4.824301] kthread+0x2b5/0x3b0 [ 4.824723] ? kthread_create_on_node+0xd0/0xd0 [ 4.825304] ret_from_fork+0x35/0x40 Signed-off-by: Zekun Shen <[email protected]> Signed-off-by: David S. Miller <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
commit 07edfec upstream. At CPU-hotplug time, unbind_worker() may preempt a worker while it is waking up. In that case the following scenario can happen: unbind_workers() wq_worker_running() -------------- ------------------- if (!(worker->flags & WORKER_NOT_RUNNING)) //PREEMPTED by unbind_workers worker->flags |= WORKER_UNBOUND; [...] atomic_set(&pool->nr_running, 0); //resume to worker atomic_inc(&worker->pool->nr_running); After unbind_worker() resets pool->nr_running, the value is expected to remain 0 until the pool ever gets rebound in case cpu_up() is called on the target CPU in the future. But here the race leaves pool->nr_running with a value of 1, triggering the following warning when the worker goes idle: WARNING: CPU: 3 PID: 34 at kernel/workqueue.c:1823 worker_enter_idle+0x95/0xc0 Modules linked in: CPU: 3 PID: 34 Comm: kworker/3:0 Not tainted 5.16.0-rc1+ #34 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.12.0-59-gc9ba527-rebuilt.opensuse.org 04/01/2014 Workqueue: 0x0 (rcu_par_gp) RIP: 0010:worker_enter_idle+0x95/0xc0 Code: 04 85 f8 ff ff ff 39 c1 7f 09 48 8b 43 50 48 85 c0 74 1b 83 e2 04 75 99 8b 43 34 39 43 30 75 91 8b 83 00 03 00 00 85 c0 74 87 <0f> 0b 5b c3 48 8b 35 70 f1 37 01 48 8d 7b 48 48 81 c6 e0 93 0 RSP: 0000:ffff9b7680277ed0 EFLAGS: 00010086 RAX: 00000000ffffffff RBX: ffff93465eae9c00 RCX: 0000000000000000 RDX: 0000000000000000 RSI: ffff9346418a0000 RDI: ffff934641057140 RBP: ffff934641057170 R08: 0000000000000001 R09: ffff9346418a0080 R10: ffff9b768027fdf0 R11: 0000000000002400 R12: ffff93465eae9c20 R13: ffff93465eae9c20 R14: ffff93465eae9c70 R15: ffff934641057140 FS: 0000000000000000(0000) GS:ffff93465eac0000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000000 CR3: 000000001cc0c000 CR4: 00000000000006e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> worker_thread+0x89/0x3d0 ? process_one_work+0x400/0x400 kthread+0x162/0x190 ? set_kthread_struct+0x40/0x40 ret_from_fork+0x22/0x30 </TASK> Also due to this incorrect "nr_running == 1", further queued work may end up not being served, because no worker is awaken at work insert time. This raises rcutorture writer stalls for example. Fix this with disabling preemption in the right place in wq_worker_running(). It's worth noting that if the worker migrates and runs concurrently with unbind_workers(), it is guaranteed to see the WORKER_UNBOUND flag update due to set_cpus_allowed_ptr() acquiring/releasing rq->lock. Fixes: 6d25be5 ("sched/core, workqueues: Distangle worker accounting from rq lock") Reviewed-by: Lai Jiangshan <[email protected]> Tested-by: Paul E. McKenney <[email protected]> Acked-by: Peter Zijlstra (Intel) <[email protected]> Signed-off-by: Frederic Weisbecker <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Sebastian Andrzej Siewior <[email protected]> Cc: Daniel Bristot de Oliveira <[email protected]> Signed-off-by: Tejun Heo <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
By keep sending L2CAP_CONF_REQ packets, chan->num_conf_rsp increases multiple times and eventually it will wrap around the maximum number (i.e., 255). This patch prevents this by adding a boundary check with L2CAP_MAX_CONF_RSP Btmon log: Bluetooth monitor ver 5.64 = Note: Linux version 6.1.0-rc2 (x86_64) 0.264594 = Note: Bluetooth subsystem version 2.22 0.264636 @ MGMT Open: btmon (privileged) version 1.22 {0x0001} 0.272191 = New Index: 00:00:00:00:00:00 (Primary,Virtual,hci0) [hci0] 13.877604 @ RAW Open: 9496 (privileged) version 2.22 {0x0002} 13.890741 = Open Index: 00:00:00:00:00:00 [hci0] 13.900426 (...) > ACL Data RX: Handle 200 flags 0x00 dlen 1033 #32 [hci0] 14.273106 invalid packet size (12 != 1033) 08 00 01 00 02 01 04 00 01 10 ff ff ............ > ACL Data RX: Handle 200 flags 0x00 dlen 1547 #33 [hci0] 14.273561 invalid packet size (14 != 1547) 0a 00 01 00 04 01 06 00 40 00 00 00 00 00 ........@..... > ACL Data RX: Handle 200 flags 0x00 dlen 2061 #34 [hci0] 14.274390 invalid packet size (16 != 2061) 0c 00 01 00 04 01 08 00 40 00 00 00 00 00 00 04 ........@....... > ACL Data RX: Handle 200 flags 0x00 dlen 2061 #35 [hci0] 14.274932 invalid packet size (16 != 2061) 0c 00 01 00 04 01 08 00 40 00 00 00 07 00 03 00 ........@....... = bluetoothd: Bluetooth daemon 5.43 14.401828 > ACL Data RX: Handle 200 flags 0x00 dlen 1033 #36 [hci0] 14.275753 invalid packet size (12 != 1033) 08 00 01 00 04 01 04 00 40 00 00 00 ........@... Signed-off-by: Sungwoo Kim <[email protected]> Signed-off-by: Luiz Augusto von Dentz <[email protected]>
[ Upstream commit bcd7026 ] By keep sending L2CAP_CONF_REQ packets, chan->num_conf_rsp increases multiple times and eventually it will wrap around the maximum number (i.e., 255). This patch prevents this by adding a boundary check with L2CAP_MAX_CONF_RSP Btmon log: Bluetooth monitor ver 5.64 = Note: Linux version 6.1.0-rc2 (x86_64) 0.264594 = Note: Bluetooth subsystem version 2.22 0.264636 @ MGMT Open: btmon (privileged) version 1.22 {0x0001} 0.272191 = New Index: 00:00:00:00:00:00 (Primary,Virtual,hci0) [hci0] 13.877604 @ RAW Open: 9496 (privileged) version 2.22 {0x0002} 13.890741 = Open Index: 00:00:00:00:00:00 [hci0] 13.900426 (...) > ACL Data RX: Handle 200 flags 0x00 dlen 1033 #32 [hci0] 14.273106 invalid packet size (12 != 1033) 08 00 01 00 02 01 04 00 01 10 ff ff ............ > ACL Data RX: Handle 200 flags 0x00 dlen 1547 #33 [hci0] 14.273561 invalid packet size (14 != 1547) 0a 00 01 00 04 01 06 00 40 00 00 00 00 00 ........@..... > ACL Data RX: Handle 200 flags 0x00 dlen 2061 #34 [hci0] 14.274390 invalid packet size (16 != 2061) 0c 00 01 00 04 01 08 00 40 00 00 00 00 00 00 04 ........@....... > ACL Data RX: Handle 200 flags 0x00 dlen 2061 #35 [hci0] 14.274932 invalid packet size (16 != 2061) 0c 00 01 00 04 01 08 00 40 00 00 00 07 00 03 00 ........@....... = bluetoothd: Bluetooth daemon 5.43 14.401828 > ACL Data RX: Handle 200 flags 0x00 dlen 1033 #36 [hci0] 14.275753 invalid packet size (12 != 1033) 08 00 01 00 04 01 04 00 40 00 00 00 ........@... Signed-off-by: Sungwoo Kim <[email protected]> Signed-off-by: Luiz Augusto von Dentz <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
[ Upstream commit bcd7026 ] By keep sending L2CAP_CONF_REQ packets, chan->num_conf_rsp increases multiple times and eventually it will wrap around the maximum number (i.e., 255). This patch prevents this by adding a boundary check with L2CAP_MAX_CONF_RSP Btmon log: Bluetooth monitor ver 5.64 = Note: Linux version 6.1.0-rc2 (x86_64) 0.264594 = Note: Bluetooth subsystem version 2.22 0.264636 @ MGMT Open: btmon (privileged) version 1.22 {0x0001} 0.272191 = New Index: 00:00:00:00:00:00 (Primary,Virtual,hci0) [hci0] 13.877604 @ RAW Open: 9496 (privileged) version 2.22 {0x0002} 13.890741 = Open Index: 00:00:00:00:00:00 [hci0] 13.900426 (...) > ACL Data RX: Handle 200 flags 0x00 dlen 1033 raspberrypi#32 [hci0] 14.273106 invalid packet size (12 != 1033) 08 00 01 00 02 01 04 00 01 10 ff ff ............ > ACL Data RX: Handle 200 flags 0x00 dlen 1547 raspberrypi#33 [hci0] 14.273561 invalid packet size (14 != 1547) 0a 00 01 00 04 01 06 00 40 00 00 00 00 00 ........@..... > ACL Data RX: Handle 200 flags 0x00 dlen 2061 raspberrypi#34 [hci0] 14.274390 invalid packet size (16 != 2061) 0c 00 01 00 04 01 08 00 40 00 00 00 00 00 00 04 ........@....... > ACL Data RX: Handle 200 flags 0x00 dlen 2061 raspberrypi#35 [hci0] 14.274932 invalid packet size (16 != 2061) 0c 00 01 00 04 01 08 00 40 00 00 00 07 00 03 00 ........@....... = bluetoothd: Bluetooth daemon 5.43 14.401828 > ACL Data RX: Handle 200 flags 0x00 dlen 1033 raspberrypi#36 [hci0] 14.275753 invalid packet size (12 != 1033) 08 00 01 00 04 01 04 00 40 00 00 00 ........@... Signed-off-by: Sungwoo Kim <[email protected]> Signed-off-by: Luiz Augusto von Dentz <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
commit e0bead9 upstream. Move the VMCB updates from avic_refresh_apicv_exec_ctrl() into avic_set_virtual_apic_mode() and invert the dependency being said functions to avoid calling avic_vcpu_{load,put}() and avic_set_pi_irte_mode() when "only" setting the virtual APIC mode. avic_set_virtual_apic_mode() is invoked from common x86 with preemption enabled, which makes avic_vcpu_{load,put}() unhappy. Luckily, calling those and updating IRTE stuff is unnecessary as the only reason avic_set_virtual_apic_mode() is called is to handle transitions between xAPIC and x2APIC that don't also toggle APICv activation. And if activation doesn't change, there's no need to fiddle with the physical APIC ID table or update IRTE. The "full" refresh is guaranteed to be called if activation changes in this case as the only call to the "set" path is: kvm_vcpu_update_apicv(vcpu); static_call_cond(kvm_x86_set_virtual_apic_mode)(vcpu); and kvm_vcpu_update_apicv() invokes the refresh if activation changes: if (apic->apicv_active == activate) goto out; apic->apicv_active = activate; kvm_apic_update_apicv(vcpu); static_call(kvm_x86_refresh_apicv_exec_ctrl)(vcpu); Rename the helper to reflect that it is also called during "refresh". WARNING: CPU: 183 PID: 49186 at arch/x86/kvm/svm/avic.c:1081 avic_vcpu_put+0xde/0xf0 [kvm_amd] CPU: 183 PID: 49186 Comm: stable Tainted: G O 6.0.0-smp--fcddbca45f0a-sink #34 Hardware name: Google, Inc. Arcadia_IT_80/Arcadia_IT_80, BIOS 10.48.0 01/27/2022 RIP: 0010:avic_vcpu_put+0xde/0xf0 [kvm_amd] avic_refresh_apicv_exec_ctrl+0x142/0x1c0 [kvm_amd] avic_set_virtual_apic_mode+0x5a/0x70 [kvm_amd] kvm_lapic_set_base+0x149/0x1a0 [kvm] kvm_set_apic_base+0x8f/0xd0 [kvm] kvm_set_msr_common+0xa3a/0xdc0 [kvm] svm_set_msr+0x364/0x6b0 [kvm_amd] __kvm_set_msr+0xb8/0x1c0 [kvm] kvm_emulate_wrmsr+0x58/0x1d0 [kvm] msr_interception+0x1c/0x30 [kvm_amd] svm_invoke_exit_handler+0x31/0x100 [kvm_amd] svm_handle_exit+0xfc/0x160 [kvm_amd] vcpu_enter_guest+0x21bb/0x23e0 [kvm] vcpu_run+0x92/0x450 [kvm] kvm_arch_vcpu_ioctl_run+0x43e/0x6e0 [kvm] kvm_vcpu_ioctl+0x559/0x620 [kvm] Fixes: 05c4fe8 ("KVM: SVM: Refresh AVIC configuration when changing APIC mode") Cc: [email protected] Cc: Suravee Suthikulpanit <[email protected]> Reviewed-by: Maxim Levitsky <[email protected]> Signed-off-by: Sean Christopherson <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
commit e0bead9 upstream. Move the VMCB updates from avic_refresh_apicv_exec_ctrl() into avic_set_virtual_apic_mode() and invert the dependency being said functions to avoid calling avic_vcpu_{load,put}() and avic_set_pi_irte_mode() when "only" setting the virtual APIC mode. avic_set_virtual_apic_mode() is invoked from common x86 with preemption enabled, which makes avic_vcpu_{load,put}() unhappy. Luckily, calling those and updating IRTE stuff is unnecessary as the only reason avic_set_virtual_apic_mode() is called is to handle transitions between xAPIC and x2APIC that don't also toggle APICv activation. And if activation doesn't change, there's no need to fiddle with the physical APIC ID table or update IRTE. The "full" refresh is guaranteed to be called if activation changes in this case as the only call to the "set" path is: kvm_vcpu_update_apicv(vcpu); static_call_cond(kvm_x86_set_virtual_apic_mode)(vcpu); and kvm_vcpu_update_apicv() invokes the refresh if activation changes: if (apic->apicv_active == activate) goto out; apic->apicv_active = activate; kvm_apic_update_apicv(vcpu); static_call(kvm_x86_refresh_apicv_exec_ctrl)(vcpu); Rename the helper to reflect that it is also called during "refresh". WARNING: CPU: 183 PID: 49186 at arch/x86/kvm/svm/avic.c:1081 avic_vcpu_put+0xde/0xf0 [kvm_amd] CPU: 183 PID: 49186 Comm: stable Tainted: G O 6.0.0-smp--fcddbca45f0a-sink #34 Hardware name: Google, Inc. Arcadia_IT_80/Arcadia_IT_80, BIOS 10.48.0 01/27/2022 RIP: 0010:avic_vcpu_put+0xde/0xf0 [kvm_amd] avic_refresh_apicv_exec_ctrl+0x142/0x1c0 [kvm_amd] avic_set_virtual_apic_mode+0x5a/0x70 [kvm_amd] kvm_lapic_set_base+0x149/0x1a0 [kvm] kvm_set_apic_base+0x8f/0xd0 [kvm] kvm_set_msr_common+0xa3a/0xdc0 [kvm] svm_set_msr+0x364/0x6b0 [kvm_amd] __kvm_set_msr+0xb8/0x1c0 [kvm] kvm_emulate_wrmsr+0x58/0x1d0 [kvm] msr_interception+0x1c/0x30 [kvm_amd] svm_invoke_exit_handler+0x31/0x100 [kvm_amd] svm_handle_exit+0xfc/0x160 [kvm_amd] vcpu_enter_guest+0x21bb/0x23e0 [kvm] vcpu_run+0x92/0x450 [kvm] kvm_arch_vcpu_ioctl_run+0x43e/0x6e0 [kvm] kvm_vcpu_ioctl+0x559/0x620 [kvm] Fixes: 05c4fe8 ("KVM: SVM: Refresh AVIC configuration when changing APIC mode") Cc: [email protected] Cc: Suravee Suthikulpanit <[email protected]> Reviewed-by: Maxim Levitsky <[email protected]> Signed-off-by: Sean Christopherson <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
The nexthop code expects a 31 bit hash, such as what is returned by fib_multipath_hash() and rt6_multipath_hash(). Passing the 32 bit hash returned by skb_get_hash() can lead to problems related to the fact that 'int hash' is a negative number when the MSB is set. In the case of hash threshold nexthop groups, nexthop_select_path_hthr() will disproportionately select the first nexthop group entry. In the case of resilient nexthop groups, nexthop_select_path_res() may do an out of bounds access in nh_buckets[], for example: hash = -912054133 num_nh_buckets = 2 bucket_index = 65535 which leads to the following panic: BUG: unable to handle page fault for address: ffffc900025910c8 PGD 100000067 P4D 100000067 PUD 10026b067 PMD 0 Oops: 0002 [#1] PREEMPT SMP KASAN NOPTI CPU: 4 PID: 856 Comm: kworker/4:3 Not tainted 6.5.0-rc2+ #34 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-debian-1.16.2-1 04/01/2014 Workqueue: ipv6_addrconf addrconf_dad_work RIP: 0010:nexthop_select_path+0x197/0xbf0 Code: c1 e4 05 be 08 00 00 00 4c 8b 35 a4 14 7e 01 4e 8d 6c 25 00 4a 8d 7c 25 08 48 01 dd e8 c2 25 15 ff 49 8d 7d 08 e8 39 13 15 ff <4d> 89 75 08 48 89 ef e8 7d 12 15 ff 48 8b 5d 00 e8 14 55 2f 00 85 RSP: 0018:ffff88810c36f260 EFLAGS: 00010246 RAX: 0000000000000000 RBX: 00000000002000c0 RCX: ffffffffaf02dd77 RDX: dffffc0000000000 RSI: 0000000000000008 RDI: ffffc900025910c8 RBP: ffffc900025910c0 R08: 0000000000000001 R09: fffff520004b2219 R10: ffffc900025910cf R11: 31392d2068736168 R12: 00000000002000c0 R13: ffffc900025910c0 R14: 00000000fffef608 R15: ffff88811840e900 FS: 0000000000000000(0000) GS:ffff8881f7000000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: ffffc900025910c8 CR3: 0000000129d00000 CR4: 0000000000750ee0 PKRU: 55555554 Call Trace: <TASK> ? __die+0x23/0x70 ? page_fault_oops+0x1ee/0x5c0 ? __pfx_is_prefetch.constprop.0+0x10/0x10 ? __pfx_page_fault_oops+0x10/0x10 ? search_bpf_extables+0xfe/0x1c0 ? fixup_exception+0x3b/0x470 ? exc_page_fault+0xf6/0x110 ? asm_exc_page_fault+0x26/0x30 ? nexthop_select_path+0x197/0xbf0 ? nexthop_select_path+0x197/0xbf0 ? lock_is_held_type+0xe7/0x140 vxlan_xmit+0x5b2/0x2340 ? __lock_acquire+0x92b/0x3370 ? __pfx_vxlan_xmit+0x10/0x10 ? __pfx___lock_acquire+0x10/0x10 ? __pfx_register_lock_class+0x10/0x10 ? skb_network_protocol+0xce/0x2d0 ? dev_hard_start_xmit+0xca/0x350 ? __pfx_vxlan_xmit+0x10/0x10 dev_hard_start_xmit+0xca/0x350 __dev_queue_xmit+0x513/0x1e20 ? __pfx___dev_queue_xmit+0x10/0x10 ? __pfx_lock_release+0x10/0x10 ? mark_held_locks+0x44/0x90 ? skb_push+0x4c/0x80 ? eth_header+0x81/0xe0 ? __pfx_eth_header+0x10/0x10 ? neigh_resolve_output+0x215/0x310 ? ip6_finish_output2+0x2ba/0xc90 ip6_finish_output2+0x2ba/0xc90 ? lock_release+0x236/0x3e0 ? ip6_mtu+0xbb/0x240 ? __pfx_ip6_finish_output2+0x10/0x10 ? find_held_lock+0x83/0xa0 ? lock_is_held_type+0xe7/0x140 ip6_finish_output+0x1ee/0x780 ip6_output+0x138/0x460 ? __pfx_ip6_output+0x10/0x10 ? __pfx___lock_acquire+0x10/0x10 ? __pfx_ip6_finish_output+0x10/0x10 NF_HOOK.constprop.0+0xc0/0x420 ? __pfx_NF_HOOK.constprop.0+0x10/0x10 ? ndisc_send_skb+0x2c0/0x960 ? __pfx_lock_release+0x10/0x10 ? __local_bh_enable_ip+0x93/0x110 ? lock_is_held_type+0xe7/0x140 ndisc_send_skb+0x4be/0x960 ? __pfx_ndisc_send_skb+0x10/0x10 ? mark_held_locks+0x65/0x90 ? find_held_lock+0x83/0xa0 ndisc_send_ns+0xb0/0x110 ? __pfx_ndisc_send_ns+0x10/0x10 addrconf_dad_work+0x631/0x8e0 ? lock_acquire+0x180/0x3f0 ? __pfx_addrconf_dad_work+0x10/0x10 ? mark_held_locks+0x24/0x90 process_one_work+0x582/0x9c0 ? __pfx_process_one_work+0x10/0x10 ? __pfx_do_raw_spin_lock+0x10/0x10 ? mark_held_locks+0x24/0x90 worker_thread+0x93/0x630 ? __kthread_parkme+0xdc/0x100 ? __pfx_worker_thread+0x10/0x10 kthread+0x1a5/0x1e0 ? __pfx_kthread+0x10/0x10 ret_from_fork+0x34/0x60 ? __pfx_kthread+0x10/0x10 ret_from_fork_asm+0x1b/0x30 RIP: 0000:0x0 Code: Unable to access opcode bytes at 0xffffffffffffffd6. RSP: 0000:0000000000000000 EFLAGS: 00000000 ORIG_RAX: 0000000000000000 RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000 RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000 RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 </TASK> Modules linked in: CR2: ffffc900025910c8 ---[ end trace 0000000000000000 ]--- RIP: 0010:nexthop_select_path+0x197/0xbf0 Code: c1 e4 05 be 08 00 00 00 4c 8b 35 a4 14 7e 01 4e 8d 6c 25 00 4a 8d 7c 25 08 48 01 dd e8 c2 25 15 ff 49 8d 7d 08 e8 39 13 15 ff <4d> 89 75 08 48 89 ef e8 7d 12 15 ff 48 8b 5d 00 e8 14 55 2f 00 85 RSP: 0018:ffff88810c36f260 EFLAGS: 00010246 RAX: 0000000000000000 RBX: 00000000002000c0 RCX: ffffffffaf02dd77 RDX: dffffc0000000000 RSI: 0000000000000008 RDI: ffffc900025910c8 RBP: ffffc900025910c0 R08: 0000000000000001 R09: fffff520004b2219 R10: ffffc900025910cf R11: 31392d2068736168 R12: 00000000002000c0 R13: ffffc900025910c0 R14: 00000000fffef608 R15: ffff88811840e900 FS: 0000000000000000(0000) GS:ffff8881f7000000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: ffffffffffffffd6 CR3: 0000000129d00000 CR4: 0000000000750ee0 PKRU: 55555554 Kernel panic - not syncing: Fatal exception in interrupt Kernel Offset: 0x2ca00000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff) ---[ end Kernel panic - not syncing: Fatal exception in interrupt ]--- Fix this problem by ensuring the MSB of hash is 0 using a right shift - the same approach used in fib_multipath_hash() and rt6_multipath_hash(). Fixes: 1274e1c ("vxlan: ecmp support for mac fdb entries") Signed-off-by: Benjamin Poirier <[email protected]> Reviewed-by: Ido Schimmel <[email protected]> Reviewed-by: Simon Horman <[email protected]> Signed-off-by: David S. Miller <[email protected]>
[ Upstream commit 0756384 ] The nexthop code expects a 31 bit hash, such as what is returned by fib_multipath_hash() and rt6_multipath_hash(). Passing the 32 bit hash returned by skb_get_hash() can lead to problems related to the fact that 'int hash' is a negative number when the MSB is set. In the case of hash threshold nexthop groups, nexthop_select_path_hthr() will disproportionately select the first nexthop group entry. In the case of resilient nexthop groups, nexthop_select_path_res() may do an out of bounds access in nh_buckets[], for example: hash = -912054133 num_nh_buckets = 2 bucket_index = 65535 which leads to the following panic: BUG: unable to handle page fault for address: ffffc900025910c8 PGD 100000067 P4D 100000067 PUD 10026b067 PMD 0 Oops: 0002 [#1] PREEMPT SMP KASAN NOPTI CPU: 4 PID: 856 Comm: kworker/4:3 Not tainted 6.5.0-rc2+ #34 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-debian-1.16.2-1 04/01/2014 Workqueue: ipv6_addrconf addrconf_dad_work RIP: 0010:nexthop_select_path+0x197/0xbf0 Code: c1 e4 05 be 08 00 00 00 4c 8b 35 a4 14 7e 01 4e 8d 6c 25 00 4a 8d 7c 25 08 48 01 dd e8 c2 25 15 ff 49 8d 7d 08 e8 39 13 15 ff <4d> 89 75 08 48 89 ef e8 7d 12 15 ff 48 8b 5d 00 e8 14 55 2f 00 85 RSP: 0018:ffff88810c36f260 EFLAGS: 00010246 RAX: 0000000000000000 RBX: 00000000002000c0 RCX: ffffffffaf02dd77 RDX: dffffc0000000000 RSI: 0000000000000008 RDI: ffffc900025910c8 RBP: ffffc900025910c0 R08: 0000000000000001 R09: fffff520004b2219 R10: ffffc900025910cf R11: 31392d2068736168 R12: 00000000002000c0 R13: ffffc900025910c0 R14: 00000000fffef608 R15: ffff88811840e900 FS: 0000000000000000(0000) GS:ffff8881f7000000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: ffffc900025910c8 CR3: 0000000129d00000 CR4: 0000000000750ee0 PKRU: 55555554 Call Trace: <TASK> ? __die+0x23/0x70 ? page_fault_oops+0x1ee/0x5c0 ? __pfx_is_prefetch.constprop.0+0x10/0x10 ? __pfx_page_fault_oops+0x10/0x10 ? search_bpf_extables+0xfe/0x1c0 ? fixup_exception+0x3b/0x470 ? exc_page_fault+0xf6/0x110 ? asm_exc_page_fault+0x26/0x30 ? nexthop_select_path+0x197/0xbf0 ? nexthop_select_path+0x197/0xbf0 ? lock_is_held_type+0xe7/0x140 vxlan_xmit+0x5b2/0x2340 ? __lock_acquire+0x92b/0x3370 ? __pfx_vxlan_xmit+0x10/0x10 ? __pfx___lock_acquire+0x10/0x10 ? __pfx_register_lock_class+0x10/0x10 ? skb_network_protocol+0xce/0x2d0 ? dev_hard_start_xmit+0xca/0x350 ? __pfx_vxlan_xmit+0x10/0x10 dev_hard_start_xmit+0xca/0x350 __dev_queue_xmit+0x513/0x1e20 ? __pfx___dev_queue_xmit+0x10/0x10 ? __pfx_lock_release+0x10/0x10 ? mark_held_locks+0x44/0x90 ? skb_push+0x4c/0x80 ? eth_header+0x81/0xe0 ? __pfx_eth_header+0x10/0x10 ? neigh_resolve_output+0x215/0x310 ? ip6_finish_output2+0x2ba/0xc90 ip6_finish_output2+0x2ba/0xc90 ? lock_release+0x236/0x3e0 ? ip6_mtu+0xbb/0x240 ? __pfx_ip6_finish_output2+0x10/0x10 ? find_held_lock+0x83/0xa0 ? lock_is_held_type+0xe7/0x140 ip6_finish_output+0x1ee/0x780 ip6_output+0x138/0x460 ? __pfx_ip6_output+0x10/0x10 ? __pfx___lock_acquire+0x10/0x10 ? __pfx_ip6_finish_output+0x10/0x10 NF_HOOK.constprop.0+0xc0/0x420 ? __pfx_NF_HOOK.constprop.0+0x10/0x10 ? ndisc_send_skb+0x2c0/0x960 ? __pfx_lock_release+0x10/0x10 ? __local_bh_enable_ip+0x93/0x110 ? lock_is_held_type+0xe7/0x140 ndisc_send_skb+0x4be/0x960 ? __pfx_ndisc_send_skb+0x10/0x10 ? mark_held_locks+0x65/0x90 ? find_held_lock+0x83/0xa0 ndisc_send_ns+0xb0/0x110 ? __pfx_ndisc_send_ns+0x10/0x10 addrconf_dad_work+0x631/0x8e0 ? lock_acquire+0x180/0x3f0 ? __pfx_addrconf_dad_work+0x10/0x10 ? mark_held_locks+0x24/0x90 process_one_work+0x582/0x9c0 ? __pfx_process_one_work+0x10/0x10 ? __pfx_do_raw_spin_lock+0x10/0x10 ? mark_held_locks+0x24/0x90 worker_thread+0x93/0x630 ? __kthread_parkme+0xdc/0x100 ? __pfx_worker_thread+0x10/0x10 kthread+0x1a5/0x1e0 ? __pfx_kthread+0x10/0x10 ret_from_fork+0x34/0x60 ? __pfx_kthread+0x10/0x10 ret_from_fork_asm+0x1b/0x30 RIP: 0000:0x0 Code: Unable to access opcode bytes at 0xffffffffffffffd6. RSP: 0000:0000000000000000 EFLAGS: 00000000 ORIG_RAX: 0000000000000000 RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000 RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000 RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 </TASK> Modules linked in: CR2: ffffc900025910c8 ---[ end trace 0000000000000000 ]--- RIP: 0010:nexthop_select_path+0x197/0xbf0 Code: c1 e4 05 be 08 00 00 00 4c 8b 35 a4 14 7e 01 4e 8d 6c 25 00 4a 8d 7c 25 08 48 01 dd e8 c2 25 15 ff 49 8d 7d 08 e8 39 13 15 ff <4d> 89 75 08 48 89 ef e8 7d 12 15 ff 48 8b 5d 00 e8 14 55 2f 00 85 RSP: 0018:ffff88810c36f260 EFLAGS: 00010246 RAX: 0000000000000000 RBX: 00000000002000c0 RCX: ffffffffaf02dd77 RDX: dffffc0000000000 RSI: 0000000000000008 RDI: ffffc900025910c8 RBP: ffffc900025910c0 R08: 0000000000000001 R09: fffff520004b2219 R10: ffffc900025910cf R11: 31392d2068736168 R12: 00000000002000c0 R13: ffffc900025910c0 R14: 00000000fffef608 R15: ffff88811840e900 FS: 0000000000000000(0000) GS:ffff8881f7000000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: ffffffffffffffd6 CR3: 0000000129d00000 CR4: 0000000000750ee0 PKRU: 55555554 Kernel panic - not syncing: Fatal exception in interrupt Kernel Offset: 0x2ca00000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff) ---[ end Kernel panic - not syncing: Fatal exception in interrupt ]--- Fix this problem by ensuring the MSB of hash is 0 using a right shift - the same approach used in fib_multipath_hash() and rt6_multipath_hash(). Fixes: 1274e1c ("vxlan: ecmp support for mac fdb entries") Signed-off-by: Benjamin Poirier <[email protected]> Reviewed-by: Ido Schimmel <[email protected]> Reviewed-by: Simon Horman <[email protected]> Signed-off-by: David S. Miller <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
[ Upstream commit 0756384 ] The nexthop code expects a 31 bit hash, such as what is returned by fib_multipath_hash() and rt6_multipath_hash(). Passing the 32 bit hash returned by skb_get_hash() can lead to problems related to the fact that 'int hash' is a negative number when the MSB is set. In the case of hash threshold nexthop groups, nexthop_select_path_hthr() will disproportionately select the first nexthop group entry. In the case of resilient nexthop groups, nexthop_select_path_res() may do an out of bounds access in nh_buckets[], for example: hash = -912054133 num_nh_buckets = 2 bucket_index = 65535 which leads to the following panic: BUG: unable to handle page fault for address: ffffc900025910c8 PGD 100000067 P4D 100000067 PUD 10026b067 PMD 0 Oops: 0002 [#1] PREEMPT SMP KASAN NOPTI CPU: 4 PID: 856 Comm: kworker/4:3 Not tainted 6.5.0-rc2+ #34 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-debian-1.16.2-1 04/01/2014 Workqueue: ipv6_addrconf addrconf_dad_work RIP: 0010:nexthop_select_path+0x197/0xbf0 Code: c1 e4 05 be 08 00 00 00 4c 8b 35 a4 14 7e 01 4e 8d 6c 25 00 4a 8d 7c 25 08 48 01 dd e8 c2 25 15 ff 49 8d 7d 08 e8 39 13 15 ff <4d> 89 75 08 48 89 ef e8 7d 12 15 ff 48 8b 5d 00 e8 14 55 2f 00 85 RSP: 0018:ffff88810c36f260 EFLAGS: 00010246 RAX: 0000000000000000 RBX: 00000000002000c0 RCX: ffffffffaf02dd77 RDX: dffffc0000000000 RSI: 0000000000000008 RDI: ffffc900025910c8 RBP: ffffc900025910c0 R08: 0000000000000001 R09: fffff520004b2219 R10: ffffc900025910cf R11: 31392d2068736168 R12: 00000000002000c0 R13: ffffc900025910c0 R14: 00000000fffef608 R15: ffff88811840e900 FS: 0000000000000000(0000) GS:ffff8881f7000000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: ffffc900025910c8 CR3: 0000000129d00000 CR4: 0000000000750ee0 PKRU: 55555554 Call Trace: <TASK> ? __die+0x23/0x70 ? page_fault_oops+0x1ee/0x5c0 ? __pfx_is_prefetch.constprop.0+0x10/0x10 ? __pfx_page_fault_oops+0x10/0x10 ? search_bpf_extables+0xfe/0x1c0 ? fixup_exception+0x3b/0x470 ? exc_page_fault+0xf6/0x110 ? asm_exc_page_fault+0x26/0x30 ? nexthop_select_path+0x197/0xbf0 ? nexthop_select_path+0x197/0xbf0 ? lock_is_held_type+0xe7/0x140 vxlan_xmit+0x5b2/0x2340 ? __lock_acquire+0x92b/0x3370 ? __pfx_vxlan_xmit+0x10/0x10 ? __pfx___lock_acquire+0x10/0x10 ? __pfx_register_lock_class+0x10/0x10 ? skb_network_protocol+0xce/0x2d0 ? dev_hard_start_xmit+0xca/0x350 ? __pfx_vxlan_xmit+0x10/0x10 dev_hard_start_xmit+0xca/0x350 __dev_queue_xmit+0x513/0x1e20 ? __pfx___dev_queue_xmit+0x10/0x10 ? __pfx_lock_release+0x10/0x10 ? mark_held_locks+0x44/0x90 ? skb_push+0x4c/0x80 ? eth_header+0x81/0xe0 ? __pfx_eth_header+0x10/0x10 ? neigh_resolve_output+0x215/0x310 ? ip6_finish_output2+0x2ba/0xc90 ip6_finish_output2+0x2ba/0xc90 ? lock_release+0x236/0x3e0 ? ip6_mtu+0xbb/0x240 ? __pfx_ip6_finish_output2+0x10/0x10 ? find_held_lock+0x83/0xa0 ? lock_is_held_type+0xe7/0x140 ip6_finish_output+0x1ee/0x780 ip6_output+0x138/0x460 ? __pfx_ip6_output+0x10/0x10 ? __pfx___lock_acquire+0x10/0x10 ? __pfx_ip6_finish_output+0x10/0x10 NF_HOOK.constprop.0+0xc0/0x420 ? __pfx_NF_HOOK.constprop.0+0x10/0x10 ? ndisc_send_skb+0x2c0/0x960 ? __pfx_lock_release+0x10/0x10 ? __local_bh_enable_ip+0x93/0x110 ? lock_is_held_type+0xe7/0x140 ndisc_send_skb+0x4be/0x960 ? __pfx_ndisc_send_skb+0x10/0x10 ? mark_held_locks+0x65/0x90 ? find_held_lock+0x83/0xa0 ndisc_send_ns+0xb0/0x110 ? __pfx_ndisc_send_ns+0x10/0x10 addrconf_dad_work+0x631/0x8e0 ? lock_acquire+0x180/0x3f0 ? __pfx_addrconf_dad_work+0x10/0x10 ? mark_held_locks+0x24/0x90 process_one_work+0x582/0x9c0 ? __pfx_process_one_work+0x10/0x10 ? __pfx_do_raw_spin_lock+0x10/0x10 ? mark_held_locks+0x24/0x90 worker_thread+0x93/0x630 ? __kthread_parkme+0xdc/0x100 ? __pfx_worker_thread+0x10/0x10 kthread+0x1a5/0x1e0 ? __pfx_kthread+0x10/0x10 ret_from_fork+0x34/0x60 ? __pfx_kthread+0x10/0x10 ret_from_fork_asm+0x1b/0x30 RIP: 0000:0x0 Code: Unable to access opcode bytes at 0xffffffffffffffd6. RSP: 0000:0000000000000000 EFLAGS: 00000000 ORIG_RAX: 0000000000000000 RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000 RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000 RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 </TASK> Modules linked in: CR2: ffffc900025910c8 ---[ end trace 0000000000000000 ]--- RIP: 0010:nexthop_select_path+0x197/0xbf0 Code: c1 e4 05 be 08 00 00 00 4c 8b 35 a4 14 7e 01 4e 8d 6c 25 00 4a 8d 7c 25 08 48 01 dd e8 c2 25 15 ff 49 8d 7d 08 e8 39 13 15 ff <4d> 89 75 08 48 89 ef e8 7d 12 15 ff 48 8b 5d 00 e8 14 55 2f 00 85 RSP: 0018:ffff88810c36f260 EFLAGS: 00010246 RAX: 0000000000000000 RBX: 00000000002000c0 RCX: ffffffffaf02dd77 RDX: dffffc0000000000 RSI: 0000000000000008 RDI: ffffc900025910c8 RBP: ffffc900025910c0 R08: 0000000000000001 R09: fffff520004b2219 R10: ffffc900025910cf R11: 31392d2068736168 R12: 00000000002000c0 R13: ffffc900025910c0 R14: 00000000fffef608 R15: ffff88811840e900 FS: 0000000000000000(0000) GS:ffff8881f7000000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: ffffffffffffffd6 CR3: 0000000129d00000 CR4: 0000000000750ee0 PKRU: 55555554 Kernel panic - not syncing: Fatal exception in interrupt Kernel Offset: 0x2ca00000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff) ---[ end Kernel panic - not syncing: Fatal exception in interrupt ]--- Fix this problem by ensuring the MSB of hash is 0 using a right shift - the same approach used in fib_multipath_hash() and rt6_multipath_hash(). Fixes: 1274e1c ("vxlan: ecmp support for mac fdb entries") Signed-off-by: Benjamin Poirier <[email protected]> Reviewed-by: Ido Schimmel <[email protected]> Reviewed-by: Simon Horman <[email protected]> Signed-off-by: David S. Miller <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
[ Upstream commit 282c1d7 ] [ 567.613292] shift exponent 255 is too large for 64-bit type 'long unsigned int' [ 567.614498] CPU: 5 PID: 238 Comm: kworker/5:1 Tainted: G OE 6.2.0-34-generic #34~22.04.1-Ubuntu [ 567.614502] Hardware name: AMD Splinter/Splinter-RPL, BIOS WS43927N_871 09/25/2023 [ 567.614504] Workqueue: events send_exception_work_handler [amdgpu] [ 567.614748] Call Trace: [ 567.614750] <TASK> [ 567.614753] dump_stack_lvl+0x48/0x70 [ 567.614761] dump_stack+0x10/0x20 [ 567.614763] __ubsan_handle_shift_out_of_bounds+0x156/0x310 [ 567.614769] ? srso_alias_return_thunk+0x5/0x7f [ 567.614773] ? update_sd_lb_stats.constprop.0+0xf2/0x3c0 [ 567.614780] svm_range_split_by_granularity.cold+0x2b/0x34 [amdgpu] [ 567.615047] ? srso_alias_return_thunk+0x5/0x7f [ 567.615052] svm_migrate_to_ram+0x185/0x4d0 [amdgpu] [ 567.615286] do_swap_page+0x7b6/0xa30 [ 567.615291] ? srso_alias_return_thunk+0x5/0x7f [ 567.615294] ? __free_pages+0x119/0x130 [ 567.615299] handle_pte_fault+0x227/0x280 [ 567.615303] __handle_mm_fault+0x3c0/0x720 [ 567.615311] handle_mm_fault+0x119/0x330 [ 567.615314] ? lock_mm_and_find_vma+0x44/0x250 [ 567.615318] do_user_addr_fault+0x1a9/0x640 [ 567.615323] exc_page_fault+0x81/0x1b0 [ 567.615328] asm_exc_page_fault+0x27/0x30 [ 567.615332] RIP: 0010:__get_user_8+0x1c/0x30 Signed-off-by: Jesse Zhang <[email protected]> Suggested-by: Philip Yang <[email protected]> Reviewed-by: Yifan Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
[ Upstream commit be97d0d ] commit 31da94c ("riscv: add VMAP_STACK overflow detection") added support for CONFIG_VMAP_STACK. If overflow is detected, CPU switches to `shadow_stack` temporarily before switching finally to per-cpu `overflow_stack`. If two CPUs/harts are racing and end up in over flowing kernel stack, one or both will end up corrupting each other state because `shadow_stack` is not per-cpu. This patch optimizes per-cpu overflow stack switch by directly picking per-cpu `overflow_stack` and gets rid of `shadow_stack`. Following are the changes in this patch - Defines an asm macro to obtain per-cpu symbols in destination register. - In entry.S, when overflow is detected, per-cpu overflow stack is located using per-cpu asm macro. Computing per-cpu symbol requires a temporary register. x31 is saved away into CSR_SCRATCH (CSR_SCRATCH is anyways zero since we're in kernel). Please see Links for additional relevant disccussion and alternative solution. Tested by `echo EXHAUST_STACK > /sys/kernel/debug/provoke-crash/DIRECT` Kernel crash log below Insufficient stack space to handle exception!/debug/provoke-crash/DIRECT Task stack: [0xff20000010a98000..0xff20000010a9c000] Overflow stack: [0xff600001f7d98370..0xff600001f7d99370] CPU: 1 PID: 205 Comm: bash Not tainted 6.1.0-rc2-00001-g328a1f96f7b9 #34 Hardware name: riscv-virtio,qemu (DT) epc : __memset+0x60/0xfc ra : recursive_loop+0x48/0xc6 [lkdtm] epc : ffffffff808de0e4 ra : ffffffff0163a752 sp : ff20000010a97e80 gp : ffffffff815c0330 tp : ff600000820ea280 t0 : ff20000010a97e88 t1 : 000000000000002e t2 : 3233206874706564 s0 : ff20000010a982b0 s1 : 0000000000000012 a0 : ff20000010a97e88 a1 : 0000000000000000 a2 : 0000000000000400 a3 : ff20000010a98288 a4 : 0000000000000000 a5 : 0000000000000000 a6 : fffffffffffe43f0 a7 : 00007fffffffffff s2 : ff20000010a97e88 s3 : ffffffff01644680 s4 : ff20000010a9be90 s5 : ff600000842ba6c0 s6 : 00aaaaaac29e42b0 s7 : 00fffffff0aa3684 s8 : 00aaaaaac2978040 s9 : 0000000000000065 s10: 00ffffff8a7cad10 s11: 00ffffff8a76a4e0 t3 : ffffffff815dbaf4 t4 : ffffffff815dbaf4 t5 : ffffffff815dbab8 t6 : ff20000010a9bb48 status: 0000000200000120 badaddr: ff20000010a97e88 cause: 000000000000000f Kernel panic - not syncing: Kernel stack overflow CPU: 1 PID: 205 Comm: bash Not tainted 6.1.0-rc2-00001-g328a1f96f7b9 #34 Hardware name: riscv-virtio,qemu (DT) Call Trace: [<ffffffff80006754>] dump_backtrace+0x30/0x38 [<ffffffff808de798>] show_stack+0x40/0x4c [<ffffffff808ea2a8>] dump_stack_lvl+0x44/0x5c [<ffffffff808ea2d8>] dump_stack+0x18/0x20 [<ffffffff808dec06>] panic+0x126/0x2fe [<ffffffff800065ea>] walk_stackframe+0x0/0xf0 [<ffffffff0163a752>] recursive_loop+0x48/0xc6 [lkdtm] SMP: stopping secondary CPUs ---[ end Kernel panic - not syncing: Kernel stack overflow ]--- Cc: Guo Ren <[email protected]> Cc: Jisheng Zhang <[email protected]> Link: https://lore.kernel.org/linux-riscv/Y347B0x4VUNOd6V7@xhacker/T/#t Link: https://lore.kernel.org/lkml/[email protected]/ Signed-off-by: Deepak Gupta <[email protected]> Co-developed-by: Sami Tolvanen <[email protected]> Signed-off-by: Sami Tolvanen <[email protected]> Acked-by: Guo Ren <[email protected]> Tested-by: Nathan Chancellor <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Palmer Dabbelt <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
[ Upstream commit 282c1d7 ] [ 567.613292] shift exponent 255 is too large for 64-bit type 'long unsigned int' [ 567.614498] CPU: 5 PID: 238 Comm: kworker/5:1 Tainted: G OE 6.2.0-34-generic #34~22.04.1-Ubuntu [ 567.614502] Hardware name: AMD Splinter/Splinter-RPL, BIOS WS43927N_871 09/25/2023 [ 567.614504] Workqueue: events send_exception_work_handler [amdgpu] [ 567.614748] Call Trace: [ 567.614750] <TASK> [ 567.614753] dump_stack_lvl+0x48/0x70 [ 567.614761] dump_stack+0x10/0x20 [ 567.614763] __ubsan_handle_shift_out_of_bounds+0x156/0x310 [ 567.614769] ? srso_alias_return_thunk+0x5/0x7f [ 567.614773] ? update_sd_lb_stats.constprop.0+0xf2/0x3c0 [ 567.614780] svm_range_split_by_granularity.cold+0x2b/0x34 [amdgpu] [ 567.615047] ? srso_alias_return_thunk+0x5/0x7f [ 567.615052] svm_migrate_to_ram+0x185/0x4d0 [amdgpu] [ 567.615286] do_swap_page+0x7b6/0xa30 [ 567.615291] ? srso_alias_return_thunk+0x5/0x7f [ 567.615294] ? __free_pages+0x119/0x130 [ 567.615299] handle_pte_fault+0x227/0x280 [ 567.615303] __handle_mm_fault+0x3c0/0x720 [ 567.615311] handle_mm_fault+0x119/0x330 [ 567.615314] ? lock_mm_and_find_vma+0x44/0x250 [ 567.615318] do_user_addr_fault+0x1a9/0x640 [ 567.615323] exc_page_fault+0x81/0x1b0 [ 567.615328] asm_exc_page_fault+0x27/0x30 [ 567.615332] RIP: 0010:__get_user_8+0x1c/0x30 Signed-off-by: Jesse Zhang <[email protected]> Suggested-by: Philip Yang <[email protected]> Reviewed-by: Yifan Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
[ Upstream commit be97d0d ] commit 31da94c ("riscv: add VMAP_STACK overflow detection") added support for CONFIG_VMAP_STACK. If overflow is detected, CPU switches to `shadow_stack` temporarily before switching finally to per-cpu `overflow_stack`. If two CPUs/harts are racing and end up in over flowing kernel stack, one or both will end up corrupting each other state because `shadow_stack` is not per-cpu. This patch optimizes per-cpu overflow stack switch by directly picking per-cpu `overflow_stack` and gets rid of `shadow_stack`. Following are the changes in this patch - Defines an asm macro to obtain per-cpu symbols in destination register. - In entry.S, when overflow is detected, per-cpu overflow stack is located using per-cpu asm macro. Computing per-cpu symbol requires a temporary register. x31 is saved away into CSR_SCRATCH (CSR_SCRATCH is anyways zero since we're in kernel). Please see Links for additional relevant disccussion and alternative solution. Tested by `echo EXHAUST_STACK > /sys/kernel/debug/provoke-crash/DIRECT` Kernel crash log below Insufficient stack space to handle exception!/debug/provoke-crash/DIRECT Task stack: [0xff20000010a98000..0xff20000010a9c000] Overflow stack: [0xff600001f7d98370..0xff600001f7d99370] CPU: 1 PID: 205 Comm: bash Not tainted 6.1.0-rc2-00001-g328a1f96f7b9 #34 Hardware name: riscv-virtio,qemu (DT) epc : __memset+0x60/0xfc ra : recursive_loop+0x48/0xc6 [lkdtm] epc : ffffffff808de0e4 ra : ffffffff0163a752 sp : ff20000010a97e80 gp : ffffffff815c0330 tp : ff600000820ea280 t0 : ff20000010a97e88 t1 : 000000000000002e t2 : 3233206874706564 s0 : ff20000010a982b0 s1 : 0000000000000012 a0 : ff20000010a97e88 a1 : 0000000000000000 a2 : 0000000000000400 a3 : ff20000010a98288 a4 : 0000000000000000 a5 : 0000000000000000 a6 : fffffffffffe43f0 a7 : 00007fffffffffff s2 : ff20000010a97e88 s3 : ffffffff01644680 s4 : ff20000010a9be90 s5 : ff600000842ba6c0 s6 : 00aaaaaac29e42b0 s7 : 00fffffff0aa3684 s8 : 00aaaaaac2978040 s9 : 0000000000000065 s10: 00ffffff8a7cad10 s11: 00ffffff8a76a4e0 t3 : ffffffff815dbaf4 t4 : ffffffff815dbaf4 t5 : ffffffff815dbab8 t6 : ff20000010a9bb48 status: 0000000200000120 badaddr: ff20000010a97e88 cause: 000000000000000f Kernel panic - not syncing: Kernel stack overflow CPU: 1 PID: 205 Comm: bash Not tainted 6.1.0-rc2-00001-g328a1f96f7b9 #34 Hardware name: riscv-virtio,qemu (DT) Call Trace: [<ffffffff80006754>] dump_backtrace+0x30/0x38 [<ffffffff808de798>] show_stack+0x40/0x4c [<ffffffff808ea2a8>] dump_stack_lvl+0x44/0x5c [<ffffffff808ea2d8>] dump_stack+0x18/0x20 [<ffffffff808dec06>] panic+0x126/0x2fe [<ffffffff800065ea>] walk_stackframe+0x0/0xf0 [<ffffffff0163a752>] recursive_loop+0x48/0xc6 [lkdtm] SMP: stopping secondary CPUs ---[ end Kernel panic - not syncing: Kernel stack overflow ]--- Cc: Guo Ren <[email protected]> Cc: Jisheng Zhang <[email protected]> Link: https://lore.kernel.org/linux-riscv/Y347B0x4VUNOd6V7@xhacker/T/#t Link: https://lore.kernel.org/lkml/[email protected]/ Signed-off-by: Deepak Gupta <[email protected]> Co-developed-by: Sami Tolvanen <[email protected]> Signed-off-by: Sami Tolvanen <[email protected]> Acked-by: Guo Ren <[email protected]> Tested-by: Nathan Chancellor <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Palmer Dabbelt <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
[ Upstream commit 282c1d7 ] [ 567.613292] shift exponent 255 is too large for 64-bit type 'long unsigned int' [ 567.614498] CPU: 5 PID: 238 Comm: kworker/5:1 Tainted: G OE 6.2.0-34-generic #34~22.04.1-Ubuntu [ 567.614502] Hardware name: AMD Splinter/Splinter-RPL, BIOS WS43927N_871 09/25/2023 [ 567.614504] Workqueue: events send_exception_work_handler [amdgpu] [ 567.614748] Call Trace: [ 567.614750] <TASK> [ 567.614753] dump_stack_lvl+0x48/0x70 [ 567.614761] dump_stack+0x10/0x20 [ 567.614763] __ubsan_handle_shift_out_of_bounds+0x156/0x310 [ 567.614769] ? srso_alias_return_thunk+0x5/0x7f [ 567.614773] ? update_sd_lb_stats.constprop.0+0xf2/0x3c0 [ 567.614780] svm_range_split_by_granularity.cold+0x2b/0x34 [amdgpu] [ 567.615047] ? srso_alias_return_thunk+0x5/0x7f [ 567.615052] svm_migrate_to_ram+0x185/0x4d0 [amdgpu] [ 567.615286] do_swap_page+0x7b6/0xa30 [ 567.615291] ? srso_alias_return_thunk+0x5/0x7f [ 567.615294] ? __free_pages+0x119/0x130 [ 567.615299] handle_pte_fault+0x227/0x280 [ 567.615303] __handle_mm_fault+0x3c0/0x720 [ 567.615311] handle_mm_fault+0x119/0x330 [ 567.615314] ? lock_mm_and_find_vma+0x44/0x250 [ 567.615318] do_user_addr_fault+0x1a9/0x640 [ 567.615323] exc_page_fault+0x81/0x1b0 [ 567.615328] asm_exc_page_fault+0x27/0x30 [ 567.615332] RIP: 0010:__get_user_8+0x1c/0x30 Signed-off-by: Jesse Zhang <[email protected]> Suggested-by: Philip Yang <[email protected]> Reviewed-by: Yifan Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
[ Upstream commit d6938c1 ] Inside decrement_ttl() upon discovering that the packet ttl has exceeded, __IP_INC_STATS and __IP6_INC_STATS macros can be called from preemptible context having the following backtrace: check_preemption_disabled: 48 callbacks suppressed BUG: using __this_cpu_add() in preemptible [00000000] code: curl/1177 caller is decrement_ttl+0x217/0x830 CPU: 5 PID: 1177 Comm: curl Not tainted 6.7.0+ #34 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 04/01/2014 Call Trace: <TASK> dump_stack_lvl+0xbd/0xe0 check_preemption_disabled+0xd1/0xe0 decrement_ttl+0x217/0x830 __ip_vs_get_out_rt+0x4e0/0x1ef0 ip_vs_nat_xmit+0x205/0xcd0 ip_vs_in_hook+0x9b1/0x26a0 nf_hook_slow+0xc2/0x210 nf_hook+0x1fb/0x770 __ip_local_out+0x33b/0x640 ip_local_out+0x2a/0x490 __ip_queue_xmit+0x990/0x1d10 __tcp_transmit_skb+0x288b/0x3d10 tcp_connect+0x3466/0x5180 tcp_v4_connect+0x1535/0x1bb0 __inet_stream_connect+0x40d/0x1040 inet_stream_connect+0x57/0xa0 __sys_connect_file+0x162/0x1a0 __sys_connect+0x137/0x160 __x64_sys_connect+0x72/0xb0 do_syscall_64+0x6f/0x140 entry_SYSCALL_64_after_hwframe+0x6e/0x76 RIP: 0033:0x7fe6dbbc34e0 Use the corresponding preemption-aware variants: IP_INC_STATS and IP6_INC_STATS. Found by Linux Verification Center (linuxtesting.org). Fixes: 8d8e20e ("ipvs: Decrement ttl") Signed-off-by: Fedor Pchelkin <[email protected]> Acked-by: Julian Anastasov <[email protected]> Acked-by: Simon Horman <[email protected]> Signed-off-by: Pablo Neira Ayuso <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
[ Upstream commit d6938c1 ] Inside decrement_ttl() upon discovering that the packet ttl has exceeded, __IP_INC_STATS and __IP6_INC_STATS macros can be called from preemptible context having the following backtrace: check_preemption_disabled: 48 callbacks suppressed BUG: using __this_cpu_add() in preemptible [00000000] code: curl/1177 caller is decrement_ttl+0x217/0x830 CPU: 5 PID: 1177 Comm: curl Not tainted 6.7.0+ #34 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 04/01/2014 Call Trace: <TASK> dump_stack_lvl+0xbd/0xe0 check_preemption_disabled+0xd1/0xe0 decrement_ttl+0x217/0x830 __ip_vs_get_out_rt+0x4e0/0x1ef0 ip_vs_nat_xmit+0x205/0xcd0 ip_vs_in_hook+0x9b1/0x26a0 nf_hook_slow+0xc2/0x210 nf_hook+0x1fb/0x770 __ip_local_out+0x33b/0x640 ip_local_out+0x2a/0x490 __ip_queue_xmit+0x990/0x1d10 __tcp_transmit_skb+0x288b/0x3d10 tcp_connect+0x3466/0x5180 tcp_v4_connect+0x1535/0x1bb0 __inet_stream_connect+0x40d/0x1040 inet_stream_connect+0x57/0xa0 __sys_connect_file+0x162/0x1a0 __sys_connect+0x137/0x160 __x64_sys_connect+0x72/0xb0 do_syscall_64+0x6f/0x140 entry_SYSCALL_64_after_hwframe+0x6e/0x76 RIP: 0033:0x7fe6dbbc34e0 Use the corresponding preemption-aware variants: IP_INC_STATS and IP6_INC_STATS. Found by Linux Verification Center (linuxtesting.org). Fixes: 8d8e20e ("ipvs: Decrement ttl") Signed-off-by: Fedor Pchelkin <[email protected]> Acked-by: Julian Anastasov <[email protected]> Acked-by: Simon Horman <[email protected]> Signed-off-by: Pablo Neira Ayuso <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
scx: Convert user space schedulers to use __{s,u}{32,64} types
[ Upstream commit d6938c1 ] Inside decrement_ttl() upon discovering that the packet ttl has exceeded, __IP_INC_STATS and __IP6_INC_STATS macros can be called from preemptible context having the following backtrace: check_preemption_disabled: 48 callbacks suppressed BUG: using __this_cpu_add() in preemptible [00000000] code: curl/1177 caller is decrement_ttl+0x217/0x830 CPU: 5 PID: 1177 Comm: curl Not tainted 6.7.0+ raspberrypi#34 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 04/01/2014 Call Trace: <TASK> dump_stack_lvl+0xbd/0xe0 check_preemption_disabled+0xd1/0xe0 decrement_ttl+0x217/0x830 __ip_vs_get_out_rt+0x4e0/0x1ef0 ip_vs_nat_xmit+0x205/0xcd0 ip_vs_in_hook+0x9b1/0x26a0 nf_hook_slow+0xc2/0x210 nf_hook+0x1fb/0x770 __ip_local_out+0x33b/0x640 ip_local_out+0x2a/0x490 __ip_queue_xmit+0x990/0x1d10 __tcp_transmit_skb+0x288b/0x3d10 tcp_connect+0x3466/0x5180 tcp_v4_connect+0x1535/0x1bb0 __inet_stream_connect+0x40d/0x1040 inet_stream_connect+0x57/0xa0 __sys_connect_file+0x162/0x1a0 __sys_connect+0x137/0x160 __x64_sys_connect+0x72/0xb0 do_syscall_64+0x6f/0x140 entry_SYSCALL_64_after_hwframe+0x6e/0x76 RIP: 0033:0x7fe6dbbc34e0 Use the corresponding preemption-aware variants: IP_INC_STATS and IP6_INC_STATS. Found by Linux Verification Center (linuxtesting.org). Fixes: 8d8e20e ("ipvs: Decrement ttl") Signed-off-by: Fedor Pchelkin <[email protected]> Acked-by: Julian Anastasov <[email protected]> Acked-by: Simon Horman <[email protected]> Signed-off-by: Pablo Neira Ayuso <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
[ Upstream commit d6938c1 ] Inside decrement_ttl() upon discovering that the packet ttl has exceeded, __IP_INC_STATS and __IP6_INC_STATS macros can be called from preemptible context having the following backtrace: check_preemption_disabled: 48 callbacks suppressed BUG: using __this_cpu_add() in preemptible [00000000] code: curl/1177 caller is decrement_ttl+0x217/0x830 CPU: 5 PID: 1177 Comm: curl Not tainted 6.7.0+ #34 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 04/01/2014 Call Trace: <TASK> dump_stack_lvl+0xbd/0xe0 check_preemption_disabled+0xd1/0xe0 decrement_ttl+0x217/0x830 __ip_vs_get_out_rt+0x4e0/0x1ef0 ip_vs_nat_xmit+0x205/0xcd0 ip_vs_in_hook+0x9b1/0x26a0 nf_hook_slow+0xc2/0x210 nf_hook+0x1fb/0x770 __ip_local_out+0x33b/0x640 ip_local_out+0x2a/0x490 __ip_queue_xmit+0x990/0x1d10 __tcp_transmit_skb+0x288b/0x3d10 tcp_connect+0x3466/0x5180 tcp_v4_connect+0x1535/0x1bb0 __inet_stream_connect+0x40d/0x1040 inet_stream_connect+0x57/0xa0 __sys_connect_file+0x162/0x1a0 __sys_connect+0x137/0x160 __x64_sys_connect+0x72/0xb0 do_syscall_64+0x6f/0x140 entry_SYSCALL_64_after_hwframe+0x6e/0x76 RIP: 0033:0x7fe6dbbc34e0 Use the corresponding preemption-aware variants: IP_INC_STATS and IP6_INC_STATS. Found by Linux Verification Center (linuxtesting.org). Fixes: 8d8e20e ("ipvs: Decrement ttl") Signed-off-by: Fedor Pchelkin <[email protected]> Acked-by: Julian Anastasov <[email protected]> Acked-by: Simon Horman <[email protected]> Signed-off-by: Pablo Neira Ayuso <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
When l2tp tunnels use a socket provided by userspace, we can hit lockdep splats like the below when data is transmitted through another (unrelated) userspace socket which then gets routed over l2tp. This issue was previously discussed here: https://lore.kernel.org/netdev/[email protected]/ The solution is to have lockdep treat socket locks of l2tp tunnel sockets separately than those of standard INET sockets. To do so, use a different lockdep subclass where lock nesting is possible. ============================================ WARNING: possible recursive locking detected 6.10.0+ #34 Not tainted -------------------------------------------- iperf3/771 is trying to acquire lock: ffff8881027601d8 (slock-AF_INET/1){+.-.}-{2:2}, at: l2tp_xmit_skb+0x243/0x9d0 but task is already holding lock: ffff888102650d98 (slock-AF_INET/1){+.-.}-{2:2}, at: tcp_v4_rcv+0x1848/0x1e10 other info that might help us debug this: Possible unsafe locking scenario: CPU0 ---- lock(slock-AF_INET/1); lock(slock-AF_INET/1); *** DEADLOCK *** May be due to missing lock nesting notation 10 locks held by iperf3/771: #0: ffff888102650258 (sk_lock-AF_INET){+.+.}-{0:0}, at: tcp_sendmsg+0x1a/0x40 #1: ffffffff822ac220 (rcu_read_lock){....}-{1:2}, at: __ip_queue_xmit+0x4b/0xbc0 #2: ffffffff822ac220 (rcu_read_lock){....}-{1:2}, at: ip_finish_output2+0x17a/0x1130 #3: ffffffff822ac220 (rcu_read_lock){....}-{1:2}, at: process_backlog+0x28b/0x9f0 #4: ffffffff822ac220 (rcu_read_lock){....}-{1:2}, at: ip_local_deliver_finish+0xf9/0x260 #5: ffff888102650d98 (slock-AF_INET/1){+.-.}-{2:2}, at: tcp_v4_rcv+0x1848/0x1e10 #6: ffffffff822ac220 (rcu_read_lock){....}-{1:2}, at: __ip_queue_xmit+0x4b/0xbc0 #7: ffffffff822ac220 (rcu_read_lock){....}-{1:2}, at: ip_finish_output2+0x17a/0x1130 #8: ffffffff822ac1e0 (rcu_read_lock_bh){....}-{1:2}, at: __dev_queue_xmit+0xcc/0x1450 #9: ffff888101f33258 (dev->qdisc_tx_busylock ?: &qdisc_tx_busylock#2){+...}-{2:2}, at: __dev_queue_xmit+0x513/0x1450 stack backtrace: CPU: 2 UID: 0 PID: 771 Comm: iperf3 Not tainted 6.10.0+ #34 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014 Call Trace: <IRQ> dump_stack_lvl+0x69/0xa0 dump_stack+0xc/0x20 __lock_acquire+0x135d/0x2600 ? srso_alias_return_thunk+0x5/0xfbef5 lock_acquire+0xc4/0x2a0 ? l2tp_xmit_skb+0x243/0x9d0 ? __skb_checksum+0xa3/0x540 _raw_spin_lock_nested+0x35/0x50 ? l2tp_xmit_skb+0x243/0x9d0 l2tp_xmit_skb+0x243/0x9d0 l2tp_eth_dev_xmit+0x3c/0xc0 dev_hard_start_xmit+0x11e/0x420 sch_direct_xmit+0xc3/0x640 __dev_queue_xmit+0x61c/0x1450 ? ip_finish_output2+0xf4c/0x1130 ip_finish_output2+0x6b6/0x1130 ? srso_alias_return_thunk+0x5/0xfbef5 ? __ip_finish_output+0x217/0x380 ? srso_alias_return_thunk+0x5/0xfbef5 __ip_finish_output+0x217/0x380 ip_output+0x99/0x120 __ip_queue_xmit+0xae4/0xbc0 ? srso_alias_return_thunk+0x5/0xfbef5 ? srso_alias_return_thunk+0x5/0xfbef5 ? tcp_options_write.constprop.0+0xcb/0x3e0 ip_queue_xmit+0x34/0x40 __tcp_transmit_skb+0x1625/0x1890 __tcp_send_ack+0x1b8/0x340 tcp_send_ack+0x23/0x30 __tcp_ack_snd_check+0xa8/0x530 ? srso_alias_return_thunk+0x5/0xfbef5 tcp_rcv_established+0x412/0xd70 tcp_v4_do_rcv+0x299/0x420 tcp_v4_rcv+0x1991/0x1e10 ip_protocol_deliver_rcu+0x50/0x220 ip_local_deliver_finish+0x158/0x260 ip_local_deliver+0xc8/0xe0 ip_rcv+0xe5/0x1d0 ? __pfx_ip_rcv+0x10/0x10 __netif_receive_skb_one_core+0xce/0xe0 ? process_backlog+0x28b/0x9f0 __netif_receive_skb+0x34/0xd0 ? process_backlog+0x28b/0x9f0 process_backlog+0x2cb/0x9f0 __napi_poll.constprop.0+0x61/0x280 net_rx_action+0x332/0x670 ? srso_alias_return_thunk+0x5/0xfbef5 ? find_held_lock+0x2b/0x80 ? srso_alias_return_thunk+0x5/0xfbef5 ? srso_alias_return_thunk+0x5/0xfbef5 handle_softirqs+0xda/0x480 ? __dev_queue_xmit+0xa2c/0x1450 do_softirq+0xa1/0xd0 </IRQ> <TASK> __local_bh_enable_ip+0xc8/0xe0 ? __dev_queue_xmit+0xa2c/0x1450 __dev_queue_xmit+0xa48/0x1450 ? ip_finish_output2+0xf4c/0x1130 ip_finish_output2+0x6b6/0x1130 ? srso_alias_return_thunk+0x5/0xfbef5 ? __ip_finish_output+0x217/0x380 ? srso_alias_return_thunk+0x5/0xfbef5 __ip_finish_output+0x217/0x380 ip_output+0x99/0x120 __ip_queue_xmit+0xae4/0xbc0 ? srso_alias_return_thunk+0x5/0xfbef5 ? srso_alias_return_thunk+0x5/0xfbef5 ? tcp_options_write.constprop.0+0xcb/0x3e0 ip_queue_xmit+0x34/0x40 __tcp_transmit_skb+0x1625/0x1890 tcp_write_xmit+0x766/0x2fb0 ? __entry_text_end+0x102ba9/0x102bad ? srso_alias_return_thunk+0x5/0xfbef5 ? __might_fault+0x74/0xc0 ? srso_alias_return_thunk+0x5/0xfbef5 __tcp_push_pending_frames+0x56/0x190 tcp_push+0x117/0x310 tcp_sendmsg_locked+0x14c1/0x1740 tcp_sendmsg+0x28/0x40 inet_sendmsg+0x5d/0x90 sock_write_iter+0x242/0x2b0 vfs_write+0x68d/0x800 ? __pfx_sock_write_iter+0x10/0x10 ksys_write+0xc8/0xf0 __x64_sys_write+0x3d/0x50 x64_sys_call+0xfaf/0x1f50 do_syscall_64+0x6d/0x140 entry_SYSCALL_64_after_hwframe+0x76/0x7e RIP: 0033:0x7f4d143af992 Code: c3 8b 07 85 c0 75 24 49 89 fb 48 89 f0 48 89 d7 48 89 ce 4c 89 c2 4d 89 ca 4c 8b 44 24 08 4c 8b 4c 24 10 4c 89 5c 24 08 0f 05 <c3> e9 01 cc ff ff 41 54 b8 02 00 00 0 RSP: 002b:00007ffd65032058 EFLAGS: 00000246 ORIG_RAX: 0000000000000001 RAX: ffffffffffffffda RBX: 0000000000000001 RCX: 00007f4d143af992 RDX: 0000000000000025 RSI: 00007f4d143f3bcc RDI: 0000000000000005 RBP: 00007f4d143f2b28 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 00007f4d143f3bcc R13: 0000000000000005 R14: 0000000000000000 R15: 00007ffd650323f0 </TASK> Fixes: 0b2c597 ("l2tp: close all race conditions in l2tp_tunnel_register()") Suggested-by: Eric Dumazet <[email protected]> Reported-by: [email protected] Closes: https://syzkaller.appspot.com/bug?extid=6acef9e0a4d1f46c83d4 CC: [email protected] CC: [email protected] Signed-off-by: James Chapman <[email protected]> Signed-off-by: Tom Parkin <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
[ Upstream commit 86a41ea ] When l2tp tunnels use a socket provided by userspace, we can hit lockdep splats like the below when data is transmitted through another (unrelated) userspace socket which then gets routed over l2tp. This issue was previously discussed here: https://lore.kernel.org/netdev/[email protected]/ The solution is to have lockdep treat socket locks of l2tp tunnel sockets separately than those of standard INET sockets. To do so, use a different lockdep subclass where lock nesting is possible. ============================================ WARNING: possible recursive locking detected 6.10.0+ #34 Not tainted -------------------------------------------- iperf3/771 is trying to acquire lock: ffff8881027601d8 (slock-AF_INET/1){+.-.}-{2:2}, at: l2tp_xmit_skb+0x243/0x9d0 but task is already holding lock: ffff888102650d98 (slock-AF_INET/1){+.-.}-{2:2}, at: tcp_v4_rcv+0x1848/0x1e10 other info that might help us debug this: Possible unsafe locking scenario: CPU0 ---- lock(slock-AF_INET/1); lock(slock-AF_INET/1); *** DEADLOCK *** May be due to missing lock nesting notation 10 locks held by iperf3/771: #0: ffff888102650258 (sk_lock-AF_INET){+.+.}-{0:0}, at: tcp_sendmsg+0x1a/0x40 #1: ffffffff822ac220 (rcu_read_lock){....}-{1:2}, at: __ip_queue_xmit+0x4b/0xbc0 #2: ffffffff822ac220 (rcu_read_lock){....}-{1:2}, at: ip_finish_output2+0x17a/0x1130 #3: ffffffff822ac220 (rcu_read_lock){....}-{1:2}, at: process_backlog+0x28b/0x9f0 #4: ffffffff822ac220 (rcu_read_lock){....}-{1:2}, at: ip_local_deliver_finish+0xf9/0x260 #5: ffff888102650d98 (slock-AF_INET/1){+.-.}-{2:2}, at: tcp_v4_rcv+0x1848/0x1e10 #6: ffffffff822ac220 (rcu_read_lock){....}-{1:2}, at: __ip_queue_xmit+0x4b/0xbc0 #7: ffffffff822ac220 (rcu_read_lock){....}-{1:2}, at: ip_finish_output2+0x17a/0x1130 #8: ffffffff822ac1e0 (rcu_read_lock_bh){....}-{1:2}, at: __dev_queue_xmit+0xcc/0x1450 #9: ffff888101f33258 (dev->qdisc_tx_busylock ?: &qdisc_tx_busylock#2){+...}-{2:2}, at: __dev_queue_xmit+0x513/0x1450 stack backtrace: CPU: 2 UID: 0 PID: 771 Comm: iperf3 Not tainted 6.10.0+ #34 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014 Call Trace: <IRQ> dump_stack_lvl+0x69/0xa0 dump_stack+0xc/0x20 __lock_acquire+0x135d/0x2600 ? srso_alias_return_thunk+0x5/0xfbef5 lock_acquire+0xc4/0x2a0 ? l2tp_xmit_skb+0x243/0x9d0 ? __skb_checksum+0xa3/0x540 _raw_spin_lock_nested+0x35/0x50 ? l2tp_xmit_skb+0x243/0x9d0 l2tp_xmit_skb+0x243/0x9d0 l2tp_eth_dev_xmit+0x3c/0xc0 dev_hard_start_xmit+0x11e/0x420 sch_direct_xmit+0xc3/0x640 __dev_queue_xmit+0x61c/0x1450 ? ip_finish_output2+0xf4c/0x1130 ip_finish_output2+0x6b6/0x1130 ? srso_alias_return_thunk+0x5/0xfbef5 ? __ip_finish_output+0x217/0x380 ? srso_alias_return_thunk+0x5/0xfbef5 __ip_finish_output+0x217/0x380 ip_output+0x99/0x120 __ip_queue_xmit+0xae4/0xbc0 ? srso_alias_return_thunk+0x5/0xfbef5 ? srso_alias_return_thunk+0x5/0xfbef5 ? tcp_options_write.constprop.0+0xcb/0x3e0 ip_queue_xmit+0x34/0x40 __tcp_transmit_skb+0x1625/0x1890 __tcp_send_ack+0x1b8/0x340 tcp_send_ack+0x23/0x30 __tcp_ack_snd_check+0xa8/0x530 ? srso_alias_return_thunk+0x5/0xfbef5 tcp_rcv_established+0x412/0xd70 tcp_v4_do_rcv+0x299/0x420 tcp_v4_rcv+0x1991/0x1e10 ip_protocol_deliver_rcu+0x50/0x220 ip_local_deliver_finish+0x158/0x260 ip_local_deliver+0xc8/0xe0 ip_rcv+0xe5/0x1d0 ? __pfx_ip_rcv+0x10/0x10 __netif_receive_skb_one_core+0xce/0xe0 ? process_backlog+0x28b/0x9f0 __netif_receive_skb+0x34/0xd0 ? process_backlog+0x28b/0x9f0 process_backlog+0x2cb/0x9f0 __napi_poll.constprop.0+0x61/0x280 net_rx_action+0x332/0x670 ? srso_alias_return_thunk+0x5/0xfbef5 ? find_held_lock+0x2b/0x80 ? srso_alias_return_thunk+0x5/0xfbef5 ? srso_alias_return_thunk+0x5/0xfbef5 handle_softirqs+0xda/0x480 ? __dev_queue_xmit+0xa2c/0x1450 do_softirq+0xa1/0xd0 </IRQ> <TASK> __local_bh_enable_ip+0xc8/0xe0 ? __dev_queue_xmit+0xa2c/0x1450 __dev_queue_xmit+0xa48/0x1450 ? ip_finish_output2+0xf4c/0x1130 ip_finish_output2+0x6b6/0x1130 ? srso_alias_return_thunk+0x5/0xfbef5 ? __ip_finish_output+0x217/0x380 ? srso_alias_return_thunk+0x5/0xfbef5 __ip_finish_output+0x217/0x380 ip_output+0x99/0x120 __ip_queue_xmit+0xae4/0xbc0 ? srso_alias_return_thunk+0x5/0xfbef5 ? srso_alias_return_thunk+0x5/0xfbef5 ? tcp_options_write.constprop.0+0xcb/0x3e0 ip_queue_xmit+0x34/0x40 __tcp_transmit_skb+0x1625/0x1890 tcp_write_xmit+0x766/0x2fb0 ? __entry_text_end+0x102ba9/0x102bad ? srso_alias_return_thunk+0x5/0xfbef5 ? __might_fault+0x74/0xc0 ? srso_alias_return_thunk+0x5/0xfbef5 __tcp_push_pending_frames+0x56/0x190 tcp_push+0x117/0x310 tcp_sendmsg_locked+0x14c1/0x1740 tcp_sendmsg+0x28/0x40 inet_sendmsg+0x5d/0x90 sock_write_iter+0x242/0x2b0 vfs_write+0x68d/0x800 ? __pfx_sock_write_iter+0x10/0x10 ksys_write+0xc8/0xf0 __x64_sys_write+0x3d/0x50 x64_sys_call+0xfaf/0x1f50 do_syscall_64+0x6d/0x140 entry_SYSCALL_64_after_hwframe+0x76/0x7e RIP: 0033:0x7f4d143af992 Code: c3 8b 07 85 c0 75 24 49 89 fb 48 89 f0 48 89 d7 48 89 ce 4c 89 c2 4d 89 ca 4c 8b 44 24 08 4c 8b 4c 24 10 4c 89 5c 24 08 0f 05 <c3> e9 01 cc ff ff 41 54 b8 02 00 00 0 RSP: 002b:00007ffd65032058 EFLAGS: 00000246 ORIG_RAX: 0000000000000001 RAX: ffffffffffffffda RBX: 0000000000000001 RCX: 00007f4d143af992 RDX: 0000000000000025 RSI: 00007f4d143f3bcc RDI: 0000000000000005 RBP: 00007f4d143f2b28 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 00007f4d143f3bcc R13: 0000000000000005 R14: 0000000000000000 R15: 00007ffd650323f0 </TASK> Fixes: 0b2c597 ("l2tp: close all race conditions in l2tp_tunnel_register()") Suggested-by: Eric Dumazet <[email protected]> Reported-by: [email protected] Closes: https://syzkaller.appspot.com/bug?extid=6acef9e0a4d1f46c83d4 CC: [email protected] CC: [email protected] Signed-off-by: James Chapman <[email protected]> Signed-off-by: Tom Parkin <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Jakub Kicinski <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
[ Upstream commit 86a41ea ] When l2tp tunnels use a socket provided by userspace, we can hit lockdep splats like the below when data is transmitted through another (unrelated) userspace socket which then gets routed over l2tp. This issue was previously discussed here: https://lore.kernel.org/netdev/[email protected]/ The solution is to have lockdep treat socket locks of l2tp tunnel sockets separately than those of standard INET sockets. To do so, use a different lockdep subclass where lock nesting is possible. ============================================ WARNING: possible recursive locking detected 6.10.0+ #34 Not tainted -------------------------------------------- iperf3/771 is trying to acquire lock: ffff8881027601d8 (slock-AF_INET/1){+.-.}-{2:2}, at: l2tp_xmit_skb+0x243/0x9d0 but task is already holding lock: ffff888102650d98 (slock-AF_INET/1){+.-.}-{2:2}, at: tcp_v4_rcv+0x1848/0x1e10 other info that might help us debug this: Possible unsafe locking scenario: CPU0 ---- lock(slock-AF_INET/1); lock(slock-AF_INET/1); *** DEADLOCK *** May be due to missing lock nesting notation 10 locks held by iperf3/771: #0: ffff888102650258 (sk_lock-AF_INET){+.+.}-{0:0}, at: tcp_sendmsg+0x1a/0x40 #1: ffffffff822ac220 (rcu_read_lock){....}-{1:2}, at: __ip_queue_xmit+0x4b/0xbc0 #2: ffffffff822ac220 (rcu_read_lock){....}-{1:2}, at: ip_finish_output2+0x17a/0x1130 #3: ffffffff822ac220 (rcu_read_lock){....}-{1:2}, at: process_backlog+0x28b/0x9f0 #4: ffffffff822ac220 (rcu_read_lock){....}-{1:2}, at: ip_local_deliver_finish+0xf9/0x260 #5: ffff888102650d98 (slock-AF_INET/1){+.-.}-{2:2}, at: tcp_v4_rcv+0x1848/0x1e10 #6: ffffffff822ac220 (rcu_read_lock){....}-{1:2}, at: __ip_queue_xmit+0x4b/0xbc0 #7: ffffffff822ac220 (rcu_read_lock){....}-{1:2}, at: ip_finish_output2+0x17a/0x1130 #8: ffffffff822ac1e0 (rcu_read_lock_bh){....}-{1:2}, at: __dev_queue_xmit+0xcc/0x1450 #9: ffff888101f33258 (dev->qdisc_tx_busylock ?: &qdisc_tx_busylock#2){+...}-{2:2}, at: __dev_queue_xmit+0x513/0x1450 stack backtrace: CPU: 2 UID: 0 PID: 771 Comm: iperf3 Not tainted 6.10.0+ #34 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014 Call Trace: <IRQ> dump_stack_lvl+0x69/0xa0 dump_stack+0xc/0x20 __lock_acquire+0x135d/0x2600 ? srso_alias_return_thunk+0x5/0xfbef5 lock_acquire+0xc4/0x2a0 ? l2tp_xmit_skb+0x243/0x9d0 ? __skb_checksum+0xa3/0x540 _raw_spin_lock_nested+0x35/0x50 ? l2tp_xmit_skb+0x243/0x9d0 l2tp_xmit_skb+0x243/0x9d0 l2tp_eth_dev_xmit+0x3c/0xc0 dev_hard_start_xmit+0x11e/0x420 sch_direct_xmit+0xc3/0x640 __dev_queue_xmit+0x61c/0x1450 ? ip_finish_output2+0xf4c/0x1130 ip_finish_output2+0x6b6/0x1130 ? srso_alias_return_thunk+0x5/0xfbef5 ? __ip_finish_output+0x217/0x380 ? srso_alias_return_thunk+0x5/0xfbef5 __ip_finish_output+0x217/0x380 ip_output+0x99/0x120 __ip_queue_xmit+0xae4/0xbc0 ? srso_alias_return_thunk+0x5/0xfbef5 ? srso_alias_return_thunk+0x5/0xfbef5 ? tcp_options_write.constprop.0+0xcb/0x3e0 ip_queue_xmit+0x34/0x40 __tcp_transmit_skb+0x1625/0x1890 __tcp_send_ack+0x1b8/0x340 tcp_send_ack+0x23/0x30 __tcp_ack_snd_check+0xa8/0x530 ? srso_alias_return_thunk+0x5/0xfbef5 tcp_rcv_established+0x412/0xd70 tcp_v4_do_rcv+0x299/0x420 tcp_v4_rcv+0x1991/0x1e10 ip_protocol_deliver_rcu+0x50/0x220 ip_local_deliver_finish+0x158/0x260 ip_local_deliver+0xc8/0xe0 ip_rcv+0xe5/0x1d0 ? __pfx_ip_rcv+0x10/0x10 __netif_receive_skb_one_core+0xce/0xe0 ? process_backlog+0x28b/0x9f0 __netif_receive_skb+0x34/0xd0 ? process_backlog+0x28b/0x9f0 process_backlog+0x2cb/0x9f0 __napi_poll.constprop.0+0x61/0x280 net_rx_action+0x332/0x670 ? srso_alias_return_thunk+0x5/0xfbef5 ? find_held_lock+0x2b/0x80 ? srso_alias_return_thunk+0x5/0xfbef5 ? srso_alias_return_thunk+0x5/0xfbef5 handle_softirqs+0xda/0x480 ? __dev_queue_xmit+0xa2c/0x1450 do_softirq+0xa1/0xd0 </IRQ> <TASK> __local_bh_enable_ip+0xc8/0xe0 ? __dev_queue_xmit+0xa2c/0x1450 __dev_queue_xmit+0xa48/0x1450 ? ip_finish_output2+0xf4c/0x1130 ip_finish_output2+0x6b6/0x1130 ? srso_alias_return_thunk+0x5/0xfbef5 ? __ip_finish_output+0x217/0x380 ? srso_alias_return_thunk+0x5/0xfbef5 __ip_finish_output+0x217/0x380 ip_output+0x99/0x120 __ip_queue_xmit+0xae4/0xbc0 ? srso_alias_return_thunk+0x5/0xfbef5 ? srso_alias_return_thunk+0x5/0xfbef5 ? tcp_options_write.constprop.0+0xcb/0x3e0 ip_queue_xmit+0x34/0x40 __tcp_transmit_skb+0x1625/0x1890 tcp_write_xmit+0x766/0x2fb0 ? __entry_text_end+0x102ba9/0x102bad ? srso_alias_return_thunk+0x5/0xfbef5 ? __might_fault+0x74/0xc0 ? srso_alias_return_thunk+0x5/0xfbef5 __tcp_push_pending_frames+0x56/0x190 tcp_push+0x117/0x310 tcp_sendmsg_locked+0x14c1/0x1740 tcp_sendmsg+0x28/0x40 inet_sendmsg+0x5d/0x90 sock_write_iter+0x242/0x2b0 vfs_write+0x68d/0x800 ? __pfx_sock_write_iter+0x10/0x10 ksys_write+0xc8/0xf0 __x64_sys_write+0x3d/0x50 x64_sys_call+0xfaf/0x1f50 do_syscall_64+0x6d/0x140 entry_SYSCALL_64_after_hwframe+0x76/0x7e RIP: 0033:0x7f4d143af992 Code: c3 8b 07 85 c0 75 24 49 89 fb 48 89 f0 48 89 d7 48 89 ce 4c 89 c2 4d 89 ca 4c 8b 44 24 08 4c 8b 4c 24 10 4c 89 5c 24 08 0f 05 <c3> e9 01 cc ff ff 41 54 b8 02 00 00 0 RSP: 002b:00007ffd65032058 EFLAGS: 00000246 ORIG_RAX: 0000000000000001 RAX: ffffffffffffffda RBX: 0000000000000001 RCX: 00007f4d143af992 RDX: 0000000000000025 RSI: 00007f4d143f3bcc RDI: 0000000000000005 RBP: 00007f4d143f2b28 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 00007f4d143f3bcc R13: 0000000000000005 R14: 0000000000000000 R15: 00007ffd650323f0 </TASK> Fixes: 0b2c597 ("l2tp: close all race conditions in l2tp_tunnel_register()") Suggested-by: Eric Dumazet <[email protected]> Reported-by: [email protected] Closes: https://syzkaller.appspot.com/bug?extid=6acef9e0a4d1f46c83d4 CC: [email protected] CC: [email protected] Signed-off-by: James Chapman <[email protected]> Signed-off-by: Tom Parkin <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Jakub Kicinski <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
Fix a shift-out-of-bounds bug reported by UBSAN when running VM with MTE enabled host kernel. UBSAN: shift-out-of-bounds in arch/arm64/kvm/sys_regs.c:1988:14 shift exponent 33 is too large for 32-bit type 'int' CPU: 26 UID: 0 PID: 7629 Comm: qemu-kvm Not tainted 6.12.0-rc2 #34 Hardware name: IEI NF5280R7/Mitchell MB, BIOS 00.00. 2024-10-12 09:28:54 10/14/2024 Call trace: dump_backtrace+0xa0/0x128 show_stack+0x20/0x38 dump_stack_lvl+0x74/0x90 dump_stack+0x18/0x28 __ubsan_handle_shift_out_of_bounds+0xf8/0x1e0 reset_clidr+0x10c/0x1c8 kvm_reset_sys_regs+0x50/0x1c8 kvm_reset_vcpu+0xec/0x2b0 __kvm_vcpu_set_target+0x84/0x158 kvm_vcpu_set_target+0x138/0x168 kvm_arch_vcpu_ioctl_vcpu_init+0x40/0x2b0 kvm_arch_vcpu_ioctl+0x28c/0x4b8 kvm_vcpu_ioctl+0x4bc/0x7a8 __arm64_sys_ioctl+0xb4/0x100 invoke_syscall+0x70/0x100 el0_svc_common.constprop.0+0x48/0xf0 do_el0_svc+0x24/0x38 el0_svc+0x3c/0x158 el0t_64_sync_handler+0x120/0x130 el0t_64_sync+0x194/0x198 Fixes: 7af0c25 ("KVM: arm64: Normalize cache configuration") Cc: [email protected] Reviewed-by: Gavin Shan <[email protected]> Signed-off-by: Ilkka Koskinen <[email protected]> Reviewed-by: Anshuman Khandual <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Marc Zyngier <[email protected]>
commit c6c167a upstream. Fix a shift-out-of-bounds bug reported by UBSAN when running VM with MTE enabled host kernel. UBSAN: shift-out-of-bounds in arch/arm64/kvm/sys_regs.c:1988:14 shift exponent 33 is too large for 32-bit type 'int' CPU: 26 UID: 0 PID: 7629 Comm: qemu-kvm Not tainted 6.12.0-rc2 #34 Hardware name: IEI NF5280R7/Mitchell MB, BIOS 00.00. 2024-10-12 09:28:54 10/14/2024 Call trace: dump_backtrace+0xa0/0x128 show_stack+0x20/0x38 dump_stack_lvl+0x74/0x90 dump_stack+0x18/0x28 __ubsan_handle_shift_out_of_bounds+0xf8/0x1e0 reset_clidr+0x10c/0x1c8 kvm_reset_sys_regs+0x50/0x1c8 kvm_reset_vcpu+0xec/0x2b0 __kvm_vcpu_set_target+0x84/0x158 kvm_vcpu_set_target+0x138/0x168 kvm_arch_vcpu_ioctl_vcpu_init+0x40/0x2b0 kvm_arch_vcpu_ioctl+0x28c/0x4b8 kvm_vcpu_ioctl+0x4bc/0x7a8 __arm64_sys_ioctl+0xb4/0x100 invoke_syscall+0x70/0x100 el0_svc_common.constprop.0+0x48/0xf0 do_el0_svc+0x24/0x38 el0_svc+0x3c/0x158 el0t_64_sync_handler+0x120/0x130 el0t_64_sync+0x194/0x198 Fixes: 7af0c25 ("KVM: arm64: Normalize cache configuration") Cc: [email protected] Reviewed-by: Gavin Shan <[email protected]> Signed-off-by: Ilkka Koskinen <[email protected]> Reviewed-by: Anshuman Khandual <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Marc Zyngier <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
commit c6c167a upstream. Fix a shift-out-of-bounds bug reported by UBSAN when running VM with MTE enabled host kernel. UBSAN: shift-out-of-bounds in arch/arm64/kvm/sys_regs.c:1988:14 shift exponent 33 is too large for 32-bit type 'int' CPU: 26 UID: 0 PID: 7629 Comm: qemu-kvm Not tainted 6.12.0-rc2 #34 Hardware name: IEI NF5280R7/Mitchell MB, BIOS 00.00. 2024-10-12 09:28:54 10/14/2024 Call trace: dump_backtrace+0xa0/0x128 show_stack+0x20/0x38 dump_stack_lvl+0x74/0x90 dump_stack+0x18/0x28 __ubsan_handle_shift_out_of_bounds+0xf8/0x1e0 reset_clidr+0x10c/0x1c8 kvm_reset_sys_regs+0x50/0x1c8 kvm_reset_vcpu+0xec/0x2b0 __kvm_vcpu_set_target+0x84/0x158 kvm_vcpu_set_target+0x138/0x168 kvm_arch_vcpu_ioctl_vcpu_init+0x40/0x2b0 kvm_arch_vcpu_ioctl+0x28c/0x4b8 kvm_vcpu_ioctl+0x4bc/0x7a8 __arm64_sys_ioctl+0xb4/0x100 invoke_syscall+0x70/0x100 el0_svc_common.constprop.0+0x48/0xf0 do_el0_svc+0x24/0x38 el0_svc+0x3c/0x158 el0t_64_sync_handler+0x120/0x130 el0t_64_sync+0x194/0x198 Fixes: 7af0c25 ("KVM: arm64: Normalize cache configuration") Cc: [email protected] Reviewed-by: Gavin Shan <[email protected]> Signed-off-by: Ilkka Koskinen <[email protected]> Reviewed-by: Anshuman Khandual <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Marc Zyngier <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
trying to unzip the current zip ( downloaded from the "zip" link on main page ) fails with this error :
( more exactly dowloaded from https://github.com/raspberrypi/linux/zipball/rpi-patches )
unzip raspberrypi-linux-b683e89.zip
finishing deferred symbolic links:
raspberrypi-linux-b683e89/arch/microblaze/boot/dts/system.dts -> ../../platform/generic/system.dts
raspberrypi-linux-b683e89/drivers/scsi/aacraid/linit.c -> /^J *^IAdaptec AAC series RAID controller driver^J *^I(c) Copyright 2001 Red Hat Inc.^J *^J * based on the old aacraid driver that is..^J * Adaptec aacraid device driver for Linux.^J *^J * Copyright (c) 2000-2010 Adaptec, Inc.^J * 2010 PMC-Sierra, Inc. ([email protected])^J *^J * This program is free software; you can redistribute it and/or modify^J * it under the terms of the GNU General Public License as published by^J * the Free Software Foundation; either version 2, or (at your option)^J * any later version.^J *^J * This program is distributed in the hope that it will be useful,^J * but WITHOUT ANY WARRANTY; without even the implied warranty of^J * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the^J * GNU General Public License for more details.^J *^J * You should have received a copy of the GNU General Public License^J * along with this program; see the file COPYING. If not, write to^J * the Free Software Foundation, 675 Mass Ave, Cambridge, MA 02139, USA.^J *^J * Module Name:^J * linit.c^J *^J * Abstract: Linux Driver entry module for Adaptec RAID Array Controller^J */^J^J^J#include <linux/compat.h>^J#include <linux/blkdev.h>^J#include <linux/completion.h>^J#include <linux/init.h>^J#include <linux/interrupt.h>^J#include <linux/kernel.h>^J#include <linux/module.h>^J#include <linux/moduleparam.h>^J#include <linux/pci.h>^J#include <linux/pci-aspm.h>^J#include <linux/slab.h>^J#include <linux/mutex.h>^J#include <linux/spinlock.h>^J#include <linux/syscalls.h>^J#include <linux/delay.h>^J#include <linux/kthread.h>^J^J#include <scsi/scsi.h>^J#include <scsi/scsi_cmnd.h>^J#include <scsi/scsi_device.h>^J#include <scsi/scsi_host.h>^J#include <scsi/scsi_tcq.h>^J#include <scsi/scsicam.h>^J#include <scsi/scsi_eh.h>^J^J#include "aacraid.h"^J^J#define AAC_DRIVER_VERSION^I^I"1.1-7"^J#ifndef AAC_DRIVER_BRANCH^J#define AAC_DRIVER_BRANCH^I^I""^J#endif^J#define AAC_DRIVERNAME^I^I^I"aacraid"^J^J#ifdef AAC_DRIVER_BUILD^J#define str(x) #x^J#define str(x) str(x)^J#define AAC_DRIVER_FULL_VERSION^IAAC_DRIVER_VERSION "[" str(AAC_DRIVER_BUILD) "]" AAC_DRIVER_BRANCH^J#else^J#define AAC_DRIVER_FULL_VERSION^IAAC_DRIVER_VERSION AAC_DRIVER_BRANCH^J#endif^J^JMODULE_AUTHOR("Red Hat Inc and Adaptec");^JMODULE_DESCRIPTION("Dell PERC2, 2/Si, 3/Si, 3/Di, "^J^I^I "Adaptec Advanced Raid Products, "^J^I^I "HP NetRAID-4M, IBM ServeRAID & ICP SCSI driver");^JMODULE_LICENSE("GPL");^JMODULE_VERSION(AAC_DRIVER_FULL_VERSION);^J^Jstatic DEFINE_MUTEX(aac_mutex);^Jstatic LIST_HEAD(aac_devices);^Jstatic int aac_cfg_major = -1;^Jchar aac_driver_version[] = AAC_DRIVER_FULL_VERSION;^J^J/^J * Because of the way Linux names scsi devices, the order in this table has^J * become important. Check for on-board Raid first, add-in cards second.^J ^J * Note: The last field is used to index into aac_drivers below.^J */^J#ifdef DECLARE_PCI_DEVICE_TABLE^Jstatic DECLARE_PCI_DEVICE_TABLE(aac_pci_tbl) = {^J#elif defined(__devinitconst)^Jstatic const struct pci_device_id aac_pci_tbl[] _devinitconst = {^J#else^Jstatic const struct pci_device_id aac_pci_tbl[] devinitdata = {^J#endif^J^I{ 0x1028, 0x0001, 0x1028, 0x0001, 0, 0, 0 }, / PERC 2/Si (Iguana/PERC2Si) /^J^I{ 0x1028, 0x0002, 0x1028, 0x0002, 0, 0, 1 }, / PERC 3/Di (Opal/PERC3Di) /^J^I{ 0x1028, 0x0003, 0x1028, 0x0003, 0, 0, 2 }, / PERC 3/Si (SlimFast/PERC3Si /^J^I{ 0x1028, 0x0004, 0x1028, 0x00d0, 0, 0, 3 }, / PERC 3/Di (Iguana FlipChip/PERC3DiF /^J^I{ 0x1028, 0x0002, 0x1028, 0x00d1, 0, 0, 4 }, / PERC 3/Di (Viper/PERC3DiV) /^J^I{ 0x1028, 0x0002, 0x1028, 0x00d9, 0, 0, 5 }, / PERC 3/Di (Lexus/PERC3DiL) /^J^I{ 0x1028, 0x000a, 0x1028, 0x0106, 0, 0, 6 }, / PERC 3/Di (Jaguar/PERC3DiJ) /^J^I{ 0x1028, 0x000a, 0x1028, 0x011b, 0, 0, 7 }, / PERC 3/Di (Dagger/PERC3DiD) /^J^I{ 0x1028, 0x000a, 0x1028, 0x0121, 0, 0, 8 }, / PERC 3/Di (Boxster/PERC3DiB) /^J^I{ 0x9005, 0x0283, 0x9005, 0x0283, 0, 0, 9 }, / catapult /^J^I{ 0x9005, 0x0284, 0x9005, 0x0284, 0, 0, 10 }, / tomcat /^J^I{ 0x9005, 0x0285, 0x9005, 0x0286, 0, 0, 11 }, / Adaptec 2120S (Crusader) /^J^I{ 0x9005, 0x0285, 0x9005, 0x0285, 0, 0, 12 }, / Adaptec 2200S (Vulcan) /^J^I{ 0x9005, 0x0285, 0x9005, 0x0287, 0, 0, 13 }, / Adaptec 2200S (Vulcan-2m) /^J^I{ 0x9005, 0x0285, 0x17aa, 0x0286, 0, 0, 14 }, / Legend S220 (Legend Crusader) /^J^I{ 0x9005, 0x0285, 0x17aa, 0x0287, 0, 0, 15 }, / Legend S230 (Legend Vulcan) /^J^J^I{ 0x9005, 0x0285, 0x9005, 0x0288, 0, 0, 16 }, / Adaptec 3230S (Harrier) /^J^I{ 0x9005, 0x0285, 0x9005, 0x0289, 0, 0, 17 }, / Adaptec 3240S (Tornado) /^J^I{ 0x9005, 0x0285, 0x9005, 0x028a, 0, 0, 18 }, / ASR-2020ZCR SCSI PCI-X ZCR (Skyhawk) /^J^I{ 0x9005, 0x0285, 0x9005, 0x028b, 0, 0, 19 }, / ASR-2025ZCR SCSI SO-DIMM PCI-X ZCR (Terminator) /^J^I{ 0x9005, 0x0286, 0x9005, 0x028c, 0, 0, 20 }, / ASR-2230S + ASR-2230SLP PCI-X (Lancer) /^J^I{ 0x9005, 0x0286, 0x9005, 0x028d, 0, 0, 21 }, / ASR-2130S (Lancer) /^J^I{ 0x9005, 0x0286, 0x9005, 0x029b, 0, 0, 22 }, / AAR-2820SA (Intruder) /^J^I{ 0x9005, 0x0286, 0x9005, 0x029c, 0, 0, 23 }, / AAR-2620SA (Intruder) /^J^I{ 0x9005, 0x0286, 0x9005, 0x029d, 0, 0, 24 }, / AAR-2420SA (Intruder) /^J^I{ 0x9005, 0x0286, 0x9005, 0x029e, 0, 0, 25 }, / ICP9024RO (Lancer) /^J^I{ 0x9005, 0x0286, 0x9005, 0x029f, 0, 0, 26 }, / ICP9014RO (Lancer) /^J^I{ 0x9005, 0x0286, 0x9005, 0x02a0, 0, 0, 27 }, / ICP9047MA (Lancer) /^J^I{ 0x9005, 0x0286, 0x9005, 0x02a1, 0, 0, 28 }, / ICP9087MA (Lancer) /^J^I{ 0x9005, 0x0286, 0x9005, 0x02a3, 0, 0, 29 }, / ICP5445AU (Hurricane44) /^J^I{ 0x9005, 0x0285, 0x9005, 0x02a4, 0, 0, 30 }, / ICP9085LI (Marauder-X) /^J^I{ 0x9005, 0x0285, 0x9005, 0x02a5, 0, 0, 31 }, / ICP5085BR (Marauder-E) /^J^I{ 0x9005, 0x0286, 0x9005, 0x02a6, 0, 0, 32 }, / ICP9067MA (Intruder-6) /^J^I{ 0x9005, 0x0287, 0x9005, 0x0800, 0, 0, 33 }, / Themisto Jupiter Platform /^J^I{ 0x9005, 0x0200, 0x9005, 0x0200, 0, 0, 33 }, / Themisto Jupiter Platform /^J^I{ 0x9005, 0x0286, 0x9005, 0x0800, 0, 0, 34 }, / Callisto Jupiter Platform /^J^I{ 0x9005, 0x0285, 0x9005, 0x028e, 0, 0, 35 }, / ASR-2020SA SATA PCI-X ZCR (Skyhawk) /^J^I{ 0x9005, 0x0285, 0x9005, 0x028f, 0, 0, 36 }, / ASR-2025SA SATA SO-DIMM PCI-X ZCR (Terminator) /^J^I{ 0x9005, 0x0285, 0x9005, 0x0290, 0, 0, 37 }, / AAR-2410SA PCI SATA 4ch (Jaguar II) /^J^I{ 0x9005, 0x0285, 0x1028, 0x0291, 0, 0, 38 }, / CERC SATA RAID 2 PCI SATA 6ch (DellCorsair) /^J^I{ 0x9005, 0x0285, 0x9005, 0x0292, 0, 0, 39 }, / AAR-2810SA PCI SATA 8ch (Corsair-8) /^J^I{ 0x9005, 0x0285, 0x9005, 0x0293, 0, 0, 40 }, / AAR-21610SA PCI SATA 16ch (Corsair-16) /^J^I{ 0x9005, 0x0285, 0x9005, 0x0294, 0, 0, 41 }, / ESD SO-DIMM PCI-X SATA ZCR (Prowler) /^J^I{ 0x9005, 0x0285, 0x103C, 0x3227, 0, 0, 42 }, / AAR-2610SA PCI SATA 6ch /^J^I{ 0x9005, 0x0285, 0x9005, 0x0296, 0, 0, 43 }, / ASR-2240S (SabreExpress) /^J^I{ 0x9005, 0x0285, 0x9005, 0x0297, 0, 0, 44 }, / ASR-4005 /^J^I{ 0x9005, 0x0285, 0x1014, 0x02F2, 0, 0, 45 }, / IBM 8i (AvonPark) /^J^I{ 0x9005, 0x0285, 0x1014, 0x0312, 0, 0, 45 }, / IBM 8i (AvonPark Lite) /^J^I{ 0x9005, 0x0286, 0x1014, 0x9580, 0, 0, 46 }, / IBM 8k/8k-l8 (Aurora) /^J^I{ 0x9005, 0x0286, 0x1014, 0x9540, 0, 0, 47 }, / IBM 8k/8k-l4 (Aurora Lite) /^J^I{ 0x9005, 0x0285, 0x9005, 0x0298, 0, 0, 48 }, / ASR-4000 (BlackBird) /^J^I{ 0x9005, 0x0285, 0x9005, 0x0299, 0, 0, 49 }, / ASR-4800SAS (Marauder-X) /^J^I{ 0x9005, 0x0285, 0x9005, 0x029a, 0, 0, 50 }, / ASR-4805SAS (Marauder-E) /^J^I{ 0x9005, 0x0286, 0x9005, 0x02a2, 0, 0, 51 }, / ASR-3800 (Hurricane44) /^J^J^I{ 0x9005, 0x0285, 0x1028, 0x0287, 0, 0, 52 }, / Perc 320/DC/^J^I{ 0x1011, 0x0046, 0x9005, 0x0365, 0, 0, 53 }, / Adaptec 5400S (Mustang)/^J^I{ 0x1011, 0x0046, 0x9005, 0x0364, 0, 0, 54 }, / Adaptec 5400S (Mustang)/^J^I{ 0x1011, 0x0046, 0x9005, 0x1364, 0, 0, 55 }, /_ Dell PERC2/QC /^J^I{ 0x1011, 0x0046, 0x103c, 0x10c2, 0, 0, 56 }, / HP NetRAID-4M /^J^J^I{ 0x9005, 0x0285, 0x1028, PCI_ANY_ID, 0, 0, 57 }, / Dell Catchall /^J^I{ 0x9005, 0x0285, 0x17aa, PCI_ANY_ID, 0, 0, 58 }, / Legend Catchall /^J^I{ 0x9005, 0x0285, PCI_ANY_ID, PCI_ANY_ID, 0, 0, 59 }, / Adaptec Catch All /^J^I{ 0x9005, 0x0286, PCI_ANY_ID, PCI_ANY_ID, 0, 0, 60 }, / Adaptec Rocket Catch All /^J^I{ 0x9005, 0x0288, PCI_ANY_ID, PCI_ANY_ID, 0, 0, 61 }, / Adaptec NEMER/ARK Catch All /^J^I{ 0x9005, 0x028b, PCI_ANY_ID, PCI_ANY_ID, 0, 0, 62 }, / Adaptec PMC Catch All /^J^I{ 0,}^J};^JMODULE_DEVICE_TABLE(pci, aac_pci_tbl);^J^J/^J * dmb - For now we add the number of channels to this structure.^J * In the future we should add a fib that reports the number of channels^J * for the card. At that time we can remove the channels from here^J /^Jstatic struct aac_driver_ident aac_drivers[] = {^J^I{ aac_rx_init, "percraid", "DELL ", "PERCRAID ", 2, AAC_QUIRK_31BIT | AAC_QUIRK_34SG | AAC_QUIRK_SCSI_32 }, / PERC 2/Si (Iguana/PERC2Si) /^J^I{ aac_rx_init, "percraid", "DELL ", "PERCRAID ", 2, AAC_QUIRK_31BIT | AAC_QUIRK_34SG | AAC_QUIRK_SCSI_32 }, / PERC 3/Di (Opal/PERC3Di) /^J^I{ aac_rx_init, "percraid", "DELL ", "PERCRAID ", 2, AAC_QUIRK_31BIT | AAC_QUIRK_34SG | AAC_QUIRK_SCSI_32 }, / PERC 3/Si (SlimFast/PERC3Si /^J^I{ aac_rx_init, "percraid", "DELL ", "PERCRAID ", 2, AAC_QUIRK_31BIT | AAC_QUIRK_34SG | AAC_QUIRK_SCSI_32 }, / PERC 3/Di (Iguana FlipChip/PERC3DiF /^J^I{ aac_rx_init, "percraid", "DELL ", "PERCRAID ", 2, AAC_QUIRK_31BIT | AAC_QUIRK_34SG | AAC_QUIRK_SCSI_32 }, / PERC 3/Di (Viper/PERC3DiV) /^J^I{ aac_rx_init, "percraid", "DELL ", "PERCRAID ", 2, AAC_QUIRK_31BIT | AAC_QUIRK_34SG | AAC_QUIRK_SCSI_32 }, / PERC 3/Di (Lexus/PERC3DiL) /^J^I{ aac_rx_init, "percraid", "DELL ", "PERCRAID ", 1, AAC_QUIRK_31BIT | AAC_QUIRK_34SG | AAC_QUIRK_SCSI_32 }, / PERC 3/Di (Jaguar/PERC3DiJ) /^J^I{ aac_rx_init, "percraid", "DELL ", "PERCRAID ", 2, AAC_QUIRK_31BIT | AAC_QUIRK_34SG | AAC_QUIRK_SCSI_32 }, / PERC 3/Di (Dagger/PERC3DiD) /^J^I{ aac_rx_init, "percraid", "DELL ", "PERCRAID ", 2, AAC_QUIRK_31BIT | AAC_QUIRK_34SG | AAC_QUIRK_SCSI_32 }, / PERC 3/Di (Boxster/PERC3DiB) /^J^I{ aac_rx_init, "aacraid", "ADAPTEC ", "catapult ", 2, AAC_QUIRK_31BIT | AAC_QUIRK_34SG | AAC_QUIRK_SCSI_32 }, / catapult /^J^I{ aac_rx_init, "aacraid", "ADAPTEC ", "tomcat ", 2, AAC_QUIRK_31BIT | AAC_QUIRK_34SG | AAC_QUIRK_SCSI_32 }, / tomcat /^J^I{ aac_rx_init, "aacraid", "ADAPTEC ", "Adaptec 2120S ", 1, AAC_QUIRK_31BIT | AAC_QUIRK_34SG },^I^I / Adaptec 2120S (Crusader) /^J^I{ aac_rx_init, "aacraid", "ADAPTEC ", "Adaptec 2200S ", 2, AAC_QUIRK_31BIT | AAC_QUIRK_34SG },^I^I / Adaptec 2200S (Vulcan) /^J^I{ aac_rx_init, "aacraid", "ADAPTEC ", "Adaptec 2200S ", 2, AAC_QUIRK_31BIT | AAC_QUIRK_34SG | AAC_QUIRK_SCSI_32 }, / Adaptec 2200S (Vulcan-2m) /^J^I{ aac_rx_init, "aacraid", "Legend ", "Legend S220 ", 1, AAC_QUIRK_31BIT | AAC_QUIRK_34SG | AAC_QUIRK_SCSI_32 }, / Legend S220 (Legend Crusader) /^J^I{ aac_rx_init, "aacraid", "Legend ", "Legend S230 ", 2, AAC_QUIRK_31BIT | AAC_QUIRK_34SG | AAC_QUIRK_SCSI_32 }, / Legend S230 (Legend Vulcan) /^J^J^I{ aac_rx_init, "aacraid", "ADAPTEC ", "Adaptec 3230S ", 2 }, / Adaptec 3230S (Harrier) /^J^I{ aac_rx_init, "aacraid", "ADAPTEC ", "Adaptec 3240S ", 2 }, / Adaptec 3240S (Tornado) /^J^I{ aac_rx_init, "aacraid", "ADAPTEC ", "ASR-2020ZCR ", 2 }, / ASR-2020ZCR SCSI PCI-X ZCR (Skyhawk) /^J^I{ aac_rx_init, "aacraid", "ADAPTEC ", "ASR-2025ZCR ", 2 }, / ASR-2025ZCR SCSI SO-DIMM PCI-X ZCR (Terminator) /^J^I{ aac_rkt_init, "aacraid", "ADAPTEC ", "ASR-2230S PCI-X ", 2 }, / ASR-2230S + ASR-2230SLP PCI-X (Lancer) /^J^I{ aac_rkt_init, "aacraid", "ADAPTEC ", "ASR-2130S PCI-X ", 1 }, / ASR-2130S (Lancer) /^J^I{ aac_rkt_init, "aacraid", "ADAPTEC ", "AAR-2820SA ", 1 }, / AAR-2820SA (Intruder) /^J^I{ aac_rkt_init, "aacraid", "ADAPTEC ", "AAR-2620SA ", 1 }, / AAR-2620SA (Intruder) /^J^I{ aac_rkt_init, "aacraid", "ADAPTEC ", "AAR-2420SA ", 1 }, / AAR-2420SA (Intruder) /^J^I{ aac_rkt_init, "aacraid", "ICP ", "ICP9024RO ", 2 }, / ICP9024RO (Lancer) /^J^I{ aac_rkt_init, "aacraid", "ICP ", "ICP9014RO ", 1 }, / ICP9014RO (Lancer) /^J^I{ aac_rkt_init, "aacraid", "ICP ", "ICP9047MA ", 1 }, / ICP9047MA (Lancer) /^J^I{ aac_rkt_init, "aacraid", "ICP ", "ICP9087MA ", 1 }, / ICP9087MA (Lancer) /^J^I{ aac_rkt_init, "aacraid", "ICP ", "ICP5445AU ", 1 }, / ICP5445AU (Hurricane44) /^J^I{ aac_rx_init, "aacraid", "ICP ", "ICP9085LI ", 1 }, / ICP9085LI (Marauder-X) /^J^I{ aac_rx_init, "aacraid", "ICP ", "ICP5085BR ", 1 }, / ICP5085BR (Marauder-E) /^J^I{ aac_rkt_init, "aacraid", "ICP ", "ICP9067MA ", 1 }, / ICP9067MA (Intruder-6) /^J^I{ NULL , "aacraid", "ADAPTEC ", "Themisto ", 0, AAC_QUIRK_SLAVE }, / Jupiter Platform /^J^I{ aac_rkt_init, "aacraid", "ADAPTEC ", "Callisto ", 2, AAC_QUIRK_MASTER }, / Jupiter Platform /^J^I{ aac_rx_init, "aacraid", "ADAPTEC ", "ASR-2020SA ", 1 }, / ASR-2020SA SATA PCI-X ZCR (Skyhawk) /^J^I{ aac_rx_init, "aacraid", "ADAPTEC ", "ASR-2025SA ", 1 }, / ASR-2025SA SATA SO-DIMM PCI-X ZCR (Terminator) /^J^I{ aac_rx_init, "aacraid", "ADAPTEC ", "AAR-2410SA SATA ", 1, AAC_QUIRK_17SG }, / AAR-2410SA PCI SATA 4ch (Jaguar II) /^J^I{ aac_rx_init, "aacraid", "DELL ", "CERC SR2 ", 1, AAC_QUIRK_17SG }, / CERC SATA RAID 2 PCI SATA 6ch (DellCorsair) /^J^I{ aac_rx_init, "aacraid", "ADAPTEC ", "AAR-2810SA SATA ", 1, AAC_QUIRK_17SG }, / AAR-2810SA PCI SATA 8ch (Corsair-8) /^J^I{ aac_rx_init, "aacraid", "ADAPTEC ", "AAR-21610SA SATA", 1, AAC_QUIRK_17SG }, / AAR-21610SA PCI SATA 16ch (Corsair-16) /^J^I{ aac_rx_init, "aacraid", "ADAPTEC ", "ASR-2026ZCR ", 1 }, / ESD SO-DIMM PCI-X SATA ZCR (Prowler) /^J^I{ aac_rx_init, "aacraid", "ADAPTEC ", "AAR-2610SA ", 1 }, / SATA 6Ch (Bearcat) /^J^I{ aac_rx_init, "aacraid", "ADAPTEC ", "ASR-2240S ", 1 }, / ASR-2240S (SabreExpress) /^J^I{ aac_rx_init, "aacraid", "ADAPTEC ", "ASR-4005 ", 1 }, / ASR-4005 /^J^I{ aac_rx_init, "ServeRAID","IBM ", "ServeRAID 8i ", 1 }, / IBM 8i (AvonPark) /^J^I{ aac_rkt_init, "ServeRAID","IBM ", "ServeRAID 8k-l8 ", 1 }, / IBM 8k/8k-l8 (Aurora) /^J^I{ aac_rkt_init, "ServeRAID","IBM ", "ServeRAID 8k-l4 ", 1 }, / IBM 8k/8k-l4 (Aurora Lite) /^J^I{ aac_rx_init, "aacraid", "ADAPTEC ", "ASR-4000 ", 1 }, / ASR-4000 (BlackBird & AvonPark) /^J^I{ aac_rx_init, "aacraid", "ADAPTEC ", "ASR-4800SAS ", 1 }, / ASR-4800SAS (Marauder-X) /^J^I{ aac_rx_init, "aacraid", "ADAPTEC ", "ASR-4805SAS ", 1 }, / ASR-4805SAS (Marauder-E) /^J^I{ aac_rkt_init, "aacraid", "ADAPTEC ", "ASR-3800 ", 1 }, / ASR-3800 (Hurricane44) /^J^J^I{ aac_rx_init, "percraid", "DELL ", "PERC 320/DC ", 2, AAC_QUIRK_31BIT | AAC_QUIRK_34SG }, / Perc 320/DC_/^J^I{ aac_sa_init, "aacraid", "ADAPTEC ", "Adaptec 5400S ", 4, AAC_QUIRK_34SG }, /_ Adaptec 5400S (Mustang)/^J^I{ aac_sa_init, "aacraid", "ADAPTEC ", "AAC-364 ", 4, AAC_QUIRK_34SG }, / Adaptec 5400S (Mustang)/^J^I{ aac_sa_init, "percraid", "DELL ", "PERCRAID ", 4, AAC_QUIRK_34SG }, / Dell PERC2/QC /^J^I{ aac_sa_init, "hpnraid", "HP ", "NetRAID ", 4, AAC_QUIRK_34SG }, / HP NetRAID-4M /^J^J^I{ aac_rx_init, "aacraid", "DELL ", "RAID ", 2, AAC_QUIRK_31BIT | AAC_QUIRK_34SG | AAC_QUIRK_SCSI_32 }, / Dell Catchall /^J^I{ aac_rx_init, "aacraid", "Legend ", "RAID ", 2, AAC_QUIRK_31BIT | AAC_QUIRK_34SG | AAC_QUIRK_SCSI_32 }, / Legend Catchall /^J^I{ aac_rx_init, "aacraid", "ADAPTEC ", "RAID ", 2 }, / Adaptec Catch All /^J^I{ aac_rkt_init, "aacraid", "ADAPTEC ", "RAID ", 2 }, / Adaptec Rocket Catch All /^J^I{ aac_nark_init, "aacraid", "ADAPTEC ", "RAID ", 2 }, / Adaptec NEMER/ARK Catch All /^J^I{ aac_src_init, "aacraid", "ADAPTEC ", "RAID ", 2 } / Adaptec PMC Catch All _/^J};^J^J/__^J *^Iaac_queuecommand^I-^Iqueue a SCSI command^J *^I@cmd:^I^ISCSI command to queue^J *^I@done:^I^IFunction to call on command completion^J *^J *^IQueues a command for execution by the associated Host Adapter.^J *^J *^ITODO: unify with aac_scsi_cmd().^J */^J^Jstatic int aac_queuecommand_lck(struct scsi_cmnd *cmd, void (_done)(struct scsi_cmnd *)...
symlink error: File name too long
The text was updated successfully, but these errors were encountered: