-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rumble Support Not Working in fftest #1
Comments
The following lines can be found in dmesg: [Feb25 19:39] (null): strong_mag=49152 weak_mag=49152 |
Nevermind about this issue. I looked at your code, it seems the functionality is not yet implemented. I will close this ticket. |
Yeah sorry I'd been pushing unfinished work. I just got rumble working, so the latest patch should support it. |
Hi, I tested your latest patch. Rumble is working. Thank you for your awesome work. I am waiting for your driver to support gyro. |
[ Upstream commit 2a5ff07 ] We keep receiving syzbot reports [1] that show that tunnels do not play the rcu/IFF_UP rules properly. At device dismantle phase, gro_cells_destroy() will be called only after a full rcu grace period is observed after IFF_UP has been cleared. This means that IFF_UP needs to be tested before queueing packets into netif_rx() or gro_cells. This patch implements the test in gro_cells_receive() because too many callers do not seem to bother enough. [1] BUG: unable to handle kernel paging request at fffff4ca0b9ffffe PGD 0 P4D 0 Oops: 0000 [#1] PREEMPT SMP KASAN CPU: 0 PID: 21 Comm: kworker/u4:1 Not tainted 5.0.0+ #97 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Workqueue: netns cleanup_net RIP: 0010:__skb_unlink include/linux/skbuff.h:1929 [inline] RIP: 0010:__skb_dequeue include/linux/skbuff.h:1945 [inline] RIP: 0010:__skb_queue_purge include/linux/skbuff.h:2656 [inline] RIP: 0010:gro_cells_destroy net/core/gro_cells.c:89 [inline] RIP: 0010:gro_cells_destroy+0x19d/0x360 net/core/gro_cells.c:78 Code: 03 42 80 3c 20 00 0f 85 53 01 00 00 48 8d 7a 08 49 8b 47 08 49 c7 07 00 00 00 00 48 89 f9 49 c7 47 08 00 00 00 00 48 c1 e9 03 <42> 80 3c 21 00 0f 85 10 01 00 00 48 89 c1 48 89 42 08 48 c1 e9 03 RSP: 0018:ffff8880aa3f79a8 EFLAGS: 00010a02 RAX: 00ffffffffffffe8 RBX: ffffe8ffffc64b70 RCX: 1ffff8ca0b9ffffe RDX: ffffc6505cffffe8 RSI: ffffffff858410ca RDI: ffffc6505cfffff0 RBP: ffff8880aa3f7a08 R08: ffff8880aa3e8580 R09: fffffbfff1263645 R10: fffffbfff1263644 R11: ffffffff8931b223 R12: dffffc0000000000 R13: 0000000000000000 R14: ffffe8ffffc64b80 R15: ffffe8ffffc64b75 kobject: 'loop2' (000000004bd7d84a): kobject_uevent_env FS: 0000000000000000(0000) GS:ffff8880ae800000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: fffff4ca0b9ffffe CR3: 0000000094941000 CR4: 00000000001406f0 Call Trace: kobject: 'loop2' (000000004bd7d84a): fill_kobj_path: path = '/devices/virtual/block/loop2' ip_tunnel_dev_free+0x19/0x60 net/ipv4/ip_tunnel.c:1010 netdev_run_todo+0x51c/0x7d0 net/core/dev.c:8970 rtnl_unlock+0xe/0x10 net/core/rtnetlink.c:116 ip_tunnel_delete_nets+0x423/0x5f0 net/ipv4/ip_tunnel.c:1124 vti_exit_batch_net+0x23/0x30 net/ipv4/ip_vti.c:495 ops_exit_list.isra.0+0x105/0x160 net/core/net_namespace.c:156 cleanup_net+0x3fb/0x960 net/core/net_namespace.c:551 process_one_work+0x98e/0x1790 kernel/workqueue.c:2173 worker_thread+0x98/0xe40 kernel/workqueue.c:2319 kthread+0x357/0x430 kernel/kthread.c:246 ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:352 Modules linked in: CR2: fffff4ca0b9ffffe [ end trace 513fc9c1338d1cb3 ] RIP: 0010:__skb_unlink include/linux/skbuff.h:1929 [inline] RIP: 0010:__skb_dequeue include/linux/skbuff.h:1945 [inline] RIP: 0010:__skb_queue_purge include/linux/skbuff.h:2656 [inline] RIP: 0010:gro_cells_destroy net/core/gro_cells.c:89 [inline] RIP: 0010:gro_cells_destroy+0x19d/0x360 net/core/gro_cells.c:78 Code: 03 42 80 3c 20 00 0f 85 53 01 00 00 48 8d 7a 08 49 8b 47 08 49 c7 07 00 00 00 00 48 89 f9 49 c7 47 08 00 00 00 00 48 c1 e9 03 <42> 80 3c 21 00 0f 85 10 01 00 00 48 89 c1 48 89 42 08 48 c1 e9 03 RSP: 0018:ffff8880aa3f79a8 EFLAGS: 00010a02 RAX: 00ffffffffffffe8 RBX: ffffe8ffffc64b70 RCX: 1ffff8ca0b9ffffe RDX: ffffc6505cffffe8 RSI: ffffffff858410ca RDI: ffffc6505cfffff0 RBP: ffff8880aa3f7a08 R08: ffff8880aa3e8580 R09: fffffbfff1263645 R10: fffffbfff1263644 R11: ffffffff8931b223 R12: dffffc0000000000 kobject: 'loop3' (00000000e4ee57a6): kobject_uevent_env R13: 0000000000000000 R14: ffffe8ffffc64b80 R15: ffffe8ffffc64b75 FS: 0000000000000000(0000) GS:ffff8880ae800000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: fffff4ca0b9ffffe CR3: 0000000094941000 CR4: 00000000001406f0 Fixes: c9e6bc6 ("net: add gro_cells infrastructure") Signed-off-by: Eric Dumazet <[email protected]> Reported-by: syzbot <[email protected]> Signed-off-by: David S. Miller <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
[ Upstream commit 1e02796 ] syzbot found another add_timer() issue, this time in net/hsr [1] Let's use mod_timer() which is safe. [1] kernel BUG at kernel/time/timer.c:1136! invalid opcode: 0000 [#1] PREEMPT SMP KASAN CPU: 0 PID: 15909 Comm: syz-executor.3 Not tainted 5.0.0+ #97 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 kobject: 'loop2' (00000000f5629718): kobject_uevent_env RIP: 0010:add_timer kernel/time/timer.c:1136 [inline] RIP: 0010:add_timer+0x654/0xbe0 kernel/time/timer.c:1134 Code: 0f 94 c5 31 ff 44 89 ee e8 09 61 0f 00 45 84 ed 0f 84 77 fd ff ff e8 bb 5f 0f 00 e8 07 10 a0 ff e9 68 fd ff ff e8 ac 5f 0f 00 <0f> 0b e8 a5 5f 0f 00 0f 0b e8 9e 5f 0f 00 4c 89 b5 58 ff ff ff e9 RSP: 0018:ffff8880656eeca0 EFLAGS: 00010246 kobject: 'loop2' (00000000f5629718): fill_kobj_path: path = '/devices/virtual/block/loop2' RAX: 0000000000040000 RBX: 1ffff1100caddd9a RCX: ffffc9000c436000 RDX: 0000000000040000 RSI: ffffffff816056c4 RDI: ffff88806a2f6cc8 RBP: ffff8880656eed58 R08: ffff888067f4a300 R09: ffff888067f4abc8 R10: 0000000000000000 R11: 0000000000000000 R12: ffff88806a2f6cc0 R13: dffffc0000000000 R14: 0000000000000001 R15: ffff8880656eed30 FS: 00007fc2019bf700(0000) GS:ffff8880ae800000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000738000 CR3: 0000000067e8e000 CR4: 00000000001406f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: hsr_check_announce net/hsr/hsr_device.c:99 [inline] hsr_check_carrier_and_operstate+0x567/0x6f0 net/hsr/hsr_device.c:120 hsr_netdev_notify+0x297/0xa00 net/hsr/hsr_main.c:51 notifier_call_chain+0xc7/0x240 kernel/notifier.c:93 __raw_notifier_call_chain kernel/notifier.c:394 [inline] raw_notifier_call_chain+0x2e/0x40 kernel/notifier.c:401 call_netdevice_notifiers_info+0x3f/0x90 net/core/dev.c:1739 call_netdevice_notifiers_extack net/core/dev.c:1751 [inline] call_netdevice_notifiers net/core/dev.c:1765 [inline] dev_open net/core/dev.c:1436 [inline] dev_open+0x143/0x160 net/core/dev.c:1424 team_port_add drivers/net/team/team.c:1203 [inline] team_add_slave+0xa07/0x15d0 drivers/net/team/team.c:1933 do_set_master net/core/rtnetlink.c:2358 [inline] do_set_master+0x1d4/0x230 net/core/rtnetlink.c:2332 do_setlink+0x966/0x3510 net/core/rtnetlink.c:2493 rtnl_setlink+0x271/0x3b0 net/core/rtnetlink.c:2747 rtnetlink_rcv_msg+0x465/0xb00 net/core/rtnetlink.c:5192 netlink_rcv_skb+0x17a/0x460 net/netlink/af_netlink.c:2485 rtnetlink_rcv+0x1d/0x30 net/core/rtnetlink.c:5210 netlink_unicast_kernel net/netlink/af_netlink.c:1310 [inline] netlink_unicast+0x536/0x720 net/netlink/af_netlink.c:1336 netlink_sendmsg+0x8ae/0xd70 net/netlink/af_netlink.c:1925 sock_sendmsg_nosec net/socket.c:622 [inline] sock_sendmsg+0xdd/0x130 net/socket.c:632 sock_write_iter+0x27c/0x3e0 net/socket.c:923 call_write_iter include/linux/fs.h:1869 [inline] do_iter_readv_writev+0x5e0/0x8e0 fs/read_write.c:680 do_iter_write fs/read_write.c:956 [inline] do_iter_write+0x184/0x610 fs/read_write.c:937 vfs_writev+0x1b3/0x2f0 fs/read_write.c:1001 do_writev+0xf6/0x290 fs/read_write.c:1036 __do_sys_writev fs/read_write.c:1109 [inline] __se_sys_writev fs/read_write.c:1106 [inline] __x64_sys_writev+0x75/0xb0 fs/read_write.c:1106 do_syscall_64+0x103/0x610 arch/x86/entry/common.c:290 entry_SYSCALL_64_after_hwframe+0x49/0xbe RIP: 0033:0x457f29 Code: ad b8 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 7b b8 fb ff c3 66 2e 0f 1f 84 00 00 00 00 RSP: 002b:00007fc2019bec78 EFLAGS: 00000246 ORIG_RAX: 0000000000000014 RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 0000000000457f29 RDX: 0000000000000001 RSI: 00000000200000c0 RDI: 0000000000000003 RBP: 000000000073bf00 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 00007fc2019bf6d4 R13: 00000000004c4a60 R14: 00000000004dd218 R15: 00000000ffffffff Fixes: f421436 ("net/hsr: Add support for the High-availability Seamless Redundancy protocol (HSRv0)") Signed-off-by: Eric Dumazet <[email protected]> Reported-by: syzbot <[email protected]> Cc: Arvid Brodin <[email protected]> Signed-off-by: David S. Miller <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
[ Upstream commit ee74d0b ] In case x25_connect() fails and frees the socket neighbour, we also need to undo the change done to x25->state. Before my last bug fix, we had use-after-free so this patch fixes a latent bug. syzbot report : kasan: CONFIG_KASAN_INLINE enabled kasan: GPF could be caused by NULL-ptr deref or user memory access general protection fault: 0000 [#1] PREEMPT SMP KASAN CPU: 1 PID: 16137 Comm: syz-executor.1 Not tainted 5.0.0+ #117 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 RIP: 0010:x25_write_internal+0x1e8/0xdf0 net/x25/x25_subr.c:173 Code: 00 40 88 b5 e0 fe ff ff 0f 85 01 0b 00 00 48 8b 8b 80 04 00 00 48 ba 00 00 00 00 00 fc ff df 48 8d 79 1c 48 89 fe 48 c1 ee 03 <0f> b6 34 16 48 89 fa 83 e2 07 83 c2 03 40 38 f2 7c 09 40 84 f6 0f RSP: 0018:ffff888076717a08 EFLAGS: 00010207 RAX: ffff88805f2f2292 RBX: ffff8880a0ae6000 RCX: 0000000000000000 kobject: 'loop5' (0000000018d0d0ee): kobject_uevent_env RDX: dffffc0000000000 RSI: 0000000000000003 RDI: 000000000000001c RBP: ffff888076717b40 R08: ffff8880950e0580 R09: ffffed100be5e46d R10: ffffed100be5e46c R11: ffff88805f2f2363 R12: ffff888065579840 kobject: 'loop5' (0000000018d0d0ee): fill_kobj_path: path = '/devices/virtual/block/loop5' R13: 1ffff1100ece2f47 R14: 0000000000000013 R15: 0000000000000013 FS: 00007fb88cf43700(0000) GS:ffff8880ae900000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f9a42a41028 CR3: 0000000087a67000 CR4: 00000000001406e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: x25_release+0xd0/0x340 net/x25/af_x25.c:658 __sock_release+0xd3/0x2b0 net/socket.c:579 sock_close+0x1b/0x30 net/socket.c:1162 __fput+0x2df/0x8d0 fs/file_table.c:278 ____fput+0x16/0x20 fs/file_table.c:309 task_work_run+0x14a/0x1c0 kernel/task_work.c:113 get_signal+0x1961/0x1d50 kernel/signal.c:2388 do_signal+0x87/0x1940 arch/x86/kernel/signal.c:816 exit_to_usermode_loop+0x244/0x2c0 arch/x86/entry/common.c:162 prepare_exit_to_usermode arch/x86/entry/common.c:197 [inline] syscall_return_slowpath arch/x86/entry/common.c:268 [inline] do_syscall_64+0x52d/0x610 arch/x86/entry/common.c:293 entry_SYSCALL_64_after_hwframe+0x49/0xbe RIP: 0033:0x457f29 Code: ad b8 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 7b b8 fb ff c3 66 2e 0f 1f 84 00 00 00 00 RSP: 002b:00007fb88cf42c78 EFLAGS: 00000246 ORIG_RAX: 000000000000002a RAX: fffffffffffffe00 RBX: 0000000000000003 RCX: 0000000000457f29 RDX: 0000000000000012 RSI: 0000000020000080 RDI: 0000000000000004 RBP: 000000000073bf00 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 00007fb88cf436d4 R13: 00000000004be462 R14: 00000000004cec98 R15: 00000000ffffffff Modules linked in: Fixes: 95d6ebd ("net/x25: fix use-after-free in x25_device_event()") Signed-off-by: Eric Dumazet <[email protected]> Cc: andrew hendry <[email protected]> Reported-by: syzbot <[email protected]> Signed-off-by: David S. Miller <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
[ Upstream commit e15ce4b ] As part of unloading a device, the driver switches from FW command event mode to FW command polling mode. Part of switching over to polling mode is freeing the command context array memory (unfortunately, currently, without NULLing the command context array pointer). The reset flow calls "complete" to complete all outstanding fw commands (if we are in event mode). The check for event vs. polling mode here is to test if the command context array pointer is NULL. If the reset flow is activated after the switch to polling mode, it will attempt (incorrectly) to complete all the commands in the context array -- because the pointer was not NULLed when the driver switched over to polling mode. As a result, we have a use-after-free situation, which results in a kernel crash. For example: BUG: unable to handle kernel NULL pointer dereference at (null) IP: [<ffffffff876c4a8e>] __wake_up_common+0x2e/0x90 PGD 0 Oops: 0000 [#1] SMP Modules linked in: netconsole nfsv3 nfs_acl nfs lockd grace ... CPU: 2 PID: 940 Comm: kworker/2:3 Kdump: loaded Not tainted 3.10.0-862.el7.x86_64 #1 Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS 090006 04/28/2016 Workqueue: events hv_eject_device_work [pci_hyperv] task: ffff8d1734ca0fd0 ti: ffff8d17354bc000 task.ti: ffff8d17354bc000 RIP: 0010:[<ffffffff876c4a8e>] [<ffffffff876c4a8e>] __wake_up_common+0x2e/0x90 RSP: 0018:ffff8d17354bfa38 EFLAGS: 00010082 RAX: 0000000000000000 RBX: ffff8d17362d42c8 RCX: 0000000000000000 RDX: 0000000000000001 RSI: 0000000000000003 RDI: ffff8d17362d42c8 RBP: ffff8d17354bfa70 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000298 R11: ffff8d173610e000 R12: ffff8d17362d42d0 R13: 0000000000000246 R14: 0000000000000000 R15: 0000000000000003 FS: 0000000000000000(0000) GS:ffff8d1802680000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000000 CR3: 00000000f16d8000 CR4: 00000000001406e0 Call Trace: [<ffffffff876c7adc>] complete+0x3c/0x50 [<ffffffffc04242f0>] mlx4_cmd_wake_completions+0x70/0x90 [mlx4_core] [<ffffffffc041e7b1>] mlx4_enter_error_state+0xe1/0x380 [mlx4_core] [<ffffffffc041fa4b>] mlx4_comm_cmd+0x29b/0x360 [mlx4_core] [<ffffffffc041ff51>] __mlx4_cmd+0x441/0x920 [mlx4_core] [<ffffffff877f62b1>] ? __slab_free+0x81/0x2f0 [<ffffffff87951384>] ? __radix_tree_lookup+0x84/0xf0 [<ffffffffc043a8eb>] mlx4_free_mtt_range+0x5b/0xb0 [mlx4_core] [<ffffffffc043a957>] mlx4_mtt_cleanup+0x17/0x20 [mlx4_core] [<ffffffffc04272c7>] mlx4_free_eq+0xa7/0x1c0 [mlx4_core] [<ffffffffc042803e>] mlx4_cleanup_eq_table+0xde/0x130 [mlx4_core] [<ffffffffc0433e08>] mlx4_unload_one+0x118/0x300 [mlx4_core] [<ffffffffc0434191>] mlx4_remove_one+0x91/0x1f0 [mlx4_core] The fix is to set the command context array pointer to NULL after freeing the array. Fixes: f5aef5a ("net/mlx4_core: Activate reset flow upon fatal command cases") Signed-off-by: Jack Morgenstein <[email protected]> Signed-off-by: Tariq Toukan <[email protected]> Signed-off-by: David S. Miller <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
[ Upstream commit 4c404ce ] Previous to commit 22b5c0b ("vsock/virtio: fix kernel panic after device hot-unplug"), vsock_core_init() was called from virtio_vsock_probe(). Now, virtio_transport_reset_no_sock() can be called before vsock_core_init() has the chance to run. [Wed Feb 27 14:17:09 2019] BUG: unable to handle kernel NULL pointer dereference at 0000000000000110 [Wed Feb 27 14:17:09 2019] #PF error: [normal kernel read fault] [Wed Feb 27 14:17:09 2019] PGD 0 P4D 0 [Wed Feb 27 14:17:09 2019] Oops: 0000 [#1] SMP PTI [Wed Feb 27 14:17:09 2019] CPU: 3 PID: 59 Comm: kworker/3:1 Not tainted 5.0.0-rc7-390-generic-hvi #390 [Wed Feb 27 14:17:09 2019] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014 [Wed Feb 27 14:17:09 2019] Workqueue: virtio_vsock virtio_transport_rx_work [vmw_vsock_virtio_transport] [Wed Feb 27 14:17:09 2019] RIP: 0010:virtio_transport_reset_no_sock+0x8c/0xc0 [vmw_vsock_virtio_transport_common] [Wed Feb 27 14:17:09 2019] Code: 35 8b 4f 14 48 8b 57 08 31 f6 44 8b 4f 10 44 8b 07 48 8d 7d c8 e8 84 f8 ff ff 48 85 c0 48 89 c3 74 2a e8 f7 31 03 00 48 89 df <48> 8b 80 10 01 00 00 e8 68 fb 69 ed 48 8b 75 f0 65 48 33 34 25 28 [Wed Feb 27 14:17:09 2019] RSP: 0018:ffffb42701ab7d40 EFLAGS: 00010282 [Wed Feb 27 14:17:09 2019] RAX: 0000000000000000 RBX: ffff9d79637ee080 RCX: 0000000000000003 [Wed Feb 27 14:17:09 2019] RDX: 0000000000000001 RSI: 0000000000000002 RDI: ffff9d79637ee080 [Wed Feb 27 14:17:09 2019] RBP: ffffb42701ab7d78 R08: ffff9d796fae70e0 R09: ffff9d796f403500 [Wed Feb 27 14:17:09 2019] R10: ffffb42701ab7d90 R11: 0000000000000000 R12: ffff9d7969d09240 [Wed Feb 27 14:17:09 2019] R13: ffff9d79624e6840 R14: ffff9d7969d09318 R15: ffff9d796d48ff80 [Wed Feb 27 14:17:09 2019] FS: 0000000000000000(0000) GS:ffff9d796fac0000(0000) knlGS:0000000000000000 [Wed Feb 27 14:17:09 2019] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [Wed Feb 27 14:17:09 2019] CR2: 0000000000000110 CR3: 0000000427f22000 CR4: 00000000000006e0 [Wed Feb 27 14:17:09 2019] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [Wed Feb 27 14:17:09 2019] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [Wed Feb 27 14:17:09 2019] Call Trace: [Wed Feb 27 14:17:09 2019] virtio_transport_recv_pkt+0x63/0x820 [vmw_vsock_virtio_transport_common] [Wed Feb 27 14:17:09 2019] ? kfree+0x17e/0x190 [Wed Feb 27 14:17:09 2019] ? detach_buf_split+0x145/0x160 [Wed Feb 27 14:17:09 2019] ? __switch_to_asm+0x40/0x70 [Wed Feb 27 14:17:09 2019] virtio_transport_rx_work+0xa0/0x106 [vmw_vsock_virtio_transport] [Wed Feb 27 14:17:09 2019] NET: Registered protocol family 40 [Wed Feb 27 14:17:09 2019] process_one_work+0x167/0x410 [Wed Feb 27 14:17:09 2019] worker_thread+0x4d/0x460 [Wed Feb 27 14:17:09 2019] kthread+0x105/0x140 [Wed Feb 27 14:17:09 2019] ? rescuer_thread+0x360/0x360 [Wed Feb 27 14:17:09 2019] ? kthread_destroy_worker+0x50/0x50 [Wed Feb 27 14:17:09 2019] ret_from_fork+0x35/0x40 [Wed Feb 27 14:17:09 2019] Modules linked in: vmw_vsock_virtio_transport vmw_vsock_virtio_transport_common input_leds vsock serio_raw i2c_piix4 mac_hid qemu_fw_cfg autofs4 cirrus ttm drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops virtio_net psmouse drm net_failover pata_acpi virtio_blk failover floppy Fixes: 22b5c0b ("vsock/virtio: fix kernel panic after device hot-unplug") Reported-by: Alexandru Herghelegiu <[email protected]> Signed-off-by: Adalbert Lazăr <[email protected]> Co-developed-by: Stefan Hajnoczi <[email protected]> Reviewed-by: Stefan Hajnoczi <[email protected]> Reviewed-by: Stefano Garzarella <[email protected]> Signed-off-by: David S. Miller <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
commit f16eb8a upstream. If SSDT overlay is loaded via ConfigFS and then unloaded the device, we would like to have OF modalias for, already gone. Thus, acpi_get_name() returns no allocated buffer for such case and kernel crashes afterwards: ACPI: Host-directed Dynamic ACPI Table Unload ads7950 spi-PRP0001:00: Dropping the link to regulator.0 BUG: unable to handle kernel NULL pointer dereference at 0000000000000000 #PF error: [normal kernel read fault] PGD 80000000070d6067 P4D 80000000070d6067 PUD 70d0067 PMD 0 Oops: 0000 [#1] SMP PTI CPU: 0 PID: 40 Comm: kworker/u4:2 Not tainted 5.0.0+ #96 Hardware name: Intel Corporation Merrifield/BODEGA BAY, BIOS 542 2015.01.21:18.19.48 Workqueue: kacpi_hotplug acpi_device_del_work_fn RIP: 0010:create_of_modalias.isra.1+0x4c/0x150 Code: 00 00 48 89 44 24 18 31 c0 48 8d 54 24 08 48 c7 44 24 10 00 00 00 00 48 c7 44 24 08 ff ff ff ff e8 7a b0 03 00 48 8b 4c 24 10 <0f> b6 01 84 c0 74 27 48 c7 c7 00 09 f4 a5 0f b6 f0 8d 50 20 f6 04 RSP: 0000:ffffa51040297c10 EFLAGS: 00010246 RAX: 0000000000001001 RBX: 0000000000000785 RCX: 0000000000000000 RDX: 0000000000001001 RSI: 0000000000000286 RDI: ffffa2163dc042e0 RBP: ffffa216062b1196 R08: 0000000000001001 R09: ffffa21639873000 R10: ffffffffa606761d R11: 0000000000000001 R12: ffffa21639873218 R13: ffffa2163deb5060 R14: ffffa216063d1010 R15: 0000000000000000 FS: 0000000000000000(0000) GS:ffffa2163e000000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000000 CR3: 0000000007114000 CR4: 00000000001006f0 Call Trace: __acpi_device_uevent_modalias+0xb0/0x100 spi_uevent+0xd/0x40 ... In order to fix above let create_of_modalias() check the status returned by acpi_get_name() and bail out in case of failure. Fixes: 8765c5b ("ACPI / scan: Rework modalias creation when "compatible" is present") Link: https://bugzilla.kernel.org/show_bug.cgi?id=201381 Reported-by: Ferry Toth <[email protected]> Tested-by: Ferry Toth<[email protected]> Signed-off-by: Andy Shevchenko <[email protected]> Reviewed-by: Mika Westerberg <[email protected]> Cc: 4.1+ <[email protected]> # 4.1+ Signed-off-by: Rafael J. Wysocki <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
commit 32e36bf upstream. When using SCSI passthrough in combination with the iSCSI target driver then cmd->t_state_lock may be obtained from interrupt context. Hence, all code that obtains cmd->t_state_lock from thread context must disable interrupts first. This patch avoids that lockdep reports the following: WARNING: inconsistent lock state 4.18.0-dbg+ #1 Not tainted -------------------------------- inconsistent {HARDIRQ-ON-W} -> {IN-HARDIRQ-W} usage. iscsi_ttx/1800 [HC1[1]:SC0[2]:HE0:SE0] takes: 000000006e7b0ceb (&(&cmd->t_state_lock)->rlock){?...}, at: target_complete_cmd+0x47/0x2c0 [target_core_mod] {HARDIRQ-ON-W} state was registered at: lock_acquire+0xd2/0x260 _raw_spin_lock+0x32/0x50 iscsit_close_connection+0x97e/0x1020 [iscsi_target_mod] iscsit_take_action_for_connection_exit+0x108/0x200 [iscsi_target_mod] iscsi_target_rx_thread+0x180/0x190 [iscsi_target_mod] kthread+0x1cf/0x1f0 ret_from_fork+0x24/0x30 irq event stamp: 1281 hardirqs last enabled at (1279): [<ffffffff970ade79>] __local_bh_enable_ip+0xa9/0x160 hardirqs last disabled at (1281): [<ffffffff97a008a5>] interrupt_entry+0xb5/0xd0 softirqs last enabled at (1278): [<ffffffff977cd9a1>] lock_sock_nested+0x51/0xc0 softirqs last disabled at (1280): [<ffffffffc07a6e04>] ip6_finish_output2+0x124/0xe40 [ipv6] other info that might help us debug this: Possible unsafe locking scenario: CPU0 ---- lock(&(&cmd->t_state_lock)->rlock); <Interrupt> lock(&(&cmd->t_state_lock)->rlock);
commit 1cec3f2 upstream. This fixes a longstanding lockdep warning triggered by fstests/btrfs/011. Circular locking dependency check reports warning[1], that's because the btrfs_scrub_dev() calls the stack #0 below with, the fs_info::scrub_lock held. The test case leading to this warning: $ mkfs.btrfs -f /dev/sdb $ mount /dev/sdb /btrfs $ btrfs scrub start -B /btrfs In fact we have fs_info::scrub_workers_refcnt to track if the init and destroy of the scrub workers are needed. So once we have incremented and decremented the fs_info::scrub_workers_refcnt value in the thread, its ok to drop the scrub_lock, and then actually do the btrfs_destroy_workqueue() part. So this patch drops the scrub_lock before calling btrfs_destroy_workqueue(). [359.258534] ====================================================== [359.260305] WARNING: possible circular locking dependency detected [359.261938] 5.0.0-rc6-default #461 Not tainted [359.263135] ------------------------------------------------------ [359.264672] btrfs/20975 is trying to acquire lock: [359.265927] 00000000d4d32bea ((wq_completion)"%s-%s""btrfs", name){+.+.}, at: flush_workqueue+0x87/0x540 [359.268416] [359.268416] but task is already holding lock: [359.270061] 0000000053ea26a6 (&fs_info->scrub_lock){+.+.}, at: btrfs_scrub_dev+0x322/0x590 [btrfs] [359.272418] [359.272418] which lock already depends on the new lock. [359.272418] [359.274692] [359.274692] the existing dependency chain (in reverse order) is: [359.276671] [359.276671] -> #3 (&fs_info->scrub_lock){+.+.}: [359.278187] __mutex_lock+0x86/0x9c0 [359.279086] btrfs_scrub_pause+0x31/0x100 [btrfs] [359.280421] btrfs_commit_transaction+0x1e4/0x9e0 [btrfs] [359.281931] close_ctree+0x30b/0x350 [btrfs] [359.283208] generic_shutdown_super+0x64/0x100 [359.284516] kill_anon_super+0x14/0x30 [359.285658] btrfs_kill_super+0x12/0xa0 [btrfs] [359.286964] deactivate_locked_super+0x29/0x60 [359.288242] cleanup_mnt+0x3b/0x70 [359.289310] task_work_run+0x98/0xc0 [359.290428] exit_to_usermode_loop+0x83/0x90 [359.291445] do_syscall_64+0x15b/0x180 [359.292598] entry_SYSCALL_64_after_hwframe+0x49/0xbe [359.294011] [359.294011] -> #2 (sb_internal#2){.+.+}: [359.295432] __sb_start_write+0x113/0x1d0 [359.296394] start_transaction+0x369/0x500 [btrfs] [359.297471] btrfs_finish_ordered_io+0x2aa/0x7c0 [btrfs] [359.298629] normal_work_helper+0xcd/0x530 [btrfs] [359.299698] process_one_work+0x246/0x610 [359.300898] worker_thread+0x3c/0x390 [359.302020] kthread+0x116/0x130 [359.303053] ret_from_fork+0x24/0x30 [359.304152] [359.304152] -> #1 ((work_completion)(&work->normal_work)){+.+.}: [359.306100] process_one_work+0x21f/0x610 [359.307302] worker_thread+0x3c/0x390 [359.308465] kthread+0x116/0x130 [359.309357] ret_from_fork+0x24/0x30 [359.310229] [359.310229] -> #0 ((wq_completion)"%s-%s""btrfs", name){+.+.}: [359.311812] lock_acquire+0x90/0x180 [359.312929] flush_workqueue+0xaa/0x540 [359.313845] drain_workqueue+0xa1/0x180 [359.314761] destroy_workqueue+0x17/0x240 [359.315754] btrfs_destroy_workqueue+0x57/0x200 [btrfs] [359.317245] scrub_workers_put+0x2c/0x60 [btrfs] [359.318585] btrfs_scrub_dev+0x336/0x590 [btrfs] [359.319944] btrfs_dev_replace_by_ioctl.cold.19+0x179/0x1bb [btrfs] [359.321622] btrfs_ioctl+0x28a4/0x2e40 [btrfs] [359.322908] do_vfs_ioctl+0xa2/0x6d0 [359.324021] ksys_ioctl+0x3a/0x70 [359.325066] __x64_sys_ioctl+0x16/0x20 [359.326236] do_syscall_64+0x54/0x180 [359.327379] entry_SYSCALL_64_after_hwframe+0x49/0xbe [359.328772] [359.328772] other info that might help us debug this: [359.328772] [359.330990] Chain exists of: [359.330990] (wq_completion)"%s-%s""btrfs", name --> sb_internal#2 --> &fs_info->scrub_lock [359.330990] [359.334376] Possible unsafe locking scenario: [359.334376] [359.336020] CPU0 CPU1 [359.337070] ---- ---- [359.337821] lock(&fs_info->scrub_lock); [359.338506] lock(sb_internal#2); [359.339506] lock(&fs_info->scrub_lock); [359.341461] lock((wq_completion)"%s-%s""btrfs", name); [359.342437] [359.342437] *** DEADLOCK *** [359.342437] [359.343745] 1 lock held by btrfs/20975: [359.344788] #0: 0000000053ea26a6 (&fs_info->scrub_lock){+.+.}, at: btrfs_scrub_dev+0x322/0x590 [btrfs] [359.346778] [359.346778] stack backtrace: [359.347897] CPU: 0 PID: 20975 Comm: btrfs Not tainted 5.0.0-rc6-default #461 [359.348983] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.11.2-0-gf9626cc-prebuilt.qemu-project.org 04/01/2014 [359.350501] Call Trace: [359.350931] dump_stack+0x67/0x90 [359.351676] print_circular_bug.isra.37.cold.56+0x15c/0x195 [359.353569] check_prev_add.constprop.44+0x4f9/0x750 [359.354849] ? check_prev_add.constprop.44+0x286/0x750 [359.356505] __lock_acquire+0xb84/0xf10 [359.357505] lock_acquire+0x90/0x180 [359.358271] ? flush_workqueue+0x87/0x540 [359.359098] flush_workqueue+0xaa/0x540 [359.359912] ? flush_workqueue+0x87/0x540 [359.360740] ? drain_workqueue+0x1e/0x180 [359.361565] ? drain_workqueue+0xa1/0x180 [359.362391] drain_workqueue+0xa1/0x180 [359.363193] destroy_workqueue+0x17/0x240 [359.364539] btrfs_destroy_workqueue+0x57/0x200 [btrfs] [359.365673] scrub_workers_put+0x2c/0x60 [btrfs] [359.366618] btrfs_scrub_dev+0x336/0x590 [btrfs] [359.367594] ? start_transaction+0xa1/0x500 [btrfs] [359.368679] btrfs_dev_replace_by_ioctl.cold.19+0x179/0x1bb [btrfs] [359.369545] btrfs_ioctl+0x28a4/0x2e40 [btrfs] [359.370186] ? __lock_acquire+0x263/0xf10 [359.370777] ? kvm_clock_read+0x14/0x30 [359.371392] ? kvm_sched_clock_read+0x5/0x10 [359.372248] ? sched_clock+0x5/0x10 [359.372786] ? sched_clock_cpu+0xc/0xc0 [359.373662] ? do_vfs_ioctl+0xa2/0x6d0 [359.374552] do_vfs_ioctl+0xa2/0x6d0 [359.375378] ? do_sigaction+0xff/0x250 [359.376233] ksys_ioctl+0x3a/0x70 [359.376954] __x64_sys_ioctl+0x16/0x20 [359.377772] do_syscall_64+0x54/0x180 [359.378841] entry_SYSCALL_64_after_hwframe+0x49/0xbe [359.380422] RIP: 0033:0x7f5429296a97 Backporting to older kernels: scrub_nocow_workers must be freed the same way as the others. CC: [email protected] # 4.4+ Signed-off-by: Anand Jain <[email protected]> [ update changelog ] Reviewed-by: David Sterba <[email protected]> Signed-off-by: David Sterba <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
commit bbe54ea upstream. Commit 0e157e5 ("PCI/PME: Implement runtime PM callbacks") tried to solve an issue where the hierarchy immediately wakes up when it is transitioned into D3cold. However, it turns out to prevent PME propagation on some systems that do not support D3cold. I looked more closely at what might cause the immediate wakeup. It happens when the ACPI power resource of the root port is turned off. The AML code associated with the _OFF() method of the ACPI power resource starts a PCIe L2/L3 Ready transition and waits for it to complete. Right after the L2/L3 Ready transition is started the root port receives a PME from the downstream port. The simplest hierarchy where this happens looks like this: 00:1d.0 PCIe Root Port ^ | v 05:00.0 PCIe switch #1 upstream port 06:01.0 PCIe switch #1 downstream hotplug port ^ | v 08:00.0 PCIe switch #2 upstream port It seems that the PCIe link between the two switches, before PME_Turn_Off/PME_TO_Ack is complete for the whole hierarchy, goes inactive and triggers PME towards the root port bringing it back to D0. The L2/L3 Ready sequence is described in PCIe r4.0 spec sections 5.2 and 5.3.3 but unfortunately they do not state what happens if DLLSCE is enabled during the sequence. Disabling Data Link Layer State Changed event (DLLSCE) seems to prevent the issue and still allows the downstream hotplug port to notice when a device is plugged/unplugged. Link: https://bugzilla.kernel.org/show_bug.cgi?id=202593 Fixes: 0e157e5 ("PCI/PME: Implement runtime PM callbacks") Signed-off-by: Mika Westerberg <[email protected]> Signed-off-by: Bjorn Helgaas <[email protected]> Reviewed-by: Rafael J. Wysocki <[email protected]> CC: [email protected] # v4.20+ Signed-off-by: Greg Kroah-Hartman <[email protected]>
commit bc5add0 upstream. When disabling and removing a receive context, it is possible for an asynchronous event (i.e IRQ) to occur. Because of this, there is a race between cleaning up the context, and the context being used by the asynchronous event. cpu 0 (context cleanup) rc->ref_count-- (ref_count == 0) hfi1_rcd_free() cpu 1 (IRQ (with rcd index)) rcd_get_by_index() lock ref_count+++ <-- reference count race (WARNING) return rcd unlock cpu 0 hfi1_free_ctxtdata() <-- incorrect free location lock remove rcd from array unlock free rcd This race will cause the following WARNING trace: WARNING: CPU: 0 PID: 175027 at include/linux/kref.h:52 hfi1_rcd_get_by_index+0x84/0xa0 [hfi1] CPU: 0 PID: 175027 Comm: IMB-MPI1 Kdump: loaded Tainted: G OE ------------ 3.10.0-957.el7.x86_64 #1 Hardware name: Intel Corporation S2600KP/S2600KP, BIOS SE5C610.86B.11.01.0076.C4.111920150602 11/19/2015 Call Trace: dump_stack+0x19/0x1b __warn+0xd8/0x100 warn_slowpath_null+0x1d/0x20 hfi1_rcd_get_by_index+0x84/0xa0 [hfi1] is_rcv_urgent_int+0x24/0x90 [hfi1] general_interrupt+0x1b6/0x210 [hfi1] __handle_irq_event_percpu+0x44/0x1c0 handle_irq_event_percpu+0x32/0x80 handle_irq_event+0x3c/0x60 handle_edge_irq+0x7f/0x150 handle_irq+0xe4/0x1a0 do_IRQ+0x4d/0xf0 common_interrupt+0x162/0x162 The race can also lead to a use after free which could be similar to: general protection fault: 0000 1 SMP CPU: 71 PID: 177147 Comm: IMB-MPI1 Kdump: loaded Tainted: G W OE ------------ 3.10.0-957.el7.x86_64 #1 Hardware name: Intel Corporation S2600KP/S2600KP, BIOS SE5C610.86B.11.01.0076.C4.111920150602 11/19/2015 task: ffff9962a8098000 ti: ffff99717a508000 task.ti: ffff99717a508000 __kmalloc+0x94/0x230 Call Trace: ? hfi1_user_sdma_process_request+0x9c8/0x1250 [hfi1] hfi1_user_sdma_process_request+0x9c8/0x1250 [hfi1] hfi1_aio_write+0xba/0x110 [hfi1] do_sync_readv_writev+0x7b/0xd0 do_readv_writev+0xce/0x260 ? handle_mm_fault+0x39d/0x9b0 ? pick_next_task_fair+0x5f/0x1b0 ? sched_clock_cpu+0x85/0xc0 ? __schedule+0x13a/0x890 vfs_writev+0x35/0x60 SyS_writev+0x7f/0x110 system_call_fastpath+0x22/0x27 Use the appropriate kref API to verify access. Reorder context cleanup to ensure context removal before cleanup occurs correctly. Cc: [email protected] # v4.14.0+ Fixes: f683c80 ("IB/hfi1: Resolve kernel panics by reference counting receive contexts") Reviewed-by: Mike Marciniszyn <[email protected]> Signed-off-by: Michael J. Ruhl <[email protected]> Signed-off-by: Dennis Dalessandro <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
…override commit 785c9f4 upstream. Platform driver driver_override field should not be initialized from const memory because the core later kfree() it. If driver_override is manually set later through sysfs, kfree() of old value leads to: $ echo "new_value" > /sys/bus/platform/drivers/.../driver_override kernel BUG at ../mm/slub.c:3960! Internal error: Oops - BUG: 0 [#1] PREEMPT SMP ARM ... (kfree) from [<c058e8c0>] (platform_set_driver_override+0x84/0xac) (platform_set_driver_override) from [<c058e908>] (driver_override_store+0x20/0x34) (driver_override_store) from [<c031f778>] (kernfs_fop_write+0x100/0x1dc) (kernfs_fop_write) from [<c0296de8>] (__vfs_write+0x2c/0x17c) (__vfs_write) from [<c02970c4>] (vfs_write+0xa4/0x188) (vfs_write) from [<c02972e8>] (ksys_write+0x4c/0xac) (ksys_write) from [<c0101000>] (ret_fast_syscall+0x0/0x28) The clk-exynos5-subcmu driver uses override only for the purpose of creating meaningful names for children devices (matching names of power domains, e.g. DISP, MFC). The driver_override was not developed for this purpose so just switch to default names of devices to fix the issue. Fixes: b06a532 ("clk: samsung: Add Exynos5 sub-CMU clock driver") Cc: <[email protected]> Signed-off-by: Krzysztof Kozlowski <[email protected]> Reviewed-by: Geert Uytterhoeven <[email protected]> Signed-off-by: Stephen Boyd <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
commit 401e7e8 upstream. When we excute the following commands, we got oops rmmod ipmi_si cat /proc/ioports [ 1623.482380] Unable to handle kernel paging request at virtual address ffff00000901d478 [ 1623.482382] Mem abort info: [ 1623.482383] ESR = 0x96000007 [ 1623.482385] Exception class = DABT (current EL), IL = 32 bits [ 1623.482386] SET = 0, FnV = 0 [ 1623.482387] EA = 0, S1PTW = 0 [ 1623.482388] Data abort info: [ 1623.482389] ISV = 0, ISS = 0x00000007 [ 1623.482390] CM = 0, WnR = 0 [ 1623.482393] swapper pgtable: 4k pages, 48-bit VAs, pgdp = 00000000d7d94a66 [ 1623.482395] [ffff00000901d478] pgd=000000dffbfff003, pud=000000dffbffe003, pmd=0000003f5d06e003, pte=0000000000000000 [ 1623.482399] Internal error: Oops: 96000007 [#1] SMP [ 1623.487407] Modules linked in: ipmi_si(E) nls_utf8 isofs rpcrdma ib_iser ib_srpt target_core_mod ib_srp scsi_transport_srp ib_ipoib rdma_ucm ib_umad rdma_cm ib_cm dm_mirror dm_region_hash dm_log iw_cm dm_mod aes_ce_blk crypto_simd cryptd aes_ce_cipher ses ghash_ce sha2_ce enclosure sha256_arm64 sg sha1_ce hisi_sas_v2_hw hibmc_drm sbsa_gwdt hisi_sas_main ip_tables mlx5_ib ib_uverbs marvell ib_core mlx5_core ixgbe mdio hns_dsaf ipmi_devintf hns_enet_drv ipmi_msghandler hns_mdio [last unloaded: ipmi_si] [ 1623.532410] CPU: 30 PID: 11438 Comm: cat Kdump: loaded Tainted: G E 5.0.0-rc3+ #168 [ 1623.541498] Hardware name: Huawei TaiShan 2280 /BC11SPCD, BIOS 1.37 11/21/2017 [ 1623.548822] pstate: a0000005 (NzCv daif -PAN -UAO) [ 1623.553684] pc : string+0x28/0x98 [ 1623.557040] lr : vsnprintf+0x368/0x5e8 [ 1623.560837] sp : ffff000013213a80 [ 1623.564191] x29: ffff000013213a80 x28: ffff00001138abb5 [ 1623.569577] x27: ffff000013213c18 x26: ffff805f67d06049 [ 1623.574963] x25: 0000000000000000 x24: ffff00001138abb5 [ 1623.580349] x23: 0000000000000fb7 x22: ffff0000117ed000 [ 1623.585734] x21: ffff000011188fd8 x20: ffff805f67d07000 [ 1623.591119] x19: ffff805f67d06061 x18: ffffffffffffffff [ 1623.596505] x17: 0000000000000200 x16: 0000000000000000 [ 1623.601890] x15: ffff0000117ed748 x14: ffff805f67d07000 [ 1623.607276] x13: ffff805f67d0605e x12: 0000000000000000 [ 1623.612661] x11: 0000000000000000 x10: 0000000000000000 [ 1623.618046] x9 : 0000000000000000 x8 : 000000000000000f [ 1623.623432] x7 : ffff805f67d06061 x6 : fffffffffffffffe [ 1623.628817] x5 : 0000000000000012 x4 : ffff00000901d478 [ 1623.634203] x3 : ffff0a00ffffff04 x2 : ffff805f67d07000 [ 1623.639588] x1 : ffff805f67d07000 x0 : ffffffffffffffff [ 1623.644974] Process cat (pid: 11438, stack limit = 0x000000008d4cbc10) [ 1623.651592] Call trace: [ 1623.654068] string+0x28/0x98 [ 1623.657071] vsnprintf+0x368/0x5e8 [ 1623.660517] seq_vprintf+0x70/0x98 [ 1623.668009] seq_printf+0x7c/0xa0 [ 1623.675530] r_show+0xc8/0xf8 [ 1623.682558] seq_read+0x330/0x440 [ 1623.689877] proc_reg_read+0x78/0xd0 [ 1623.697346] __vfs_read+0x60/0x1a0 [ 1623.704564] vfs_read+0x94/0x150 [ 1623.711339] ksys_read+0x6c/0xd8 [ 1623.717939] __arm64_sys_read+0x24/0x30 [ 1623.725077] el0_svc_common+0x120/0x148 [ 1623.732035] el0_svc_handler+0x30/0x40 [ 1623.738757] el0_svc+0x8/0xc [ 1623.744520] Code: d1000406 aa0103e2 54000149 b4000080 (39400085) [ 1623.753441] ---[ end trace f91b6a4937de9835 ]--- [ 1623.760871] Kernel panic - not syncing: Fatal exception [ 1623.768935] SMP: stopping secondary CPUs [ 1623.775718] Kernel Offset: disabled [ 1623.781998] CPU features: 0x002,21006008 [ 1623.788777] Memory Limit: none [ 1623.798329] Starting crashdump kernel... [ 1623.805202] Bye! If io_setup is called successful in try_smi_init() but try_smi_init() goes out_err before calling ipmi_register_smi(), so ipmi_unregister_smi() will not be called while removing module. It leads to the resource that allocated in io_setup() can not be freed, but the name(DEVICE_NAME) of resource is freed while removing the module. It causes use-after-free when cat /proc/ioports. Fix this by calling io_cleanup() while try_smi_init() goes to out_err. and don't call io_cleanup() until io_setup() returns successful to avoid warning prints. Fixes: 93c303d ("ipmi_si: Clean up shutdown a bit") Cc: [email protected] Reported-by: NuoHan Qiao <[email protected]> Suggested-by: Corey Minyard <[email protected]> Signed-off-by: Yang Yingliang <[email protected]> Signed-off-by: Corey Minyard <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
commit 9951379 upstream. Some users see panics like the following when performing fstrim on a bcached volume: [ 529.803060] BUG: unable to handle kernel NULL pointer dereference at 0000000000000008 [ 530.183928] #PF error: [normal kernel read fault] [ 530.412392] PGD 8000001f42163067 P4D 8000001f42163067 PUD 1f42168067 PMD 0 [ 530.750887] Oops: 0000 [#1] SMP PTI [ 530.920869] CPU: 10 PID: 4167 Comm: fstrim Kdump: loaded Not tainted 5.0.0-rc1+ #3 [ 531.290204] Hardware name: HP ProLiant DL360 Gen9/ProLiant DL360 Gen9, BIOS P89 12/27/2015 [ 531.693137] RIP: 0010:blk_queue_split+0x148/0x620 [ 531.922205] Code: 60 38 89 55 a0 45 31 db 45 31 f6 45 31 c9 31 ff 89 4d 98 85 db 0f 84 7f 04 00 00 44 8b 6d 98 4c 89 ee 48 c1 e6 04 49 03 70 78 <8b> 46 08 44 8b 56 0c 48 8b 16 44 29 e0 39 d8 48 89 55 a8 0f 47 c3 [ 532.838634] RSP: 0018:ffffb9b708df39b0 EFLAGS: 00010246 [ 533.093571] RAX: 00000000ffffffff RBX: 0000000000046000 RCX: 0000000000000000 [ 533.441865] RDX: 0000000000000200 RSI: 0000000000000000 RDI: 0000000000000000 [ 533.789922] RBP: ffffb9b708df3a48 R08: ffff940d3b3fdd20 R09: 0000000000000000 [ 534.137512] R10: ffffb9b708df3958 R11: 0000000000000000 R12: 0000000000000000 [ 534.485329] R13: 0000000000000000 R14: 0000000000000000 R15: ffff940d39212020 [ 534.833319] FS: 00007efec26e3840(0000) GS:ffff940d1f480000(0000) knlGS:0000000000000000 [ 535.224098] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 535.504318] CR2: 0000000000000008 CR3: 0000001f4e256004 CR4: 00000000001606e0 [ 535.851759] Call Trace: [ 535.970308] ? mempool_alloc_slab+0x15/0x20 [ 536.174152] ? bch_data_insert+0x42/0xd0 [bcache] [ 536.403399] blk_mq_make_request+0x97/0x4f0 [ 536.607036] generic_make_request+0x1e2/0x410 [ 536.819164] submit_bio+0x73/0x150 [ 536.980168] ? submit_bio+0x73/0x150 [ 537.149731] ? bio_associate_blkg_from_css+0x3b/0x60 [ 537.391595] ? _cond_resched+0x1a/0x50 [ 537.573774] submit_bio_wait+0x59/0x90 [ 537.756105] blkdev_issue_discard+0x80/0xd0 [ 537.959590] ext4_trim_fs+0x4a9/0x9e0 [ 538.137636] ? ext4_trim_fs+0x4a9/0x9e0 [ 538.324087] ext4_ioctl+0xea4/0x1530 [ 538.497712] ? _copy_to_user+0x2a/0x40 [ 538.679632] do_vfs_ioctl+0xa6/0x600 [ 538.853127] ? __do_sys_newfstat+0x44/0x70 [ 539.051951] ksys_ioctl+0x6d/0x80 [ 539.212785] __x64_sys_ioctl+0x1a/0x20 [ 539.394918] do_syscall_64+0x5a/0x110 [ 539.568674] entry_SYSCALL_64_after_hwframe+0x44/0xa9 We have observed it where both: 1) LVM/devmapper is involved (bcache backing device is LVM volume) and 2) writeback cache is involved (bcache cache_mode is writeback) On one machine, we can reliably reproduce it with: # echo writeback > /sys/block/bcache0/bcache/cache_mode (not sure whether above line is required) # mount /dev/bcache0 /test # for i in {0..10}; do file="$(mktemp /test/zero.XXX)" dd if=/dev/zero of="$file" bs=1M count=256 sync rm $file done # fstrim -v /test Observing this with tracepoints on, we see the following writes: fstrim-18019 [022] .... 91107.302026: bcache_write: 73f95583-561c-408f-a93a-4cbd2498f5c8 inode 0 DS 4260112 + 196352 hit 0 bypass 1 fstrim-18019 [022] .... 91107.302050: bcache_write: 73f95583-561c-408f-a93a-4cbd2498f5c8 inode 0 DS 4456464 + 262144 hit 0 bypass 1 fstrim-18019 [022] .... 91107.302075: bcache_write: 73f95583-561c-408f-a93a-4cbd2498f5c8 inode 0 DS 4718608 + 81920 hit 0 bypass 1 fstrim-18019 [022] .... 91107.302094: bcache_write: 73f95583-561c-408f-a93a-4cbd2498f5c8 inode 0 DS 5324816 + 180224 hit 0 bypass 1 fstrim-18019 [022] .... 91107.302121: bcache_write: 73f95583-561c-408f-a93a-4cbd2498f5c8 inode 0 DS 5505040 + 262144 hit 0 bypass 1 fstrim-18019 [022] .... 91107.302145: bcache_write: 73f95583-561c-408f-a93a-4cbd2498f5c8 inode 0 DS 5767184 + 81920 hit 0 bypass 1 fstrim-18019 [022] .... 91107.308777: bcache_write: 73f95583-561c-408f-a93a-4cbd2498f5c8 inode 0 DS 6373392 + 180224 hit 1 bypass 0 <crash> Note the final one has different hit/bypass flags. This is because in should_writeback(), we were hitting a case where the partial stripe condition was returning true and so should_writeback() was returning true early. If that hadn't been the case, it would have hit the would_skip test, and as would_skip == s->iop.bypass == true, should_writeback() would have returned false. Looking at the git history from 'commit 72c2706 ("bcache: Write out full stripes")', it looks like the idea was to optimise for raid5/6: * If a stripe is already dirty, force writes to that stripe to writeback mode - to help build up full stripes of dirty data To fix this issue, make sure that should_writeback() on a discard op never returns true. More details of debugging: https://www.spinics.net/lists/linux-bcache/msg06996.html Previous reports: - https://bugzilla.kernel.org/show_bug.cgi?id=201051 - https://bugzilla.kernel.org/show_bug.cgi?id=196103 - https://www.spinics.net/lists/linux-bcache/msg06885.html (Coly Li: minor modification to follow maximum 75 chars per line rule) Cc: Kent Overstreet <[email protected]> Cc: [email protected] Fixes: 72c2706 ("bcache: Write out full stripes") Signed-off-by: Daniel Axtens <[email protected]> Signed-off-by: Coly Li <[email protected]> Signed-off-by: Jens Axboe <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
commit 78de14c upstream. If fbdev setup has failed, lastclose will give a NULL pointer deref: [ 77.794295] [drm:drm_lastclose] [ 77.794414] [drm:drm_lastclose] driver lastclose completed [ 77.794660] Unable to handle kernel NULL pointer dereference at virtual address 00000014 [ 77.809460] pgd = b376b71b [ 77.818275] [00000014] *pgd=175ba831, *pte=00000000, *ppte=00000000 [ 77.830813] Internal error: Oops: 17 [#1] ARM [ 77.840963] Modules linked in: mi0283qt mipi_dbi tinydrm raspberrypi_hwmon gpio_backlight backlight snd_bcm2835(C) bcm2835_rng rng_core [ 77.865203] CPU: 0 PID: 527 Comm: lt-modetest Tainted: G C 5.0.0-rc1+ #1 [ 77.879525] Hardware name: BCM2835 [ 77.889185] PC is at restore_fbdev_mode+0x20/0x164 [ 77.900261] LR is at drm_fb_helper_restore_fbdev_mode_unlocked+0x54/0x9c [ 78.002446] Process lt-modetest (pid: 527, stack limit = 0x7a3d5c14) [ 78.291030] Backtrace: [ 78.300815] [<c04f2d0c>] (restore_fbdev_mode) from [<c04f4708>] (drm_fb_helper_restore_fbdev_mode_unlocked+0x54/0x9c) [ 78.319095] r9:d8a8a288 r8:d891acf0 r7:d7697910 r6:00000000 r5:d891ac00 r4:d891ac00 [ 78.334432] [<c04f46b4>] (drm_fb_helper_restore_fbdev_mode_unlocked) from [<c04f47e8>] (drm_fbdev_client_restore+0x18/0x20) [ 78.353296] r8:d76978c0 r7:d7697910 r6:d7697950 r5:d7697800 r4:d891ac00 r3:c04f47d0 [ 78.368689] [<c04f47d0>] (drm_fbdev_client_restore) from [<c051b6b4>] (drm_client_dev_restore+0x7c/0xc0) [ 78.385982] [<c051b638>] (drm_client_dev_restore) from [<c04f8fd0>] (drm_lastclose+0xc4/0xd4) [ 78.402332] r8:d76978c0 r7:d7471080 r6:c0e0c088 r5:d8a85e00 r4:d7697800 [ 78.416688] [<c04f8f0c>] (drm_lastclose) from [<c04f9088>] (drm_release+0xa8/0x10c) [ 78.431929] r5:d8a85e00 r4:d7697800 [ 78.442989] [<c04f8fe0>] (drm_release) from [<c02640c4>] (__fput+0x104/0x1c8) [ 78.457740] r8:d5ccea10 r7:d96cfb10 r6:00000008 r5:d74c1b90 r4:d8a8a280 [ 78.472043] [<c0263fc0>] (__fput) from [<c02641ec>] (____fput+0x18/0x1c) [ 78.486363] r10:00000006 r9:d7722000 r8:c01011c4 r7:00000000 r6:c0ebac6c r5:d892a340 [ 78.501869] r4:d8a8a280 [ 78.512002] [<c02641d4>] (____fput) from [<c013ef1c>] (task_work_run+0x98/0xac) [ 78.527186] [<c013ee84>] (task_work_run) from [<c010cc54>] (do_work_pending+0x4f8/0x570) [ 78.543238] r7:d7722030 r6:00000004 r5:d7723fb0 r4:00000000 [ 78.556825] [<c010c75c>] (do_work_pending) from [<c0101034>] (slow_work_pending+0xc/0x20) [ 78.674256] ---[ end trace 70d3a60cf739be3b ]--- Fix by using drm_fb_helper_lastclose() which checks if fbdev is in use. Fixes: 9060d7f ("drm/fb-helper: Finish the generic fbdev emulation") Cc: [email protected] Signed-off-by: Noralf Trønnes <[email protected]> Reviewed-by: Gerd Hoffmann <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected] Signed-off-by: Greg Kroah-Hartman <[email protected]>
commit 2b77158 upstream. This reverts commit b189e75. Unable to handle kernel paging request at virtual address c8358000 pgd = efa405c3 [c8358000] *pgd=00000000 Internal error: Oops: 805 [#1] PREEMPT ARM CPU: 0 PID: 711 Comm: kworker/0:2 Not tainted 4.20.0+ #30 Hardware name: Freescale i.MX27 (Device Tree Support) Workqueue: events mxcmci_datawork PC is at mxcmci_datawork+0xbc/0x2ac LR is at mxcmci_datawork+0xac/0x2ac pc : [<c04e33c8>] lr : [<c04e33b8>] psr: 60000013 sp : c6c93f08 ip : 24004180 fp : 00000008 r10: c8358000 r9 : c78b3e24 r8 : c6c92000 r7 : 00000000 r6 : c7bb8680 r5 : c7bb86d4 r4 : c78b3de0 r3 : 00002502 r2 : c090b2e0 r1 : 00000880 r0 : 00000000 Flags: nZCv IRQs on FIQs on Mode SVC_32 ISA ARM Segment user Control: 0005317f Table: a68a8000 DAC: 00000055 Process kworker/0:2 (pid: 711, stack limit = 0x389543bc) Stack: (0xc6c93f08 to 0xc6c94000) 3f00: c7bb86d4 00000000 00000000 c6cbfde0 c7bb86d4 c7ee4200 3f20: 00000000 c0907ea8 00000000 c7bb86d8 c0907ea8 c012077c c6cbfde0 c7bb86d4 3f40: c6cbfde0 c6c92000 c6cbfdf4 c09280ba c0907ea8 c090b2e0 c0907ebc c0120c18 3f60: c6cbfde0 00000000 00000000 c6cbb580 c7ba7c40 c7837edc c6cbb598 00000000 3f80: c6cbfde0 c01208f8 00000000 c01254fc c7ba7c40 c0125400 00000000 00000000 3fa0: 00000000 00000000 00000000 c01010d0 00000000 00000000 00000000 00000000 3fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 3fe0: 00000000 00000000 00000000 00000000 00000013 00000000 00000000 00000000 [<c04e33c8>] (mxcmci_datawork) from [<c012077c>] (process_one_work+0x1f0/0x338) [<c012077c>] (process_one_work) from [<c0120c18>] (worker_thread+0x320/0x474) [<c0120c18>] (worker_thread) from [<c01254fc>] (kthread+0xfc/0x118) [<c01254fc>] (kthread) from [<c01010d0>] (ret_from_fork+0x14/0x24) Exception stack(0xc6c93fb0 to 0xc6c93ff8) 3fa0: 00000000 00000000 00000000 00000000 3fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 3fe0: 00000000 00000000 00000000 00000000 00000013 00000000 Code: e3500000 1a000059 e5153050 e5933038 (e48a3004) ---[ end trace 54ca629b75f0e737 ]--- note: kworker/0:2[711] exited with preempt_count 1 Signed-off-by: Alexander Shiyan <[email protected]> Fixes: b189e75 ("mmc: mxcmmc: handle highmem pages") Cc: [email protected] Signed-off-by: Ulf Hansson <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
commit 4e50ce0 upstream. Take into account that sg->offset can be bigger than PAGE_SIZE when setting segment sg->dma_address. Otherwise sg->dma_address will point at diffrent page, what makes DMA not possible with erros like this: xhci_hcd 0000:38:00.3: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0000 address=0x00000000fdaa70c0 flags=0x0020] xhci_hcd 0000:38:00.3: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0000 address=0x00000000fdaa7040 flags=0x0020] xhci_hcd 0000:38:00.3: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0000 address=0x00000000fdaa7080 flags=0x0020] xhci_hcd 0000:38:00.3: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0000 address=0x00000000fdaa7100 flags=0x0020] xhci_hcd 0000:38:00.3: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0000 address=0x00000000fdaa7000 flags=0x0020] Additinally with wrong sg->dma_address unmap_sg will free wrong pages, what what can cause crashes like this: Feb 28 19:27:45 kernel: BUG: Bad page state in process cinnamon pfn:39e8b1 Feb 28 19:27:45 kernel: Disabling lock debugging due to kernel taint Feb 28 19:27:45 kernel: flags: 0x2ffff0000000000() Feb 28 19:27:45 kernel: raw: 02ffff0000000000 0000000000000000 ffffffff00000301 0000000000000000 Feb 28 19:27:45 kernel: raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000 Feb 28 19:27:45 kernel: page dumped because: nonzero _refcount Feb 28 19:27:45 kernel: Modules linked in: ccm fuse arc4 nct6775 hwmon_vid amdgpu nls_iso8859_1 nls_cp437 edac_mce_amd vfat fat kvm_amd ccp rng_core kvm mt76x0u mt76x0_common mt76x02_usb irqbypass mt76_usb mt76x02_lib mt76 crct10dif_pclmul crc32_pclmul chash mac80211 amd_iommu_v2 ghash_clmulni_intel gpu_sched i2c_algo_bit ttm wmi_bmof snd_hda_codec_realtek snd_hda_codec_generic drm_kms_helper snd_hda_codec_hdmi snd_hda_intel drm snd_hda_codec aesni_intel snd_hda_core snd_hwdep aes_x86_64 crypto_simd snd_pcm cfg80211 cryptd mousedev snd_timer glue_helper pcspkr r8169 input_leds realtek agpgart libphy rfkill snd syscopyarea sysfillrect sysimgblt fb_sys_fops soundcore sp5100_tco k10temp i2c_piix4 wmi evdev gpio_amdpt pinctrl_amd mac_hid pcc_cpufreq acpi_cpufreq sg ip_tables x_tables ext4(E) crc32c_generic(E) crc16(E) mbcache(E) jbd2(E) fscrypto(E) sd_mod(E) hid_generic(E) usbhid(E) hid(E) dm_mod(E) serio_raw(E) atkbd(E) libps2(E) crc32c_intel(E) ahci(E) libahci(E) libata(E) xhci_pci(E) xhci_hcd(E) Feb 28 19:27:45 kernel: scsi_mod(E) i8042(E) serio(E) bcache(E) crc64(E) Feb 28 19:27:45 kernel: CPU: 2 PID: 896 Comm: cinnamon Tainted: G B W E 4.20.12-arch1-1-custom #1 Feb 28 19:27:45 kernel: Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./B450M Pro4, BIOS P1.20 06/26/2018 Feb 28 19:27:45 kernel: Call Trace: Feb 28 19:27:45 kernel: dump_stack+0x5c/0x80 Feb 28 19:27:45 kernel: bad_page.cold.29+0x7f/0xb2 Feb 28 19:27:45 kernel: __free_pages_ok+0x2c0/0x2d0 Feb 28 19:27:45 kernel: skb_release_data+0x96/0x180 Feb 28 19:27:45 kernel: __kfree_skb+0xe/0x20 Feb 28 19:27:45 kernel: tcp_recvmsg+0x894/0xc60 Feb 28 19:27:45 kernel: ? reuse_swap_page+0x120/0x340 Feb 28 19:27:45 kernel: ? ptep_set_access_flags+0x23/0x30 Feb 28 19:27:45 kernel: inet_recvmsg+0x5b/0x100 Feb 28 19:27:45 kernel: __sys_recvfrom+0xc3/0x180 Feb 28 19:27:45 kernel: ? handle_mm_fault+0x10a/0x250 Feb 28 19:27:45 kernel: ? syscall_trace_enter+0x1d3/0x2d0 Feb 28 19:27:45 kernel: ? __audit_syscall_exit+0x22a/0x290 Feb 28 19:27:45 kernel: __x64_sys_recvfrom+0x24/0x30 Feb 28 19:27:45 kernel: do_syscall_64+0x5b/0x170 Feb 28 19:27:45 kernel: entry_SYSCALL_64_after_hwframe+0x44/0xa9 Cc: [email protected] Reported-and-tested-by: Jan Viktorin <[email protected]> Reviewed-by: Alexander Duyck <[email protected]> Signed-off-by: Stanislaw Gruszka <[email protected]> Fixes: 80187fd ('iommu/amd: Optimize map_sg and unmap_sg') Signed-off-by: Joerg Roedel <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
commit 17605af upstream. Since scsi_device_quiesce() skips SCSI devices that have another state than RUNNING, OFFLINE or TRANSPORT_OFFLINE, scsi_device_resume() should not complain about SCSI devices that have been skipped. Hence this patch. This patch avoids that the following warning appears during resume: WARNING: CPU: 3 PID: 1039 at blk_clear_pm_only+0x2a/0x30 CPU: 3 PID: 1039 Comm: kworker/u8:49 Not tainted 5.0.0+ #1 Hardware name: LENOVO 4180F42/4180F42, BIOS 83ET75WW (1.45 ) 05/10/2013 Workqueue: events_unbound async_run_entry_fn RIP: 0010:blk_clear_pm_only+0x2a/0x30 Call Trace: ? scsi_device_resume+0x28/0x50 ? scsi_dev_type_resume+0x2b/0x80 ? async_run_entry_fn+0x2c/0xd0 ? process_one_work+0x1f0/0x3f0 ? worker_thread+0x28/0x3c0 ? process_one_work+0x3f0/0x3f0 ? kthread+0x10c/0x130 ? __kthread_create_on_node+0x150/0x150 ? ret_from_fork+0x1f/0x30 Cc: Christoph Hellwig <[email protected]> Cc: Hannes Reinecke <[email protected]> Cc: Ming Lei <[email protected]> Cc: Johannes Thumshirn <[email protected]> Cc: Oleksandr Natalenko <[email protected]> Cc: Martin Steigerwald <[email protected]> Cc: <[email protected]> Reported-by: Jisheng Zhang <[email protected]> Tested-by: Jisheng Zhang <[email protected]> Fixes: 3a0a529 ("block, scsi: Make SCSI quiesce and resume work reliably") # v4.15 Signed-off-by: Bart Van Assche <[email protected]> Signed-off-by: Martin K. Petersen <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
commit fa30dde upstream. We see the following NULL pointer dereference while running xfstests generic/475: BUG: unable to handle kernel NULL pointer dereference at 0000000000000008 PGD 8000000c84bad067 P4D 8000000c84bad067 PUD c84e62067 PMD 0 Oops: 0000 [#1] SMP PTI CPU: 7 PID: 9886 Comm: fsstress Kdump: loaded Not tainted 5.0.0-rc8 #10 RIP: 0010:ext4_do_update_inode+0x4ec/0x760 ... Call Trace: ? jbd2_journal_get_write_access+0x42/0x50 ? __ext4_journal_get_write_access+0x2c/0x70 ? ext4_truncate+0x186/0x3f0 ext4_mark_iloc_dirty+0x61/0x80 ext4_mark_inode_dirty+0x62/0x1b0 ext4_truncate+0x186/0x3f0 ? unmap_mapping_pages+0x56/0x100 ext4_setattr+0x817/0x8b0 notify_change+0x1df/0x430 do_truncate+0x5e/0x90 ? generic_permission+0x12b/0x1a0 This is triggered because the NULL pointer handle->h_transaction was dereferenced in function ext4_update_inode_fsync_trans(). I found that the h_transaction was set to NULL in jbd2__journal_restart but failed to attached to a new transaction while the journal is aborted. Fix this by checking the handle before updating the inode. Fixes: b436b9b ("ext4: Wait for proper transaction commit on fsync") Signed-off-by: Jiufei Xue <[email protected]> Signed-off-by: Theodore Ts'o <[email protected]> Reviewed-by: Joseph Qi <[email protected]> Cc: [email protected] Signed-off-by: Greg Kroah-Hartman <[email protected]>
commit 4843298 upstream. Thread A Thread B - __fput - f2fs_release_file - drop_inmem_pages - mutex_lock(&fi->inmem_lock) - __revoke_inmem_pages - lock_page(page) - open - f2fs_setattr - truncate_setsize - truncate_inode_pages_range - lock_page(page) - truncate_cleanup_page - f2fs_invalidate_page - drop_inmem_page - mutex_lock(&fi->inmem_lock); We may encounter above ABBA deadlock as reported by Kyungtae Kim: I'm reporting a bug in linux-4.17.19: "INFO: task hung in drop_inmem_page" (no reproducer) I think this might be somehow related to the following: https://groups.google.com/forum/#!searchin/syzkaller-bugs/INFO$3A$20task$20hung$20in$20%7Csort:date/syzkaller-bugs/c6soBTrdaIo/AjAzPeIzCgAJ ========================================= INFO: task syz-executor7:10822 blocked for more than 120 seconds. Not tainted 4.17.19 #1 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. syz-executor7 D27024 10822 6346 0x00000004 Call Trace: context_switch kernel/sched/core.c:2867 [inline] __schedule+0x721/0x1e60 kernel/sched/core.c:3515 schedule+0x88/0x1c0 kernel/sched/core.c:3559 schedule_preempt_disabled+0x18/0x30 kernel/sched/core.c:3617 __mutex_lock_common kernel/locking/mutex.c:833 [inline] __mutex_lock+0x5bd/0x1410 kernel/locking/mutex.c:893 mutex_lock_nested+0x1b/0x20 kernel/locking/mutex.c:908 drop_inmem_page+0xcb/0x810 fs/f2fs/segment.c:327 f2fs_invalidate_page+0x337/0x5e0 fs/f2fs/data.c:2401 do_invalidatepage mm/truncate.c:165 [inline] truncate_cleanup_page+0x261/0x330 mm/truncate.c:187 truncate_inode_pages_range+0x552/0x1610 mm/truncate.c:367 truncate_inode_pages mm/truncate.c:478 [inline] truncate_pagecache+0x6d/0x90 mm/truncate.c:801 truncate_setsize+0x81/0xa0 mm/truncate.c:826 f2fs_setattr+0x44f/0x1270 fs/f2fs/file.c:781 notify_change+0xa62/0xe80 fs/attr.c:313 do_truncate+0x12e/0x1e0 fs/open.c:63 do_last fs/namei.c:2955 [inline] path_openat+0x2042/0x29f0 fs/namei.c:3505 do_filp_open+0x1bd/0x2c0 fs/namei.c:3540 do_sys_open+0x35e/0x4e0 fs/open.c:1101 __do_sys_open fs/open.c:1119 [inline] __se_sys_open fs/open.c:1114 [inline] __x64_sys_open+0x89/0xc0 fs/open.c:1114 do_syscall_64+0xc4/0x4e0 arch/x86/entry/common.c:287 entry_SYSCALL_64_after_hwframe+0x49/0xbe RIP: 0033:0x4497b9 RSP: 002b:00007f734e459c68 EFLAGS: 00000246 ORIG_RAX: 0000000000000002 RAX: ffffffffffffffda RBX: 00007f734e45a6cc RCX: 00000000004497b9 RDX: 0000000000000104 RSI: 00000000000a8280 RDI: 0000000020000080 RBP: 000000000071bea0 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 00000000ffffffff R13: 0000000000007230 R14: 00000000006f02d0 R15: 00007f734e45a700 INFO: task syz-executor7:10858 blocked for more than 120 seconds. Not tainted 4.17.19 #1 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. syz-executor7 D28880 10858 6346 0x00000004 Call Trace: context_switch kernel/sched/core.c:2867 [inline] __schedule+0x721/0x1e60 kernel/sched/core.c:3515 schedule+0x88/0x1c0 kernel/sched/core.c:3559 __rwsem_down_write_failed_common kernel/locking/rwsem-xadd.c:565 [inline] rwsem_down_write_failed+0x5e6/0xc90 kernel/locking/rwsem-xadd.c:594 call_rwsem_down_write_failed+0x17/0x30 arch/x86/lib/rwsem.S:117 __down_write arch/x86/include/asm/rwsem.h:142 [inline] down_write+0x58/0xa0 kernel/locking/rwsem.c:72 inode_lock include/linux/fs.h:713 [inline] do_truncate+0x120/0x1e0 fs/open.c:61 do_last fs/namei.c:2955 [inline] path_openat+0x2042/0x29f0 fs/namei.c:3505 do_filp_open+0x1bd/0x2c0 fs/namei.c:3540 do_sys_open+0x35e/0x4e0 fs/open.c:1101 __do_sys_open fs/open.c:1119 [inline] __se_sys_open fs/open.c:1114 [inline] __x64_sys_open+0x89/0xc0 fs/open.c:1114 do_syscall_64+0xc4/0x4e0 arch/x86/entry/common.c:287 entry_SYSCALL_64_after_hwframe+0x49/0xbe RIP: 0033:0x4497b9 RSP: 002b:00007f734e3b4c68 EFLAGS: 00000246 ORIG_RAX: 0000000000000002 RAX: ffffffffffffffda RBX: 00007f734e3b56cc RCX: 00000000004497b9 RDX: 0000000000000104 RSI: 00000000000a8280 RDI: 0000000020000080 RBP: 000000000071c238 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 00000000ffffffff R13: 0000000000007230 R14: 00000000006f02d0 R15: 00007f734e3b5700 INFO: task syz-executor5:10829 blocked for more than 120 seconds. Not tainted 4.17.19 #1 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. syz-executor5 D28760 10829 6308 0x80000002 Call Trace: context_switch kernel/sched/core.c:2867 [inline] __schedule+0x721/0x1e60 kernel/sched/core.c:3515 schedule+0x88/0x1c0 kernel/sched/core.c:3559 io_schedule+0x21/0x80 kernel/sched/core.c:5179 wait_on_page_bit_common mm/filemap.c:1100 [inline] __lock_page+0x2b5/0x390 mm/filemap.c:1273 lock_page include/linux/pagemap.h:483 [inline] __revoke_inmem_pages+0xb35/0x11c0 fs/f2fs/segment.c:231 drop_inmem_pages+0xa3/0x3e0 fs/f2fs/segment.c:306 f2fs_release_file+0x2c7/0x330 fs/f2fs/file.c:1556 __fput+0x2c7/0x780 fs/file_table.c:209 ____fput+0x1a/0x20 fs/file_table.c:243 task_work_run+0x151/0x1d0 kernel/task_work.c:113 exit_task_work include/linux/task_work.h:22 [inline] do_exit+0x8ba/0x30a0 kernel/exit.c:865 do_group_exit+0x13b/0x3a0 kernel/exit.c:968 get_signal+0x6bb/0x1650 kernel/signal.c:2482 do_signal+0x84/0x1b70 arch/x86/kernel/signal.c:810 exit_to_usermode_loop+0x155/0x190 arch/x86/entry/common.c:162 prepare_exit_to_usermode arch/x86/entry/common.c:196 [inline] syscall_return_slowpath arch/x86/entry/common.c:265 [inline] do_syscall_64+0x445/0x4e0 arch/x86/entry/common.c:290 entry_SYSCALL_64_after_hwframe+0x49/0xbe RIP: 0033:0x4497b9 RSP: 002b:00007f1c68e74ce8 EFLAGS: 00000246 ORIG_RAX: 00000000000000ca RAX: fffffffffffffe00 RBX: 000000000071bf80 RCX: 00000000004497b9 RDX: 0000000000000000 RSI: 0000000000000000 RDI: 000000000071bf80 RBP: 000000000071bf80 R08: 0000000000000000 R09: 000000000071bf58 R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000 R13: 0000000000000000 R14: 00007f1c68e759c0 R15: 00007f1c68e75700 This patch tries to use trylock_page to mitigate such deadlock condition for fix. Signed-off-by: Chao Yu <[email protected]> Signed-off-by: Jaegeuk Kim <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
commit 3897b6f upstream. Parity page is incorrectly unmapped in finish_parity_scrub(), triggering a reference counter bug on i386, i.e.: [ 157.662401] kernel BUG at mm/highmem.c:349! [ 157.666725] invalid opcode: 0000 [#1] SMP PTI The reason is that kunmap(p_page) was completely left out, so we never did an unmap for the p_page and the loop unmapping the rbio page was iterating over the wrong number of stripes: unmapping should be done with nr_data instead of rbio->real_stripes. Test case to reproduce the bug: - create a raid5 btrfs filesystem: # mkfs.btrfs -m raid5 -d raid5 /dev/sdb /dev/sdc /dev/sdd /dev/sde - mount it: # mount /dev/sdb /mnt - run btrfs scrub in a loop: # while :; do btrfs scrub start -BR /mnt; done BugLink: https://bugs.launchpad.net/bugs/1812845 Fixes: 5a6ac9e ("Btrfs, raid56: support parity scrub on raid56") CC: [email protected] # 4.4+ Reviewed-by: Johannes Thumshirn <[email protected]> Signed-off-by: Andrea Righi <[email protected]> Reviewed-by: David Sterba <[email protected]> Signed-off-by: David Sterba <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
commit 0ccc387 upstream. Back in commit a89ca6f ("Btrfs: fix fsync after truncate when no_holes feature is enabled") I added an assertion that is triggered when an inline extent is found to assert that the length of the (uncompressed) data the extent represents is the same as the i_size of the inode, since that is true most of the time I couldn't find or didn't remembered about any exception at that time. Later on the assertion was expanded twice to deal with a case of a compressed inline extent representing a range that matches the sector size followed by an expanding truncate, and another case where fallocate can update the i_size of the inode without adding or updating existing extents (if the fallocate range falls entirely within the first block of the file). These two expansion/fixes of the assertion were done by commit 7ed586d ("Btrfs: fix assertion on fsync of regular file when using no-holes feature") and commit 6399fb5 ("Btrfs: fix assertion failure during fsync in no-holes mode"). These however missed the case where an falloc expands the i_size of an inode to exactly the sector size and inline extent exists, for example: $ mkfs.btrfs -f -O no-holes /dev/sdc $ mount /dev/sdc /mnt $ xfs_io -f -c "pwrite -S 0xab 0 1096" /mnt/foobar wrote 1096/1096 bytes at offset 0 1 KiB, 1 ops; 0.0002 sec (4.448 MiB/sec and 4255.3191 ops/sec) $ xfs_io -c "falloc 1096 3000" /mnt/foobar $ xfs_io -c "fsync" /mnt/foobar Segmentation fault $ dmesg [701253.602385] assertion failed: len == i_size || (len == fs_info->sectorsize && btrfs_file_extent_compression(leaf, extent) != BTRFS_COMPRESS_NONE) || (len < i_size && i_size < fs_info->sectorsize), file: fs/btrfs/tree-log.c, line: 4727 [701253.602962] ------------[ cut here ]------------ [701253.603224] kernel BUG at fs/btrfs/ctree.h:3533! [701253.603503] invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC PTI [701253.603774] CPU: 2 PID: 7192 Comm: xfs_io Tainted: G W 5.0.0-rc8-btrfs-next-45 #1 [701253.604054] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.11.2-0-gf9626ccb91-prebuilt.qemu-project.org 04/01/2014 [701253.604650] RIP: 0010:assfail.constprop.23+0x18/0x1a [btrfs] (...) [701253.605591] RSP: 0018:ffffbb48c186bc48 EFLAGS: 00010286 [701253.605914] RAX: 00000000000000de RBX: ffff921d0a7afc08 RCX: 0000000000000000 [701253.606244] RDX: 0000000000000000 RSI: ffff921d36b16868 RDI: ffff921d36b16868 [701253.606580] RBP: ffffbb48c186bcf0 R08: 0000000000000000 R09: 0000000000000000 [701253.606913] R10: 0000000000000003 R11: 0000000000000000 R12: ffff921d05d2de18 [701253.607247] R13: ffff921d03b54000 R14: 0000000000000448 R15: ffff921d059ecf80 [701253.607769] FS: 00007f14da906700(0000) GS:ffff921d36b00000(0000) knlGS:0000000000000000 [701253.608163] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [701253.608516] CR2: 000056087ea9f278 CR3: 00000002268e8001 CR4: 00000000003606e0 [701253.608880] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [701253.609250] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [701253.609608] Call Trace: [701253.609994] btrfs_log_inode+0xdfb/0xe40 [btrfs] [701253.610383] btrfs_log_inode_parent+0x2be/0xa60 [btrfs] [701253.610770] ? do_raw_spin_unlock+0x49/0xc0 [701253.611150] btrfs_log_dentry_safe+0x4a/0x70 [btrfs] [701253.611537] btrfs_sync_file+0x3b2/0x440 [btrfs] [701253.612010] ? do_sysinfo+0xb0/0xf0 [701253.612552] do_fsync+0x38/0x60 [701253.612988] __x64_sys_fsync+0x10/0x20 [701253.613360] do_syscall_64+0x60/0x1b0 [701253.613733] entry_SYSCALL_64_after_hwframe+0x49/0xbe [701253.614103] RIP: 0033:0x7f14da4e66d0 (...) [701253.615250] RSP: 002b:00007fffa670fdb8 EFLAGS: 00000246 ORIG_RAX: 000000000000004a [701253.615647] RAX: ffffffffffffffda RBX: 0000000000000001 RCX: 00007f14da4e66d0 [701253.616047] RDX: 000056087ea9c260 RSI: 000056087ea9c260 RDI: 0000000000000003 [701253.616450] RBP: 0000000000000001 R08: 0000000000000020 R09: 0000000000000010 [701253.616854] R10: 000000000000009b R11: 0000000000000246 R12: 000056087ea9c260 [701253.617257] R13: 000056087ea9c240 R14: 0000000000000000 R15: 000056087ea9dd10 (...) [701253.619941] ---[ end trace e088d74f132b6da5 ]--- Updating the assertion again to allow for this particular case would result in a meaningless assertion, plus there is currently no risk of logging content that would result in any corruption after a log replay if the size of the data encoded in an inline extent is greater than the inode's i_size (which is not currently possibe either with or without compression), therefore just remove the assertion. CC: [email protected] # 4.4+ Signed-off-by: Filipe Manana <[email protected]> Signed-off-by: David Sterba <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
commit 23da958 upstream. Syzkaller reports: kasan: GPF could be caused by NULL-ptr deref or user memory access general protection fault: 0000 [#1] SMP KASAN PTI CPU: 1 PID: 5373 Comm: syz-executor.0 Not tainted 5.0.0-rc8+ #3 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1 04/01/2014 RIP: 0010:put_links+0x101/0x440 fs/proc/proc_sysctl.c:1599 Code: 00 0f 85 3a 03 00 00 48 8b 43 38 48 89 44 24 20 48 83 c0 38 48 89 c2 48 89 44 24 28 48 b8 00 00 00 00 00 fc ff df 48 c1 ea 03 <80> 3c 02 00 0f 85 fe 02 00 00 48 8b 74 24 20 48 c7 c7 60 2a 9d 91 RSP: 0018:ffff8881d828f238 EFLAGS: 00010202 RAX: dffffc0000000000 RBX: ffff8881e01b1140 RCX: ffffffff8ee98267 RDX: 0000000000000007 RSI: ffffc90001479000 RDI: ffff8881e01b1178 RBP: dffffc0000000000 R08: ffffed103ee27259 R09: ffffed103ee27259 R10: 0000000000000001 R11: ffffed103ee27258 R12: fffffffffffffff4 R13: 0000000000000006 R14: ffff8881f59838c0 R15: dffffc0000000000 FS: 00007f072254f700(0000) GS:ffff8881f7100000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007fff8b286668 CR3: 00000001f0542002 CR4: 00000000007606e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 PKRU: 55555554 Call Trace: drop_sysctl_table+0x152/0x9f0 fs/proc/proc_sysctl.c:1629 get_subdir fs/proc/proc_sysctl.c:1022 [inline] __register_sysctl_table+0xd65/0x1090 fs/proc/proc_sysctl.c:1335 br_netfilter_init+0xbc/0x1000 [br_netfilter] do_one_initcall+0xfa/0x5ca init/main.c:887 do_init_module+0x204/0x5f6 kernel/module.c:3460 load_module+0x66b2/0x8570 kernel/module.c:3808 __do_sys_finit_module+0x238/0x2a0 kernel/module.c:3902 do_syscall_64+0x147/0x600 arch/x86/entry/common.c:290 entry_SYSCALL_64_after_hwframe+0x49/0xbe RIP: 0033:0x462e99 Code: f7 d8 64 89 02 b8 ff ff ff ff c3 66 0f 1f 44 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 bc ff ff ff f7 d8 64 89 01 48 RSP: 002b:00007f072254ec58 EFLAGS: 00000246 ORIG_RAX: 0000000000000139 RAX: ffffffffffffffda RBX: 000000000073bf00 RCX: 0000000000462e99 RDX: 0000000000000000 RSI: 0000000020000280 RDI: 0000000000000003 RBP: 00007f072254ec70 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 00007f072254f6bc R13: 00000000004bcefa R14: 00000000006f6fb0 R15: 0000000000000004 Modules linked in: br_netfilter(+) dvb_usb_dibusb_mc_common dib3000mc dibx000_common dvb_usb_dibusb_common dvb_usb_dw2102 dvb_usb classmate_laptop palmas_regulator cn videobuf2_v4l2 v4l2_common snd_soc_bd28623 mptbase snd_usb_usx2y snd_usbmidi_lib snd_rawmidi wmi libnvdimm lockd sunrpc grace rc_kworld_pc150u rc_core rtc_da9063 sha1_ssse3 i2c_cros_ec_tunnel adxl34x_spi adxl34x nfnetlink lib80211 i5500_temp dvb_as102 dvb_core videobuf2_common videodev media videobuf2_vmalloc videobuf2_memops udc_core lnbp22 leds_lp3952 hid_roccat_ryos s1d13xxxfb mtd vport_geneve openvswitch nf_conncount nf_nat_ipv6 nsh geneve udp_tunnel ip6_udp_tunnel snd_soc_mt6351 sis_agp phylink snd_soc_adau1761_spi snd_soc_adau1761 snd_soc_adau17x1 snd_soc_core snd_pcm_dmaengine ac97_bus snd_compress snd_soc_adau_utils snd_soc_sigmadsp_regmap snd_soc_sigmadsp raid_class hid_roccat_konepure hid_roccat_common hid_roccat c2port_duramar2150 core mdio_bcm_unimac iptable_security iptable_raw iptable_mangle iptable_nat nf_nat_ipv4 nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 iptable_filter bpfilter ip6_vti ip_vti ip_gre ipip sit tunnel4 ip_tunnel hsr veth netdevsim devlink vxcan batman_adv cfg80211 rfkill chnl_net caif nlmon dummy team bonding vcan bridge stp llc ip6_gre gre ip6_tunnel tunnel6 tun crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel joydev mousedev ide_pci_generic piix aesni_intel aes_x86_64 ide_core crypto_simd atkbd cryptd glue_helper serio_raw ata_generic pata_acpi i2c_piix4 floppy sch_fq_codel ip_tables x_tables ipv6 [last unloaded: lm73] Dumping ftrace buffer: (ftrace buffer empty) ---[ end trace 770020de38961fd0 ]--- A new dir entry can be created in get_subdir and its 'header->parent' is set to NULL. Only after insert_header success, it will be set to 'dir', otherwise 'header->parent' is set to NULL and drop_sysctl_table is called. However in err handling path of get_subdir, drop_sysctl_table also be called on 'new->header' regardless its value of parent pointer. Then put_links is called, which triggers NULL-ptr deref when access member of header->parent. In fact we have multiple error paths which call drop_sysctl_table() there, upon failure on insert_links() we also call drop_sysctl_table().And even in the successful case on __register_sysctl_table() we still always call drop_sysctl_table().This patch fix it. Link: http://lkml.kernel.org/r/[email protected] Fixes: 0e47c99 ("sysctl: Replace root_list with links between sysctl_table_sets") Signed-off-by: YueHaibing <[email protected]> Reported-by: Hulk Robot <[email protected]> Acked-by: Luis Chamberlain <[email protected]> Cc: Kees Cook <[email protected]> Cc: Alexey Dobriyan <[email protected]> Cc: Alexei Starovoitov <[email protected]> Cc: Daniel Borkmann <[email protected]> Cc: Al Viro <[email protected]> Cc: Eric W. Biederman <[email protected]> Cc: <[email protected]> [3.4+] Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
commit d947075 upstream. Chandan reported that fstests' generic/026 test hit a crash: BUG: Unable to handle kernel data access at 0xc00000062ac40000 Faulting instruction address: 0xc000000000092240 Oops: Kernel access of bad area, sig: 11 [#1] LE SMP NR_CPUS=2048 DEBUG_PAGEALLOC NUMA pSeries CPU: 0 PID: 27828 Comm: chacl Not tainted 5.0.0-rc2-next-20190115-00001-g6de6dba64dda #1 NIP: c000000000092240 LR: c00000000066a55c CTR: 0000000000000000 REGS: c00000062c0c3430 TRAP: 0300 Not tainted (5.0.0-rc2-next-20190115-00001-g6de6dba64dda) MSR: 8000000002009033 <SF,VEC,EE,ME,IR,DR,RI,LE> CR: 44000842 XER: 20000000 CFAR: 00007fff7f3108ac DAR: c00000062ac40000 DSISR: 40000000 IRQMASK: 0 GPR00: 0000000000000000 c00000062c0c36c0 c0000000017f4c00 c00000000121a660 GPR04: c00000062ac3fff9 0000000000000004 0000000000000020 00000000275b19c4 GPR08: 000000000000000c 46494c4500000000 5347495f41434c5f c0000000026073a0 GPR12: 0000000000000000 c0000000027a0000 0000000000000000 0000000000000000 GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 GPR20: c00000062ea70020 c00000062c0c38d0 0000000000000002 0000000000000002 GPR24: c00000062ac3ffe8 00000000275b19c4 0000000000000001 c00000062ac30000 GPR28: c00000062c0c38d0 c00000062ac30050 c00000062ac30058 0000000000000000 NIP memcmp+0x120/0x690 LR xfs_attr3_leaf_lookup_int+0x53c/0x5b0 Call Trace: xfs_attr3_leaf_lookup_int+0x78/0x5b0 (unreliable) xfs_da3_node_lookup_int+0x32c/0x5a0 xfs_attr_node_addname+0x170/0x6b0 xfs_attr_set+0x2ac/0x340 __xfs_set_acl+0xf0/0x230 xfs_set_acl+0xd0/0x160 set_posix_acl+0xc0/0x130 posix_acl_xattr_set+0x68/0x110 __vfs_setxattr+0xa4/0x110 __vfs_setxattr_noperm+0xac/0x240 vfs_setxattr+0x128/0x130 setxattr+0x248/0x600 path_setxattr+0x108/0x120 sys_setxattr+0x28/0x40 system_call+0x5c/0x70 Instruction dump: 7d201c28 7d402428 7c295040 38630008 38840008 408201f0 4200ffe8 2c050000 4182ff6c 20c50008 54c61838 7d201c28 <7d402428> 7d293436 7d4a3436 7c295040 The instruction dump decodes as: subfic r6,r5,8 rlwinm r6,r6,3,0,28 ldbrx r9,0,r3 ldbrx r10,0,r4 <- Which shows us doing an 8 byte load from c00000062ac3fff9, which crosses the page boundary at c00000062ac40000 and faults. It's not OK for memcmp to read past the end of the source or destination buffers if that would cross a page boundary, because we don't know that the next page is mapped. As pointed out by Segher, we can read past the end of the source or destination as long as we don't cross a 4K boundary, because that's our minimum page size on all platforms. The bug is in the code at the .Lcmp_rest_lt8bytes label. When we get there we know that s1 is 8-byte aligned and we have at least 1 byte to read, so a single 8-byte load won't read past the end of s1 and cross a page boundary. But we have to be more careful with s2. So check if it's within 8 bytes of a 4K boundary and if so go to the byte-by-byte loop. Fixes: 2d9ee32 ("powerpc/64: Align bytes before fall back to .Lshort in powerpc64 memcmp()") Cc: [email protected] # v4.19+ Reported-by: Chandan Rajendra <[email protected]> Signed-off-by: Michael Ellerman <[email protected]> Reviewed-by: Segher Boessenkool <[email protected]> Tested-by: Chandan Rajendra <[email protected]> Signed-off-by: Michael Ellerman <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
[ Upstream commit bc31d0c ] We have a customer reporting crashes in lock_get_status() with many "Leaked POSIX lock" messages preceeding the crash. Leaked POSIX lock on dev=0x0:0x56 ... Leaked POSIX lock on dev=0x0:0x56 ... Leaked POSIX lock on dev=0x0:0x56 ... Leaked POSIX lock on dev=0x0:0x53 ... Leaked POSIX lock on dev=0x0:0x53 ... Leaked POSIX lock on dev=0x0:0x53 ... Leaked POSIX lock on dev=0x0:0x53 ... POSIX: fl_owner=ffff8900e7b79380 fl_flags=0x1 fl_type=0x1 fl_pid=20709 Leaked POSIX lock on dev=0x0:0x4b ino... Leaked locks on dev=0x0:0x4b ino=0xf911400000029: POSIX: fl_owner=ffff89f41c870e00 fl_flags=0x1 fl_type=0x1 fl_pid=19592 stack segment: 0000 [#1] SMP Modules linked in: binfmt_misc msr tcp_diag udp_diag inet_diag unix_diag af_packet_diag netlink_diag rpcsec_gss_krb5 arc4 ecb auth_rpcgss nfsv4 md4 nfs nls_utf8 lockd grace cifs sunrpc ccm dns_resolver fscache af_packet iscsi_ibft iscsi_boot_sysfs vmw_vsock_vmci_transport vsock xfs libcrc32c sb_edac edac_core crct10dif_pclmul crc32_pclmul ghash_clmulni_intel drbg ansi_cprng vmw_balloon aesni_intel aes_x86_64 lrw gf128mul glue_helper ablk_helper cryptd joydev pcspkr vmxnet3 i2c_piix4 vmw_vmci shpchp fjes processor button ac btrfs xor raid6_pq sr_mod cdrom ata_generic sd_mod ata_piix vmwgfx crc32c_intel drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm serio_raw ahci libahci drm libata vmw_pvscsi sg dm_multipath dm_mod scsi_dh_rdac scsi_dh_emc scsi_dh_alua scsi_mod autofs4 Supported: Yes CPU: 6 PID: 28250 Comm: lsof Not tainted 4.4.156-94.64-default #1 Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 04/05/2016 task: ffff88a345f28740 ti: ffff88c74005c000 task.ti: ffff88c74005c000 RIP: 0010:[<ffffffff8125dcab>] [<ffffffff8125dcab>] lock_get_status+0x9b/0x3b0 RSP: 0018:ffff88c74005fd90 EFLAGS: 00010202 RAX: ffff89bde83e20ae RBX: ffff89e870003d18 RCX: 0000000049534f50 RDX: ffffffff81a3541f RSI: ffffffff81a3544e RDI: ffff89bde83e20ae RBP: 0026252423222120 R08: 0000000020584953 R09: 000000000000ffff R10: 0000000000000000 R11: ffff88c74005fc70 R12: ffff89e5ca7b1340 R13: 00000000000050e5 R14: ffff89e870003d30 R15: ffff89e5ca7b1340 FS: 00007fafd64be800(0000) GS:ffff89f41fd00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000001c80018 CR3: 000000a522048000 CR4: 0000000000360670 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Stack: 0000000000000208 ffffffff81a3d6b6 ffff89e870003d30 ffff89e870003d18 ffff89e5ca7b1340 ffff89f41738d7c0 ffff89e870003d30 ffff89e5ca7b1340 ffffffff8125e08f 0000000000000000 ffff89bc22b67d00 ffff88c74005ff28 Call Trace: [<ffffffff8125e08f>] locks_show+0x2f/0x70 [<ffffffff81230ad1>] seq_read+0x251/0x3a0 [<ffffffff81275bbc>] proc_reg_read+0x3c/0x70 [<ffffffff8120e456>] __vfs_read+0x26/0x140 [<ffffffff8120e9da>] vfs_read+0x7a/0x120 [<ffffffff8120faf2>] SyS_read+0x42/0xa0 [<ffffffff8161cbc3>] entry_SYSCALL_64_fastpath+0x1e/0xb7 When Linux closes a FD (close(), close-on-exec, dup2(), ...) it calls filp_close() which also removes all posix locks. The lock struct is initialized like so in filp_close() and passed down to cifs ... lock.fl_type = F_UNLCK; lock.fl_flags = FL_POSIX | FL_CLOSE; lock.fl_start = 0; lock.fl_end = OFFSET_MAX; ... Note the FL_CLOSE flag, which hints the VFS code that this unlocking is done for closing the fd. filp_close() locks_remove_posix(filp, id); vfs_lock_file(filp, F_SETLK, &lock, NULL); return filp->f_op->lock(filp, cmd, fl) => cifs_lock() rc = cifs_setlk(file, flock, type, wait_flag, posix_lck, lock, unlock, xid); rc = server->ops->mand_unlock_range(cfile, flock, xid); if (flock->fl_flags & FL_POSIX && !rc) rc = locks_lock_file_wait(file, flock) Notice how we don't call locks_lock_file_wait() which does the generic VFS lock/unlock/wait work on the inode if rc != 0. If we are closing the handle, the SMB server is supposed to remove any locks associated with it. Similarly, cifs.ko frees and wakes up any lock and lock waiter when closing the file: cifs_close() cifsFileInfo_put(file->private_data) /* * Delete any outstanding lock records. We'll lose them when the file * is closed anyway. */ down_write(&cifsi->lock_sem); list_for_each_entry_safe(li, tmp, &cifs_file->llist->locks, llist) { list_del(&li->llist); cifs_del_lock_waiters(li); kfree(li); } list_del(&cifs_file->llist->llist); kfree(cifs_file->llist); up_write(&cifsi->lock_sem); So we can safely ignore unlocking failures in cifs_lock() if they happen with the FL_CLOSE flag hint set as both the server and the client take care of it during the actual closing. This is not a proper fix for the unlocking failure but it's safe and it seems to prevent the lock leakages and crashes the customer experiences. Signed-off-by: Aurelien Aptel <[email protected]> Signed-off-by: NeilBrown <[email protected]> Signed-off-by: Steve French <[email protected]> Acked-by: Pavel Shilovsky <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
[ Upstream commit d8dbb58 ] if secmark rules fail to unpack a double free happens resulting in the following oops [ 1295.584074] audit: type=1400 audit(1549970525.256:51): apparmor="STATUS" info="failed to unpack profile secmark rules" error=-71 profile="unconfined" name="/root/test" pid=29882 comm="apparmor_parser" name="/root/test" offset=120 [ 1374.042334] ------------[ cut here ]------------ [ 1374.042336] kernel BUG at mm/slub.c:294! [ 1374.042404] invalid opcode: 0000 [#1] SMP PTI [ 1374.042436] CPU: 0 PID: 29921 Comm: apparmor_parser Not tainted 4.20.7-042007-generic #201902061234 [ 1374.042461] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1 04/01/2014 [ 1374.042489] RIP: 0010:kfree+0x164/0x180 [ 1374.042502] Code: 74 05 41 0f b6 72 51 4c 89 d7 e8 37 cd f8 ff eb 8b 41 b8 01 00 00 00 48 89 d9 48 89 da 4c 89 d6 e8 11 f6 ff ff e9 72 ff ff ff <0f> 0b 49 8b 42 08 a8 01 75 c2 0f 0b 48 8b 3d a9 f4 19 01 e9 c5 fe [ 1374.042552] RSP: 0018:ffffaf7b812d7b90 EFLAGS: 00010246 [ 1374.042568] RAX: ffff91e437679200 RBX: ffff91e437679200 RCX: ffff91e437679200 [ 1374.042589] RDX: 00000000000088b6 RSI: ffff91e43da27060 RDI: ffff91e43d401a80 [ 1374.042609] RBP: ffffaf7b812d7ba8 R08: 0000000000027080 R09: ffffffffa6627a6d [ 1374.042629] R10: ffffd3af41dd9e40 R11: ffff91e43a1740dc R12: ffff91e3f52e8000 [ 1374.042650] R13: ffffffffa6627a6d R14: ffffffffffffffb9 R15: 0000000000000001 [ 1374.042675] FS: 00007f928df77740(0000) GS:ffff91e43da00000(0000) knlGS:0000000000000000 [ 1374.042697] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 1374.042714] CR2: 000055a0c3ab6b50 CR3: 0000000079ed8004 CR4: 0000000000360ef0 [ 1374.042737] Call Trace: [ 1374.042750] kzfree+0x2d/0x40 [ 1374.042763] aa_free_profile+0x12b/0x270 [ 1374.042776] unpack_profile+0xc1/0xf10 [ 1374.042790] aa_unpack+0x115/0x4e0 [ 1374.042802] aa_replace_profiles+0x8e/0xcc0 [ 1374.042817] ? kvmalloc_node+0x6d/0x80 [ 1374.042831] ? __check_object_size+0x166/0x192 [ 1374.042845] policy_update+0xcf/0x1b0 [ 1374.042858] profile_load+0x7d/0xa0 [ 1374.042871] __vfs_write+0x3a/0x190 [ 1374.042883] ? apparmor_file_permission+0x1a/0x20 [ 1374.042899] ? security_file_permission+0x31/0xc0 [ 1374.042918] ? _cond_resched+0x19/0x30 [ 1374.042931] vfs_write+0xab/0x1b0 [ 1374.042963] ksys_write+0x55/0xc0 [ 1374.043004] __x64_sys_write+0x1a/0x20 [ 1374.043046] do_syscall_64+0x5a/0x110 [ 1374.043087] entry_SYSCALL_64_after_hwframe+0x44/0xa9 Fixes: 9caafbe ("apparmor: Parse secmark policy") Reported-by: Alex Murray <[email protected]> Signed-off-by: John Johansen <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
[ Upstream commit 6988f31 ] Replace superfluous VM_BUG_ON() with comment about correct usage. Technically reverts commit 1d148e2 ("mm: add VM_BUG_ON_PAGE() to page_mapcount()"), but context lines have changed. Function isolate_migratepages_block() runs some checks out of lru_lock when choose pages for migration. After checking PageLRU() it checks extra page references by comparing page_count() and page_mapcount(). Between these two checks page could be removed from lru, freed and taken by slab. As a result this race triggers VM_BUG_ON(PageSlab()) in page_mapcount(). Race window is tiny. For certain workload this happens around once a year. page:ffffea0105ca9380 count:1 mapcount:0 mapping:ffff88ff7712c180 index:0x0 compound_mapcount: 0 flags: 0x500000000008100(slab|head) raw: 0500000000008100 dead000000000100 dead000000000200 ffff88ff7712c180 raw: 0000000000000000 0000000080200020 00000001ffffffff 0000000000000000 page dumped because: VM_BUG_ON_PAGE(PageSlab(page)) ------------[ cut here ]------------ kernel BUG at ./include/linux/mm.h:628! invalid opcode: 0000 [#1] SMP NOPTI CPU: 77 PID: 504 Comm: kcompactd1 Tainted: G W 4.19.109-27 #1 Hardware name: Yandex T175-N41-Y3N/MY81-EX0-Y3N, BIOS R05 06/20/2019 RIP: 0010:isolate_migratepages_block+0x986/0x9b0 The code in isolate_migratepages_block() was added in commit 119d6d5 ("mm, compaction: avoid isolating pinned pages") before adding VM_BUG_ON into page_mapcount(). This race has been predicted in 2015 by Vlastimil Babka (see link below). [[email protected]: comment tweaks, per Hugh] Fixes: 1d148e2 ("mm: add VM_BUG_ON_PAGE() to page_mapcount()") Signed-off-by: Konstantin Khlebnikov <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Acked-by: Hugh Dickins <[email protected]> Acked-by: Kirill A. Shutemov <[email protected]> Acked-by: Vlastimil Babka <[email protected]> Cc: David Rientjes <[email protected]> Cc: <[email protected]> Link: http://lkml.kernel.org/r/159032779896.957378.7852761411265662220.stgit@buzz Link: https://lore.kernel.org/lkml/[email protected]/ Link: https://lore.kernel.org/linux-mm/158937872515.474360.5066096871639561424.stgit@buzz/T/ (v1) Signed-off-by: Linus Torvalds <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
[ Upstream commit bf71bc1 ] The Debian kernel v5.6 triggers this kernel panic: Kernel panic - not syncing: Bad Address (null pointer deref?) Bad Address (null pointer deref?): Code=26 (Data memory access rights trap) at addr 0000000000000000 CPU: 0 PID: 0 Comm: swapper Not tainted 5.6.0-2-parisc64 #1 Debian 5.6.14-1 IAOQ[0]: mem_init+0xb0/0x150 IAOQ[1]: mem_init+0xb4/0x150 RP(r2): start_kernel+0x6c8/0x1190 Backtrace: [<0000000040101ab4>] start_kernel+0x6c8/0x1190 [<0000000040108574>] start_parisc+0x158/0x1b8 on a HP-PARISC rp3440 machine with this memory layout: Memory Ranges: 0) Start 0x0000000000000000 End 0x000000003fffffff Size 1024 MB 1) Start 0x0000004040000000 End 0x00000040ffdfffff Size 3070 MB Fix the crash by avoiding virt_to_page() and similar functions in mem_init() until the memory zones have been fully set up. Signed-off-by: Helge Deller <[email protected]> Cc: [email protected] # v5.0+ Signed-off-by: Sasha Levin <[email protected]>
commit e2d4a80 upstream. On a non-forwarding 802.11s link between two fairly busy neighboring nodes (iperf with -P 16 at ~850MBit/s TCP; 1733.3 MBit/s VHT-MCS 9 80MHz short GI VHT-NSS 4), so with frequent PREQ retries, usually after around 30-40 seconds the following crash would occur: [ 1110.822428] Unable to handle kernel read from unreadable memory at virtual address 00000000 [ 1110.830786] Mem abort info: [ 1110.833573] Exception class = IABT (current EL), IL = 32 bits [ 1110.839494] SET = 0, FnV = 0 [ 1110.842546] EA = 0, S1PTW = 0 [ 1110.845678] user pgtable: 4k pages, 48-bit VAs, pgd = ffff800076386000 [ 1110.852204] [0000000000000000] *pgd=00000000f6322003, *pud=00000000f62de003, *pmd=0000000000000000 [ 1110.861167] Internal error: Oops: 86000004 [#1] PREEMPT SMP [ 1110.866730] Modules linked in: pppoe ppp_async batman_adv ath10k_pci ath10k_core ath pppox ppp_generic nf_conntrack_ipv6 mac80211 iptable_nat ipt_REJECT ipt_MASQUERADE cfg80211 xt_time xt_tcpudp xt_state xt_nat xt_multiport xt_mark xt_mac xt_limit xt_conntrack xt_comment xt_TCPMSS xt_REDIRECT xt_LOG xt_FLOWOFFLOAD slhc nf_reject_ipv4 nf_nat_redirect nf_nat_masquerade_ipv4 nf_conntrack_ipv4 nf_nat_ipv4 nf_nat nf_log_ipv4 nf_flow_table_hw nf_flow_table nf_defrag_ipv6 nf_defrag_ipv4 nf_conntrack_rtcache nf_conntrack iptable_mangle iptable_filter ip_tables crc_ccitt compat nf_log_ipv6 nf_log_common ip6table_mangle ip6table_filter ip6_tables ip6t_REJECT x_tables nf_reject_ipv6 usb_storage xhci_plat_hcd xhci_pci xhci_hcd dwc3 usbcore usb_common [ 1110.932190] Process swapper/3 (pid: 0, stack limit = 0xffff0000090c8000) [ 1110.938884] CPU: 3 PID: 0 Comm: swapper/3 Not tainted 4.14.162 #0 [ 1110.944965] Hardware name: LS1043A RGW Board (DT) [ 1110.949658] task: ffff8000787a81c0 task.stack: ffff0000090c8000 [ 1110.955568] PC is at 0x0 [ 1110.958097] LR is at call_timer_fn.isra.27+0x24/0x78 [ 1110.963055] pc : [<0000000000000000>] lr : [<ffff0000080ff29c>] pstate: 00400145 [ 1110.970440] sp : ffff00000801be10 [ 1110.973744] x29: ffff00000801be10 x28: ffff000008bf7018 [ 1110.979047] x27: ffff000008bf87c8 x26: ffff000008c160c0 [ 1110.984352] x25: 0000000000000000 x24: 0000000000000000 [ 1110.989657] x23: dead000000000200 x22: 0000000000000000 [ 1110.994959] x21: 0000000000000000 x20: 0000000000000101 [ 1111.000262] x19: ffff8000787a81c0 x18: 0000000000000000 [ 1111.005565] x17: ffff0000089167b0 x16: 0000000000000058 [ 1111.010868] x15: ffff0000089167b0 x14: 0000000000000000 [ 1111.016172] x13: ffff000008916788 x12: 0000000000000040 [ 1111.021475] x11: ffff80007fda9af0 x10: 0000000000000001 [ 1111.026777] x9 : ffff00000801bea0 x8 : 0000000000000004 [ 1111.032080] x7 : 0000000000000000 x6 : ffff80007fda9aa8 [ 1111.037383] x5 : ffff00000801bea0 x4 : 0000000000000010 [ 1111.042685] x3 : ffff00000801be98 x2 : 0000000000000614 [ 1111.047988] x1 : 0000000000000000 x0 : 0000000000000000 [ 1111.053290] Call trace: [ 1111.055728] Exception stack(0xffff00000801bcd0 to 0xffff00000801be10) [ 1111.062158] bcc0: 0000000000000000 0000000000000000 [ 1111.069978] bce0: 0000000000000614 ffff00000801be98 0000000000000010 ffff00000801bea0 [ 1111.077798] bd00: ffff80007fda9aa8 0000000000000000 0000000000000004 ffff00000801bea0 [ 1111.085618] bd20: 0000000000000001 ffff80007fda9af0 0000000000000040 ffff000008916788 [ 1111.093437] bd40: 0000000000000000 ffff0000089167b0 0000000000000058 ffff0000089167b0 [ 1111.101256] bd60: 0000000000000000 ffff8000787a81c0 0000000000000101 0000000000000000 [ 1111.109075] bd80: 0000000000000000 dead000000000200 0000000000000000 0000000000000000 [ 1111.116895] bda0: ffff000008c160c0 ffff000008bf87c8 ffff000008bf7018 ffff00000801be10 [ 1111.124715] bdc0: ffff0000080ff29c ffff00000801be10 0000000000000000 0000000000400145 [ 1111.132534] bde0: ffff8000787a81c0 ffff00000801bde8 0000ffffffffffff 000001029eb19be8 [ 1111.140353] be00: ffff00000801be10 0000000000000000 [ 1111.145220] [< (null)>] (null) [ 1111.149917] [<ffff0000080ff77c>] run_timer_softirq+0x184/0x398 [ 1111.155741] [<ffff000008081938>] __do_softirq+0x100/0x1fc [ 1111.161130] [<ffff0000080a2e28>] irq_exit+0x80/0xd8 [ 1111.166002] [<ffff0000080ea708>] __handle_domain_irq+0x88/0xb0 [ 1111.171825] [<ffff000008081678>] gic_handle_irq+0x68/0xb0 [ 1111.177213] Exception stack(0xffff0000090cbe30 to 0xffff0000090cbf70) [ 1111.183642] be20: 0000000000000020 0000000000000000 [ 1111.191461] be40: 0000000000000001 0000000000000000 00008000771af000 0000000000000000 [ 1111.199281] be60: ffff000008c95180 0000000000000000 ffff000008c19360 ffff0000090cbef0 [ 1111.207101] be80: 0000000000000810 0000000000000400 0000000000000098 ffff000000000000 [ 1111.214920] bea0: 0000000000000001 ffff0000089167b0 0000000000000000 ffff0000089167b0 [ 1111.222740] bec0: 0000000000000000 ffff000008c198e8 ffff000008bf7018 ffff000008c19000 [ 1111.230559] bee0: 0000000000000000 0000000000000000 ffff8000787a81c0 ffff000008018000 [ 1111.238380] bf00: ffff00000801c000 ffff00000913ba34 ffff8000787a81c0 ffff0000090cbf70 [ 1111.246199] bf20: ffff0000080857cc ffff0000090cbf70 ffff0000080857d0 0000000000400145 [ 1111.254020] bf40: ffff000008018000 ffff00000801c000 ffffffffffffffff ffff0000080fa574 [ 1111.261838] bf60: ffff0000090cbf70 ffff0000080857d0 [ 1111.266706] [<ffff0000080832e8>] el1_irq+0xe8/0x18c [ 1111.271576] [<ffff0000080857d0>] arch_cpu_idle+0x10/0x18 [ 1111.276880] [<ffff0000080d7de4>] do_idle+0xec/0x1b8 [ 1111.281748] [<ffff0000080d8020>] cpu_startup_entry+0x20/0x28 [ 1111.287399] [<ffff00000808f81c>] secondary_start_kernel+0x104/0x110 [ 1111.293662] Code: bad PC value [ 1111.296710] ---[ end trace 555b6ca4363c3edd ]--- [ 1111.301318] Kernel panic - not syncing: Fatal exception in interrupt [ 1111.307661] SMP: stopping secondary CPUs [ 1111.311574] Kernel Offset: disabled [ 1111.315053] CPU features: 0x0002000 [ 1111.318530] Memory Limit: none [ 1111.321575] Rebooting in 3 seconds.. With some added debug output / delays we were able to push the crash from the timer callback runner into the callback function and by that shedding some light on which object holding the timer gets corrupted: [ 401.720899] Unable to handle kernel read from unreadable memory at virtual address 00000868 [...] [ 402.335836] [<ffff0000088fafa4>] _raw_spin_lock_bh+0x14/0x48 [ 402.341548] [<ffff000000dbe684>] mesh_path_timer+0x10c/0x248 [mac80211] [ 402.348154] [<ffff0000080ff29c>] call_timer_fn.isra.27+0x24/0x78 [ 402.354150] [<ffff0000080ff77c>] run_timer_softirq+0x184/0x398 [ 402.359974] [<ffff000008081938>] __do_softirq+0x100/0x1fc [ 402.365362] [<ffff0000080a2e28>] irq_exit+0x80/0xd8 [ 402.370231] [<ffff0000080ea708>] __handle_domain_irq+0x88/0xb0 [ 402.376053] [<ffff000008081678>] gic_handle_irq+0x68/0xb0 The issue happens due to the following sequence of events: 1) mesh_path_start_discovery(): -> spin_unlock_bh(&mpath->state_lock) before mesh_path_sel_frame_tx() 2) mesh_path_free_rcu() -> del_timer_sync(&mpath->timer) [...] -> kfree_rcu(mpath) 3) mesh_path_start_discovery(): -> mod_timer(&mpath->timer, ...) [...] -> rcu_read_unlock() 4) mesh_path_free_rcu()'s kfree_rcu(): -> kfree(mpath) 5) mesh_path_timer() starts after timeout, using freed mpath object So a use-after-free issue due to a timer re-arming bug caused by an early spin-unlocking. This patch fixes this issue by re-checking if mpath is about to be free'd and if so bails out of re-arming the timer. Cc: [email protected] Fixes: 050ac52 ("mac80211: code for on-demand Hybrid Wireless Mesh Protocol") Cc: Simon Wunderlich <[email protected]> Signed-off-by: Linus Lüssing <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Johannes Berg <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
commit 8874347 upstream. The intermediate result of the old term (4UL * 1024 * 1024 * 1024) is 4 294 967 296 or 0x100000000 which is no problem on 64 bit systems. The patch does not change the later overall result of 0x100000 for MAX_DMA32_PFN (after it has been shifted by PAGE_SHIFT). The new calculation yields the same result, but does not require 64 bit arithmetic. On 32 bit systems the old calculation suffers from an arithmetic overflow in that intermediate term in braces: 4UL aka unsigned long int is 4 byte wide and an arithmetic overflow happens (the 0x100000000 does not fit in 4 bytes), the in braces result is truncated to zero, the following right shift does not alter that, so MAX_DMA32_PFN evaluates to 0 on 32 bit systems. That wrong value is a problem in a comparision against MAX_DMA32_PFN in the init code for swiotlb in pci_swiotlb_detect_4gb() to decide if swiotlb should be active. That comparison yields the opposite result, when compiling on 32 bit systems. This was not possible before 1b7e03e ("x86, NUMA: Enable emulation on 32bit too") when that MAX_DMA32_PFN was first made visible to x86_32 (and which landed in v3.0). In practice this wasn't a problem, unless CONFIG_SWIOTLB is active on x86-32. However if one has set CONFIG_IOMMU_INTEL, since c5a5dc4 ("iommu/vt-d: Don't switch off swiotlb if bounce page is used") there's a dependency on CONFIG_SWIOTLB, which was not necessarily active before. That landed in v5.4, where we noticed it in the fli4l Linux distribution. We have CONFIG_IOMMU_INTEL active on both 32 and 64 bit kernel configs there (I could not find out why, so let's just say historical reasons). The effect is at boot time 64 MiB (default size) were allocated for bounce buffers now, which is a noticeable amount of memory on small systems like pcengines ALIX 2D3 with 256 MiB memory, which are still frequently used as home routers. We noticed this effect when migrating from kernel v4.19 (LTS) to v5.4 (LTS) in fli4l and got that kernel messages for example: Linux version 5.4.22 (buildroot@buildroot) (gcc version 7.3.0 (Buildroot 2018.02.8)) #1 SMP Mon Nov 26 23:40:00 CET 2018 … Memory: 183484K/261756K available (4594K kernel code, 393K rwdata, 1660K rodata, 536K init, 456K bss , 78272K reserved, 0K cma-reserved, 0K highmem) … PCI-DMA: Using software bounce buffering for IO (SWIOTLB) software IO TLB: mapped [mem 0x0bb78000-0x0fb78000] (64MB) The initial analysis and the suggested fix was done by user 'sourcejedi' at stackoverflow and explicitly marked as GPLv2 for inclusion in the Linux kernel: https://unix.stackexchange.com/a/520525/50007 The new calculation, which does not suffer from that overflow, is the same as for arch/mips now as suggested by Robin Murphy. The fix was tested by fli4l users on round about two dozen different systems, including both 32 and 64 bit archs, bare metal and virtualized machines. [ bp: Massage commit message. ] Fixes: 1b7e03e ("x86, NUMA: Enable emulation on 32bit too") Reported-by: Alan Jenkins <[email protected]> Suggested-by: Robin Murphy <[email protected]> Signed-off-by: Alexander Dahl <[email protected]> Signed-off-by: Borislav Petkov <[email protected]> Reviewed-by: Greg Kroah-Hartman <[email protected]> Cc: [email protected] Link: https://unix.stackexchange.com/q/520065/50007 Link: https://web.nettworks.org/bugs/browse/FFL-2560 Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Greg Kroah-Hartman <[email protected]>
commit c95c5f5 upstream. Here is the steps to reproduce the problem: ip netns add foo ip netns add bar ip -n foo link add xfrmi0 type xfrm dev lo if_id 42 ip -n foo link set xfrmi0 netns bar ip netns del foo ip netns del bar Which results to: [ 186.686395] general protection fault, probably for non-canonical address 0x6b6b6b6b6b6b6bd3: 0000 [#1] SMP PTI [ 186.687665] CPU: 7 PID: 232 Comm: kworker/u16:2 Not tainted 5.6.0+ #1 [ 186.688430] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014 [ 186.689420] Workqueue: netns cleanup_net [ 186.689903] RIP: 0010:xfrmi_dev_uninit+0x1b/0x4b [xfrm_interface] [ 186.690657] Code: 44 f6 ff ff 31 c0 5b 5d 41 5c 41 5d 41 5e c3 48 8d 8f c0 08 00 00 8b 05 ce 14 00 00 48 8b 97 d0 08 00 00 48 8b 92 c0 0e 00 00 <48> 8b 14 c2 48 8b 02 48 85 c0 74 19 48 39 c1 75 0c 48 8b 87 c0 08 [ 186.692838] RSP: 0018:ffffc900003b7d68 EFLAGS: 00010286 [ 186.693435] RAX: 000000000000000d RBX: ffff8881b0f31000 RCX: ffff8881b0f318c0 [ 186.694334] RDX: 6b6b6b6b6b6b6b6b RSI: 0000000000000246 RDI: ffff8881b0f31000 [ 186.695190] RBP: ffffc900003b7df0 R08: ffff888236c07740 R09: 0000000000000040 [ 186.696024] R10: ffffffff81fce1b8 R11: 0000000000000002 R12: ffffc900003b7d80 [ 186.696859] R13: ffff8881edcc6a40 R14: ffff8881a1b6e780 R15: ffffffff81ed47c8 [ 186.697738] FS: 0000000000000000(0000) GS:ffff888237dc0000(0000) knlGS:0000000000000000 [ 186.698705] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 186.699408] CR2: 00007f2129e93148 CR3: 0000000001e0a000 CR4: 00000000000006e0 [ 186.700221] Call Trace: [ 186.700508] rollback_registered_many+0x32b/0x3fd [ 186.701058] ? __rtnl_unlock+0x20/0x3d [ 186.701494] ? arch_local_irq_save+0x11/0x17 [ 186.702012] unregister_netdevice_many+0x12/0x55 [ 186.702594] default_device_exit_batch+0x12b/0x150 [ 186.703160] ? prepare_to_wait_exclusive+0x60/0x60 [ 186.703719] cleanup_net+0x17d/0x234 [ 186.704138] process_one_work+0x196/0x2e8 [ 186.704652] worker_thread+0x1a4/0x249 [ 186.705087] ? cancel_delayed_work+0x92/0x92 [ 186.705620] kthread+0x105/0x10f [ 186.706000] ? __kthread_bind_mask+0x57/0x57 [ 186.706501] ret_from_fork+0x35/0x40 [ 186.706978] Modules linked in: xfrm_interface nfsv3 nfs_acl auth_rpcgss nfsv4 nfs lockd grace fscache sunrpc button parport_pc parport serio_raw evdev pcspkr loop ext4 crc16 mbcache jbd2 crc32c_generic 8139too ide_cd_mod cdrom ide_gd_mod ata_generic ata_piix libata scsi_mod piix psmouse i2c_piix4 ide_core 8139cp i2c_core mii floppy [ 186.710423] ---[ end trace 463bba18105537e5 ]--- The problem is that x-netns xfrm interface are not removed when the link netns is removed. This causes later this oops when thoses interfaces are removed. Let's add a handler to remove all interfaces related to a netns when this netns is removed. Fixes: f203b76 ("xfrm: Add virtual xfrm interfaces") Reported-by: Christophe Gouault <[email protected]> Signed-off-by: Nicolas Dichtel <[email protected]> Signed-off-by: Steffen Klassert <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
commit f6a23d8 upstream. This patch is to fix a crash: [ ] kasan: GPF could be caused by NULL-ptr deref or user memory access [ ] general protection fault: 0000 [#1] SMP KASAN PTI [ ] RIP: 0010:ipv6_local_error+0xac/0x7a0 [ ] Call Trace: [ ] xfrm6_local_error+0x1eb/0x300 [ ] xfrm_local_error+0x95/0x130 [ ] __xfrm6_output+0x65f/0xb50 [ ] xfrm6_output+0x106/0x46f [ ] udp_tunnel6_xmit_skb+0x618/0xbf0 [ip6_udp_tunnel] [ ] vxlan_xmit_one+0xbc6/0x2c60 [vxlan] [ ] vxlan_xmit+0x6a0/0x4276 [vxlan] [ ] dev_hard_start_xmit+0x165/0x820 [ ] __dev_queue_xmit+0x1ff0/0x2b90 [ ] ip_finish_output2+0xd3e/0x1480 [ ] ip_do_fragment+0x182d/0x2210 [ ] ip_output+0x1d0/0x510 [ ] ip_send_skb+0x37/0xa0 [ ] raw_sendmsg+0x1b4c/0x2b80 [ ] sock_sendmsg+0xc0/0x110 This occurred when sending a v4 skb over vxlan6 over ipsec, in which case skb->protocol == htons(ETH_P_IPV6) while skb->sk->sk_family == AF_INET in xfrm_local_error(). Then it will go to xfrm6_local_error() where it tries to get ipv6 info from a ipv4 sk. This issue was actually fixed by Commit 628e341 ("xfrm: make local error reporting more robust"), but brought back by Commit 844d487 ("xfrm: choose protocol family by skb protocol"). So to fix it, we should call xfrm6_local_error() only when skb->protocol is htons(ETH_P_IPV6) and skb->sk->sk_family is AF_INET6. Fixes: 844d487 ("xfrm: choose protocol family by skb protocol") Reported-by: Xiumei Mu <[email protected]> Signed-off-by: Xin Long <[email protected]> Signed-off-by: Steffen Klassert <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
commit 2b86cb8 upstream. Be there a platform with the following layout: Regular NIC | +----> DSA master for switch port | +----> DSA master for another switch port After changing DSA back to static lockdep class keys in commit 1a33e10 ("net: partially revert dynamic lockdep key changes"), this kernel splat can be seen: [ 13.361198] ============================================ [ 13.366524] WARNING: possible recursive locking detected [ 13.371851] 5.7.0-rc4-02121-gc32a05ecd7af-dirty #988 Not tainted [ 13.377874] -------------------------------------------- [ 13.383201] swapper/0/0 is trying to acquire lock: [ 13.388004] ffff0000668ff298 (&dsa_slave_netdev_xmit_lock_key){+.-.}-{2:2}, at: __dev_queue_xmit+0x84c/0xbe0 [ 13.397879] [ 13.397879] but task is already holding lock: [ 13.403727] ffff0000661a1698 (&dsa_slave_netdev_xmit_lock_key){+.-.}-{2:2}, at: __dev_queue_xmit+0x84c/0xbe0 [ 13.413593] [ 13.413593] other info that might help us debug this: [ 13.420140] Possible unsafe locking scenario: [ 13.420140] [ 13.426075] CPU0 [ 13.428523] ---- [ 13.430969] lock(&dsa_slave_netdev_xmit_lock_key); [ 13.435946] lock(&dsa_slave_netdev_xmit_lock_key); [ 13.440924] [ 13.440924] *** DEADLOCK *** [ 13.440924] [ 13.446860] May be due to missing lock nesting notation [ 13.446860] [ 13.453668] 6 locks held by swapper/0/0: [ 13.457598] #0: ffff800010003de0 ((&idev->mc_ifc_timer)){+.-.}-{0:0}, at: call_timer_fn+0x0/0x400 [ 13.466593] #1: ffffd4d3fb478700 (rcu_read_lock){....}-{1:2}, at: mld_sendpack+0x0/0x560 [ 13.474803] #2: ffffd4d3fb478728 (rcu_read_lock_bh){....}-{1:2}, at: ip6_finish_output2+0x64/0xb10 [ 13.483886] #3: ffffd4d3fb478728 (rcu_read_lock_bh){....}-{1:2}, at: __dev_queue_xmit+0x6c/0xbe0 [ 13.492793] #4: ffff0000661a1698 (&dsa_slave_netdev_xmit_lock_key){+.-.}-{2:2}, at: __dev_queue_xmit+0x84c/0xbe0 [ 13.503094] #5: ffffd4d3fb478728 (rcu_read_lock_bh){....}-{1:2}, at: __dev_queue_xmit+0x6c/0xbe0 [ 13.512000] [ 13.512000] stack backtrace: [ 13.516369] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.7.0-rc4-02121-gc32a05ecd7af-dirty #988 [ 13.530421] Call trace: [ 13.532871] dump_backtrace+0x0/0x1d8 [ 13.536539] show_stack+0x24/0x30 [ 13.539862] dump_stack+0xe8/0x150 [ 13.543271] __lock_acquire+0x1030/0x1678 [ 13.547290] lock_acquire+0xf8/0x458 [ 13.550873] _raw_spin_lock+0x44/0x58 [ 13.554543] __dev_queue_xmit+0x84c/0xbe0 [ 13.558562] dev_queue_xmit+0x24/0x30 [ 13.562232] dsa_slave_xmit+0xe0/0x128 [ 13.565988] dev_hard_start_xmit+0xf4/0x448 [ 13.570182] __dev_queue_xmit+0x808/0xbe0 [ 13.574200] dev_queue_xmit+0x24/0x30 [ 13.577869] neigh_resolve_output+0x15c/0x220 [ 13.582237] ip6_finish_output2+0x244/0xb10 [ 13.586430] __ip6_finish_output+0x1dc/0x298 [ 13.590709] ip6_output+0x84/0x358 [ 13.594116] mld_sendpack+0x2bc/0x560 [ 13.597786] mld_ifc_timer_expire+0x210/0x390 [ 13.602153] call_timer_fn+0xcc/0x400 [ 13.605822] run_timer_softirq+0x588/0x6e0 [ 13.609927] __do_softirq+0x118/0x590 [ 13.613597] irq_exit+0x13c/0x148 [ 13.616918] __handle_domain_irq+0x6c/0xc0 [ 13.621023] gic_handle_irq+0x6c/0x160 [ 13.624779] el1_irq+0xbc/0x180 [ 13.627927] cpuidle_enter_state+0xb4/0x4d0 [ 13.632120] cpuidle_enter+0x3c/0x50 [ 13.635703] call_cpuidle+0x44/0x78 [ 13.639199] do_idle+0x228/0x2c8 [ 13.642433] cpu_startup_entry+0x2c/0x48 [ 13.646363] rest_init+0x1ac/0x280 [ 13.649773] arch_call_rest_init+0x14/0x1c [ 13.653878] start_kernel+0x490/0x4bc Lockdep keys themselves were added in commit ab92d68 ("net: core: add generic lockdep keys"), and it's very likely that this splat existed since then, but I have no real way to check, since this stacked platform wasn't supported by mainline back then. >From Taehee's own words: This patch was considered that all stackable devices have LLTX flag. But the dsa doesn't have LLTX, so this splat happened. After this patch, dsa shares the same lockdep class key. On the nested dsa interface architecture, which you illustrated, the same lockdep class key will be used in __dev_queue_xmit() because dsa doesn't have LLTX. So that lockdep detects deadlock because the same lockdep class key is used recursively although actually the different locks are used. There are some ways to fix this problem. 1. using NETIF_F_LLTX flag. If possible, using the LLTX flag is a very clear way for it. But I'm so sorry I don't know whether the dsa could have LLTX or not. 2. using dynamic lockdep again. It means that each interface uses a separate lockdep class key. So, lockdep will not detect recursive locking. But this way has a problem that it could consume lockdep class key too many. Currently, lockdep can have 8192 lockdep class keys. - you can see this number with the following command. cat /proc/lockdep_stats lock-classes: 1251 [max: 8192] ... The [max: 8192] means that the maximum number of lockdep class keys. If too many lockdep class keys are registered, lockdep stops to work. So, using a dynamic(separated) lockdep class key should be considered carefully. In addition, updating lockdep class key routine might have to be existing. (lockdep_register_key(), lockdep_set_class(), lockdep_unregister_key()) 3. Using lockdep subclass. A lockdep class key could have 8 subclasses. The different subclass is considered different locks by lockdep infrastructure. But "lock-classes" is not counted by subclasses. So, it could avoid stopping lockdep infrastructure by an overflow of lockdep class keys. This approach should also have an updating lockdep class key routine. (lockdep_set_subclass()) 4. Using nonvalidate lockdep class key. The lockdep infrastructure supports nonvalidate lockdep class key type. It means this lockdep is not validated by lockdep infrastructure. So, the splat will not happen but lockdep couldn't detect real deadlock case because lockdep really doesn't validate it. I think this should be used for really special cases. (lockdep_set_novalidate_class()) Further discussion here: https://patchwork.ozlabs.org/project/netdev/patch/[email protected]/ There appears to be no negative side-effect to declaring lockless TX for the DSA virtual interfaces, which means they handle their own locking. So that's what we do to make the splat go away. Patch tested in a wide variety of cases: unicast, multicast, PTP, etc. Fixes: ab92d68 ("net: core: add generic lockdep keys") Suggested-by: Taehee Yoo <[email protected]> Signed-off-by: Vladimir Oltean <[email protected]> Reviewed-by: Florian Fainelli <[email protected]> Signed-off-by: David S. Miller <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
[ Upstream commit bad8e64 ] On commit 6ac9311 ("blktrace: use existing disk debugfs directory") merged on v4.12 Omar fixed the original blktrace code for request-based drivers (multiqueue). This however left in place a possible crash, if you happen to abuse blktrace while racing to remove / add a device. We used to use asynchronous removal of the request_queue, and with that the issue was easier to reproduce. Now that we have reverted to synchronous removal of the request_queue, the issue is still possible to reproduce, its however just a bit more difficult. We essentially run two instances of break-blktrace which add/remove a loop device, and setup a blktrace and just never tear the blktrace down. We do this twice in parallel. This is easily reproduced with the script run_0004.sh from break-blktrace [0]. We can end up with two types of panics each reflecting where we race, one a failed blktrace setup: [ 252.426751] debugfs: Directory 'loop0' with parent 'block' already present! [ 252.432265] BUG: kernel NULL pointer dereference, address: 00000000000000a0 [ 252.436592] #PF: supervisor write access in kernel mode [ 252.439822] #PF: error_code(0x0002) - not-present page [ 252.442967] PGD 0 P4D 0 [ 252.444656] Oops: 0002 [#1] SMP NOPTI [ 252.446972] CPU: 10 PID: 1153 Comm: break-blktrace Tainted: G E 5.7.0-rc2-next-20200420+ #164 [ 252.452673] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1 04/01/2014 [ 252.456343] RIP: 0010:down_write+0x15/0x40 [ 252.458146] Code: eb ca e8 ae 22 8d ff cc cc cc cc cc cc cc cc cc cc cc cc cc cc 0f 1f 44 00 00 55 48 89 fd e8 52 db ff ff 31 c0 ba 01 00 00 00 <f0> 48 0f b1 55 00 75 0f 48 8b 04 25 c0 8b 01 00 48 89 45 08 5d [ 252.463638] RSP: 0018:ffffa626415abcc8 EFLAGS: 00010246 [ 252.464950] RAX: 0000000000000000 RBX: ffff958c25f0f5c0 RCX: ffffff8100000000 [ 252.466727] RDX: 0000000000000001 RSI: ffffff8100000000 RDI: 00000000000000a0 [ 252.468482] RBP: 00000000000000a0 R08: 0000000000000000 R09: 0000000000000001 [ 252.470014] R10: 0000000000000000 R11: ffff958d1f9227ff R12: 0000000000000000 [ 252.471473] R13: ffff958c25ea5380 R14: ffffffff8cce15f1 R15: 00000000000000a0 [ 252.473346] FS: 00007f2e69dee540(0000) GS:ffff958c2fc80000(0000) knlGS:0000000000000000 [ 252.475225] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 252.476267] CR2: 00000000000000a0 CR3: 0000000427d10004 CR4: 0000000000360ee0 [ 252.477526] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 252.478776] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 252.479866] Call Trace: [ 252.480322] simple_recursive_removal+0x4e/0x2e0 [ 252.481078] ? debugfs_remove+0x60/0x60 [ 252.481725] ? relay_destroy_buf+0x77/0xb0 [ 252.482662] debugfs_remove+0x40/0x60 [ 252.483518] blk_remove_buf_file_callback+0x5/0x10 [ 252.484328] relay_close_buf+0x2e/0x60 [ 252.484930] relay_open+0x1ce/0x2c0 [ 252.485520] do_blk_trace_setup+0x14f/0x2b0 [ 252.486187] __blk_trace_setup+0x54/0xb0 [ 252.486803] blk_trace_ioctl+0x90/0x140 [ 252.487423] ? do_sys_openat2+0x1ab/0x2d0 [ 252.488053] blkdev_ioctl+0x4d/0x260 [ 252.488636] block_ioctl+0x39/0x40 [ 252.489139] ksys_ioctl+0x87/0xc0 [ 252.489675] __x64_sys_ioctl+0x16/0x20 [ 252.490380] do_syscall_64+0x52/0x180 [ 252.491032] entry_SYSCALL_64_after_hwframe+0x44/0xa9 And the other on the device removal: [ 128.528940] debugfs: Directory 'loop0' with parent 'block' already present! [ 128.615325] BUG: kernel NULL pointer dereference, address: 00000000000000a0 [ 128.619537] #PF: supervisor write access in kernel mode [ 128.622700] #PF: error_code(0x0002) - not-present page [ 128.625842] PGD 0 P4D 0 [ 128.627585] Oops: 0002 [#1] SMP NOPTI [ 128.629871] CPU: 12 PID: 544 Comm: break-blktrace Tainted: G E 5.7.0-rc2-next-20200420+ #164 [ 128.635595] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1 04/01/2014 [ 128.640471] RIP: 0010:down_write+0x15/0x40 [ 128.643041] Code: eb ca e8 ae 22 8d ff cc cc cc cc cc cc cc cc cc cc cc cc cc cc 0f 1f 44 00 00 55 48 89 fd e8 52 db ff ff 31 c0 ba 01 00 00 00 <f0> 48 0f b1 55 00 75 0f 65 48 8b 04 25 c0 8b 01 00 48 89 45 08 5d [ 128.650180] RSP: 0018:ffffa9c3c05ebd78 EFLAGS: 00010246 [ 128.651820] RAX: 0000000000000000 RBX: ffff8ae9a6370240 RCX: ffffff8100000000 [ 128.653942] RDX: 0000000000000001 RSI: ffffff8100000000 RDI: 00000000000000a0 [ 128.655720] RBP: 00000000000000a0 R08: 0000000000000002 R09: ffff8ae9afd2d3d0 [ 128.657400] R10: 0000000000000056 R11: 0000000000000000 R12: 0000000000000000 [ 128.659099] R13: 0000000000000000 R14: 0000000000000003 R15: 00000000000000a0 [ 128.660500] FS: 00007febfd995540(0000) GS:ffff8ae9afd00000(0000) knlGS:0000000000000000 [ 128.662204] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 128.663426] CR2: 00000000000000a0 CR3: 0000000420042003 CR4: 0000000000360ee0 [ 128.664776] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 128.666022] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 128.667282] Call Trace: [ 128.667801] simple_recursive_removal+0x4e/0x2e0 [ 128.668663] ? debugfs_remove+0x60/0x60 [ 128.669368] debugfs_remove+0x40/0x60 [ 128.669985] blk_trace_free+0xd/0x50 [ 128.670593] __blk_trace_remove+0x27/0x40 [ 128.671274] blk_trace_shutdown+0x30/0x40 [ 128.671935] blk_release_queue+0x95/0xf0 [ 128.672589] kobject_put+0xa5/0x1b0 [ 128.673188] disk_release+0xa2/0xc0 [ 128.673786] device_release+0x28/0x80 [ 128.674376] kobject_put+0xa5/0x1b0 [ 128.674915] loop_remove+0x39/0x50 [loop] [ 128.675511] loop_control_ioctl+0x113/0x130 [loop] [ 128.676199] ksys_ioctl+0x87/0xc0 [ 128.676708] __x64_sys_ioctl+0x16/0x20 [ 128.677274] do_syscall_64+0x52/0x180 [ 128.677823] entry_SYSCALL_64_after_hwframe+0x44/0xa9 The common theme here is: debugfs: Directory 'loop0' with parent 'block' already present This crash happens because of how blktrace uses the debugfs directory where it places its files. Upon init we always create the same directory which would be needed by blktrace but we only do this for make_request drivers (multiqueue) block drivers. When you race a removal of these devices with a blktrace setup you end up in a situation where the make_request recursive debugfs removal will sweep away the blktrace files and then later blktrace will also try to remove individual dentries which are already NULL. The inverse is also possible and hence the two types of use after frees. We don't create the block debugfs directory on init for these types of block devices: * request-based block driver block devices * every possible partition * scsi-generic And so, this race should in theory only be possible with make_request drivers. We can fix the UAF by simply re-using the debugfs directory for make_request drivers (multiqueue) and only creating the ephemeral directory for the other type of block devices. The new clarifications on relying on the q->blk_trace_mutex *and* also checking for q->blk_trace *prior* to processing a blktrace ensures the debugfs directories are only created if no possible directory name clashes are possible. This goes tested with: o nvme partitions o ISCSI with tgt, and blktracing against scsi-generic with: o block o tape o cdrom o media changer o blktests This patch is part of the work which disputes the severity of CVE-2019-19770 which shows this issue is not a core debugfs issue, but a misuse of debugfs within blktace. Fixes: 6ac9311 ("blktrace: use existing disk debugfs directory") Reported-by: [email protected] Signed-off-by: Luis Chamberlain <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Cc: Bart Van Assche <[email protected]> Cc: Omar Sandoval <[email protected]> Cc: Hannes Reinecke <[email protected]> Cc: Nicolai Stange <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Cc: Michal Hocko <[email protected]> Cc: "Martin K. Petersen" <[email protected]> Cc: "James E.J. Bottomley" <[email protected]> Cc: yu kuai <[email protected]> Signed-off-by: Jens Axboe <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
[ Upstream commit e0f1a30 ] When, at probe time, an SCMI communication failure inhibits the capacity to query power domains states, such domains should be skipped. Registering partially initialized SCMI power domains with genpd will causes kernel panic. arm-scmi timed out in resp(caller: scmi_power_state_get+0xa4/0xd0) scmi-power-domain scmi_dev.2: failed to get state for domain 9 Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000 Mem abort info: ESR = 0x96000006 EC = 0x25: DABT (current EL), IL = 32 bits SET = 0, FnV = 0 EA = 0, S1PTW = 0 Data abort info: ISV = 0, ISS = 0x00000006 CM = 0, WnR = 0 user pgtable: 4k pages, 48-bit VAs, pgdp=00000009f3691000 [0000000000000000] pgd=00000009f1ca0003, p4d=00000009f1ca0003, pud=00000009f35ea003, pmd=0000000000000000 Internal error: Oops: 96000006 [#1] PREEMPT SMP CPU: 2 PID: 381 Comm: bash Not tainted 5.8.0-rc1-00011-gebd118c2cca8 #2 Hardware name: ARM LTD ARM Juno Development Platform/ARM Juno Development Platform, BIOS EDK II Jan 3 2020 Internal error: Oops: 96000006 [#1] PREEMPT SMP pstate: 80000005 (Nzcv daif -PAN -UAO BTYPE=--) pc : of_genpd_add_provider_onecell+0x98/0x1f8 lr : of_genpd_add_provider_onecell+0x48/0x1f8 Call trace: of_genpd_add_provider_onecell+0x98/0x1f8 scmi_pm_domain_probe+0x174/0x1e8 scmi_dev_probe+0x90/0xe0 really_probe+0xe4/0x448 driver_probe_device+0xfc/0x168 device_driver_attach+0x7c/0x88 bind_store+0xe8/0x128 drv_attr_store+0x2c/0x40 sysfs_kf_write+0x4c/0x60 kernfs_fop_write+0x114/0x230 __vfs_write+0x24/0x50 vfs_write+0xbc/0x1e0 ksys_write+0x70/0xf8 __arm64_sys_write+0x24/0x30 el0_svc_common.constprop.3+0x94/0x160 do_el0_svc+0x2c/0x98 el0_sync_handler+0x148/0x1a8 el0_sync+0x158/0x180 Do not register any power domain that failed to be queried with genpd. Fixes: 898216c ("firmware: arm_scmi: add device power domain support using genpd") Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Cristian Marussi <[email protected]> Signed-off-by: Sudeep Holla <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
[ Upstream commit 60f80d6 ] reproduction steps: ``` node1 # mdadm -C /dev/md0 -b clustered -e 1.2 -n 2 -l mirror /dev/sda /dev/sdb node2 # mdadm -A /dev/md0 /dev/sda /dev/sdb node1 # mdadm -G /dev/md0 -b none mdadm: failed to remove clustered bitmap. node1 # mdadm -S --scan ^C <==== mdadm hung & kernel crash ``` kernel stack: ``` [ 335.230657] general protection fault: 0000 [#1] SMP NOPTI [...] [ 335.230848] Call Trace: [ 335.230873] ? unlock_all_bitmaps+0x5/0x70 [md_cluster] [ 335.230886] unlock_all_bitmaps+0x3d/0x70 [md_cluster] [ 335.230899] leave+0x10f/0x190 [md_cluster] [ 335.230932] ? md_super_wait+0x93/0xa0 [md_mod] [ 335.230947] ? leave+0x5/0x190 [md_cluster] [ 335.230973] md_cluster_stop+0x1a/0x30 [md_mod] [ 335.230999] md_bitmap_free+0x142/0x150 [md_mod] [ 335.231013] ? _cond_resched+0x15/0x40 [ 335.231025] ? mutex_lock+0xe/0x30 [ 335.231056] __md_stop+0x1c/0xa0 [md_mod] [ 335.231083] do_md_stop+0x160/0x580 [md_mod] [ 335.231119] ? 0xffffffffc05fb078 [ 335.231148] md_ioctl+0xa04/0x1930 [md_mod] [ 335.231165] ? filename_lookup+0xf2/0x190 [ 335.231179] blkdev_ioctl+0x93c/0xa10 [ 335.231205] ? _cond_resched+0x15/0x40 [ 335.231214] ? __check_object_size+0xd4/0x1a0 [ 335.231224] block_ioctl+0x39/0x40 [ 335.231243] do_vfs_ioctl+0xa0/0x680 [ 335.231253] ksys_ioctl+0x70/0x80 [ 335.231261] __x64_sys_ioctl+0x16/0x20 [ 335.231271] do_syscall_64+0x65/0x1f0 [ 335.231278] entry_SYSCALL_64_after_hwframe+0x44/0xa9 ``` Signed-off-by: Zhao Heming <[email protected]> Signed-off-by: Song Liu <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
[ Upstream commit ab0db04 ] When running with -o enospc_debug you can get the following splat if one of the dump_space_info's trip ====================================================== WARNING: possible circular locking dependency detected 5.8.0-rc5+ #20 Tainted: G OE ------------------------------------------------------ dd/563090 is trying to acquire lock: ffff9e7dbf4f1e18 (&ctl->tree_lock){+.+.}-{2:2}, at: btrfs_dump_free_space+0x2b/0xa0 [btrfs] but task is already holding lock: ffff9e7e2284d428 (&cache->lock){+.+.}-{2:2}, at: btrfs_dump_space_info+0xaa/0x120 [btrfs] which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #3 (&cache->lock){+.+.}-{2:2}: _raw_spin_lock+0x25/0x30 btrfs_add_reserved_bytes+0x3c/0x3c0 [btrfs] find_free_extent+0x7ef/0x13b0 [btrfs] btrfs_reserve_extent+0x9b/0x180 [btrfs] btrfs_alloc_tree_block+0xc1/0x340 [btrfs] alloc_tree_block_no_bg_flush+0x4a/0x60 [btrfs] __btrfs_cow_block+0x122/0x530 [btrfs] btrfs_cow_block+0x106/0x210 [btrfs] commit_cowonly_roots+0x55/0x300 [btrfs] btrfs_commit_transaction+0x4ed/0xac0 [btrfs] sync_filesystem+0x74/0x90 generic_shutdown_super+0x22/0x100 kill_anon_super+0x14/0x30 btrfs_kill_super+0x12/0x20 [btrfs] deactivate_locked_super+0x36/0x70 cleanup_mnt+0x104/0x160 task_work_run+0x5f/0x90 __prepare_exit_to_usermode+0x1bd/0x1c0 do_syscall_64+0x5e/0xb0 entry_SYSCALL_64_after_hwframe+0x44/0xa9 -> #2 (&space_info->lock){+.+.}-{2:2}: _raw_spin_lock+0x25/0x30 btrfs_block_rsv_release+0x1a6/0x3f0 [btrfs] btrfs_inode_rsv_release+0x4f/0x170 [btrfs] btrfs_clear_delalloc_extent+0x155/0x480 [btrfs] clear_state_bit+0x81/0x1a0 [btrfs] __clear_extent_bit+0x25c/0x5d0 [btrfs] clear_extent_bit+0x15/0x20 [btrfs] btrfs_invalidatepage+0x2b7/0x3c0 [btrfs] truncate_cleanup_page+0x47/0xe0 truncate_inode_pages_range+0x238/0x840 truncate_pagecache+0x44/0x60 btrfs_setattr+0x202/0x5e0 [btrfs] notify_change+0x33b/0x490 do_truncate+0x76/0xd0 path_openat+0x687/0xa10 do_filp_open+0x91/0x100 do_sys_openat2+0x215/0x2d0 do_sys_open+0x44/0x80 do_syscall_64+0x52/0xb0 entry_SYSCALL_64_after_hwframe+0x44/0xa9 -> #1 (&tree->lock#2){+.+.}-{2:2}: _raw_spin_lock+0x25/0x30 find_first_extent_bit+0x32/0x150 [btrfs] write_pinned_extent_entries.isra.0+0xc5/0x100 [btrfs] __btrfs_write_out_cache+0x172/0x480 [btrfs] btrfs_write_out_cache+0x7a/0xf0 [btrfs] btrfs_write_dirty_block_groups+0x286/0x3b0 [btrfs] commit_cowonly_roots+0x245/0x300 [btrfs] btrfs_commit_transaction+0x4ed/0xac0 [btrfs] close_ctree+0xf9/0x2f5 [btrfs] generic_shutdown_super+0x6c/0x100 kill_anon_super+0x14/0x30 btrfs_kill_super+0x12/0x20 [btrfs] deactivate_locked_super+0x36/0x70 cleanup_mnt+0x104/0x160 task_work_run+0x5f/0x90 __prepare_exit_to_usermode+0x1bd/0x1c0 do_syscall_64+0x5e/0xb0 entry_SYSCALL_64_after_hwframe+0x44/0xa9 -> #0 (&ctl->tree_lock){+.+.}-{2:2}: __lock_acquire+0x1240/0x2460 lock_acquire+0xab/0x360 _raw_spin_lock+0x25/0x30 btrfs_dump_free_space+0x2b/0xa0 [btrfs] btrfs_dump_space_info+0xf4/0x120 [btrfs] btrfs_reserve_extent+0x176/0x180 [btrfs] __btrfs_prealloc_file_range+0x145/0x550 [btrfs] cache_save_setup+0x28d/0x3b0 [btrfs] btrfs_start_dirty_block_groups+0x1fc/0x4f0 [btrfs] btrfs_commit_transaction+0xcc/0xac0 [btrfs] btrfs_alloc_data_chunk_ondemand+0x162/0x4c0 [btrfs] btrfs_check_data_free_space+0x4c/0xa0 [btrfs] btrfs_buffered_write.isra.0+0x19b/0x740 [btrfs] btrfs_file_write_iter+0x3cf/0x610 [btrfs] new_sync_write+0x11e/0x1b0 vfs_write+0x1c9/0x200 ksys_write+0x68/0xe0 do_syscall_64+0x52/0xb0 entry_SYSCALL_64_after_hwframe+0x44/0xa9 other info that might help us debug this: Chain exists of: &ctl->tree_lock --> &space_info->lock --> &cache->lock Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(&cache->lock); lock(&space_info->lock); lock(&cache->lock); lock(&ctl->tree_lock); *** DEADLOCK *** 6 locks held by dd/563090: #0: ffff9e7e21d18448 (sb_writers#14){.+.+}-{0:0}, at: vfs_write+0x195/0x200 #1: ffff9e7dd0410ed8 (&sb->s_type->i_mutex_key#19){++++}-{3:3}, at: btrfs_file_write_iter+0x86/0x610 [btrfs] #2: ffff9e7e21d18638 (sb_internal#2){.+.+}-{0:0}, at: start_transaction+0x40b/0x5b0 [btrfs] #3: ffff9e7e1f05d688 (&cur_trans->cache_write_mutex){+.+.}-{3:3}, at: btrfs_start_dirty_block_groups+0x158/0x4f0 [btrfs] #4: ffff9e7e2284ddb8 (&space_info->groups_sem){++++}-{3:3}, at: btrfs_dump_space_info+0x69/0x120 [btrfs] #5: ffff9e7e2284d428 (&cache->lock){+.+.}-{2:2}, at: btrfs_dump_space_info+0xaa/0x120 [btrfs] stack backtrace: CPU: 3 PID: 563090 Comm: dd Tainted: G OE 5.8.0-rc5+ #20 Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./890FX Deluxe5, BIOS P1.40 05/03/2011 Call Trace: dump_stack+0x96/0xd0 check_noncircular+0x162/0x180 __lock_acquire+0x1240/0x2460 ? wake_up_klogd.part.0+0x30/0x40 lock_acquire+0xab/0x360 ? btrfs_dump_free_space+0x2b/0xa0 [btrfs] _raw_spin_lock+0x25/0x30 ? btrfs_dump_free_space+0x2b/0xa0 [btrfs] btrfs_dump_free_space+0x2b/0xa0 [btrfs] btrfs_dump_space_info+0xf4/0x120 [btrfs] btrfs_reserve_extent+0x176/0x180 [btrfs] __btrfs_prealloc_file_range+0x145/0x550 [btrfs] ? btrfs_qgroup_reserve_data+0x1d/0x60 [btrfs] cache_save_setup+0x28d/0x3b0 [btrfs] btrfs_start_dirty_block_groups+0x1fc/0x4f0 [btrfs] btrfs_commit_transaction+0xcc/0xac0 [btrfs] ? start_transaction+0xe0/0x5b0 [btrfs] btrfs_alloc_data_chunk_ondemand+0x162/0x4c0 [btrfs] btrfs_check_data_free_space+0x4c/0xa0 [btrfs] btrfs_buffered_write.isra.0+0x19b/0x740 [btrfs] ? ktime_get_coarse_real_ts64+0xa8/0xd0 ? trace_hardirqs_on+0x1c/0xe0 btrfs_file_write_iter+0x3cf/0x610 [btrfs] new_sync_write+0x11e/0x1b0 vfs_write+0x1c9/0x200 ksys_write+0x68/0xe0 do_syscall_64+0x52/0xb0 entry_SYSCALL_64_after_hwframe+0x44/0xa9 This is because we're holding the block_group->lock while trying to dump the free space cache. However we don't need this lock, we just need it to read the values for the printk, so move the free space cache dumping outside of the block group lock. Signed-off-by: Josef Bacik <[email protected]> Reviewed-by: David Sterba <[email protected]> Signed-off-by: David Sterba <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
[ Upstream commit 3cbdc8d ] Adding an msm_gem_object object to the inactive_list before completing its initialization is a bad idea because shrinker may pick it up from the inactive_list. Fix this by making sure that the initialization is complete before moving the msm_obj object to the inactive list. This patch fixes the below error: [10027.553044] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000068 [10027.573305] Mem abort info: [10027.590160] ESR = 0x96000006 [10027.597905] EC = 0x25: DABT (current EL), IL = 32 bits [10027.614430] SET = 0, FnV = 0 [10027.624427] EA = 0, S1PTW = 0 [10027.632722] Data abort info: [10027.638039] ISV = 0, ISS = 0x00000006 [10027.647459] CM = 0, WnR = 0 [10027.654345] user pgtable: 4k pages, 39-bit VAs, pgdp=00000001e3a6a000 [10027.672681] [0000000000000068] pgd=0000000198c31003, pud=0000000198c31003, pmd=0000000000000000 [10027.693900] Internal error: Oops: 96000006 [#1] PREEMPT SMP [10027.738261] CPU: 3 PID: 214 Comm: kswapd0 Tainted: G S 5.4.40 #1 [10027.745766] Hardware name: Qualcomm Technologies, Inc. SC7180 IDP (DT) [10027.752472] pstate: 80c00009 (Nzcv daif +PAN +UAO) [10027.757409] pc : mutex_is_locked+0x14/0x2c [10027.761626] lr : msm_gem_shrinker_count+0x70/0xec [10027.766454] sp : ffffffc011323ad0 [10027.769867] x29: ffffffc011323ad0 x28: ffffffe677e4b878 [10027.775324] x27: 0000000000000cc0 x26: 0000000000000000 [10027.780783] x25: ffffff817114a708 x24: 0000000000000008 [10027.786242] x23: ffffff8023ab7170 x22: 0000000000000001 [10027.791701] x21: ffffff817114a080 x20: 0000000000000119 [10027.797160] x19: 0000000000000068 x18: 00000000000003bc [10027.802621] x17: 0000000004a34210 x16: 00000000000000c0 [10027.808083] x15: 0000000000000000 x14: 0000000000000000 [10027.813542] x13: ffffffe677e0a3c0 x12: 0000000000000000 [10027.819000] x11: 0000000000000000 x10: ffffff8174b94340 [10027.824461] x9 : 0000000000000000 x8 : 0000000000000000 [10027.829919] x7 : 00000000000001fc x6 : ffffffc011323c88 [10027.835373] x5 : 0000000000000001 x4 : ffffffc011323d80 [10027.840832] x3 : ffffffff0477b348 x2 : 0000000000000000 [10027.846290] x1 : ffffffc011323b68 x0 : 0000000000000068 [10027.851748] Call trace: [10027.854264] mutex_is_locked+0x14/0x2c [10027.858121] msm_gem_shrinker_count+0x70/0xec [10027.862603] shrink_slab+0xc0/0x4b4 [10027.866187] shrink_node+0x4a8/0x818 [10027.869860] kswapd+0x624/0x890 [10027.873097] kthread+0x11c/0x12c [10027.876424] ret_from_fork+0x10/0x18 [10027.880102] Code: f9000bf3 910003fd aa0003f3 d503201f (f9400268) [10027.886362] ---[ end trace df5849a1a3543251 ]--- [10027.891518] Kernel panic - not syncing: Fatal exception Signed-off-by: Akhil P Oommen <[email protected]> Signed-off-by: Rob Clark <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
[ Upstream commit 6eeb997 ] This driver may take a regular spinlock when a raw spinlock (irq_desc->lock) is already taken which results in the following lockdep splat: ============================= [ BUG: Invalid wait context ] 5.7.0-rc7 #1 Not tainted ----------------------------- swapper/0/0 is trying to lock: ffffff800303b798 (&chip_data->lock){....}-{3:3}, at: mtk_sysirq_set_type+0x48/0xc0 other info that might help us debug this: context-{5:5} 2 locks held by swapper/0/0: #0: ffffff800302ee68 (&desc->request_mutex){....}-{4:4}, at: __setup_irq+0xc4/0x8a0 #1: ffffff800302ecf0 (&irq_desc_lock_class){....}-{2:2}, at: __setup_irq+0xe4/0x8a0 stack backtrace: CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.7.0-rc7 #1 Hardware name: Pumpkin MT8516 (DT) Call trace: dump_backtrace+0x0/0x180 show_stack+0x14/0x20 dump_stack+0xd0/0x118 __lock_acquire+0x8c8/0x2270 lock_acquire+0xf8/0x470 _raw_spin_lock_irqsave+0x50/0x78 mtk_sysirq_set_type+0x48/0xc0 __irq_set_trigger+0x58/0x170 __setup_irq+0x420/0x8a0 request_threaded_irq+0xd8/0x190 timer_of_init+0x1e8/0x2c4 mtk_gpt_init+0x5c/0x1dc timer_probe+0x74/0xf4 time_init+0x14/0x44 start_kernel+0x394/0x4f0 Replace the spinlock_t with raw_spinlock_t to avoid this warning. Signed-off-by: Bartosz Golaszewski <[email protected]> Signed-off-by: Marc Zyngier <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Sasha Levin <[email protected]>
[ Upstream commit 0a3b3c2 ] A large process running on a heavily loaded system can encounter the following RCU CPU stall warning: rcu: INFO: rcu_sched self-detected stall on CPU rcu: 3-....: (20998 ticks this GP) idle=4ea/1/0x4000000000000002 softirq=556558/556558 fqs=5190 (t=21013 jiffies g=1005461 q=132576) NMI backtrace for cpu 3 CPU: 3 PID: 501900 Comm: aio-free-ring-w Kdump: loaded Not tainted 5.2.9-108_fbk12_rc3_3858_gb83b75af7909 #1 Hardware name: Wiwynn HoneyBadger/PantherPlus, BIOS HBM6.71 02/03/2016 Call Trace: <IRQ> dump_stack+0x46/0x60 nmi_cpu_backtrace.cold.3+0x13/0x50 ? lapic_can_unplug_cpu.cold.27+0x34/0x34 nmi_trigger_cpumask_backtrace+0xba/0xca rcu_dump_cpu_stacks+0x99/0xc7 rcu_sched_clock_irq.cold.87+0x1aa/0x397 ? tick_sched_do_timer+0x60/0x60 update_process_times+0x28/0x60 tick_sched_timer+0x37/0x70 __hrtimer_run_queues+0xfe/0x270 hrtimer_interrupt+0xf4/0x210 smp_apic_timer_interrupt+0x5e/0x120 apic_timer_interrupt+0xf/0x20 </IRQ> RIP: 0010:kmem_cache_free+0x223/0x300 Code: 88 00 00 00 0f 85 ca 00 00 00 41 8b 55 18 31 f6 f7 da 41 f6 45 0a 02 40 0f 94 c6 83 c6 05 9c 41 5e fa e8 a0 a7 01 00 41 56 9d <49> 8b 47 08 a8 03 0f 85 87 00 00 00 65 48 ff 08 e9 3d fe ff ff 65 RSP: 0018:ffffc9000e8e3da8 EFLAGS: 00000206 ORIG_RAX: ffffffffffffff13 RAX: 0000000000020000 RBX: ffff88861b9de960 RCX: 0000000000000030 RDX: fffffffffffe41e8 RSI: 000060777fe3a100 RDI: 000000000001be18 RBP: ffffea00186e7780 R08: ffffffffffffffff R09: ffffffffffffffff R10: ffff88861b9dea28 R11: ffff88887ffde000 R12: ffffffff81230a1f R13: ffff888854684dc0 R14: 0000000000000206 R15: ffff8888547dbc00 ? remove_vma+0x4f/0x60 remove_vma+0x4f/0x60 exit_mmap+0xd6/0x160 mmput+0x4a/0x110 do_exit+0x278/0xae0 ? syscall_trace_enter+0x1d3/0x2b0 ? handle_mm_fault+0xaa/0x1c0 do_group_exit+0x3a/0xa0 __x64_sys_exit_group+0x14/0x20 do_syscall_64+0x42/0x100 entry_SYSCALL_64_after_hwframe+0x44/0xa9 And on a PREEMPT=n kernel, the "while (vma)" loop in exit_mmap() can run for a very long time given a large process. This commit therefore adds a cond_resched() to this loop, providing RCU any needed quiescent states. Cc: Andrew Morton <[email protected]> Cc: <[email protected]> Reviewed-by: Shakeel Butt <[email protected]> Reviewed-by: Joel Fernandes (Google) <[email protected]> Signed-off-by: Paul E. McKenney <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
…uest_t.handle [ Upstream commit f8f12bd ] The request_t 'handle' member is 32-bits wide, hence use wrt_reg_dword(). Change the cast in the wrt_reg_byte() call to make it clear that a regular pointer is casted to an __iomem pointer. Note: 'pkt' points to I/O memory for the qlafx00 adapter family and to coherent memory for all other adapter families. This patch fixes the following Coverity complaint: CID 358864 (#1 of 1): Reliance on integer endianness (INCOMPATIBLE_CAST) incompatible_cast: Pointer &pkt->handle points to an object whose effective type is unsigned int (32 bits, unsigned) but is dereferenced as a narrower unsigned short (16 bits, unsigned). This may lead to unexpected results depending on machine endianness. Link: https://lore.kernel.org/r/[email protected] Fixes: 8ae6d9c ("[SCSI] qla2xxx: Enhancements to support ISPFx00.") Cc: Nilesh Javali <[email protected]> Cc: Quinn Tran <[email protected]> Cc: Himanshu Madhani <[email protected]> Cc: Martin Wilck <[email protected]> Cc: Roman Bolshakov <[email protected]> Reviewed-by: Daniel Wagner <[email protected]> Reviewed-by: Himanshu Madhani <[email protected]> Signed-off-by: Bart Van Assche <[email protected]> Signed-off-by: Martin K. Petersen <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
[ Upstream commit 33a06f1 ] When gadget registration fails, one should not call usb_del_gadget_udc(). Ensure this by setting gadget->udc to NULL. Also in case of a failure there is no need to disable low-level hardware, so return immiedetly instead of jumping to error_init label. This fixes the following kernel NULL ptr dereference on gadget failure (can be easily triggered with g_mass_storage without any module parameters): dwc2 12480000.hsotg: dwc2_check_params: Invalid parameter besl=1 dwc2 12480000.hsotg: dwc2_check_params: Invalid parameter g_np_tx_fifo_size=1024 dwc2 12480000.hsotg: EPs: 16, dedicated fifos, 7808 entries in SPRAM Mass Storage Function, version: 2009/09/11 LUN: removable file: (no medium) no file given for LUN0 g_mass_storage 12480000.hsotg: failed to start g_mass_storage: -22 8<--- cut here --- Unable to handle kernel NULL pointer dereference at virtual address 00000104 pgd = (ptrval) [00000104] *pgd=00000000 Internal error: Oops: 805 [#1] PREEMPT SMP ARM Modules linked in: CPU: 0 PID: 12 Comm: kworker/0:1 Not tainted 5.8.0-rc5 #3133 Hardware name: Samsung Exynos (Flattened Device Tree) Workqueue: events deferred_probe_work_func PC is at usb_del_gadget_udc+0x38/0xc4 LR is at __mutex_lock+0x31c/0xb18 ... Process kworker/0:1 (pid: 12, stack limit = 0x(ptrval)) Stack: (0xef121db0 to 0xef122000) ... [<c076bf3c>] (usb_del_gadget_udc) from [<c0726bec>] (dwc2_hsotg_remove+0x10/0x20) [<c0726bec>] (dwc2_hsotg_remove) from [<c0711208>] (dwc2_driver_probe+0x57c/0x69c) [<c0711208>] (dwc2_driver_probe) from [<c06247c0>] (platform_drv_probe+0x6c/0xa4) [<c06247c0>] (platform_drv_probe) from [<c0621df4>] (really_probe+0x200/0x48c) [<c0621df4>] (really_probe) from [<c06221e8>] (driver_probe_device+0x78/0x1fc) [<c06221e8>] (driver_probe_device) from [<c061fcd4>] (bus_for_each_drv+0x74/0xb8) [<c061fcd4>] (bus_for_each_drv) from [<c0621b54>] (__device_attach+0xd4/0x16c) [<c0621b54>] (__device_attach) from [<c0620c98>] (bus_probe_device+0x88/0x90) [<c0620c98>] (bus_probe_device) from [<c06211b0>] (deferred_probe_work_func+0x3c/0xd0) [<c06211b0>] (deferred_probe_work_func) from [<c0149280>] (process_one_work+0x234/0x7dc) [<c0149280>] (process_one_work) from [<c014986c>] (worker_thread+0x44/0x51c) [<c014986c>] (worker_thread) from [<c0150b1c>] (kthread+0x158/0x1a0) [<c0150b1c>] (kthread) from [<c0100114>] (ret_from_fork+0x14/0x20) Exception stack(0xef121fb0 to 0xef121ff8) ... ---[ end trace 9724c2fc7cc9c982 ]--- While fixing this also fix the double call to dwc2_lowlevel_hw_disable() if dr_mode is set to USB_DR_MODE_PERIPHERAL. In such case low-level hardware is already disabled before calling usb_add_gadget_udc(). That function correctly preserves low-level hardware state, there is no need for the second unconditional dwc2_lowlevel_hw_disable() call. Fixes: 207324a ("usb: dwc2: Postponed gadget registration to the udc class driver") Acked-by: Minas Harutyunyan <[email protected]> Signed-off-by: Marek Szyprowski <[email protected]> Signed-off-by: Felipe Balbi <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
[ Upstream commit edd7dd2 ] Booting Linux with a Conner CP3200 drive attached to the MESH SCSI bus results in EH measures and a panic: [ 25.499838] mesh: configured for synchronous 5 MB/s [ 25.787154] mesh: performing initial bus reset... [ 29.867115] scsi host0: MESH [ 29.929527] mesh: target 0 synchronous at 3.6 MB/s [ 29.998763] scsi 0:0:0:0: Direct-Access CONNER CP3200-200mb-3.5 4040 PQ: 0 ANSI: 1 CCS [ 31.989975] sd 0:0:0:0: [sda] 415872 512-byte logical blocks: (213 MB/203 MiB) [ 32.070975] sd 0:0:0:0: [sda] Write Protect is off [ 32.137197] sd 0:0:0:0: [sda] Mode Sense: 5b 00 00 08 [ 32.209661] sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA [ 32.332708] sda: [mac] sda1 sda2 sda3 [ 32.417733] sd 0:0:0:0: [sda] Attached SCSI disk ... snip ... [ 76.687067] mesh_abort((ptrval)) [ 76.743606] mesh: state at (ptrval), regs at (ptrval), dma at (ptrval) [ 76.810798] ct=6000 seq=86 bs=4017 fc= 0 exc= 0 err= 0 im= 7 int= 0 sp=85 [ 76.880720] dma stat=84e0 cmdptr=1f73d000 [ 76.941387] phase=4 msgphase=0 conn_tgt=0 data_ptr=24576 [ 77.005567] dma_st=1 dma_ct=0 n_msgout=0 [ 77.065456] target 0: req=(ptrval) goes_out=0 saved_ptr=0 [ 77.130512] mesh_abort((ptrval)) [ 77.187670] mesh: state at (ptrval), regs at (ptrval), dma at (ptrval) [ 77.255594] ct=6000 seq=86 bs=4017 fc= 0 exc= 0 err= 0 im= 7 int= 0 sp=85 [ 77.325778] dma stat=84e0 cmdptr=1f73d000 [ 77.387239] phase=4 msgphase=0 conn_tgt=0 data_ptr=24576 [ 77.453665] dma_st=1 dma_ct=0 n_msgout=0 [ 77.515900] target 0: req=(ptrval) goes_out=0 saved_ptr=0 [ 77.582902] mesh_host_reset [ 88.187083] Kernel panic - not syncing: mesh: double DMA start ! [ 88.254510] CPU: 0 PID: 358 Comm: scsi_eh_0 Not tainted 5.6.13-pmac #1 [ 88.323302] Call Trace: [ 88.378854] [e16ddc58] [c0027080] panic+0x13c/0x308 (unreliable) [ 88.446221] [e16ddcb8] [c02b2478] mesh_start.part.12+0x130/0x414 [ 88.513298] [e16ddcf8] [c02b2fc8] mesh_queue+0x54/0x70 [ 88.577097] [e16ddd18] [c02a1848] scsi_send_eh_cmnd+0x374/0x384 [ 88.643476] [e16dddc8] [c02a1938] scsi_eh_tur+0x5c/0xb8 [ 88.707878] [e16dddf8] [c02a1ab8] scsi_eh_test_devices+0x124/0x178 [ 88.775663] [e16dde28] [c02a2094] scsi_eh_ready_devs+0x588/0x8a8 [ 88.843124] [e16dde98] [c02a31d8] scsi_error_handler+0x344/0x520 [ 88.910697] [e16ddf08] [c00409c8] kthread+0xe4/0xe8 [ 88.975166] [e16ddf38] [c000f234] ret_from_kernel_thread+0x14/0x1c [ 89.044112] Rebooting in 180 seconds.. In theory, a panic can happen after a bus or host reset with dma_started flag set. Fix this by halting the DMA before reinitializing the host. Don't assume that ms->current_req is set when halt_dma() is invoked as it may not hold for bus or host reset. BTW, this particular Conner drive can be made to work by inhibiting disconnect/reselect with 'mesh.resel_targets=0'. Link: https://lore.kernel.org/r/3952bc691e150a7128b29120999b6092071b039a.1595460351.git.fthain@telegraphics.com.au Fixes: 1da177e ("Linux-2.6.12-rc2") Cc: Paul Mackerras <[email protected]> Reported-and-tested-by: Stan Johnson <[email protected]> Signed-off-by: Finn Thain <[email protected]> Signed-off-by: Martin K. Petersen <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
[ Upstream commit 4e8c36c ] Unregister from suspend notifications and cancel suspend preparations before running hci_dev_do_close. Otherwise, the suspend notifier may race with unregister and cause cmd_timeout even after hdev has been freed. Below is the trace from when this panic was seen: [ 832.578518] Bluetooth: hci_core.c:hci_cmd_timeout() hci0: command 0x0c05 tx timeout [ 832.586200] BUG: kernel NULL pointer dereference, address: 0000000000000000 [ 832.586203] #PF: supervisor read access in kernel mode [ 832.586205] #PF: error_code(0x0000) - not-present page [ 832.586206] PGD 0 P4D 0 [ 832.586210] PM: suspend exit [ 832.608870] Oops: 0000 [#1] PREEMPT SMP NOPTI [ 832.613232] CPU: 3 PID: 10755 Comm: kworker/3:7 Not tainted 5.4.44-04894-g1e9dbb96a161 #1 [ 832.630036] Workqueue: events hci_cmd_timeout [bluetooth] [ 832.630046] RIP: 0010:__queue_work+0xf0/0x374 [ 832.630051] RSP: 0018:ffff9b5285f1fdf8 EFLAGS: 00010046 [ 832.674033] RAX: ffff8a97681bac00 RBX: 0000000000000000 RCX: ffff8a976a000600 [ 832.681162] RDX: 0000000000000000 RSI: 0000000000000009 RDI: ffff8a976a000748 [ 832.688289] RBP: ffff9b5285f1fe38 R08: 0000000000000000 R09: ffff8a97681bac00 [ 832.695418] R10: 0000000000000002 R11: ffff8a976a0006d8 R12: ffff8a9745107600 [ 832.698045] usb 1-6: new full-speed USB device number 119 using xhci_hcd [ 832.702547] R13: ffff8a9673658850 R14: 0000000000000040 R15: 000000000000001e [ 832.702549] FS: 0000000000000000(0000) GS:ffff8a976af80000(0000) knlGS:0000000000000000 [ 832.702550] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 832.702550] CR2: 0000000000000000 CR3: 000000010415a000 CR4: 00000000003406e0 [ 832.702551] Call Trace: [ 832.702558] queue_work_on+0x3f/0x68 [ 832.702562] process_one_work+0x1db/0x396 [ 832.747397] worker_thread+0x216/0x375 [ 832.751147] kthread+0x138/0x140 [ 832.754377] ? pr_cont_work+0x58/0x58 [ 832.758037] ? kthread_blkcg+0x2e/0x2e [ 832.761787] ret_from_fork+0x22/0x40 [ 832.846191] ---[ end trace fa93f466da517212 ]--- Fixes: 9952d90 ("Bluetooth: Handle PM_SUSPEND_PREPARE and PM_POST_SUSPEND") Signed-off-by: Abhishek Pandit-Subedi <[email protected]> Reviewed-by: Miao-chen Chou <[email protected]> Signed-off-by: Marcel Holtmann <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
[ Upstream commit c1055b7 ] A VF's mailbox mutex is not getting initialized by nicvf_probe() until after it is first used. And such usage is resulting in... [ 28.270927] ------------[ cut here ]------------ [ 28.270934] DEBUG_LOCKS_WARN_ON(lock->magic != lock) [ 28.270980] WARNING: CPU: 9 PID: 675 at kernel/locking/mutex.c:938 __mutex_lock+0xdac/0x12f0 [ 28.270985] Modules linked in: ast(+) nicvf(+) i2c_algo_bit drm_vram_helper drm_ttm_helper ttm nicpf(+) drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm ixgbe(+) sg thunder_bgx mdio i2c_thunderx mdio_thunder thunder_xcv mdio_cavium dm_mirror dm_region_hash dm_log dm_mod [ 28.271064] CPU: 9 PID: 675 Comm: systemd-udevd Not tainted 4.18.0+ #1 [ 28.271070] Hardware name: GIGABYTE R120-T34-00/MT30-GS2-00, BIOS F02 08/06/2019 [ 28.271078] pstate: 60000005 (nZCv daif -PAN -UAO) [ 28.271086] pc : __mutex_lock+0xdac/0x12f0 [ 28.271092] lr : __mutex_lock+0xdac/0x12f0 [ 28.271097] sp : ffff800d42146fb0 [ 28.271103] x29: ffff800d42146fb0 x28: 0000000000000000 [ 28.271113] x27: ffff800d24361180 x26: dfff200000000000 [ 28.271122] x25: 0000000000000000 x24: 0000000000000002 [ 28.271132] x23: ffff20001597cc80 x22: ffff2000139e9848 [ 28.271141] x21: 0000000000000000 x20: 1ffff001a8428e0c [ 28.271151] x19: ffff200015d5d000 x18: 1ffff001ae0f2184 [ 28.271160] x17: 0000000000000000 x16: 0000000000000000 [ 28.271170] x15: ffff800d70790c38 x14: ffff20001597c000 [ 28.271179] x13: ffff20001597cc80 x12: ffff040002b2f779 [ 28.271189] x11: 1fffe40002b2f778 x10: ffff040002b2f778 [ 28.271199] x9 : 0000000000000000 x8 : 00000000f1f1f1f1 [ 28.271208] x7 : 00000000f2f2f2f2 x6 : 0000000000000000 [ 28.271217] x5 : 1ffff001ae0f2186 x4 : 1fffe400027eb03c [ 28.271227] x3 : dfff200000000000 x2 : ffff1001a8428dbe [ 28.271237] x1 : c87fdfac7ea11d00 x0 : 0000000000000000 [ 28.271246] Call trace: [ 28.271254] __mutex_lock+0xdac/0x12f0 [ 28.271261] mutex_lock_nested+0x3c/0x50 [ 28.271297] nicvf_send_msg_to_pf+0x40/0x3a0 [nicvf] [ 28.271316] nicvf_register_misc_interrupt+0x20c/0x328 [nicvf] [ 28.271334] nicvf_probe+0x508/0xda0 [nicvf] [ 28.271344] local_pci_probe+0xc4/0x180 [ 28.271352] pci_device_probe+0x3ec/0x528 [ 28.271363] driver_probe_device+0x21c/0xb98 [ 28.271371] device_driver_attach+0xe8/0x120 [ 28.271379] __driver_attach+0xe0/0x2a0 [ 28.271386] bus_for_each_dev+0x118/0x190 [ 28.271394] driver_attach+0x48/0x60 [ 28.271401] bus_add_driver+0x328/0x558 [ 28.271409] driver_register+0x148/0x398 [ 28.271416] __pci_register_driver+0x14c/0x1b0 [ 28.271437] nicvf_init_module+0x54/0x10000 [nicvf] [ 28.271447] do_one_initcall+0x18c/0xc18 [ 28.271457] do_init_module+0x18c/0x618 [ 28.271464] load_module+0x2bc0/0x4088 [ 28.271472] __se_sys_finit_module+0x110/0x188 [ 28.271479] __arm64_sys_finit_module+0x70/0xa0 [ 28.271490] el0_svc_handler+0x15c/0x380 [ 28.271496] el0_svc+0x8/0xc [ 28.271502] irq event stamp: 52649 [ 28.271513] hardirqs last enabled at (52649): [<ffff200011b4d790>] _raw_spin_unlock_irqrestore+0xc0/0xd8 [ 28.271522] hardirqs last disabled at (52648): [<ffff200011b4d3c4>] _raw_spin_lock_irqsave+0x3c/0xf0 [ 28.271530] softirqs last enabled at (52330): [<ffff200010082af4>] __do_softirq+0xacc/0x117c [ 28.271540] softirqs last disabled at (52313): [<ffff20001019b354>] irq_exit+0x3cc/0x500 [ 28.271545] ---[ end trace a9b90324c8a0d4ee ]--- This problem is resolved by moving the call to mutex_init() up earlier in nicvf_probe(). Fixes: 609ea65 ("net: thunderx: add mutex to protect mailbox from concurrent calls for same VF") Signed-off-by: Dean Nelson <[email protected]> Signed-off-by: David S. Miller <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
[ Upstream commit 1980c05 ] syzbot reported this issue where in the vsock_poll() we find the socket state at TCP_ESTABLISHED, but 'transport' is null: general protection fault, probably for non-canonical address 0xdffffc0000000012: 0000 [#1] PREEMPT SMP KASAN KASAN: null-ptr-deref in range [0x0000000000000090-0x0000000000000097] CPU: 0 PID: 8227 Comm: syz-executor.2 Not tainted 5.8.0-rc7-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 RIP: 0010:vsock_poll+0x75a/0x8e0 net/vmw_vsock/af_vsock.c:1038 Call Trace: sock_poll+0x159/0x460 net/socket.c:1266 vfs_poll include/linux/poll.h:90 [inline] do_pollfd fs/select.c:869 [inline] do_poll fs/select.c:917 [inline] do_sys_poll+0x607/0xd40 fs/select.c:1011 __do_sys_poll fs/select.c:1069 [inline] __se_sys_poll fs/select.c:1057 [inline] __x64_sys_poll+0x18c/0x440 fs/select.c:1057 do_syscall_64+0x60/0xe0 arch/x86/entry/common.c:384 entry_SYSCALL_64_after_hwframe+0x44/0xa9 This issue can happen if the TCP_ESTABLISHED state is set after we read the vsk->transport in the vsock_poll(). We could put barriers to synchronize, but this can only happen during connection setup, so we can simply check that 'transport' is valid. Fixes: c0cfa2d ("vsock: add multi-transports support") Reported-and-tested-by: [email protected] Signed-off-by: Stefano Garzarella <[email protected]> Reviewed-by: Jorgen Hansen <[email protected]> Signed-off-by: David S. Miller <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
commit 18c850f upstream. There's long existed a lockdep splat because we open our bdev's under the ->device_list_mutex at mount time, which acquires the bd_mutex. Usually this goes unnoticed, but if you do loopback devices at all suddenly the bd_mutex comes with a whole host of other dependencies, which results in the splat when you mount a btrfs file system. ====================================================== WARNING: possible circular locking dependency detected 5.8.0-0.rc3.1.fc33.x86_64+debug #1 Not tainted ------------------------------------------------------ systemd-journal/509 is trying to acquire lock: ffff970831f84db0 (&fs_info->reloc_mutex){+.+.}-{3:3}, at: btrfs_record_root_in_trans+0x44/0x70 [btrfs] but task is already holding lock: ffff97083144d598 (sb_pagefaults){.+.+}-{0:0}, at: btrfs_page_mkwrite+0x59/0x560 [btrfs] which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #6 (sb_pagefaults){.+.+}-{0:0}: __sb_start_write+0x13e/0x220 btrfs_page_mkwrite+0x59/0x560 [btrfs] do_page_mkwrite+0x4f/0x130 do_wp_page+0x3b0/0x4f0 handle_mm_fault+0xf47/0x1850 do_user_addr_fault+0x1fc/0x4b0 exc_page_fault+0x88/0x300 asm_exc_page_fault+0x1e/0x30 -> #5 (&mm->mmap_lock#2){++++}-{3:3}: __might_fault+0x60/0x80 _copy_from_user+0x20/0xb0 get_sg_io_hdr+0x9a/0xb0 scsi_cmd_ioctl+0x1ea/0x2f0 cdrom_ioctl+0x3c/0x12b4 sr_block_ioctl+0xa4/0xd0 block_ioctl+0x3f/0x50 ksys_ioctl+0x82/0xc0 __x64_sys_ioctl+0x16/0x20 do_syscall_64+0x52/0xb0 entry_SYSCALL_64_after_hwframe+0x44/0xa9 -> #4 (&cd->lock){+.+.}-{3:3}: __mutex_lock+0x7b/0x820 sr_block_open+0xa2/0x180 __blkdev_get+0xdd/0x550 blkdev_get+0x38/0x150 do_dentry_open+0x16b/0x3e0 path_openat+0x3c9/0xa00 do_filp_open+0x75/0x100 do_sys_openat2+0x8a/0x140 __x64_sys_openat+0x46/0x70 do_syscall_64+0x52/0xb0 entry_SYSCALL_64_after_hwframe+0x44/0xa9 -> #3 (&bdev->bd_mutex){+.+.}-{3:3}: __mutex_lock+0x7b/0x820 __blkdev_get+0x6a/0x550 blkdev_get+0x85/0x150 blkdev_get_by_path+0x2c/0x70 btrfs_get_bdev_and_sb+0x1b/0xb0 [btrfs] open_fs_devices+0x88/0x240 [btrfs] btrfs_open_devices+0x92/0xa0 [btrfs] btrfs_mount_root+0x250/0x490 [btrfs] legacy_get_tree+0x30/0x50 vfs_get_tree+0x28/0xc0 vfs_kern_mount.part.0+0x71/0xb0 btrfs_mount+0x119/0x380 [btrfs] legacy_get_tree+0x30/0x50 vfs_get_tree+0x28/0xc0 do_mount+0x8c6/0xca0 __x64_sys_mount+0x8e/0xd0 do_syscall_64+0x52/0xb0 entry_SYSCALL_64_after_hwframe+0x44/0xa9 -> #2 (&fs_devs->device_list_mutex){+.+.}-{3:3}: __mutex_lock+0x7b/0x820 btrfs_run_dev_stats+0x36/0x420 [btrfs] commit_cowonly_roots+0x91/0x2d0 [btrfs] btrfs_commit_transaction+0x4e6/0x9f0 [btrfs] btrfs_sync_file+0x38a/0x480 [btrfs] __x64_sys_fdatasync+0x47/0x80 do_syscall_64+0x52/0xb0 entry_SYSCALL_64_after_hwframe+0x44/0xa9 -> #1 (&fs_info->tree_log_mutex){+.+.}-{3:3}: __mutex_lock+0x7b/0x820 btrfs_commit_transaction+0x48e/0x9f0 [btrfs] btrfs_sync_file+0x38a/0x480 [btrfs] __x64_sys_fdatasync+0x47/0x80 do_syscall_64+0x52/0xb0 entry_SYSCALL_64_after_hwframe+0x44/0xa9 -> #0 (&fs_info->reloc_mutex){+.+.}-{3:3}: __lock_acquire+0x1241/0x20c0 lock_acquire+0xb0/0x400 __mutex_lock+0x7b/0x820 btrfs_record_root_in_trans+0x44/0x70 [btrfs] start_transaction+0xd2/0x500 [btrfs] btrfs_dirty_inode+0x44/0xd0 [btrfs] file_update_time+0xc6/0x120 btrfs_page_mkwrite+0xda/0x560 [btrfs] do_page_mkwrite+0x4f/0x130 do_wp_page+0x3b0/0x4f0 handle_mm_fault+0xf47/0x1850 do_user_addr_fault+0x1fc/0x4b0 exc_page_fault+0x88/0x300 asm_exc_page_fault+0x1e/0x30 other info that might help us debug this: Chain exists of: &fs_info->reloc_mutex --> &mm->mmap_lock#2 --> sb_pagefaults Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(sb_pagefaults); lock(&mm->mmap_lock#2); lock(sb_pagefaults); lock(&fs_info->reloc_mutex); *** DEADLOCK *** 3 locks held by systemd-journal/509: #0: ffff97083bdec8b8 (&mm->mmap_lock#2){++++}-{3:3}, at: do_user_addr_fault+0x12e/0x4b0 #1: ffff97083144d598 (sb_pagefaults){.+.+}-{0:0}, at: btrfs_page_mkwrite+0x59/0x560 [btrfs] #2: ffff97083144d6a8 (sb_internal){.+.+}-{0:0}, at: start_transaction+0x3f8/0x500 [btrfs] stack backtrace: CPU: 0 PID: 509 Comm: systemd-journal Not tainted 5.8.0-0.rc3.1.fc33.x86_64+debug #1 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015 Call Trace: dump_stack+0x92/0xc8 check_noncircular+0x134/0x150 __lock_acquire+0x1241/0x20c0 lock_acquire+0xb0/0x400 ? btrfs_record_root_in_trans+0x44/0x70 [btrfs] ? lock_acquire+0xb0/0x400 ? btrfs_record_root_in_trans+0x44/0x70 [btrfs] __mutex_lock+0x7b/0x820 ? btrfs_record_root_in_trans+0x44/0x70 [btrfs] ? kvm_sched_clock_read+0x14/0x30 ? sched_clock+0x5/0x10 ? sched_clock_cpu+0xc/0xb0 btrfs_record_root_in_trans+0x44/0x70 [btrfs] start_transaction+0xd2/0x500 [btrfs] btrfs_dirty_inode+0x44/0xd0 [btrfs] file_update_time+0xc6/0x120 btrfs_page_mkwrite+0xda/0x560 [btrfs] ? sched_clock+0x5/0x10 do_page_mkwrite+0x4f/0x130 do_wp_page+0x3b0/0x4f0 handle_mm_fault+0xf47/0x1850 do_user_addr_fault+0x1fc/0x4b0 exc_page_fault+0x88/0x300 ? asm_exc_page_fault+0x8/0x30 asm_exc_page_fault+0x1e/0x30 RIP: 0033:0x7fa3972fdbfe Code: Bad RIP value. Fix this by not holding the ->device_list_mutex at this point. The device_list_mutex exists to protect us from modifying the device list while the file system is running. However it can also be modified by doing a scan on a device. But this action is specifically protected by the uuid_mutex, which we are holding here. We cannot race with opening at this point because we have the ->s_mount lock held during the mount. Not having the ->device_list_mutex here is perfectly safe as we're not going to change the devices at this point. CC: [email protected] # 4.19+ Signed-off-by: Josef Bacik <[email protected]> Reviewed-by: David Sterba <[email protected]> [ add some comments ] Signed-off-by: David Sterba <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
commit 01d01ca upstream. We are currently getting this lockdep splat in btrfs/161: ====================================================== WARNING: possible circular locking dependency detected 5.8.0-rc5+ #20 Tainted: G E ------------------------------------------------------ mount/678048 is trying to acquire lock: ffff9b769f15b6e0 (&fs_devs->device_list_mutex){+.+.}-{3:3}, at: clone_fs_devices+0x4d/0x170 [btrfs] but task is already holding lock: ffff9b76abdb08d0 (&fs_info->chunk_mutex){+.+.}-{3:3}, at: btrfs_read_chunk_tree+0x6a/0x800 [btrfs] which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #1 (&fs_info->chunk_mutex){+.+.}-{3:3}: __mutex_lock+0x8b/0x8f0 btrfs_init_new_device+0x2d2/0x1240 [btrfs] btrfs_ioctl+0x1de/0x2d20 [btrfs] ksys_ioctl+0x87/0xc0 __x64_sys_ioctl+0x16/0x20 do_syscall_64+0x52/0xb0 entry_SYSCALL_64_after_hwframe+0x44/0xa9 -> #0 (&fs_devs->device_list_mutex){+.+.}-{3:3}: __lock_acquire+0x1240/0x2460 lock_acquire+0xab/0x360 __mutex_lock+0x8b/0x8f0 clone_fs_devices+0x4d/0x170 [btrfs] btrfs_read_chunk_tree+0x330/0x800 [btrfs] open_ctree+0xb7c/0x18ce [btrfs] btrfs_mount_root.cold+0x13/0xfa [btrfs] legacy_get_tree+0x30/0x50 vfs_get_tree+0x28/0xc0 fc_mount+0xe/0x40 vfs_kern_mount.part.0+0x71/0x90 btrfs_mount+0x13b/0x3e0 [btrfs] legacy_get_tree+0x30/0x50 vfs_get_tree+0x28/0xc0 do_mount+0x7de/0xb30 __x64_sys_mount+0x8e/0xd0 do_syscall_64+0x52/0xb0 entry_SYSCALL_64_after_hwframe+0x44/0xa9 other info that might help us debug this: Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(&fs_info->chunk_mutex); lock(&fs_devs->device_list_mutex); lock(&fs_info->chunk_mutex); lock(&fs_devs->device_list_mutex); *** DEADLOCK *** 3 locks held by mount/678048: #0: ffff9b75ff5fb0e0 (&type->s_umount_key#63/1){+.+.}-{3:3}, at: alloc_super+0xb5/0x380 #1: ffffffffc0c2fbc8 (uuid_mutex){+.+.}-{3:3}, at: btrfs_read_chunk_tree+0x54/0x800 [btrfs] #2: ffff9b76abdb08d0 (&fs_info->chunk_mutex){+.+.}-{3:3}, at: btrfs_read_chunk_tree+0x6a/0x800 [btrfs] stack backtrace: CPU: 2 PID: 678048 Comm: mount Tainted: G E 5.8.0-rc5+ #20 Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./890FX Deluxe5, BIOS P1.40 05/03/2011 Call Trace: dump_stack+0x96/0xd0 check_noncircular+0x162/0x180 __lock_acquire+0x1240/0x2460 ? asm_sysvec_apic_timer_interrupt+0x12/0x20 lock_acquire+0xab/0x360 ? clone_fs_devices+0x4d/0x170 [btrfs] __mutex_lock+0x8b/0x8f0 ? clone_fs_devices+0x4d/0x170 [btrfs] ? rcu_read_lock_sched_held+0x52/0x60 ? cpumask_next+0x16/0x20 ? module_assert_mutex_or_preempt+0x14/0x40 ? __module_address+0x28/0xf0 ? clone_fs_devices+0x4d/0x170 [btrfs] ? static_obj+0x4f/0x60 ? lockdep_init_map_waits+0x43/0x200 ? clone_fs_devices+0x4d/0x170 [btrfs] clone_fs_devices+0x4d/0x170 [btrfs] btrfs_read_chunk_tree+0x330/0x800 [btrfs] open_ctree+0xb7c/0x18ce [btrfs] ? super_setup_bdi_name+0x79/0xd0 btrfs_mount_root.cold+0x13/0xfa [btrfs] ? vfs_parse_fs_string+0x84/0xb0 ? rcu_read_lock_sched_held+0x52/0x60 ? kfree+0x2b5/0x310 legacy_get_tree+0x30/0x50 vfs_get_tree+0x28/0xc0 fc_mount+0xe/0x40 vfs_kern_mount.part.0+0x71/0x90 btrfs_mount+0x13b/0x3e0 [btrfs] ? cred_has_capability+0x7c/0x120 ? rcu_read_lock_sched_held+0x52/0x60 ? legacy_get_tree+0x30/0x50 legacy_get_tree+0x30/0x50 vfs_get_tree+0x28/0xc0 do_mount+0x7de/0xb30 ? memdup_user+0x4e/0x90 __x64_sys_mount+0x8e/0xd0 do_syscall_64+0x52/0xb0 entry_SYSCALL_64_after_hwframe+0x44/0xa9 This is because btrfs_read_chunk_tree() can come upon DEV_EXTENT's and then read the device, which takes the device_list_mutex. The device_list_mutex needs to be taken before the chunk_mutex, so this is a problem. We only really need the chunk mutex around adding the chunk, so move the mutex around read_one_chunk. An argument could be made that we don't even need the chunk_mutex here as it's during mount, and we are protected by various other locks. However we already have special rules for ->device_list_mutex, and I'd rather not have another special case for ->chunk_mutex. CC: [email protected] # 4.19+ Reviewed-by: Anand Jain <[email protected]> Signed-off-by: Josef Bacik <[email protected]> Reviewed-by: David Sterba <[email protected]> Signed-off-by: David Sterba <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
commit a47bd78 upstream. Dave hit this splat during testing btrfs/078: ====================================================== WARNING: possible circular locking dependency detected 5.8.0-rc6-default+ #1191 Not tainted ------------------------------------------------------ kswapd0/75 is trying to acquire lock: ffffa040e9d04ff8 (&delayed_node->mutex){+.+.}-{3:3}, at: __btrfs_release_delayed_node.part.0+0x3f/0x310 [btrfs] but task is already holding lock: ffffffff8b0c8040 (fs_reclaim){+.+.}-{0:0}, at: __fs_reclaim_acquire+0x5/0x30 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #2 (fs_reclaim){+.+.}-{0:0}: __lock_acquire+0x56f/0xaa0 lock_acquire+0xa3/0x440 fs_reclaim_acquire.part.0+0x25/0x30 __kmalloc_track_caller+0x49/0x330 kstrdup+0x2e/0x60 __kernfs_new_node.constprop.0+0x44/0x250 kernfs_new_node+0x25/0x50 kernfs_create_link+0x34/0xa0 sysfs_do_create_link_sd+0x5e/0xd0 btrfs_sysfs_add_devices_dir+0x65/0x100 [btrfs] btrfs_init_new_device+0x44c/0x12b0 [btrfs] btrfs_ioctl+0xc3c/0x25c0 [btrfs] ksys_ioctl+0x68/0xa0 __x64_sys_ioctl+0x16/0x20 do_syscall_64+0x50/0xe0 entry_SYSCALL_64_after_hwframe+0x44/0xa9 -> #1 (&fs_info->chunk_mutex){+.+.}-{3:3}: __lock_acquire+0x56f/0xaa0 lock_acquire+0xa3/0x440 __mutex_lock+0xa0/0xaf0 btrfs_chunk_alloc+0x137/0x3e0 [btrfs] find_free_extent+0xb44/0xfb0 [btrfs] btrfs_reserve_extent+0x9b/0x180 [btrfs] btrfs_alloc_tree_block+0xc1/0x350 [btrfs] alloc_tree_block_no_bg_flush+0x4a/0x60 [btrfs] __btrfs_cow_block+0x143/0x7a0 [btrfs] btrfs_cow_block+0x15f/0x310 [btrfs] push_leaf_right+0x150/0x240 [btrfs] split_leaf+0x3cd/0x6d0 [btrfs] btrfs_search_slot+0xd14/0xf70 [btrfs] btrfs_insert_empty_items+0x64/0xc0 [btrfs] __btrfs_commit_inode_delayed_items+0xb2/0x840 [btrfs] btrfs_async_run_delayed_root+0x10e/0x1d0 [btrfs] btrfs_work_helper+0x2f9/0x650 [btrfs] process_one_work+0x22c/0x600 worker_thread+0x50/0x3b0 kthread+0x137/0x150 ret_from_fork+0x1f/0x30 -> #0 (&delayed_node->mutex){+.+.}-{3:3}: check_prev_add+0x98/0xa20 validate_chain+0xa8c/0x2a00 __lock_acquire+0x56f/0xaa0 lock_acquire+0xa3/0x440 __mutex_lock+0xa0/0xaf0 __btrfs_release_delayed_node.part.0+0x3f/0x310 [btrfs] btrfs_evict_inode+0x3bf/0x560 [btrfs] evict+0xd6/0x1c0 dispose_list+0x48/0x70 prune_icache_sb+0x54/0x80 super_cache_scan+0x121/0x1a0 do_shrink_slab+0x175/0x420 shrink_slab+0xb1/0x2e0 shrink_node+0x192/0x600 balance_pgdat+0x31f/0x750 kswapd+0x206/0x510 kthread+0x137/0x150 ret_from_fork+0x1f/0x30 other info that might help us debug this: Chain exists of: &delayed_node->mutex --> &fs_info->chunk_mutex --> fs_reclaim Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(fs_reclaim); lock(&fs_info->chunk_mutex); lock(fs_reclaim); lock(&delayed_node->mutex); *** DEADLOCK *** 3 locks held by kswapd0/75: #0: ffffffff8b0c8040 (fs_reclaim){+.+.}-{0:0}, at: __fs_reclaim_acquire+0x5/0x30 #1: ffffffff8b0b50b8 (shrinker_rwsem){++++}-{3:3}, at: shrink_slab+0x54/0x2e0 #2: ffffa040e057c0e8 (&type->s_umount_key#26){++++}-{3:3}, at: trylock_super+0x16/0x50 stack backtrace: CPU: 2 PID: 75 Comm: kswapd0 Not tainted 5.8.0-rc6-default+ #1191 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-59-gc9ba527-rebuilt.opensuse.org 04/01/2014 Call Trace: dump_stack+0x78/0xa0 check_noncircular+0x16f/0x190 check_prev_add+0x98/0xa20 validate_chain+0xa8c/0x2a00 __lock_acquire+0x56f/0xaa0 lock_acquire+0xa3/0x440 ? __btrfs_release_delayed_node.part.0+0x3f/0x310 [btrfs] __mutex_lock+0xa0/0xaf0 ? __btrfs_release_delayed_node.part.0+0x3f/0x310 [btrfs] ? __lock_acquire+0x56f/0xaa0 ? __btrfs_release_delayed_node.part.0+0x3f/0x310 [btrfs] ? lock_acquire+0xa3/0x440 ? btrfs_evict_inode+0x138/0x560 [btrfs] ? btrfs_evict_inode+0x2fe/0x560 [btrfs] ? __btrfs_release_delayed_node.part.0+0x3f/0x310 [btrfs] __btrfs_release_delayed_node.part.0+0x3f/0x310 [btrfs] btrfs_evict_inode+0x3bf/0x560 [btrfs] evict+0xd6/0x1c0 dispose_list+0x48/0x70 prune_icache_sb+0x54/0x80 super_cache_scan+0x121/0x1a0 do_shrink_slab+0x175/0x420 shrink_slab+0xb1/0x2e0 shrink_node+0x192/0x600 balance_pgdat+0x31f/0x750 kswapd+0x206/0x510 ? _raw_spin_unlock_irqrestore+0x3e/0x50 ? finish_wait+0x90/0x90 ? balance_pgdat+0x750/0x750 kthread+0x137/0x150 ? kthread_stop+0x2a0/0x2a0 ret_from_fork+0x1f/0x30 This is because we're holding the chunk_mutex while adding this device and adding its sysfs entries. We actually hold different locks in different places when calling this function, the dev_replace semaphore for instance in dev replace, so instead of moving this call around simply wrap it's operations in NOFS. CC: [email protected] # 4.14+ Reported-by: David Sterba <[email protected]> Signed-off-by: Josef Bacik <[email protected]> Reviewed-by: David Sterba <[email protected]> Signed-off-by: David Sterba <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
…ression commit 1e6e238 upstream. [BUG] There is a bug report of NULL pointer dereference caused in compress_file_extent(): Oops: Kernel access of bad area, sig: 11 [#1] LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries Workqueue: btrfs-delalloc btrfs_delalloc_helper [btrfs] NIP [c008000006dd4d34] compress_file_range.constprop.41+0x75c/0x8a0 [btrfs] LR [c008000006dd4d1c] compress_file_range.constprop.41+0x744/0x8a0 [btrfs] Call Trace: [c000000c69093b00] [c008000006dd4d1c] compress_file_range.constprop.41+0x744/0x8a0 [btrfs] (unreliable) [c000000c69093bd0] [c008000006dd4ebc] async_cow_start+0x44/0xa0 [btrfs] [c000000c69093c10] [c008000006e14824] normal_work_helper+0xdc/0x598 [btrfs] [c000000c69093c80] [c0000000001608c0] process_one_work+0x2c0/0x5b0 [c000000c69093d10] [c000000000160c38] worker_thread+0x88/0x660 [c000000c69093db0] [c00000000016b55c] kthread+0x1ac/0x1c0 [c000000c69093e20] [c00000000000b660] ret_from_kernel_thread+0x5c/0x7c ---[ end trace f16954aa20d822f6 ]--- [CAUSE] For the following execution route of compress_file_range(), it's possible to hit NULL pointer dereference: compress_file_extent() |- pages = NULL; |- start = async_chunk->start = 0; |- end = async_chunk = 4095; |- nr_pages = 1; |- inode_need_compress() == false; <<< Possible, see later explanation | Now, we have nr_pages = 1, pages = NULL |- cont: |- ret = cow_file_range_inline(); |- if (ret <= 0) { |- for (i = 0; i < nr_pages; i++) { |- WARN_ON(pages[i]->mapping); <<< Crash To enter above call execution branch, we need the following race: Thread 1 (chattr) | Thread 2 (writeback) --------------------------+------------------------------ | btrfs_run_delalloc_range | |- inode_need_compress = true | |- cow_file_range_async() btrfs_ioctl_set_flag() | |- binode_flags |= | BTRFS_INODE_NOCOMPRESS | | compress_file_range() | |- inode_need_compress = false | |- nr_page = 1 while pages = NULL | | Then hit the crash [FIX] This patch will fix it by checking @pages before doing accessing it. This patch is only designed as a hot fix and easy to backport. More elegant fix may make btrfs only check inode_need_compress() once to avoid such race, but that would be another story. Reported-by: Luciano Chavez <[email protected]> Fixes: 4d3a800 ("btrfs: merge nr_pages input and output parameter in compress_pages") CC: [email protected] # 4.14.x: cecc8d9: btrfs: Move free_pages_out label in inline extent handling branch in compress_file_range CC: [email protected] # 4.14+ Signed-off-by: Qu Wenruo <[email protected]> Signed-off-by: David Sterba <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
commit c92d30e upstream. In commit f3b98e3 ("media: vsp1: Provide support for extended command pools"), the vsp pointer used for referencing the VSP1 device structure from a command pool during vsp1_dl_ext_cmd_pool_destroy() was not populated. Correctly assign the pointer to prevent the following null-pointer-dereference when removing the device: [*] h3ulcb-kf #> echo fea28000.vsp > /sys/bus/platform/devices/fea28000.vsp/driver/unbind Unable to handle kernel NULL pointer dereference at virtual address 0000000000000028 Mem abort info: ESR = 0x96000006 EC = 0x25: DABT (current EL), IL = 32 bits SET = 0, FnV = 0 EA = 0, S1PTW = 0 Data abort info: ISV = 0, ISS = 0x00000006 CM = 0, WnR = 0 user pgtable: 4k pages, 48-bit VAs, pgdp=00000007318be000 [0000000000000028] pgd=00000007333a1003, pud=00000007333a6003, pmd=0000000000000000 Internal error: Oops: 96000006 [#1] PREEMPT SMP Modules linked in: CPU: 1 PID: 486 Comm: sh Not tainted 5.7.0-rc6-arm64-renesas-00118-ge644645abf47 #185 Hardware name: Renesas H3ULCB Kingfisher board based on r8a77951 (DT) pstate: 40000005 (nZcv daif -PAN -UAO) pc : vsp1_dlm_destroy+0xe4/0x11c lr : vsp1_dlm_destroy+0xc8/0x11c sp : ffff800012963b60 x29: ffff800012963b60 x28: ffff0006f83fc440 x27: 0000000000000000 x26: ffff0006f5e13e80 x25: ffff0006f5e13ed0 x24: ffff0006f5e13ed0 x23: ffff0006f5e13ed0 x22: dead000000000122 x21: ffff0006f5e3a080 x20: ffff0006f5df2938 x19: ffff0006f5df2980 x18: 0000000000000003 x17: 0000000000000000 x16: 0000000000000016 x15: 0000000000000003 x14: 00000000000393c0 x13: ffff800011a5ec18 x12: ffff800011d8d000 x11: ffff0006f83fcc68 x10: ffff800011a53d70 x9 : ffff8000111f3000 x8 : 0000000000000000 x7 : 0000000000210d00 x6 : 0000000000000000 x5 : ffff800010872e60 x4 : 0000000000000004 x3 : 0000000078068000 x2 : ffff800012781000 x1 : 0000000000002c00 x0 : 0000000000000000 Call trace: vsp1_dlm_destroy+0xe4/0x11c vsp1_wpf_destroy+0x10/0x20 vsp1_entity_destroy+0x24/0x4c vsp1_destroy_entities+0x54/0x130 vsp1_remove+0x1c/0x40 platform_drv_remove+0x28/0x50 __device_release_driver+0x178/0x220 device_driver_detach+0x44/0xc0 unbind_store+0xe0/0x104 drv_attr_store+0x20/0x30 sysfs_kf_write+0x48/0x70 kernfs_fop_write+0x148/0x230 __vfs_write+0x18/0x40 vfs_write+0xdc/0x1c4 ksys_write+0x68/0xf0 __arm64_sys_write+0x18/0x20 el0_svc_common.constprop.0+0x70/0x170 do_el0_svc+0x20/0x80 el0_sync_handler+0x134/0x1b0 el0_sync+0x140/0x180 Code: b40000c2 f9403a60 d2800084 a9400663 (f9401400) ---[ end trace 3875369841fb288a ]--- Fixes: f3b98e3 ("media: vsp1: Provide support for extended command pools") Cc: [email protected] # v4.19+ Signed-off-by: Eugeniu Rosca <[email protected]> Reviewed-by: Kieran Bingham <[email protected]> Tested-by: Kieran Bingham <[email protected]> Reviewed-by: Laurent Pinchart <[email protected]> Signed-off-by: Hans Verkuil <[email protected]> Signed-off-by: Mauro Carvalho Chehab <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
commit 3a5139f upstream. The routine cma_init_reserved_areas is designed to activate all reserved cma areas. It quits when it first encounters an error. This can leave some areas in a state where they are reserved but not activated. There is no feedback to code which performed the reservation. Attempting to allocate memory from areas in such a state will result in a BUG. Modify cma_init_reserved_areas to always attempt to activate all areas. The called routine, cma_activate_area is responsible for leaving the area in a valid state. No one is making active use of returned error codes, so change the routine to void. How to reproduce: This example uses kernelcore, hugetlb and cma as an easy way to reproduce. However, this is a more general cma issue. Two node x86 VM 16GB total, 8GB per node Kernel command line parameters, kernelcore=4G hugetlb_cma=8G Related boot time messages, hugetlb_cma: reserve 8192 MiB, up to 4096 MiB per node cma: Reserved 4096 MiB at 0x0000000100000000 hugetlb_cma: reserved 4096 MiB on node 0 cma: Reserved 4096 MiB at 0x0000000300000000 hugetlb_cma: reserved 4096 MiB on node 1 cma: CMA area hugetlb could not be activated # echo 8 > /sys/kernel/mm/hugepages/hugepages-1048576kB/nr_hugepages BUG: kernel NULL pointer dereference, address: 0000000000000000 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 0 P4D 0 Oops: 0000 [#1] SMP PTI ... Call Trace: bitmap_find_next_zero_area_off+0x51/0x90 cma_alloc+0x1a5/0x310 alloc_fresh_huge_page+0x78/0x1a0 alloc_pool_huge_page+0x6f/0xf0 set_max_huge_pages+0x10c/0x250 nr_hugepages_store_common+0x92/0x120 ? __kmalloc+0x171/0x270 kernfs_fop_write+0xc1/0x1a0 vfs_write+0xc7/0x1f0 ksys_write+0x5f/0xe0 do_syscall_64+0x4d/0x90 entry_SYSCALL_64_after_hwframe+0x44/0xa9 Fixes: c64be2b ("drivers: add Contiguous Memory Allocator") Signed-off-by: Mike Kravetz <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Reviewed-by: Roman Gushchin <[email protected]> Acked-by: Barry Song <[email protected]> Cc: Marek Szyprowski <[email protected]> Cc: Michal Nazarewicz <[email protected]> Cc: Kyungmin Park <[email protected]> Cc: Joonsoo Kim <[email protected]> Cc: <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
[ Upstream commit cb36e29 ] When watchdog device is being registered, it calls misc_register that makes watchdog available for systemd to open. This is a data race scenario, because when device is open it may still have device struct not initialized - this in turn causes a crash. This patch moves device initialization before misc_register call and it solves the problem printed below. ------------[ cut here ]------------ WARNING: CPU: 3 PID: 1 at lib/kobject.c:612 kobject_get+0x50/0x54 kobject: '(null)' ((ptrval)): is not initialized, yet kobject_get() is being called. Modules linked in: k2_reset_status(O) davinci_wdt(+) sfn_platform_hwbcn(O) fsmddg_sfn(O) clk_misc_mmap(O) clk_sw_bcn(O) fsp_reset(O) cma_mod(O) slave_sup_notif(O) fpga_master(O) latency(O+) evnotify(O) enable_arm_pmu(O) xge(O) rio_mport_cdev br_netfilter bridge stp llc nvrd_checksum(O) ipv6 CPU: 3 PID: 1 Comm: systemd Tainted: G O 4.19.113-g2579778-fsm4_k2 #1 Hardware name: Keystone [<c02126c4>] (unwind_backtrace) from [<c020da94>] (show_stack+0x18/0x1c) [<c020da94>] (show_stack) from [<c07f87d8>] (dump_stack+0xb4/0xe8) [<c07f87d8>] (dump_stack) from [<c0221f70>] (__warn+0xfc/0x114) [<c0221f70>] (__warn) from [<c0221fd8>] (warn_slowpath_fmt+0x50/0x74) [<c0221fd8>] (warn_slowpath_fmt) from [<c07fd394>] (kobject_get+0x50/0x54) [<c07fd394>] (kobject_get) from [<c0602ce8>] (get_device+0x1c/0x24) [<c0602ce8>] (get_device) from [<c06961e0>] (watchdog_open+0x90/0xf0) [<c06961e0>] (watchdog_open) from [<c06001dc>] (misc_open+0x130/0x17c) [<c06001dc>] (misc_open) from [<c0388228>] (chrdev_open+0xec/0x1a8) [<c0388228>] (chrdev_open) from [<c037fa98>] (do_dentry_open+0x204/0x3cc) [<c037fa98>] (do_dentry_open) from [<c0391e2c>] (path_openat+0x330/0x1148) [<c0391e2c>] (path_openat) from [<c0394518>] (do_filp_open+0x78/0xec) [<c0394518>] (do_filp_open) from [<c0381100>] (do_sys_open+0x130/0x1f4) [<c0381100>] (do_sys_open) from [<c0201000>] (ret_fast_syscall+0x0/0x28) Exception stack(0xd2ceffa8 to 0xd2cefff0) ffa0: b6f69968 00000000 ffffff9c b6ebd210 000a0001 00000000 ffc0: b6f69968 00000000 00000000 00000142 fffffffd ffffffff 00b65530 bed7bb78 ffe0: 00000142 bed7ba70 b6cc2503 b6cc41d6 ---[ end trace 7b16eb105513974f ]--- ------------[ cut here ]------------ WARNING: CPU: 3 PID: 1 at lib/refcount.c:153 kobject_get+0x24/0x54 refcount_t: increment on 0; use-after-free. Modules linked in: k2_reset_status(O) davinci_wdt(+) sfn_platform_hwbcn(O) fsmddg_sfn(O) clk_misc_mmap(O) clk_sw_bcn(O) fsp_reset(O) cma_mod(O) slave_sup_notif(O) fpga_master(O) latency(O+) evnotify(O) enable_arm_pmu(O) xge(O) rio_mport_cdev br_netfilter bridge stp llc nvrd_checksum(O) ipv6 CPU: 3 PID: 1 Comm: systemd Tainted: G W O 4.19.113-g2579778-fsm4_k2 #1 Hardware name: Keystone [<c02126c4>] (unwind_backtrace) from [<c020da94>] (show_stack+0x18/0x1c) [<c020da94>] (show_stack) from [<c07f87d8>] (dump_stack+0xb4/0xe8) [<c07f87d8>] (dump_stack) from [<c0221f70>] (__warn+0xfc/0x114) [<c0221f70>] (__warn) from [<c0221fd8>] (warn_slowpath_fmt+0x50/0x74) [<c0221fd8>] (warn_slowpath_fmt) from [<c07fd368>] (kobject_get+0x24/0x54) [<c07fd368>] (kobject_get) from [<c0602ce8>] (get_device+0x1c/0x24) [<c0602ce8>] (get_device) from [<c06961e0>] (watchdog_open+0x90/0xf0) [<c06961e0>] (watchdog_open) from [<c06001dc>] (misc_open+0x130/0x17c) [<c06001dc>] (misc_open) from [<c0388228>] (chrdev_open+0xec/0x1a8) [<c0388228>] (chrdev_open) from [<c037fa98>] (do_dentry_open+0x204/0x3cc) [<c037fa98>] (do_dentry_open) from [<c0391e2c>] (path_openat+0x330/0x1148) [<c0391e2c>] (path_openat) from [<c0394518>] (do_filp_open+0x78/0xec) [<c0394518>] (do_filp_open) from [<c0381100>] (do_sys_open+0x130/0x1f4) [<c0381100>] (do_sys_open) from [<c0201000>] (ret_fast_syscall+0x0/0x28) Exception stack(0xd2ceffa8 to 0xd2cefff0) ffa0: b6f69968 00000000 ffffff9c b6ebd210 000a0001 00000000 ffc0: b6f69968 00000000 00000000 00000142 fffffffd ffffffff 00b65530 bed7bb78 ffe0: 00000142 bed7ba70 b6cc2503 b6cc41d6 ---[ end trace 7b16eb1055139750 ]--- Fixes: 72139df ("watchdog: Fix the race between the release of watchdog_core_data and cdev") Reviewed-by: Guenter Roeck <[email protected]> Reviewed-by: Alexander Sverdlin <[email protected]> Signed-off-by: Krzysztof Sobota <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Guenter Roeck <[email protected]> Signed-off-by: Wim Van Sebroeck <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
… set [ Upstream commit 1101c87 ] We received an error report that perf-record caused 'Segmentation fault' on a newly system (e.g. on the new installed ubuntu). (gdb) backtrace #0 __read_once_size (size=4, res=<synthetic pointer>, p=0x14) at /root/0-jinyao/acme/tools/include/linux/compiler.h:139 #1 atomic_read (v=0x14) at /root/0-jinyao/acme/tools/include/asm/../../arch/x86/include/asm/atomic.h:28 #2 refcount_read (r=0x14) at /root/0-jinyao/acme/tools/include/linux/refcount.h:65 #3 perf_mmap__read_init (map=map@entry=0x0) at mmap.c:177 #4 0x0000561ce5c0de39 in perf_evlist__poll_thread (arg=0x561ce68584d0) at util/sideband_evlist.c:62 #5 0x00007fad78491609 in start_thread (arg=<optimized out>) at pthread_create.c:477 #6 0x00007fad7823c103 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95 The root cause is, evlist__add_bpf_sb_event() just returns 0 if HAVE_LIBBPF_SUPPORT is not defined (inline function path). So it will not create a valid evsel for side-band event. But perf-record still creates BPF side band thread to process the side-band event, then the error happpens. We can reproduce this issue by removing the libelf-dev. e.g. 1. apt-get remove libelf-dev 2. perf record -a -- sleep 1 root@test:~# ./perf record -a -- sleep 1 perf: Segmentation fault Obtained 6 stack frames. ./perf(+0x28eee8) [0x5562d6ef6ee8] /lib/x86_64-linux-gnu/libc.so.6(+0x46210) [0x7fbfdc65f210] ./perf(+0x342e74) [0x5562d6faae74] ./perf(+0x257e39) [0x5562d6ebfe39] /lib/x86_64-linux-gnu/libpthread.so.0(+0x9609) [0x7fbfdc990609] /lib/x86_64-linux-gnu/libc.so.6(clone+0x43) [0x7fbfdc73b103] Segmentation fault (core dumped) To fix this issue, 1. We either install the missing libraries to let HAVE_LIBBPF_SUPPORT be defined. e.g. apt-get install libelf-dev and install other related libraries. 2. Use this patch to skip the side-band event setup if HAVE_LIBBPF_SUPPORT is not set. Committer notes: The side band thread is not used just with BPF, it is also used with --switch-output-event, so narrow the ifdef to the BPF specific part. Fixes: 23cbb41 ("perf record: Move side band evlist setup to separate routine") Signed-off-by: Jin Yao <[email protected]> Acked-by: Jiri Olsa <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Jin Yao <[email protected]> Cc: Kan Liang <[email protected]> Cc: Peter Zijlstra <[email protected]> Link: http://lore.kernel.org/lkml/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
commit 58c1721 upstream. This fixes the following use-after-free problem in case an MST down message times out, while waiting for the response for it: [ 449.022841] [drm:drm_dp_mst_wait_tx_reply.isra.26] timedout msg send 0000000080ba7fa2 2 0 [ 449.022898] ------------[ cut here ]------------ [ 449.022903] list_add corruption. prev->next should be next (ffff88847dae32c0), but was 6b6b6b6b6b6b6b6b. (prev=ffff88847db1c140). [ 449.022931] WARNING: CPU: 2 PID: 22 at lib/list_debug.c:28 __list_add_valid+0x4d/0x70 [ 449.022935] Modules linked in: asix usbnet mii snd_hda_codec_hdmi mei_hdcp i915 x86_pkg_temp_thermal coretemp crct10dif_pclmul crc32_pclmul ghash_clmulni_intel snd_hda_intel snd_intel_dspcfg snd_hda_codec snd_hwdep e1000e snd_hda_core ptp snd_pcm pps_core mei_me mei intel_lpss_pci prime_numbers [ 449.022966] CPU: 2 PID: 22 Comm: kworker/2:0 Not tainted 5.7.0-rc3-CI-Patchwork_17536+ #1 [ 449.022970] Hardware name: Intel Corporation Tiger Lake Client Platform/TigerLake U DDR4 SODIMM RVP, BIOS TGLSFWI1.R00.2457.A16.1912270059 12/27/2019 [ 449.022976] Workqueue: events_long drm_dp_mst_link_probe_work [ 449.022982] RIP: 0010:__list_add_valid+0x4d/0x70 [ 449.022987] Code: c3 48 89 d1 48 c7 c7 f0 e7 32 82 48 89 c2 e8 3a 49 b7 ff 0f 0b 31 c0 c3 48 89 c1 4c 89 c6 48 c7 c7 40 e8 32 82 e8 23 49 b7 ff <0f> 0b 31 c0 c3 48 89 f2 4c 89 c1 48 89 fe 48 c7 c7 90 e8 32 82 e8 [ 449.022991] RSP: 0018:ffffc900001abcb0 EFLAGS: 00010286 [ 449.022995] RAX: 0000000000000000 RBX: ffff88847dae2d58 RCX: 0000000000000001 [ 449.022999] RDX: 0000000080000001 RSI: ffff88849d914978 RDI: 00000000ffffffff [ 449.023002] RBP: ffff88847dae32c0 R08: ffff88849d914978 R09: 0000000000000000 [ 449.023006] R10: ffffc900001abcb8 R11: 0000000000000000 R12: ffff888490d98400 [ 449.023009] R13: ffff88847dae3230 R14: ffff88847db1c140 R15: ffff888490d98540 [ 449.023013] FS: 0000000000000000(0000) GS:ffff88849ff00000(0000) knlGS:0000000000000000 [ 449.023017] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 449.023021] CR2: 00007fb96fafdc63 CR3: 0000000005610004 CR4: 0000000000760ee0 [ 449.023025] PKRU: 55555554 [ 449.023028] Call Trace: [ 449.023034] drm_dp_queue_down_tx+0x59/0x110 [ 449.023041] ? rcu_read_lock_sched_held+0x4d/0x80 [ 449.023050] ? kmem_cache_alloc_trace+0x2a6/0x2d0 [ 449.023060] drm_dp_send_link_address+0x74/0x870 [ 449.023065] ? __slab_free+0x3e1/0x5c0 [ 449.023071] ? lockdep_hardirqs_on+0xe0/0x1c0 [ 449.023078] ? lockdep_hardirqs_on+0xe0/0x1c0 [ 449.023097] drm_dp_check_and_send_link_address+0x9a/0xc0 [ 449.023106] drm_dp_mst_link_probe_work+0x9e/0x160 [ 449.023117] process_one_work+0x268/0x600 [ 449.023124] ? __schedule+0x307/0x8d0 [ 449.023139] worker_thread+0x37/0x380 [ 449.023149] ? process_one_work+0x600/0x600 [ 449.023153] kthread+0x140/0x160 [ 449.023159] ? kthread_park+0x80/0x80 [ 449.023169] ret_from_fork+0x24/0x50 Fixes: d308a88 ("drm/dp_mst: Kill the second sideband tx slot, save the world") Cc: Lyude Paul <[email protected]> Cc: Sean Paul <[email protected]> Cc: Wayne Lin <[email protected]> Cc: <[email protected]> # v3.17+ Signed-off-by: Imre Deak <[email protected]> Reviewed-by: Ville Syrjälä <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected] Signed-off-by: Greg Kroah-Hartman <[email protected]>
Hi,
I compiled your latest code with rumble support using nintendo switch pro controller.
fftest shows the following capabilities
Device /dev/input/event16 opened
Features:
[1B 00 00 00 00 00 00 00 ]
[00 00 ]
Force feedback periodic effects: Square, Triangle, Sine,
[00 00 00 00 00 00 00 00 00 00 03 07 01 00 00 00 ]
Setting master gain to 75% ... OK
Uploading effect #0 (Periodic sinusoidal) ... OK (id 0)
Uploading effect #1 (Constant) ... Error: Invalid argument
Uploading effect #2 (Spring) ... Error: Invalid argument
Uploading effect #3 (Damper) ... Error: Invalid argument
Uploading effect #4 (Strong rumble, with heavy motor) ... OK (id 1)
Uploading effect #5 (Weak rumble, with light motor) ... OK (id 2)
When I tested #0, #4, #5, none of them provided any feedback. Do you know how I can test your code?
The text was updated successfully, but these errors were encountered: