(MP)TFO support #59

matttbe · 2020-07-16T14:54:34Z

There is a section about that in the RFC: https://www.rfc-editor.org/rfc/rfc8684.html#name-tcp-fast-open-and-mptcp

TCP Fast Open (TFO) is an experimental TCP extension, described in [RFC7413], which has been introduced to allow the sending of data one RTT earlier than with regular TCP. This is considered a valuable gain, as very short connections are very common, especially for HTTP request/response schemes. It achieves this by sending the SYN segment together with the application's data and allowing the listener to reply immediately with data after the SYN/ACK. [RFC7413] secures this mechanism by using a new TCP option that includes a cookie that is negotiated in a preceding connection.
When using TFO in conjunction with MPTCP, there are two key points to take into account, as detailed below (see RFC)

(Feature from the initial roadmap)

This fixes regression on device unplug and/or driver unload. [ 65.681501 < 0.000004>] BUG: kernel NULL pointer dereference, address: 0000000000000008 [ 65.681504 < 0.000003>] #PF: supervisor write access in kernel mode [ 65.681506 < 0.000002>] #PF: error_code(0x0002) - not-present page [ 65.681507 < 0.000001>] PGD 7c9437067 P4D 7c9437067 PUD 7c9db7067 PMD 0 [ 65.681511 < 0.000004>] Oops: 0002 [#1] SMP NOPTI [ 65.681512 < 0.000001>] CPU: 8 PID: 127 Comm: kworker/8:1 Tainted: G W O 5.9.0-rc2-dev+ #59 [ 65.681514 < 0.000002>] Hardware name: System manufacturer System Product Name/PRIME X470-PRO, BIOS 4406 02/28/2019 [ 65.681525 < 0.000011>] Workqueue: events drm_connector_free_work_fn [drm] [ 65.681535 < 0.000010>] RIP: 0010:drm_atomic_private_obj_fini+0x11/0x60 [drm] [ 65.681537 < 0.000002>] Code: de 4c 89 e7 e8 70 f2 ba f8 48 8d 65 d8 5b 41 5c 41 5d 41 5e 41 5f 5d c3 90 0f 1f 44 00 00 48 8b 47 08 48 8b 17 55 48 89 e5 53 <48> 89 42 08 48 89 10 48 b8 00 01 00 00 00 00 ad de 48 89 fb 48 89 [ 65.681541 < 0.000004>] RSP: 0018:ffffa5fa805efdd8 EFLAGS: 00010246 [ 65.681542 < 0.000001>] RAX: 0000000000000000 RBX: ffff9a4b094654d8 RCX: 0000000000000000 [ 65.681544 < 0.000002>] RDX: 0000000000000000 RSI: ffffffffba197bc2 RDI: ffff9a4b094654d8 [ 65.681545 < 0.000001>] RBP: ffffa5fa805efde0 R08: ffffffffba197b82 R09: 0000000000000040 [ 65.681547 < 0.000002>] R10: ffffa5fa805efdc8 R11: 000000000000007f R12: ffff9a4b09465888 [ 65.681549 < 0.000002>] R13: ffff9a4b36f20010 R14: ffff9a4b36f20290 R15: ffff9a4b3a692840 [ 65.681551 < 0.000002>] FS: 0000000000000000(0000) GS:ffff9a4b3ea00000(0000) knlGS:0000000000000000 [ 65.681553 < 0.000002>] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 65.681554 < 0.000001>] CR2: 0000000000000008 CR3: 00000007c9c82000 CR4: 00000000003506e0 [ 65.681556 < 0.000002>] Call Trace: [ 65.681561 < 0.000005>] drm_dp_mst_topology_mgr_destroy+0xc4/0xe0 [drm_kms_helper] [ 65.681612 < 0.000051>] amdgpu_dm_connector_destroy+0x3d/0x110 [amdgpu] [ 65.681622 < 0.000010>] drm_connector_free_work_fn+0x78/0x90 [drm] [ 65.681624 < 0.000002>] process_one_work+0x164/0x410 [ 65.681626 < 0.000002>] worker_thread+0x4d/0x450 [ 65.681628 < 0.000002>] ? rescuer_thread+0x390/0x390 [ 65.681630 < 0.000002>] kthread+0x10a/0x140 [ 65.681632 < 0.000002>] ? kthread_unpark+0x70/0x70 [ 65.681634 < 0.000002>] ret_from_fork+0x22/0x30 This reverts commit 1545fbf. Signed-off-by: Andrey Grodzovsky <[email protected]> Acked-by: Alex Deucher <[email protected]> Signed-off-by: Alex Deucher <[email protected]> Cc: [email protected]

Since commit f959dcd ("dma-direct: Fix potential NULL pointer dereference") an error is reported when we load vdpa_sim and virtio-vdpa: [ 129.351207] net eth0: Unexpected TXQ (0) queue failure: -12 It seems that dma_mask is not initialized. This patch initializes dma_mask() and calls dma_set_mask_and_coherent() to fix the problem. Full log: [ 128.548628] ------------[ cut here ]------------ [ 128.553268] WARNING: CPU: 23 PID: 1105 at kernel/dma/mapping.c:149 dma_map_page_attrs+0x14c/0x1d0 [ 128.562139] Modules linked in: virtio_net net_failover failover virtio_vdpa vdpa_sim vringh vhost_iotlb vdpa xt_CHECKSUM xt_MASQUERADE xt_conntrack ipt_REJECT nf_reject_ipv4 nft_compat nft_counter nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nf_tables nfnetlink tun bridge stp llc iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi rfkill intel_rapl_msr intel_rapl_common isst_if_common sunrpc skx_edac nfit libnvdimm x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel ipmi_ssif kvm mgag200 i2c_algo_bit irqbypass drm_kms_helper crct10dif_pclmul crc32_pclmul syscopyarea ghash_clmulni_intel iTCO_wdt sysfillrect iTCO_vendor_support sysimgblt rapl fb_sys_fops dcdbas intel_cstate drm acpi_ipmi ipmi_si mei_me dell_smbios intel_uncore ipmi_devintf mei i2c_i801 dell_wmi_descriptor wmi_bmof pcspkr lpc_ich i2c_smbus ipmi_msghandler acpi_power_meter ip_tables xfs libcrc32c sd_mod t10_pi sg ahci libahci libata megaraid_sas tg3 crc32c_intel wmi dm_mirror dm_region_hash dm_log [ 128.562188] dm_mod [ 128.651334] CPU: 23 PID: 1105 Comm: NetworkManager Tainted: G S I 5.10.0-rc1+ #59 [ 128.659939] Hardware name: Dell Inc. PowerEdge R440/04JN2K, BIOS 2.8.1 06/30/2020 [ 128.667419] RIP: 0010:dma_map_page_attrs+0x14c/0x1d0 [ 128.672384] Code: 1c 25 28 00 00 00 0f 85 97 00 00 00 48 83 c4 10 5b 5d 41 5c 41 5d c3 4c 89 da eb d7 48 89 f2 48 2b 50 18 48 89 d0 eb 8d 0f 0b <0f> 0b 48 c7 c0 ff ff ff ff eb c3 48 89 d9 48 8b 40 40 e8 2d a0 aa [ 128.691131] RSP: 0018:ffffae0f0151f3c8 EFLAGS: 00010246 [ 128.696357] RAX: ffffffffc06b7400 RBX: 00000000000005fa RCX: 0000000000000000 [ 128.703488] RDX: 0000000000000040 RSI: ffffcee3c7861200 RDI: ffff9e2bc16cd000 [ 128.710620] RBP: 0000000000000000 R08: 0000000000000002 R09: 0000000000000000 [ 128.717754] R10: 0000000000000002 R11: 0000000000000000 R12: ffff9e472cb291f8 [ 128.724886] R13: ffff9e2bc14da780 R14: ffff9e472bc20000 R15: ffff9e2bc1b14940 [ 128.732020] FS: 00007f887bae23c0(0000) GS:ffff9e4ac01c0000(0000) knlGS:0000000000000000 [ 128.740105] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 128.745852] CR2: 0000562bc09de998 CR3: 00000003c156c006 CR4: 00000000007706e0 [ 128.752982] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 128.760114] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 128.767247] PKRU: 55555554 [ 128.769961] Call Trace: [ 128.772418] virtqueue_add+0x81e/0xb00 [ 128.776176] virtqueue_add_inbuf_ctx+0x26/0x30 [ 128.780625] try_fill_recv+0x3a2/0x6e0 [virtio_net] [ 128.785509] virtnet_open+0xf9/0x180 [virtio_net] [ 128.790217] __dev_open+0xe8/0x180 [ 128.793620] __dev_change_flags+0x1a7/0x210 [ 128.797808] dev_change_flags+0x21/0x60 [ 128.801646] do_setlink+0x328/0x10e0 [ 128.805227] ? __nla_validate_parse+0x121/0x180 [ 128.809757] ? __nla_parse+0x21/0x30 [ 128.813338] ? inet6_validate_link_af+0x5c/0xf0 [ 128.817871] ? cpumask_next+0x17/0x20 [ 128.821535] ? __snmp6_fill_stats64.isra.54+0x6b/0x110 [ 128.826676] ? __nla_validate_parse+0x47/0x180 [ 128.831120] __rtnl_newlink+0x541/0x8e0 [ 128.834962] ? __nla_reserve+0x38/0x50 [ 128.838713] ? security_sock_rcv_skb+0x2a/0x40 [ 128.843158] ? netlink_deliver_tap+0x2c/0x1e0 [ 128.847518] ? netlink_attachskb+0x1d8/0x220 [ 128.851793] ? skb_queue_tail+0x1b/0x50 [ 128.855641] ? fib6_clean_node+0x43/0x170 [ 128.859652] ? _cond_resched+0x15/0x30 [ 128.863406] ? kmem_cache_alloc_trace+0x3a3/0x420 [ 128.868110] rtnl_newlink+0x43/0x60 [ 128.871602] rtnetlink_rcv_msg+0x12c/0x380 [ 128.875701] ? rtnl_calcit.isra.39+0x110/0x110 [ 128.880147] netlink_rcv_skb+0x50/0x100 [ 128.883987] netlink_unicast+0x1a5/0x280 [ 128.887913] netlink_sendmsg+0x23d/0x470 [ 128.891839] sock_sendmsg+0x5b/0x60 [ 128.895331] ____sys_sendmsg+0x1ef/0x260 [ 128.899255] ? copy_msghdr_from_user+0x5c/0x90 [ 128.903702] ___sys_sendmsg+0x7c/0xc0 [ 128.907369] ? dev_forward_change+0x130/0x130 [ 128.911731] ? sysctl_head_finish.part.29+0x24/0x40 [ 128.916616] ? new_sync_write+0x11f/0x1b0 [ 128.920628] ? mntput_no_expire+0x47/0x240 [ 128.924727] __sys_sendmsg+0x57/0xa0 [ 128.928309] do_syscall_64+0x33/0x40 [ 128.931887] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ 128.936937] RIP: 0033:0x7f88792e3857 [ 128.940518] Code: c3 66 90 41 54 41 89 d4 55 48 89 f5 53 89 fb 48 83 ec 10 e8 0b ed ff ff 44 89 e2 48 89 ee 89 df 41 89 c0 b8 2e 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 35 44 89 c7 48 89 44 24 08 e8 44 ed ff ff 48 [ 128.959263] RSP: 002b:00007ffdca60dea0 EFLAGS: 00000293 ORIG_RAX: 000000000000002e [ 128.966827] RAX: ffffffffffffffda RBX: 000000000000000c RCX: 00007f88792e3857 [ 128.973960] RDX: 0000000000000000 RSI: 00007ffdca60def0 RDI: 000000000000000c [ 128.981095] RBP: 00007ffdca60def0 R08: 0000000000000000 R09: 0000000000000000 [ 128.988224] R10: 0000000000000001 R11: 0000000000000293 R12: 0000000000000000 [ 128.995357] R13: 0000000000000000 R14: 00007ffdca60e0a8 R15: 00007ffdca60e09c [ 129.002492] CPU: 23 PID: 1105 Comm: NetworkManager Tainted: G S I 5.10.0-rc1+ #59 [ 129.011093] Hardware name: Dell Inc. PowerEdge R440/04JN2K, BIOS 2.8.1 06/30/2020 [ 129.018571] Call Trace: [ 129.021027] dump_stack+0x57/0x6a [ 129.024346] __warn.cold.14+0xe/0x3d [ 129.027925] ? dma_map_page_attrs+0x14c/0x1d0 [ 129.032283] report_bug+0xbd/0xf0 [ 129.035602] handle_bug+0x44/0x80 [ 129.038922] exc_invalid_op+0x13/0x60 [ 129.042589] asm_exc_invalid_op+0x12/0x20 [ 129.046602] RIP: 0010:dma_map_page_attrs+0x14c/0x1d0 [ 129.051566] Code: 1c 25 28 00 00 00 0f 85 97 00 00 00 48 83 c4 10 5b 5d 41 5c 41 5d c3 4c 89 da eb d7 48 89 f2 48 2b 50 18 48 89 d0 eb 8d 0f 0b <0f> 0b 48 c7 c0 ff ff ff ff eb c3 48 89 d9 48 8b 40 40 e8 2d a0 aa [ 129.070311] RSP: 0018:ffffae0f0151f3c8 EFLAGS: 00010246 [ 129.075536] RAX: ffffffffc06b7400 RBX: 00000000000005fa RCX: 0000000000000000 [ 129.082669] RDX: 0000000000000040 RSI: ffffcee3c7861200 RDI: ffff9e2bc16cd000 [ 129.089803] RBP: 0000000000000000 R08: 0000000000000002 R09: 0000000000000000 [ 129.096936] R10: 0000000000000002 R11: 0000000000000000 R12: ffff9e472cb291f8 [ 129.104068] R13: ffff9e2bc14da780 R14: ffff9e472bc20000 R15: ffff9e2bc1b14940 [ 129.111200] virtqueue_add+0x81e/0xb00 [ 129.114952] virtqueue_add_inbuf_ctx+0x26/0x30 [ 129.119399] try_fill_recv+0x3a2/0x6e0 [virtio_net] [ 129.124280] virtnet_open+0xf9/0x180 [virtio_net] [ 129.128984] __dev_open+0xe8/0x180 [ 129.132390] __dev_change_flags+0x1a7/0x210 [ 129.136575] dev_change_flags+0x21/0x60 [ 129.140415] do_setlink+0x328/0x10e0 [ 129.143994] ? __nla_validate_parse+0x121/0x180 [ 129.148528] ? __nla_parse+0x21/0x30 [ 129.152107] ? inet6_validate_link_af+0x5c/0xf0 [ 129.156639] ? cpumask_next+0x17/0x20 [ 129.160306] ? __snmp6_fill_stats64.isra.54+0x6b/0x110 [ 129.165443] ? __nla_validate_parse+0x47/0x180 [ 129.169890] __rtnl_newlink+0x541/0x8e0 [ 129.173731] ? __nla_reserve+0x38/0x50 [ 129.177483] ? security_sock_rcv_skb+0x2a/0x40 [ 129.181928] ? netlink_deliver_tap+0x2c/0x1e0 [ 129.186286] ? netlink_attachskb+0x1d8/0x220 [ 129.190560] ? skb_queue_tail+0x1b/0x50 [ 129.194401] ? fib6_clean_node+0x43/0x170 [ 129.198411] ? _cond_resched+0x15/0x30 [ 129.202163] ? kmem_cache_alloc_trace+0x3a3/0x420 [ 129.206869] rtnl_newlink+0x43/0x60 [ 129.210361] rtnetlink_rcv_msg+0x12c/0x380 [ 129.214462] ? rtnl_calcit.isra.39+0x110/0x110 [ 129.218908] netlink_rcv_skb+0x50/0x100 [ 129.222747] netlink_unicast+0x1a5/0x280 [ 129.226672] netlink_sendmsg+0x23d/0x470 [ 129.230599] sock_sendmsg+0x5b/0x60 [ 129.234090] ____sys_sendmsg+0x1ef/0x260 [ 129.238015] ? copy_msghdr_from_user+0x5c/0x90 [ 129.242461] ___sys_sendmsg+0x7c/0xc0 [ 129.246128] ? dev_forward_change+0x130/0x130 [ 129.250487] ? sysctl_head_finish.part.29+0x24/0x40 [ 129.255368] ? new_sync_write+0x11f/0x1b0 [ 129.259381] ? mntput_no_expire+0x47/0x240 [ 129.263478] __sys_sendmsg+0x57/0xa0 [ 129.267058] do_syscall_64+0x33/0x40 [ 129.270639] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ 129.275689] RIP: 0033:0x7f88792e3857 [ 129.279268] Code: c3 66 90 41 54 41 89 d4 55 48 89 f5 53 89 fb 48 83 ec 10 e8 0b ed ff ff 44 89 e2 48 89 ee 89 df 41 89 c0 b8 2e 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 35 44 89 c7 48 89 44 24 08 e8 44 ed ff ff 48 [ 129.298015] RSP: 002b:00007ffdca60dea0 EFLAGS: 00000293 ORIG_RAX: 000000000000002e [ 129.305581] RAX: ffffffffffffffda RBX: 000000000000000c RCX: 00007f88792e3857 [ 129.312712] RDX: 0000000000000000 RSI: 00007ffdca60def0 RDI: 000000000000000c [ 129.319846] RBP: 00007ffdca60def0 R08: 0000000000000000 R09: 0000000000000000 [ 129.326978] R10: 0000000000000001 R11: 0000000000000293 R12: 0000000000000000 [ 129.334109] R13: 0000000000000000 R14: 00007ffdca60e0a8 R15: 00007ffdca60e09c [ 129.341249] ---[ end trace c551e8028fbaf59d ]--- [ 129.351207] net eth0: Unexpected TXQ (0) queue failure: -12 [ 129.360445] net eth0: Unexpected TXQ (0) queue failure: -12 [ 129.824428] net eth0: Unexpected TXQ (0) queue failure: -12 Fixes: 2c53d0f ("vdpasim: vDPA device simulator") Signed-off-by: Laurent Vivier <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Michael S. Tsirkin <[email protected]> Cc: [email protected] Acked-by: Jason Wang <[email protected]>

In case of memory pressure the MPTCP xmit path keeps at most a single skb in the tx cache, eventually freeing additional ones. The associated counter for forward memory is not update accordingly, and that causes the following splat: WARNING: CPU: 0 PID: 12 at net/core/stream.c:208 sk_stream_kill_queues+0x3ca/0x530 net/core/stream.c:208 Modules linked in: CPU: 0 PID: 12 Comm: kworker/0:1 Not tainted 5.11.0-rc2 #59 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014 Workqueue: events mptcp_worker RIP: 0010:sk_stream_kill_queues+0x3ca/0x530 net/core/stream.c:208 Code: 03 0f b6 04 02 84 c0 74 08 3c 03 0f 8e 63 01 00 00 8b ab 00 01 00 00 e9 60 ff ff ff e8 2f 24 d3 fe 0f 0b eb 97 e8 26 24 d3 fe <0f> 0b eb a0 e8 1d 24 d3 fe 0f 0b e9 a5 fe ff ff 4c 89 e7 e8 0e d0 RSP: 0018:ffffc900000c7bc8 EFLAGS: 00010293 RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000 RDX: ffff88810030ac40 RSI: ffffffff8262ca4a RDI: 0000000000000003 RBP: 0000000000000d00 R08: 0000000000000000 R09: ffffffff85095aa7 R10: ffffffff8262c9ea R11: 0000000000000001 R12: ffff888108908100 R13: ffffffff85095aa0 R14: ffffc900000c7c48 R15: 1ffff92000018f85 FS: 0000000000000000(0000) GS:ffff88811b200000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007fa7444baef8 CR3: 0000000035ee9005 CR4: 0000000000170ef0 Call Trace: __mptcp_destroy_sock+0x4a7/0x6c0 net/mptcp/protocol.c:2547 mptcp_worker+0x7dd/0x1610 net/mptcp/protocol.c:2272 process_one_work+0x896/0x1170 kernel/workqueue.c:2275 worker_thread+0x605/0x1350 kernel/workqueue.c:2421 kthread+0x344/0x410 kernel/kthread.c:292 ret_from_fork+0x22/0x30 arch/x86/entry/entry_64.S:296 At close time, as reported by syzkaller/Christoph. This change address the issue properly updating the fwd allocated memory counter in the error path. Reported-by: Christoph Paasch <[email protected]> Closes: #136 Fixes: 724cfd2 ("mptcp: allocate TX skbs in msk context") Reviewed-by: Mat Martineau <[email protected]> Signed-off-by: Paolo Abeni <[email protected]>

In case of memory pressure the MPTCP xmit path keeps at most a single skb in the tx cache, eventually freeing additional ones. The associated counter for forward memory is not update accordingly, and that causes the following splat: WARNING: CPU: 0 PID: 12 at net/core/stream.c:208 sk_stream_kill_queues+0x3ca/0x530 net/core/stream.c:208 Modules linked in: CPU: 0 PID: 12 Comm: kworker/0:1 Not tainted 5.11.0-rc2 multipath-tcp#59 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014 Workqueue: events mptcp_worker RIP: 0010:sk_stream_kill_queues+0x3ca/0x530 net/core/stream.c:208 Code: 03 0f b6 04 02 84 c0 74 08 3c 03 0f 8e 63 01 00 00 8b ab 00 01 00 00 e9 60 ff ff ff e8 2f 24 d3 fe 0f 0b eb 97 e8 26 24 d3 fe <0f> 0b eb a0 e8 1d 24 d3 fe 0f 0b e9 a5 fe ff ff 4c 89 e7 e8 0e d0 RSP: 0018:ffffc900000c7bc8 EFLAGS: 00010293 RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000 RDX: ffff88810030ac40 RSI: ffffffff8262ca4a RDI: 0000000000000003 RBP: 0000000000000d00 R08: 0000000000000000 R09: ffffffff85095aa7 R10: ffffffff8262c9ea R11: 0000000000000001 R12: ffff888108908100 R13: ffffffff85095aa0 R14: ffffc900000c7c48 R15: 1ffff92000018f85 FS: 0000000000000000(0000) GS:ffff88811b200000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007fa7444baef8 CR3: 0000000035ee9005 CR4: 0000000000170ef0 Call Trace: __mptcp_destroy_sock+0x4a7/0x6c0 net/mptcp/protocol.c:2547 mptcp_worker+0x7dd/0x1610 net/mptcp/protocol.c:2272 process_one_work+0x896/0x1170 kernel/workqueue.c:2275 worker_thread+0x605/0x1350 kernel/workqueue.c:2421 kthread+0x344/0x410 kernel/kthread.c:292 ret_from_fork+0x22/0x30 arch/x86/entry/entry_64.S:296 At close time, as reported by syzkaller/Christoph. This change address the issue properly updating the fwd allocated memory counter in the error path. Reported-by: Christoph Paasch <[email protected]> Closes: multipath-tcp#136 Fixes: 724cfd2 ("mptcp: allocate TX skbs in msk context") Reviewed-by: Mat Martineau <[email protected]> Signed-off-by: Paolo Abeni <[email protected]>

In case of memory pressure the MPTCP xmit path keeps at most a single skb in the tx cache, eventually freeing additional ones. The associated counter for forward memory is not update accordingly, and that causes the following splat: WARNING: CPU: 0 PID: 12 at net/core/stream.c:208 sk_stream_kill_queues+0x3ca/0x530 net/core/stream.c:208 Modules linked in: CPU: 0 PID: 12 Comm: kworker/0:1 Not tainted 5.11.0-rc2 #59 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014 Workqueue: events mptcp_worker RIP: 0010:sk_stream_kill_queues+0x3ca/0x530 net/core/stream.c:208 Code: 03 0f b6 04 02 84 c0 74 08 3c 03 0f 8e 63 01 00 00 8b ab 00 01 00 00 e9 60 ff ff ff e8 2f 24 d3 fe 0f 0b eb 97 e8 26 24 d3 fe <0f> 0b eb a0 e8 1d 24 d3 fe 0f 0b e9 a5 fe ff ff 4c 89 e7 e8 0e d0 RSP: 0018:ffffc900000c7bc8 EFLAGS: 00010293 RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000 RDX: ffff88810030ac40 RSI: ffffffff8262ca4a RDI: 0000000000000003 RBP: 0000000000000d00 R08: 0000000000000000 R09: ffffffff85095aa7 R10: ffffffff8262c9ea R11: 0000000000000001 R12: ffff888108908100 R13: ffffffff85095aa0 R14: ffffc900000c7c48 R15: 1ffff92000018f85 FS: 0000000000000000(0000) GS:ffff88811b200000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007fa7444baef8 CR3: 0000000035ee9005 CR4: 0000000000170ef0 Call Trace: __mptcp_destroy_sock+0x4a7/0x6c0 net/mptcp/protocol.c:2547 mptcp_worker+0x7dd/0x1610 net/mptcp/protocol.c:2272 process_one_work+0x896/0x1170 kernel/workqueue.c:2275 worker_thread+0x605/0x1350 kernel/workqueue.c:2421 kthread+0x344/0x410 kernel/kthread.c:292 ret_from_fork+0x22/0x30 arch/x86/entry/entry_64.S:296 At close time, as reported by syzkaller/Christoph. This change address the issue properly updating the fwd allocated memory counter in the error path. Reported-by: Christoph Paasch <[email protected]> Closes: #136 Fixes: 724cfd2 ("mptcp: allocate TX skbs in msk context") Reviewed-by: Mat Martineau <[email protected]> Signed-off-by: Paolo Abeni <[email protected]>

In case of memory pressure the MPTCP xmit path keeps at most a single skb in the tx cache, eventually freeing additional ones. The associated counter for forward memory is not update accordingly, and that causes the following splat: WARNING: CPU: 0 PID: 12 at net/core/stream.c:208 sk_stream_kill_queues+0x3ca/0x530 net/core/stream.c:208 Modules linked in: CPU: 0 PID: 12 Comm: kworker/0:1 Not tainted 5.11.0-rc2 #59 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014 Workqueue: events mptcp_worker RIP: 0010:sk_stream_kill_queues+0x3ca/0x530 net/core/stream.c:208 Code: 03 0f b6 04 02 84 c0 74 08 3c 03 0f 8e 63 01 00 00 8b ab 00 01 00 00 e9 60 ff ff ff e8 2f 24 d3 fe 0f 0b eb 97 e8 26 24 d3 fe <0f> 0b eb a0 e8 1d 24 d3 fe 0f 0b e9 a5 fe ff ff 4c 89 e7 e8 0e d0 RSP: 0018:ffffc900000c7bc8 EFLAGS: 00010293 RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000 RDX: ffff88810030ac40 RSI: ffffffff8262ca4a RDI: 0000000000000003 RBP: 0000000000000d00 R08: 0000000000000000 R09: ffffffff85095aa7 R10: ffffffff8262c9ea R11: 0000000000000001 R12: ffff888108908100 R13: ffffffff85095aa0 R14: ffffc900000c7c48 R15: 1ffff92000018f85 FS: 0000000000000000(0000) GS:ffff88811b200000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007fa7444baef8 CR3: 0000000035ee9005 CR4: 0000000000170ef0 Call Trace: __mptcp_destroy_sock+0x4a7/0x6c0 net/mptcp/protocol.c:2547 mptcp_worker+0x7dd/0x1610 net/mptcp/protocol.c:2272 process_one_work+0x896/0x1170 kernel/workqueue.c:2275 worker_thread+0x605/0x1350 kernel/workqueue.c:2421 kthread+0x344/0x410 kernel/kthread.c:292 ret_from_fork+0x22/0x30 arch/x86/entry/entry_64.S:296 At close time, as reported by syzkaller/Christoph. This change address the issue properly updating the fwd allocated memory counter in the error path. Reported-by: Christoph Paasch <[email protected]> Closes: #136 Fixes: 724cfd2 ("mptcp: allocate TX skbs in msk context") Signed-off-by: Paolo Abeni <[email protected]> Signed-off-by: Mat Martineau <[email protected]> Signed-off-by: David S. Miller <[email protected]>

Cifsd for next

matttbe · 2021-11-18T16:05:09Z

Dmytro Shytyi is working on that.

The rtla osnoise tool is an interface for the osnoise tracer. The osnoise tracer dispatches a kernel thread per-cpu. These threads read the time in a loop while with preemption, softirqs and IRQs enabled, thus allowing all the sources of osnoise during its execution. The osnoise threads take note of the entry and exit point of any source of interferences, increasing a per-cpu interference counter. The osnoise tracer also saves an interference counter for each source of interference. The rtla osnoise top mode displays information about the periodic summary from the osnoise tracer. One example of rtla osnoise top output is: [root@alien ~]# rtla osnoise top -c 0-3 -d 1m -q -r 900000 -P F:1 Operating System Noise duration: 0 00:01:00 | time is in us CPU Period Runtime Noise % CPU Aval Max Noise Max Single HW NMI IRQ Softirq Thread 0 #58 52200000 1031 99.99802 91 60 0 0 52285 0 101 1 #59 53100000 5 99.99999 5 5 0 9 53122 0 18 2 #59 53100000 7 99.99998 7 7 0 8 53115 0 18 3 #59 53100000 8274 99.98441 277 23 0 9 53778 0 660 "rtla osnoise top --help" works and provide information about the available options. Link: https://lkml.kernel.org/r/0d796993abf587ae5a170bb8415c49368d4999e1.1639158831.git.bristot@kernel.org Cc: Tao Zhou <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Tom Zanussi <[email protected]> Cc: Masami Hiramatsu <[email protected]> Cc: Juri Lelli <[email protected]> Cc: Clark Williams <[email protected]> Cc: John Kacur <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Sebastian Andrzej Siewior <[email protected]> Cc: Daniel Bristot de Oliveira <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Signed-off-by: Daniel Bristot de Oliveira <[email protected]> Signed-off-by: Steven Rostedt <[email protected]>

matttbe · 2022-07-28T15:31:26Z

Note: @dmytroshytyi and @bhesmans are working on that.

ieee80211_tx_ba_session_handle_start() may get NULL for sdata when a deauthentication is ongoing. Here a trace triggering the race with the hostapd test multi_ap_fronthaul_on_ap: (gdb) list *drv_ampdu_action+0x46 0x8b16 is in drv_ampdu_action (net/mac80211/driver-ops.c:396). 391 int ret = -EOPNOTSUPP; 392 393 might_sleep(); 394 395 sdata = get_bss_sdata(sdata); 396 if (!check_sdata_in_driver(sdata)) 397 return -EIO; 398 399 trace_drv_ampdu_action(local, sdata, params); 400 wlan0: moving STA 02:00:00:00:03:00 to state 3 wlan0: associated wlan0: deauthenticating from 02:00:00:00:03:00 by local choice (Reason: 3=DEAUTH_LEAVING) wlan3.sta1: Open BA session requested for 02:00:00:00:00:00 tid 0 wlan3.sta1: dropped frame to 02:00:00:00:00:00 (unauthorized port) wlan0: moving STA 02:00:00:00:03:00 to state 2 wlan0: moving STA 02:00:00:00:03:00 to state 1 wlan0: Removed STA 02:00:00:00:03:00 wlan0: Destroyed STA 02:00:00:00:03:00 BUG: unable to handle page fault for address: fffffffffffffb48 PGD 11814067 P4D 11814067 PUD 11816067 PMD 0 Oops: 0000 [#1] PREEMPT SMP PTI CPU: 2 PID: 133397 Comm: kworker/u16:1 Tainted: G W 6.1.0-rc8-wt+ #59 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.0-20220807_005459-localhost 04/01/2014 Workqueue: phy3 ieee80211_ba_session_work [mac80211] RIP: 0010:drv_ampdu_action+0x46/0x280 [mac80211] Code: 53 48 89 f3 be 89 01 00 00 e8 d6 43 bf ef e8 21 46 81 f0 83 bb a0 1b 00 00 04 75 0e 48 8b 9b 28 0d 00 00 48 81 eb 10 0e 00 00 <8b> 93 58 09 00 00 f6 c2 20 0f 84 3b 01 00 00 8b 05 dd 1c 0f 00 85 RSP: 0018:ffffc900025ebd20 EFLAGS: 00010287 RAX: 0000000000000000 RBX: fffffffffffff1f0 RCX: ffff888102228240 RDX: 0000000080000000 RSI: ffffffff918c5de0 RDI: ffff888102228b40 RBP: ffffc900025ebd40 R08: 0000000000000001 R09: 0000000000000001 R10: 0000000000000001 R11: 0000000000000000 R12: ffff888118c18ec0 R13: 0000000000000000 R14: ffffc900025ebd60 R15: ffff888018b7efb8 FS: 0000000000000000(0000) GS:ffff88817a600000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: fffffffffffffb48 CR3: 0000000105228006 CR4: 0000000000170ee0 Call Trace: <TASK> ieee80211_tx_ba_session_handle_start+0xd0/0x190 [mac80211] ieee80211_ba_session_work+0xff/0x2e0 [mac80211] process_one_work+0x29f/0x620 worker_thread+0x4d/0x3d0 ? process_one_work+0x620/0x620 kthread+0xfb/0x120 ? kthread_complete_and_exit+0x20/0x20 ret_from_fork+0x22/0x30 </TASK> Signed-off-by: Alexander Wetzel <[email protected]> Link: https://lore.kernel.org/r/[email protected] Cc: [email protected] Signed-off-by: Johannes Berg <[email protected]>

l2cap_sock_release(sk) frees sk. However, sk's children are still alive and point to the already free'd sk's address. To fix this, l2cap_sock_release(sk) also cleans sk's children. ================================================================== BUG: KASAN: use-after-free in l2cap_sock_ready_cb+0xb7/0x100 net/bluetooth/l2cap_sock.c:1650 Read of size 8 at addr ffff888104617aa8 by task kworker/u3:0/276 CPU: 0 PID: 276 Comm: kworker/u3:0 Not tainted 6.2.0-00001-gef397bd4d5fb-dirty #59 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014 Workqueue: hci2 hci_rx_work Call Trace: <TASK> __dump_stack lib/dump_stack.c:88 [inline] dump_stack_lvl+0x72/0x95 lib/dump_stack.c:106 print_address_description mm/kasan/report.c:306 [inline] print_report+0x175/0x478 mm/kasan/report.c:417 kasan_report+0xb1/0x130 mm/kasan/report.c:517 l2cap_sock_ready_cb+0xb7/0x100 net/bluetooth/l2cap_sock.c:1650 l2cap_chan_ready+0x10e/0x1e0 net/bluetooth/l2cap_core.c:1386 l2cap_config_req+0x753/0x9f0 net/bluetooth/l2cap_core.c:4480 l2cap_bredr_sig_cmd net/bluetooth/l2cap_core.c:5739 [inline] l2cap_sig_channel net/bluetooth/l2cap_core.c:6509 [inline] l2cap_recv_frame+0xe2e/0x43c0 net/bluetooth/l2cap_core.c:7788 l2cap_recv_acldata+0x6ed/0x7e0 net/bluetooth/l2cap_core.c:8506 hci_acldata_packet net/bluetooth/hci_core.c:3813 [inline] hci_rx_work+0x66e/0xbc0 net/bluetooth/hci_core.c:4048 process_one_work+0x4ea/0x8e0 kernel/workqueue.c:2289 worker_thread+0x364/0x8e0 kernel/workqueue.c:2436 kthread+0x1b9/0x200 kernel/kthread.c:376 ret_from_fork+0x2c/0x50 arch/x86/entry/entry_64.S:308 </TASK> Allocated by task 288: kasan_save_stack+0x22/0x50 mm/kasan/common.c:45 kasan_set_track+0x25/0x30 mm/kasan/common.c:52 ____kasan_kmalloc mm/kasan/common.c:374 [inline] __kasan_kmalloc+0x82/0x90 mm/kasan/common.c:383 kasan_kmalloc include/linux/kasan.h:211 [inline] __do_kmalloc_node mm/slab_common.c:968 [inline] __kmalloc+0x5a/0x140 mm/slab_common.c:981 kmalloc include/linux/slab.h:584 [inline] sk_prot_alloc+0x113/0x1f0 net/core/sock.c:2040 sk_alloc+0x36/0x3c0 net/core/sock.c:2093 l2cap_sock_alloc.constprop.0+0x39/0x1c0 net/bluetooth/l2cap_sock.c:1852 l2cap_sock_create+0x10d/0x220 net/bluetooth/l2cap_sock.c:1898 bt_sock_create+0x183/0x290 net/bluetooth/af_bluetooth.c:132 __sock_create+0x226/0x380 net/socket.c:1518 sock_create net/socket.c:1569 [inline] __sys_socket_create net/socket.c:1606 [inline] __sys_socket_create net/socket.c:1591 [inline] __sys_socket+0x112/0x200 net/socket.c:1639 __do_sys_socket net/socket.c:1652 [inline] __se_sys_socket net/socket.c:1650 [inline] __x64_sys_socket+0x40/0x50 net/socket.c:1650 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x72/0xdc Freed by task 288: kasan_save_stack+0x22/0x50 mm/kasan/common.c:45 kasan_set_track+0x25/0x30 mm/kasan/common.c:52 kasan_save_free_info+0x2e/0x50 mm/kasan/generic.c:523 ____kasan_slab_free mm/kasan/common.c:236 [inline] ____kasan_slab_free mm/kasan/common.c:200 [inline] __kasan_slab_free+0x10a/0x190 mm/kasan/common.c:244 kasan_slab_free include/linux/kasan.h:177 [inline] slab_free_hook mm/slub.c:1781 [inline] slab_free_freelist_hook mm/slub.c:1807 [inline] slab_free mm/slub.c:3787 [inline] __kmem_cache_free+0x88/0x1f0 mm/slub.c:3800 sk_prot_free net/core/sock.c:2076 [inline] __sk_destruct+0x347/0x430 net/core/sock.c:2168 sk_destruct+0x9c/0xb0 net/core/sock.c:2183 __sk_free+0x82/0x220 net/core/sock.c:2194 sk_free+0x7c/0xa0 net/core/sock.c:2205 sock_put include/net/sock.h:1991 [inline] l2cap_sock_kill+0x256/0x2b0 net/bluetooth/l2cap_sock.c:1257 l2cap_sock_release+0x1a7/0x220 net/bluetooth/l2cap_sock.c:1428 __sock_release+0x80/0x150 net/socket.c:650 sock_close+0x19/0x30 net/socket.c:1368 __fput+0x17a/0x5c0 fs/file_table.c:320 task_work_run+0x132/0x1c0 kernel/task_work.c:179 resume_user_mode_work include/linux/resume_user_mode.h:49 [inline] exit_to_user_mode_loop kernel/entry/common.c:171 [inline] exit_to_user_mode_prepare+0x113/0x120 kernel/entry/common.c:203 __syscall_exit_to_user_mode_work kernel/entry/common.c:285 [inline] syscall_exit_to_user_mode+0x21/0x50 kernel/entry/common.c:296 do_syscall_64+0x4c/0x90 arch/x86/entry/common.c:86 entry_SYSCALL_64_after_hwframe+0x72/0xdc The buggy address belongs to the object at ffff888104617800 which belongs to the cache kmalloc-1k of size 1024 The buggy address is located 680 bytes inside of 1024-byte region [ffff888104617800, ffff888104617c00) The buggy address belongs to the physical page: page:00000000dbca6a80 refcount:1 mapcount:0 mapping:0000000000000000 index:0xffff888104614000 pfn:0x104614 head:00000000dbca6a80 order:2 compound_mapcount:0 subpages_mapcount:0 compound_pincount:0 flags: 0x200000000010200(slab|head|node=0|zone=2) raw: 0200000000010200 ffff888100041dc0 ffffea0004212c10 ffffea0004234b10 raw: ffff888104614000 0000000000080002 00000001ffffffff 0000000000000000 page dumped because: kasan: bad access detected Memory state around the buggy address: ffff888104617980: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ffff888104617a00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb >ffff888104617a80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ^ ffff888104617b00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ffff888104617b80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ================================================================== Ack: This bug is found by FuzzBT with a modified Syzkaller. Other contributors are Ruoyu Wu and Hui Peng. Signed-off-by: Sungwoo Kim <[email protected]> Signed-off-by: Luiz Augusto von Dentz <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>

Christoph reported a splat hinting at a corrupted snd_una: WARNING: CPU: 1 PID: 38 at net/mptcp/protocol.c:1005 __mptcp_clean_una+0x4b3/0x620 net/mptcp/protocol.c:1005 Modules linked in: CPU: 1 PID: 38 Comm: kworker/1:1 Not tainted 6.9.0-rc1-gbbeac67456c9 #59 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.0-2.el7 04/01/2014 Workqueue: events mptcp_worker RIP: 0010:__mptcp_clean_una+0x4b3/0x620 net/mptcp/protocol.c:1005 Code: be 06 01 00 00 bf 06 01 00 00 e8 a8 12 e7 fe e9 00 fe ff ff e8 8e 1a e7 fe 0f b7 ab 3e 02 00 00 e9 d3 fd ff ff e8 7d 1a e7 fe <0f> 0b 4c 8b bb e0 05 00 00 e9 74 fc ff ff e8 6a 1a e7 fe 0f 0b e9 RSP: 0018:ffffc9000013fd48 EFLAGS: 00010293 RAX: 0000000000000000 RBX: ffff8881029bd280 RCX: ffffffff82382fe4 RDX: ffff8881003cbd00 RSI: ffffffff823833c3 RDI: 0000000000000001 RBP: 0000000000000000 R08: 0000000000000001 R09: 0000000000000000 R10: 0000000000000000 R11: fefefefefefefeff R12: ffff888138ba8000 R13: 0000000000000106 R14: ffff8881029bd908 R15: ffff888126560000 FS: 0000000000000000(0000) GS:ffff88813bd00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f604a5dae38 CR3: 0000000101dac002 CR4: 0000000000170ef0 Call Trace: <TASK> __mptcp_clean_una_wakeup net/mptcp/protocol.c:1055 [inline] mptcp_clean_una_wakeup net/mptcp/protocol.c:1062 [inline] __mptcp_retrans+0x7f/0x7e0 net/mptcp/protocol.c:2615 mptcp_worker+0x434/0x740 net/mptcp/protocol.c:2767 process_one_work+0x1e0/0x560 kernel/workqueue.c:3254 process_scheduled_works kernel/workqueue.c:3335 [inline] worker_thread+0x3c7/0x640 kernel/workqueue.c:3416 kthread+0x121/0x170 kernel/kthread.c:388 ret_from_fork+0x44/0x50 arch/x86/kernel/process.c:147 ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:243 </TASK> When fallback to TCP happens early on client socket, and the mptcp worker (dumbly) tries mptcp-level re-injection, the snd_una is not yet initialized, and the worker tries to use it to clean-up the send buffer. Address the issue always initializing snd_una for active sockets at connect time. Reported-by: Christoph Paasch <[email protected]> Closes: #485 Signed-off-by: Paolo Abeni <[email protected]> Message-Id: <2b6d11d0210f683cc0de8bb47a923d568cfa8f47.1712942966.git.pabeni@redhat.com>

Christoph reported a splat hinting at a corrupted snd_una: WARNING: CPU: 1 PID: 38 at net/mptcp/protocol.c:1005 __mptcp_clean_una+0x4b3/0x620 net/mptcp/protocol.c:1005 Modules linked in: CPU: 1 PID: 38 Comm: kworker/1:1 Not tainted 6.9.0-rc1-gbbeac67456c9 #59 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.0-2.el7 04/01/2014 Workqueue: events mptcp_worker RIP: 0010:__mptcp_clean_una+0x4b3/0x620 net/mptcp/protocol.c:1005 Code: be 06 01 00 00 bf 06 01 00 00 e8 a8 12 e7 fe e9 00 fe ff ff e8 8e 1a e7 fe 0f b7 ab 3e 02 00 00 e9 d3 fd ff ff e8 7d 1a e7 fe <0f> 0b 4c 8b bb e0 05 00 00 e9 74 fc ff ff e8 6a 1a e7 fe 0f 0b e9 RSP: 0018:ffffc9000013fd48 EFLAGS: 00010293 RAX: 0000000000000000 RBX: ffff8881029bd280 RCX: ffffffff82382fe4 RDX: ffff8881003cbd00 RSI: ffffffff823833c3 RDI: 0000000000000001 RBP: 0000000000000000 R08: 0000000000000001 R09: 0000000000000000 R10: 0000000000000000 R11: fefefefefefefeff R12: ffff888138ba8000 R13: 0000000000000106 R14: ffff8881029bd908 R15: ffff888126560000 FS: 0000000000000000(0000) GS:ffff88813bd00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f604a5dae38 CR3: 0000000101dac002 CR4: 0000000000170ef0 Call Trace: <TASK> __mptcp_clean_una_wakeup net/mptcp/protocol.c:1055 [inline] mptcp_clean_una_wakeup net/mptcp/protocol.c:1062 [inline] __mptcp_retrans+0x7f/0x7e0 net/mptcp/protocol.c:2615 mptcp_worker+0x434/0x740 net/mptcp/protocol.c:2767 process_one_work+0x1e0/0x560 kernel/workqueue.c:3254 process_scheduled_works kernel/workqueue.c:3335 [inline] worker_thread+0x3c7/0x640 kernel/workqueue.c:3416 kthread+0x121/0x170 kernel/kthread.c:388 ret_from_fork+0x44/0x50 arch/x86/kernel/process.c:147 ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:243 </TASK> When fallback to TCP happens early on client socket, and the mptcp worker (dumbly) tries mptcp-level re-injection, the snd_una is not yet initialized, and the worker tries to use it to clean-up the send buffer. Address the issue always initializing snd_una for active sockets at establish . Reported-by: Christoph Paasch <[email protected]> Closes: #485 Fixes: 8fd7380 ("mptcp: fallback in case of simultaneous connect") Signed-off-by: Paolo Abeni <[email protected]> Message-Id: <864efc65050c64f3876e75376bbe587bdf42e2eb.1713189887.git.pabeni@redhat.com>

Christoph reported a splat hinting at a corrupted snd_una: WARNING: CPU: 1 PID: 38 at net/mptcp/protocol.c:1005 __mptcp_clean_una+0x4b3/0x620 net/mptcp/protocol.c:1005 Modules linked in: CPU: 1 PID: 38 Comm: kworker/1:1 Not tainted 6.9.0-rc1-gbbeac67456c9 #59 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.0-2.el7 04/01/2014 Workqueue: events mptcp_worker RIP: 0010:__mptcp_clean_una+0x4b3/0x620 net/mptcp/protocol.c:1005 Code: be 06 01 00 00 bf 06 01 00 00 e8 a8 12 e7 fe e9 00 fe ff ff e8 8e 1a e7 fe 0f b7 ab 3e 02 00 00 e9 d3 fd ff ff e8 7d 1a e7 fe <0f> 0b 4c 8b bb e0 05 00 00 e9 74 fc ff ff e8 6a 1a e7 fe 0f 0b e9 RSP: 0018:ffffc9000013fd48 EFLAGS: 00010293 RAX: 0000000000000000 RBX: ffff8881029bd280 RCX: ffffffff82382fe4 RDX: ffff8881003cbd00 RSI: ffffffff823833c3 RDI: 0000000000000001 RBP: 0000000000000000 R08: 0000000000000001 R09: 0000000000000000 R10: 0000000000000000 R11: fefefefefefefeff R12: ffff888138ba8000 R13: 0000000000000106 R14: ffff8881029bd908 R15: ffff888126560000 FS: 0000000000000000(0000) GS:ffff88813bd00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f604a5dae38 CR3: 0000000101dac002 CR4: 0000000000170ef0 Call Trace: <TASK> __mptcp_clean_una_wakeup net/mptcp/protocol.c:1055 [inline] mptcp_clean_una_wakeup net/mptcp/protocol.c:1062 [inline] __mptcp_retrans+0x7f/0x7e0 net/mptcp/protocol.c:2615 mptcp_worker+0x434/0x740 net/mptcp/protocol.c:2767 process_one_work+0x1e0/0x560 kernel/workqueue.c:3254 process_scheduled_works kernel/workqueue.c:3335 [inline] worker_thread+0x3c7/0x640 kernel/workqueue.c:3416 kthread+0x121/0x170 kernel/kthread.c:388 ret_from_fork+0x44/0x50 arch/x86/kernel/process.c:147 ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:243 </TASK> When fallback to TCP happens early on a client socket, snd_nxt is not yet initialized and any incoming ack will copy such value into snd_una. If the mptcp worker (dumbly) tries mptcp-level re-injection after such ack, that would unconditionally trigger a send buffer cleanup using 'bad' snd_una values. We could easily disable re-injection for fallback sockets, but such dumb behavior already helped catching a few subtle issues and a very low to zero impact in practice. Instead address the issue always initializing snd_nxt (and write_seq, for consistency) at connect time. Reported-by: Christoph Paasch <[email protected]> Tested-by: Christoph Paasch <[email protected]> Closes: #485 Fixes: 8fd7380 ("mptcp: fallback in case of simultaneous connect") Signed-off-by: Paolo Abeni <[email protected]> Message-Id: <3b0ca23eaa8906f967a841f55bafed48d0f895cb.1713367699.git.pabeni@redhat.com>

Christoph reported a splat hinting at a corrupted snd_una: WARNING: CPU: 1 PID: 38 at net/mptcp/protocol.c:1005 __mptcp_clean_una+0x4b3/0x620 net/mptcp/protocol.c:1005 Modules linked in: CPU: 1 PID: 38 Comm: kworker/1:1 Not tainted 6.9.0-rc1-gbbeac67456c9 #59 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.0-2.el7 04/01/2014 Workqueue: events mptcp_worker RIP: 0010:__mptcp_clean_una+0x4b3/0x620 net/mptcp/protocol.c:1005 Code: be 06 01 00 00 bf 06 01 00 00 e8 a8 12 e7 fe e9 00 fe ff ff e8 8e 1a e7 fe 0f b7 ab 3e 02 00 00 e9 d3 fd ff ff e8 7d 1a e7 fe <0f> 0b 4c 8b bb e0 05 00 00 e9 74 fc ff ff e8 6a 1a e7 fe 0f 0b e9 RSP: 0018:ffffc9000013fd48 EFLAGS: 00010293 RAX: 0000000000000000 RBX: ffff8881029bd280 RCX: ffffffff82382fe4 RDX: ffff8881003cbd00 RSI: ffffffff823833c3 RDI: 0000000000000001 RBP: 0000000000000000 R08: 0000000000000001 R09: 0000000000000000 R10: 0000000000000000 R11: fefefefefefefeff R12: ffff888138ba8000 R13: 0000000000000106 R14: ffff8881029bd908 R15: ffff888126560000 FS: 0000000000000000(0000) GS:ffff88813bd00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f604a5dae38 CR3: 0000000101dac002 CR4: 0000000000170ef0 Call Trace: <TASK> __mptcp_clean_una_wakeup net/mptcp/protocol.c:1055 [inline] mptcp_clean_una_wakeup net/mptcp/protocol.c:1062 [inline] __mptcp_retrans+0x7f/0x7e0 net/mptcp/protocol.c:2615 mptcp_worker+0x434/0x740 net/mptcp/protocol.c:2767 process_one_work+0x1e0/0x560 kernel/workqueue.c:3254 process_scheduled_works kernel/workqueue.c:3335 [inline] worker_thread+0x3c7/0x640 kernel/workqueue.c:3416 kthread+0x121/0x170 kernel/kthread.c:388 ret_from_fork+0x44/0x50 arch/x86/kernel/process.c:147 ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:243 </TASK> When fallback to TCP happens early on a client socket, snd_nxt is not yet initialized and any incoming ack will copy such value into snd_una. If the mptcp worker (dumbly) tries mptcp-level re-injection after such ack, that would unconditionally trigger a send buffer cleanup using 'bad' snd_una values. We could easily disable re-injection for fallback sockets, but such dumb behavior already helped catching a few subtle issues and a very low to zero impact in practice. Instead address the issue always initializing snd_nxt (and write_seq, for consistency) at connect time. Fixes: 8fd7380 ("mptcp: fallback in case of simultaneous connect") Reported-by: Christoph Paasch <[email protected]> Closes: #485 Tested-by: Christoph Paasch <[email protected]> Signed-off-by: Paolo Abeni <[email protected]> Reviewed-by: Mat Martineau <[email protected]>

Christoph reported a splat hinting at a corrupted snd_una: WARNING: CPU: 1 PID: 38 at net/mptcp/protocol.c:1005 __mptcp_clean_una+0x4b3/0x620 net/mptcp/protocol.c:1005 Modules linked in: CPU: 1 PID: 38 Comm: kworker/1:1 Not tainted 6.9.0-rc1-gbbeac67456c9 #59 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.0-2.el7 04/01/2014 Workqueue: events mptcp_worker RIP: 0010:__mptcp_clean_una+0x4b3/0x620 net/mptcp/protocol.c:1005 Code: be 06 01 00 00 bf 06 01 00 00 e8 a8 12 e7 fe e9 00 fe ff ff e8 8e 1a e7 fe 0f b7 ab 3e 02 00 00 e9 d3 fd ff ff e8 7d 1a e7 fe <0f> 0b 4c 8b bb e0 05 00 00 e9 74 fc ff ff e8 6a 1a e7 fe 0f 0b e9 RSP: 0018:ffffc9000013fd48 EFLAGS: 00010293 RAX: 0000000000000000 RBX: ffff8881029bd280 RCX: ffffffff82382fe4 RDX: ffff8881003cbd00 RSI: ffffffff823833c3 RDI: 0000000000000001 RBP: 0000000000000000 R08: 0000000000000001 R09: 0000000000000000 R10: 0000000000000000 R11: fefefefefefefeff R12: ffff888138ba8000 R13: 0000000000000106 R14: ffff8881029bd908 R15: ffff888126560000 FS: 0000000000000000(0000) GS:ffff88813bd00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f604a5dae38 CR3: 0000000101dac002 CR4: 0000000000170ef0 Call Trace: <TASK> __mptcp_clean_una_wakeup net/mptcp/protocol.c:1055 [inline] mptcp_clean_una_wakeup net/mptcp/protocol.c:1062 [inline] __mptcp_retrans+0x7f/0x7e0 net/mptcp/protocol.c:2615 mptcp_worker+0x434/0x740 net/mptcp/protocol.c:2767 process_one_work+0x1e0/0x560 kernel/workqueue.c:3254 process_scheduled_works kernel/workqueue.c:3335 [inline] worker_thread+0x3c7/0x640 kernel/workqueue.c:3416 kthread+0x121/0x170 kernel/kthread.c:388 ret_from_fork+0x44/0x50 arch/x86/kernel/process.c:147 ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:243 </TASK> When fallback to TCP happens early on a client socket, snd_nxt is not yet initialized and any incoming ack will copy such value into snd_una. If the mptcp worker (dumbly) tries mptcp-level re-injection after such ack, that would unconditionally trigger a send buffer cleanup using 'bad' snd_una values. We could easily disable re-injection for fallback sockets, but such dumb behavior already helped catching a few subtle issues and a very low to zero impact in practice. Instead address the issue always initializing snd_nxt (and write_seq, for consistency) at connect time. Fixes: 8fd7380 ("mptcp: fallback in case of simultaneous connect") Cc: [email protected] Reported-by: Christoph Paasch <[email protected]> Closes: #485 Tested-by: Christoph Paasch <[email protected]> Signed-off-by: Paolo Abeni <[email protected]> Reviewed-by: Mat Martineau <[email protected]> Signed-off-by: Matthieu Baerts (NGI0) <[email protected]> Link: https://lore.kernel.org/r/20240429-upstream-net-20240429-mptcp-snd_nxt-init-connect-v1-1-59ceac0a7dcb@kernel.org Signed-off-by: Jakub Kicinski <[email protected]>

25216af ("PCI: Add managed pcim_intx()") moved the allocation step for pci_intx()'s device resource from pcim_enable_device() to pcim_intx(). As before, pcim_enable_device() sets pci_dev.is_managed to true; and it is never set to false again. Due to the lifecycle of a struct pci_dev, it can happen that a second driver obtains the same pci_dev after a first driver ran. If one driver uses pcim_enable_device() and the other doesn't, this causes the other driver to run into managed pcim_intx(), which will try to allocate when called for the first time. Allocations might sleep, so calling pci_intx() while holding spinlocks becomes then invalid, which causes lockdep warnings and could cause deadlocks: ======================================================== WARNING: possible irq lock inversion dependency detected 6.11.0-rc6+ #59 Tainted: G W -------------------------------------------------------- CPU 0/KVM/1537 just changed the state of lock: ffffa0f0cff965f0 (&vdev->irqlock){-...}-{2:2}, at: vfio_intx_handler+0x21/0xd0 [vfio_pci_core] but this lock took another, HARDIRQ-unsafe lock in the past: (fs_reclaim){+.+.}-{0:0} and interrupts could create inverse lock ordering between them. other info that might help us debug this: Possible interrupt unsafe locking scenario: CPU0 CPU1 ---- ---- lock(fs_reclaim); local_irq_disable(); lock(&vdev->irqlock); lock(fs_reclaim); <Interrupt> lock(&vdev->irqlock); *** DEADLOCK *** Have pcim_enable_device()'s release function, pcim_disable_device(), set pci_dev.is_managed to false so that subsequent drivers using the same struct pci_dev do not implicitly run into managed code. Link: https://lore.kernel.org/r/[email protected] Fixes: 25216af ("PCI: Add managed pcim_intx()") Reported-by: Alex Williamson <[email protected]> Closes: https://lore.kernel.org/all/[email protected]/ Suggested-by: Alex Williamson <[email protected]> Signed-off-by: Philipp Stanner <[email protected]> Signed-off-by: Bjorn Helgaas <[email protected]> Tested-by: Alex Williamson <[email protected]> Reviewed-by: Damien Le Moal <[email protected]>

matttbe added the enhancement label Jul 16, 2020

dcaratti pushed a commit to dcaratti/mptcp_net-next that referenced this issue Sep 2, 2021

Merge pull request multipath-tcp#59 from namjaejeon/cifsd-for-next

7c4ed5d

Cifsd for next

matttbe changed the title ~~TFO support~~ (MP)TFO support Nov 18, 2021

matttbe closed this as completed Dec 8, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

(MP)TFO support #59

(MP)TFO support #59

matttbe commented Jul 16, 2020 •

edited

Loading

matttbe commented Nov 18, 2021

matttbe commented Jul 28, 2022

(MP)TFO support #59

(MP)TFO support #59

Comments

matttbe commented Jul 16, 2020 • edited Loading

matttbe commented Nov 18, 2021

matttbe commented Jul 28, 2022

matttbe commented Jul 16, 2020 •

edited

Loading