MP_PRIO support #51

matttbe · 2020-07-13T14:44:11Z

As defined in RFC8684:

Within a local MPTCP implementation, a host may use any local policy it wishes to decide how to share the traffic to be sent over the available paths.
In the typical use case, where the goal is to maximize throughput, all available paths will be used simultaneously for data transfer, using coupled congestion control as described in [RFC6356]. It is expected, however, that other use cases will appear.
For instance, one possibility is an "all-or-nothing" approach, i.e., have a second path ready for use in the event of failure of the first path, but alternatives could include entirely saturating one path before using an additional path (the "overflow" case). Such choices would be most likely based on the monetary cost of links but may also be based on properties such as the delay or jitter of links, where stability (of delay or bandwidth) is more important than throughput. Application requirements such as these are discussed in detail in [RFC6897].
The ability to make effective choices at the sender requires full knowledge of the path "cost", which is unlikely to be the case. It would be desirable for a receiver to be able to signal their own preferences for paths, since they will often be the multihomed party and may have to pay for metered incoming bandwidth.
To enable this behavior, the MP_JOIN option (see Section 3.2) contains the "B" bit, which allows a host to indicate to its peer that this path should be treated as a backup path to use only in the event of failure of other working subflows (i.e., a subflow where the receiver has indicated that B=1 SHOULD NOT be used to send data unless there are no usable subflows where B=0).
In the event that the available set of paths changes, a host may wish to signal a change in priority of subflows to the peer (e.g., a subflow that was previously set as a backup should now take priority over all remaining subflows). Therefore, the MP_PRIO option, shown in Figure 11, can be used to change the "B" flag of the subflow on which it is sent.
                       1                   2                   3
   0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
  +---------------+---------------+-------+-----+-+
  |     Kind      |     Length    |Subtype|(rsv)|B|
  +---------------+---------------+-------+-----+-+
Figure 11: Change Subflow Priority (MP_PRIO) Option

Another use of the MP_PRIO option is to set the "B" flag on a subflow to cleanly "retire" its use before closing it and removing it with REMOVE_ADDR (Section 3.4.2) -- for example, to support make-before-break session continuity, where new subflows are added before the previously used subflows are closed.
It should be noted that the backup flag is a request from a data receiver to a data sender only, and the data sender SHOULD adhere to these requests. A host cannot assume that the data sender will do so, however, since local policies -- or technical difficulties -- may override MP_PRIO requests. Note also that this signal applies to a single direction, and so the sender of this option could choose to continue using the subflow to send data even if it has signaled B=1 to the other host.

https://www.rfc-editor.org/rfc/rfc8684.html#sec_policy

If needed, new tickets can be created to track: receiver/sender side only and packetdrill support.

kernel: feature support
kernel: selftests
modification of iproute2 → See issue iproute2: change backup mode (MP_PRIO) for active connections #158
and ideally packetdrill tests

(Feature from the initial roadmap)

The text was updated successfully, but these errors were encountered:

With LOCKDEP enabled, CTI driver triggers the following splat due to uninitialized lock class for dynamically allocated attribute objects. [ 5.372901] coresight etm0: CPU0: ETM v4.0 initialized [ 5.376694] coresight etm1: CPU1: ETM v4.0 initialized [ 5.380785] coresight etm2: CPU2: ETM v4.0 initialized [ 5.385851] coresight etm3: CPU3: ETM v4.0 initialized [ 5.389808] BUG: key ffff00000564a798 has not been registered! [ 5.392456] ------------[ cut here ]------------ [ 5.398195] DEBUG_LOCKS_WARN_ON(1) [ 5.398233] WARNING: CPU: 1 PID: 32 at kernel/locking/lockdep.c:4623 lockdep_init_map_waits+0x14c/0x260 [ 5.406149] Modules linked in: [ 5.415411] CPU: 1 PID: 32 Comm: kworker/1:1 Not tainted 5.9.0-12034-gbbe85027ce80 #51 [ 5.418553] Hardware name: Qualcomm Technologies, Inc. APQ 8016 SBC (DT) [ 5.426453] Workqueue: events amba_deferred_retry_func [ 5.433299] pstate: 40000005 (nZcv daif -PAN -UAO -TCO BTYPE=--) [ 5.438252] pc : lockdep_init_map_waits+0x14c/0x260 [ 5.444410] lr : lockdep_init_map_waits+0x14c/0x260 [ 5.449007] sp : ffff800012bbb720 ... [ 5.531561] Call trace: [ 5.536847] lockdep_init_map_waits+0x14c/0x260 [ 5.539027] __kernfs_create_file+0xa8/0x1c8 [ 5.543539] sysfs_add_file_mode_ns+0xd0/0x208 [ 5.548054] internal_create_group+0x118/0x3c8 [ 5.552307] internal_create_groups+0x58/0xb8 [ 5.556733] sysfs_create_groups+0x2c/0x38 [ 5.561160] device_add+0x2d8/0x768 [ 5.565148] device_register+0x28/0x38 [ 5.568537] coresight_register+0xf8/0x320 [ 5.572358] cti_probe+0x1b0/0x3f0 ... Fix this by initializing the attributes when they are allocated. Fixes: 3c5597e ("coresight: cti: Add connection information to sysfs") Reported-by: Leo Yan <[email protected]> Tested-by: Leo Yan <[email protected]> Cc: Mike Leach <[email protected]> Cc: Mathieu Poirier <[email protected]> Signed-off-by: Suzuki K Poulose <[email protected]> Cc: stable <[email protected]> Signed-off-by: Mathieu Poirier <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Greg Kroah-Hartman <[email protected]>

If a user unbinds and re-binds a NC-SI aware driver the kernel will attempt to register the netlink interface at runtime. The structure is marked __ro_after_init so registration fails spectacularly at this point. # echo 1e660000.ethernet > /sys/bus/platform/drivers/ftgmac100/unbind # echo 1e660000.ethernet > /sys/bus/platform/drivers/ftgmac100/bind ftgmac100 1e660000.ethernet: Read MAC address 52:54:00:12:34:56 from chip ftgmac100 1e660000.ethernet: Using NCSI interface 8<--- cut here --- Unable to handle kernel paging request at virtual address 80a8f858 pgd = 8c768dd6 [80a8f858] *pgd=80a0841e(bad) Internal error: Oops: 80d [#1] SMP ARM CPU: 0 PID: 116 Comm: sh Not tainted 5.10.0-rc3-next-20201111-00003-gdd25b227ec1e #51 Hardware name: Generic DT based system PC is at genl_register_family+0x1f8/0x6d4 LR is at 0xff26ffff pc : [<8073f930>] lr : [<ff26ffff>] psr: 20000153 sp : 8553bc80 ip : 81406244 fp : 8553bd04 r10: 8085d12c r9 : 80a8f73c r8 : 85739000 r7 : 00000017 r6 : 80a8f860 r5 : 80c8ab98 r4 : 80a8f858 r3 : 00000000 r2 : 00000000 r1 : 81406130 r0 : 00000017 Flags: nzCv IRQs on FIQs off Mode SVC_32 ISA ARM Segment none Control: 00c5387d Table: 85524008 DAC: 00000051 Process sh (pid: 116, stack limit = 0x1f1988d6) ... Backtrace: [<8073f738>] (genl_register_family) from [<80860ac0>] (ncsi_init_netlink+0x20/0x48) r10:8085d12c r9:80c8fb0c r8:85739000 r7:00000000 r6:81218000 r5:85739000 r4:8121c000 [<80860aa0>] (ncsi_init_netlink) from [<8085d740>] (ncsi_register_dev+0x1b0/0x210) r5:8121c400 r4:8121c000 [<8085d590>] (ncsi_register_dev) from [<805a8060>] (ftgmac100_probe+0x6e0/0x778) r10:00000004 r9:80950228 r8:8115bc10 r7:8115ab00 r6:9eae2c24 r5:813b6f88 r4:85739000 [<805a7980>] (ftgmac100_probe) from [<805355ec>] (platform_drv_probe+0x58/0xa8) r9:80c76bb0 r8:00000000 r7:80cd4974 r6:80c76bb0 r5:8115bc10 r4:00000000 [<80535594>] (platform_drv_probe) from [<80532d58>] (really_probe+0x204/0x514) r7:80cd4974 r6:00000000 r5:80cd4868 r4:8115bc10 Jakub pointed out that ncsi_register_dev is obviously broken, because there is only one family so it would never work if there was more than one ncsi netdev. Fix the crash by registering the netlink family once on boot, and drop the code to unregister it. Fixes: 955dc68 ("net/ncsi: Add generic netlink family") Signed-off-by: Joel Stanley <[email protected]> Reviewed-by: Samuel Mendoza-Jonas <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>

matttbe · 2020-12-16T21:59:56Z

Support in MPTCP has been added by @geliangtang , thanks!

a56be1d: mptcp: add the outgoing MP_PRIO support
661b20b: mptcp: add the incoming MP_PRIO support
ef8d413: mptcp: add set_flags command in PM netlink
bda492f: selftests: mptcp: add set_flags command in pm_nl_ctl
cdd087e: mptcp: add the mibs for MP_PRIO
6a9e7c4: selftests: mptcp: add the MP_PRIO testcases
5b9a807: "squashed" in "mptcp: add the outgoing MP_PRIO support"

The remaining bit is the modification of iproute2 (and ideally packetdrill tests :) )

matttbe · 2021-01-28T17:39:29Z

Assigning to @dcaratti as well for the Packetdrill part ;-)
Thanks for that!

basic functional test that uses inbound mp_prio to flip the 'backup' flag in the MP_JOIN server scenario. Closes: multipath-tcp/mptcp_net-next#51 Signed-off-by: Davide Caratti <[email protected]>

basic functional test that uses inbound mp_prio to flip the 'backup' flag in the MP_JOIN server scenario. Link: multipath-tcp/mptcp_net-next#51 Signed-off-by: Davide Caratti <[email protected]>

matttbe · 2021-02-11T14:40:16Z

iproute2 support has been moved to a new issue: #158
We can then close this one.

cifsd-fixes

Commit 3df98d7 ("lsm,selinux: pass flowi_common instead of flowi to the LSM hooks") introduced flowi{4,6}_to_flowi_common() functions which cause UBSAN warning when building with LLVM 11.0.1 on Ubuntu 21.04. ================================================================================ UBSAN: object-size-mismatch in ./include/net/flow.h:197:33 member access within address ffffc9000109fbd8 with insufficient space for an object of type 'struct flowi' CPU: 2 PID: 7410 Comm: systemd-resolve Not tainted 5.14.0 #51 Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 02/27/2020 Call Trace: dump_stack_lvl+0x103/0x171 ubsan_type_mismatch_common+0x1de/0x390 __ubsan_handle_type_mismatch_v1+0x41/0x50 udp_sendmsg+0xda2/0x1300 ? ip_skb_dst_mtu+0x1f0/0x1f0 ? sock_rps_record_flow+0xe/0x200 ? inet_send_prepare+0x2d/0x90 sock_sendmsg+0x49/0x80 ____sys_sendmsg+0x269/0x370 __sys_sendmsg+0x15e/0x1d0 ? syscall_enter_from_user_mode+0xf0/0x1b0 do_syscall_64+0x3d/0xb0 entry_SYSCALL_64_after_hwframe+0x44/0xae RIP: 0033:0x7f7081a50497 Code: 0c 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 2e 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 89 54 24 1c 48 89 74 24 10 RSP: 002b:00007ffc153870f8 EFLAGS: 00000246 ORIG_RAX: 000000000000002e RAX: ffffffffffffffda RBX: 000000000000000c RCX: 00007f7081a50497 RDX: 0000000000000000 RSI: 00007ffc15387140 RDI: 000000000000000c RBP: 00007ffc15387140 R08: 0000563f29a5e4fc R09: 000000000000cd28 R10: 0000563f29a68a30 R11: 0000000000000246 R12: 000000000000000c R13: 0000000000000001 R14: 0000563f29a68a30 R15: 0000563f29a5e50c ================================================================================ I don't think we need to call flowi{4,6}_to_flowi() from these functions because the first member of "struct flowi4" and "struct flowi6" is struct flowi_common __fl_common; while the first member of "struct flowi" is union { struct flowi_common __fl_common; struct flowi4 ip4; struct flowi6 ip6; struct flowidn dn; } u; which should point to the same address without access to "struct flowi". Signed-off-by: Tetsuo Handa <[email protected]> Signed-off-by: David S. Miller <[email protected]>

This is arm64 version of commit fec56f5 ("bpf: Introduce BPF trampoline"). A bpf trampoline converts native calling convention to bpf calling convention and is used to implement various bpf features, such as fentry, fexit, fmod_ret and struct_ops. This patch does essentially the same thing that bpf trampoline does on x86. Tested on Raspberry Pi 4B and qemu: #18 /1 bpf_tcp_ca/dctcp:OK #18 /2 bpf_tcp_ca/cubic:OK #18 /3 bpf_tcp_ca/invalid_license:OK #18 /4 bpf_tcp_ca/dctcp_fallback:OK #18 /5 bpf_tcp_ca/rel_setsockopt:OK #18 bpf_tcp_ca:OK #51 /1 dummy_st_ops/dummy_st_ops_attach:OK #51 /2 dummy_st_ops/dummy_init_ret_value:OK #51 /3 dummy_st_ops/dummy_init_ptr_arg:OK #51 /4 dummy_st_ops/dummy_multiple_args:OK #51 dummy_st_ops:OK #57 /1 fexit_bpf2bpf/target_no_callees:OK #57 /2 fexit_bpf2bpf/target_yes_callees:OK #57 /3 fexit_bpf2bpf/func_replace:OK #57 /4 fexit_bpf2bpf/func_replace_verify:OK #57 /5 fexit_bpf2bpf/func_sockmap_update:OK #57 /6 fexit_bpf2bpf/func_replace_return_code:OK #57 /7 fexit_bpf2bpf/func_map_prog_compatibility:OK #57 /8 fexit_bpf2bpf/func_replace_multi:OK #57 /9 fexit_bpf2bpf/fmod_ret_freplace:OK #57 fexit_bpf2bpf:OK #237 xdp_bpf2bpf:OK Signed-off-by: Xu Kuohai <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]> Reviewed-by: Jean-Philippe Brucker <[email protected]> Acked-by: Song Liu <[email protected]> Acked-by: KP Singh <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]

The TPM code registers put_device() as a devm cleanup handler, and casts the reference to the right function pointer type for this to be permitted by the compiler. However, under kCFI, this is rejected at runtime, resulting in a splat like CFI failure at devm_action_release+0x24/0x3c (target: put_device+0x0/0x24; expected type: 0xa488ebfc) Internal error: Oops - CFI: 0000000000000000 [#1] PREEMPT SMP Modules linked in: ... CPU: 20 PID: 454 Comm: systemd-udevd Not tainted 6.1.0-rc1+ #51 Hardware name: Socionext SynQuacer E-series DeveloperBox, BIOS build #1 Oct 3 2022 pstate: 80400005 (Nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) pc : devm_action_release+0x24/0x3c lr : devres_release_all+0xb4/0x114 sp : ffff800009bb3630 x29: ffff800009bb3630 x28: 0000000000000000 x27: 0000000000000011 x26: ffffaa6f9922c0c8 x25: 0000000000000002 x24: 000000000000000f x23: ffff800009bb3648 x22: ffff7aefc3be2100 x21: ffff7aefc3be2e00 x20: 0000000000000005 x19: ffff7aefc1e1ec10 x18: ffff800009af70a8 x17: 00000000a488ebfc x16: 0000000094ee7df3 x15: 0000000000000000 x14: 4075c5c2ef7affff x13: e46a91c5c5e2ef42 x12: ffff7aefc2c57540 x11: 0000000000000001 x10: 0000000000000001 x9 : 0000000100000000 x8 : ffffaa6fa09b39b4 x7 : 7f7f7f7f7f7f7f7f x6 : 8000000000000000 x5 : 000000008020000e x4 : ffff7aefc2c57500 x3 : ffff800009bb3648 x2 : ffff800009bb3648 x1 : ffff7aefc3be2e80 x0 : ffff7aefc3bb7000 Call trace: devm_action_release+0x24/0x3c devres_release_all+0xb4/0x114 really_probe+0xb0/0x49c __driver_probe_device+0x114/0x180 driver_probe_device+0x48/0x1ec __driver_attach+0x118/0x284 bus_for_each_dev+0x94/0xe4 driver_attach+0x24/0x34 bus_add_driver+0x10c/0x220 driver_register+0x78/0x118 __platform_driver_register+0x24/0x34 init_module+0x20/0xfe4 [tpm_tis_synquacer] do_one_initcall+0xd4/0x248 do_init_module+0x44/0x28c load_module+0x16b4/0x1920 Fix this by going through a helper function of the correct type. Signed-off-by: Ard Biesheuvel <[email protected]> Reviewed-by: Kees Cook <[email protected]> Reviewed-by: Jason Gunthorpe <[email protected]> Reviewed-by: Jarkko Sakkinen <[email protected]> Signed-off-by: Jarkko Sakkinen <[email protected]>

As the mention in commmit f7452a7 ("RDMA/rtrs-srv: fix memory leak by missing kobject free"), it was intended to remove the kobject_del for srv_path->kobj. f7452a7 said: >This patch moves kobject_del() into free_sess() so that the kobject of > rtrs_srv_sess can be freed. This patch also move rtrs_srv_destroy_once_sysfs_root_folders back to 'if (srv_path->kobj.state_in_sysfs)' block to avoid a 'held lock freed!' A kernel panic will be triggered by following script ----------------------- $ while true do echo "sessname=foo path=ip:<ip address> device_path=/dev/nvme0n1" > /sys/devices/virtual/rnbd-client/ctl/map_device echo "normal" > /sys/block/rnbd0/rnbd/unmap_device done ----------------------- The bisection pointed to commit 6af4609 ("RDMA/rtrs-srv: Fix several issues in rtrs_srv_destroy_path_files") at last. rnbd_server L777: </dev/nvme0n1@foo>: Opened device 'nvme0n1' general protection fault, probably for non-canonical address 0x765f766564753aea: 0000 [#1] PREEMPT SMP PTI CPU: 0 PID: 3558 Comm: systemd-udevd Kdump: loaded Not tainted 6.1.0-rc3-roce-flush+ #51 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014 RIP: 0010:kernfs_dop_revalidate+0x36/0x180 Code: 00 00 41 55 41 54 55 53 48 8b 47 68 48 89 fb 48 85 c0 0f 84 db 00 00 00 48 8b a8 60 04 00 00 48 8b 45 30 48 85 c0 48 0f 44 c5 <4c> 8b 60 78 49 81 c4 d8 00 00 00 4c 89 e7 e8 b7 78 7b 00 8b 05 3d RSP: 0018:ffffaf1700b67c78 EFLAGS: 00010206 RAX: 765f766564753a72 RBX: ffff89e2830849c0 RCX: 0000000000000000 RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff89e2830849c0 RBP: ffff89e280361bd0 R08: 0000000000000000 R09: 0000000000000001 R10: 0000000000000065 R11: 0000000000000000 R12: ffff89e2830849c0 R13: ffff89e283084888 R14: d0d0d0d0d0d0d0d0 R15: 2f2f2f2f2f2f2f2f FS: 00007f13fbce7b40(0000) GS:ffff89e2bbc00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f93e055d340 CR3: 0000000104664002 CR4: 00000000001706f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> lookup_fast+0x7b/0x100 walk_component+0x21/0x160 link_path_walk.part.0+0x24d/0x390 path_openat+0xad/0x9a0 do_filp_open+0xa9/0x150 ? lock_release+0x13c/0x2e0 ? _raw_spin_unlock+0x29/0x50 ? alloc_fd+0x124/0x1f0 do_sys_openat2+0x9b/0x160 __x64_sys_openat+0x54/0xa0 do_syscall_64+0x3b/0x90 entry_SYSCALL_64_after_hwframe+0x63/0xcd RIP: 0033:0x7f13fc9d701b Code: 25 00 00 41 00 3d 00 00 41 00 74 4b 64 8b 04 25 18 00 00 00 85 c0 75 67 44 89 e2 48 89 ee bf 9c ff ff ff b8 01 01 00 00 0f 05 <48> 3d 00 f0 ff ff 0f 87 91 00 00 00 48 8b 54 24 28 64 48 2b 14 25 RSP: 002b:00007ffddf242640 EFLAGS: 00000246 ORIG_RAX: 0000000000000101 RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f13fc9d701b RDX: 0000000000080000 RSI: 00007ffddf2427c0 RDI: 00000000ffffff9c RBP: 00007ffddf2427c0 R08: 00007f13fcc5b440 R09: 21b2131aa64b1ef2 R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000080000 R13: 00007ffddf2427c0 R14: 000055ed13be8db0 R15: 0000000000000000 Fixes: 6af4609 ("RDMA/rtrs-srv: Fix several issues in rtrs_srv_destroy_path_files") Acked-by: Guoqing Jiang <[email protected]> Signed-off-by: Li Zhijian <[email protected]> Link: https://lore.kernel.org/r/[email protected] Acked-by: Jack Wang <[email protected]> Signed-off-by: Leon Romanovsky <[email protected]>

ice_qp_dis() intends to stop a given queue pair that is a target of xsk pool attach/detach. One of the steps is to disable interrupts on these queues. It currently is broken in a way that txq irq is turned off *after* HW flush which in turn takes no effect. ice_qp_dis(): -> ice_qvec_dis_irq() --> disable rxq irq --> flush hw -> ice_vsi_stop_tx_ring() -->disable txq irq Below splat can be triggered by following steps: - start xdpsock WITHOUT loading xdp prog - run xdp_rxq_info with XDP_TX action on this interface - start traffic - terminate xdpsock [ 256.312485] BUG: kernel NULL pointer dereference, address: 0000000000000018 [ 256.319560] #PF: supervisor read access in kernel mode [ 256.324775] #PF: error_code(0x0000) - not-present page [ 256.329994] PGD 0 P4D 0 [ 256.332574] Oops: 0000 [#1] PREEMPT SMP NOPTI [ 256.337006] CPU: 3 PID: 32 Comm: ksoftirqd/3 Tainted: G OE 6.2.0-rc5+ #51 [ 256.345218] Hardware name: Intel Corporation S2600WFT/S2600WFT, BIOS SE5C620.86B.02.01.0008.031920191559 03/19/2019 [ 256.355807] RIP: 0010:ice_clean_rx_irq_zc+0x9c/0x7d0 [ice] [ 256.361423] Code: b7 8f 8a 00 00 00 66 39 ca 0f 84 f1 04 00 00 49 8b 47 40 4c 8b 24 d0 41 0f b7 45 04 66 25 ff 3f 66 89 04 24 0f 84 85 02 00 00 <49> 8b 44 24 18 0f b7 14 24 48 05 00 01 00 00 49 89 04 24 49 89 44 [ 256.380463] RSP: 0018:ffffc900088bfd20 EFLAGS: 00010206 [ 256.385765] RAX: 000000000000003c RBX: 0000000000000035 RCX: 000000000000067f [ 256.393012] RDX: 0000000000000775 RSI: 0000000000000000 RDI: ffff8881deb3ac80 [ 256.400256] RBP: 000000000000003c R08: ffff889847982710 R09: 0000000000010000 [ 256.407500] R10: ffffffff82c060c0 R11: 0000000000000004 R12: 0000000000000000 [ 256.414746] R13: ffff88811165eea0 R14: ffffc9000d255000 R15: ffff888119b37600 [ 256.421990] FS: 0000000000000000(0000) GS:ffff8897e0cc0000(0000) knlGS:0000000000000000 [ 256.430207] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 256.436036] CR2: 0000000000000018 CR3: 0000000005c0a006 CR4: 00000000007706e0 [ 256.443283] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 256.450527] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 256.457770] PKRU: 55555554 [ 256.460529] Call Trace: [ 256.463015] <TASK> [ 256.465157] ? ice_xmit_zc+0x6e/0x150 [ice] [ 256.469437] ice_napi_poll+0x46d/0x680 [ice] [ 256.473815] ? _raw_spin_unlock_irqrestore+0x1b/0x40 [ 256.478863] __napi_poll+0x29/0x160 [ 256.482409] net_rx_action+0x136/0x260 [ 256.486222] __do_softirq+0xe8/0x2e5 [ 256.489853] ? smpboot_thread_fn+0x2c/0x270 [ 256.494108] run_ksoftirqd+0x2a/0x50 [ 256.497747] smpboot_thread_fn+0x1c1/0x270 [ 256.501907] ? __pfx_smpboot_thread_fn+0x10/0x10 [ 256.506594] kthread+0xea/0x120 [ 256.509785] ? __pfx_kthread+0x10/0x10 [ 256.513597] ret_from_fork+0x29/0x50 [ 256.517238] </TASK> In fact, irqs were not disabled and napi managed to be scheduled and run while xsk_pool pointer was still valid, but SW ring of xdp_buff pointers was already freed. To fix this, call ice_qvec_dis_irq() after ice_vsi_stop_tx_ring(). Also while at it, remove redundant ice_clean_rx_ring() call - this is handled in ice_qp_clean_rings(). Fixes: 2d4238f ("ice: Add support for AF_XDP") Signed-off-by: Maciej Fijalkowski <[email protected]> Reviewed-by: Larysa Zaremba <[email protected]> Tested-by: Chandan Kumar Rout <[email protected]> (A Contingent Worker at Intel) Acked-by: John Fastabend <[email protected]> Signed-off-by: Tony Nguyen <[email protected]> Reviewed-by: Leon Romanovsky <[email protected]> Signed-off-by: David S. Miller <[email protected]>

The `cls_redirect` test triggers a kernel panic like: # ./test_progs -t cls_redirect Can't find bpf_testmod.ko kernel module: -2 WARNING! Selftests relying on bpf_testmod.ko will be skipped. [ 30.938489] CPU 3 Unable to handle kernel paging request at virtual address fffffffffd814de0, era == ffff800002009fb8, ra == ffff800002009f9c [ 30.939331] Oops[#1]: [ 30.939513] CPU: 3 PID: 1260 Comm: test_progs Not tainted 6.7.0-rc2-loong-devel-g2f56bb0d2327 #35 a896aca3f4164f09cc346f89f2e09832e07be5f6 [ 30.939732] Hardware name: QEMU QEMU Virtual Machine, BIOS unknown 2/2/2022 [ 30.939901] pc ffff800002009fb8 ra ffff800002009f9c tp 9000000104da4000 sp 9000000104da7ab0 [ 30.940038] a0 fffffffffd814de0 a1 9000000104da7a68 a2 0000000000000000 a3 9000000104da7c10 [ 30.940183] a4 9000000104da7c14 a5 0000000000000002 a6 0000000000000021 a7 00005555904d7f90 [ 30.940321] t0 0000000000000110 t1 0000000000000000 t2 fffffffffd814de0 t3 0004c4b400000000 [ 30.940456] t4 ffffffffffffffff t5 00000000c3f63600 t6 0000000000000000 t7 0000000000000000 [ 30.940590] t8 000000000006d803 u0 0000000000000020 s9 9000000104da7b10 s0 900000010504c200 [ 30.940727] s1 fffffffffd814de0 s2 900000010504c200 s3 9000000104da7c10 s4 9000000104da7ad0 [ 30.940866] s5 0000000000000000 s6 90000000030e65bc s7 9000000104da7b44 s8 90000000044f6fc0 [ 30.941015] ra: ffff800002009f9c bpf_prog_846803e5ae81417f_cls_redirect+0xa0/0x590 [ 30.941535] ERA: ffff800002009fb8 bpf_prog_846803e5ae81417f_cls_redirect+0xbc/0x590 [ 30.941696] CRMD: 000000b0 (PLV0 -IE -DA +PG DACF=CC DACM=CC -WE) [ 30.942224] PRMD: 00000004 (PPLV0 +PIE -PWE) [ 30.942330] EUEN: 00000003 (+FPE +SXE -ASXE -BTE) [ 30.942453] ECFG: 00071c1c (LIE=2-4,10-12 VS=7) [ 30.942612] ESTAT: 00010000 [PIL] (IS= ECode=1 EsubCode=0) [ 30.942764] BADV: fffffffffd814de0 [ 30.942854] PRID: 0014c010 (Loongson-64bit, Loongson-3A5000) [ 30.942974] Modules linked in: [ 30.943078] Process test_progs (pid: 1260, threadinfo=00000000ce303226, task=000000007d10bb76) [ 30.943306] Stack : 900000010a064000 90000000044f6fc0 9000000104da7b48 0000000000000000 [ 30.943495] 0000000000000000 9000000104da7c14 9000000104da7c10 900000010504c200 [ 30.943626] 0000000000000001 ffff80001b88c000 9000000104da7b70 90000000030e6668 [ 30.943785] 0000000000000000 9000000104da7b58 ffff80001b88c048 9000000003d05000 [ 30.943936] 900000000303ac88 0000000000000000 0000000000000000 9000000104da7b70 [ 30.944091] 0000000000000000 0000000000000001 0000000731eeab00 0000000000000000 [ 30.944245] ffff80001b88c000 0000000000000000 0000000000000000 54b99959429f83b8 [ 30.944402] ffff80001b88c000 90000000044f6fc0 9000000101d70000 ffff80001b88c000 [ 30.944538] 000000000000005a 900000010504c200 900000010a064000 900000010a067000 [ 30.944697] 9000000104da7d88 0000000000000000 9000000003d05000 90000000030e794c [ 30.944852] ... [ 30.944924] Call Trace: [ 30.945120] [<ffff800002009fb8>] bpf_prog_846803e5ae81417f_cls_redirect+0xbc/0x590 [ 30.945650] [<90000000030e6668>] bpf_test_run+0x1ec/0x2f8 [ 30.945958] [<90000000030e794c>] bpf_prog_test_run_skb+0x31c/0x684 [ 30.946065] [<90000000026d4f68>] __sys_bpf+0x678/0x2724 [ 30.946159] [<90000000026d7288>] sys_bpf+0x20/0x2c [ 30.946253] [<90000000032dd224>] do_syscall+0x7c/0x94 [ 30.946343] [<9000000002541c5c>] handle_syscall+0xbc/0x158 [ 30.946492] [ 30.946549] Code: 0015030e 5c0009c0 5001d000 <28c00304> 02c00484 29c00304 00150009 2a42d2e4 0280200d [ 30.946793] [ 30.946971] ---[ end trace 0000000000000000 ]--- [ 32.093225] Kernel panic - not syncing: Fatal exception in interrupt [ 32.093526] Kernel relocated by 0x2320000 [ 32.093630] .text @ 0x9000000002520000 [ 32.093725] .data @ 0x9000000003400000 [ 32.093792] .bss @ 0x9000000004413200 [ 34.971998] ---[ end Kernel panic - not syncing: Fatal exception in interrupt ]--- This is because we signed-extend function return values. When subprog mode is enabled, we have: cls_redirect() -> get_global_metrics() returns pcpu ptr 0xfffffefffc00b480 The pointer returned is later signed-extended to 0xfffffffffc00b480 at `BPF_JMP | BPF_EXIT`. During BPF prog run, this triggers unhandled page fault and a kernel panic. Drop the unnecessary signed-extension on return values like other architectures do. With this change, we have: # ./test_progs -t cls_redirect Can't find bpf_testmod.ko kernel module: -2 WARNING! Selftests relying on bpf_testmod.ko will be skipped. #51/1 cls_redirect/cls_redirect_inlined:OK #51/2 cls_redirect/IPv4 TCP accept unknown (no hops, flags: SYN):OK #51/3 cls_redirect/IPv6 TCP accept unknown (no hops, flags: SYN):OK #51/4 cls_redirect/IPv4 TCP accept unknown (no hops, flags: ACK):OK #51/5 cls_redirect/IPv6 TCP accept unknown (no hops, flags: ACK):OK #51/6 cls_redirect/IPv4 TCP forward unknown (one hop, flags: ACK):OK #51/7 cls_redirect/IPv6 TCP forward unknown (one hop, flags: ACK):OK #51/8 cls_redirect/IPv4 TCP accept known (one hop, flags: ACK):OK #51/9 cls_redirect/IPv6 TCP accept known (one hop, flags: ACK):OK #51/10 cls_redirect/IPv4 UDP accept unknown (no hops, flags: none):OK #51/11 cls_redirect/IPv6 UDP accept unknown (no hops, flags: none):OK #51/12 cls_redirect/IPv4 UDP forward unknown (one hop, flags: none):OK #51/13 cls_redirect/IPv6 UDP forward unknown (one hop, flags: none):OK #51/14 cls_redirect/IPv4 UDP accept known (one hop, flags: none):OK #51/15 cls_redirect/IPv6 UDP accept known (one hop, flags: none):OK #51/16 cls_redirect/cls_redirect_subprogs:OK #51/17 cls_redirect/IPv4 TCP accept unknown (no hops, flags: SYN):OK #51/18 cls_redirect/IPv6 TCP accept unknown (no hops, flags: SYN):OK #51/19 cls_redirect/IPv4 TCP accept unknown (no hops, flags: ACK):OK #51/20 cls_redirect/IPv6 TCP accept unknown (no hops, flags: ACK):OK #51/21 cls_redirect/IPv4 TCP forward unknown (one hop, flags: ACK):OK #51/22 cls_redirect/IPv6 TCP forward unknown (one hop, flags: ACK):OK #51/23 cls_redirect/IPv4 TCP accept known (one hop, flags: ACK):OK #51/24 cls_redirect/IPv6 TCP accept known (one hop, flags: ACK):OK #51/25 cls_redirect/IPv4 UDP accept unknown (no hops, flags: none):OK #51/26 cls_redirect/IPv6 UDP accept unknown (no hops, flags: none):OK #51/27 cls_redirect/IPv4 UDP forward unknown (one hop, flags: none):OK #51/28 cls_redirect/IPv6 UDP forward unknown (one hop, flags: none):OK #51/29 cls_redirect/IPv4 UDP accept known (one hop, flags: none):OK #51/30 cls_redirect/IPv6 UDP accept known (one hop, flags: none):OK #51/31 cls_redirect/cls_redirect_dynptr:OK #51/32 cls_redirect/IPv4 TCP accept unknown (no hops, flags: SYN):OK #51/33 cls_redirect/IPv6 TCP accept unknown (no hops, flags: SYN):OK #51/34 cls_redirect/IPv4 TCP accept unknown (no hops, flags: ACK):OK #51/35 cls_redirect/IPv6 TCP accept unknown (no hops, flags: ACK):OK #51/36 cls_redirect/IPv4 TCP forward unknown (one hop, flags: ACK):OK #51/37 cls_redirect/IPv6 TCP forward unknown (one hop, flags: ACK):OK #51/38 cls_redirect/IPv4 TCP accept known (one hop, flags: ACK):OK #51/39 cls_redirect/IPv6 TCP accept known (one hop, flags: ACK):OK #51/40 cls_redirect/IPv4 UDP accept unknown (no hops, flags: none):OK #51/41 cls_redirect/IPv6 UDP accept unknown (no hops, flags: none):OK #51/42 cls_redirect/IPv4 UDP forward unknown (one hop, flags: none):OK #51/43 cls_redirect/IPv6 UDP forward unknown (one hop, flags: none):OK #51/44 cls_redirect/IPv4 UDP accept known (one hop, flags: none):OK #51/45 cls_redirect/IPv6 UDP accept known (one hop, flags: none):OK #51 cls_redirect:OK Summary: 1/45 PASSED, 0 SKIPPED, 0 FAILED Fixes: 5dc6155 ("LoongArch: Add BPF JIT support") Signed-off-by: Hengqi Chen <[email protected]> Signed-off-by: Huacai Chen <[email protected]>

A crash was found when dumping SMC-R connections. It can be reproduced by following steps: - environment: two RNICs on both sides. - run SMC-R between two sides, now a SMC_LGR_SYMMETRIC type link group will be created. - set the first RNIC down on either side and link group will turn to SMC_LGR_ASYMMETRIC_LOCAL then. - run 'smcss -R' and the crash will be triggered. BUG: kernel NULL pointer dereference, address: 0000000000000010 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 8000000101fdd067 P4D 8000000101fdd067 PUD 10ce46067 PMD 0 Oops: 0000 [#1] PREEMPT SMP PTI CPU: 3 PID: 1810 Comm: smcss Kdump: loaded Tainted: G W E 6.7.0-rc6+ #51 RIP: 0010:__smc_diag_dump.constprop.0+0x36e/0x620 [smc_diag] Call Trace: <TASK> ? __die+0x24/0x70 ? page_fault_oops+0x66/0x150 ? exc_page_fault+0x69/0x140 ? asm_exc_page_fault+0x26/0x30 ? __smc_diag_dump.constprop.0+0x36e/0x620 [smc_diag] smc_diag_dump_proto+0xd0/0xf0 [smc_diag] smc_diag_dump+0x26/0x60 [smc_diag] netlink_dump+0x19f/0x320 __netlink_dump_start+0x1dc/0x300 smc_diag_handler_dump+0x6a/0x80 [smc_diag] ? __pfx_smc_diag_dump+0x10/0x10 [smc_diag] sock_diag_rcv_msg+0x121/0x140 ? __pfx_sock_diag_rcv_msg+0x10/0x10 netlink_rcv_skb+0x5a/0x110 sock_diag_rcv+0x28/0x40 netlink_unicast+0x22a/0x330 netlink_sendmsg+0x240/0x4a0 __sock_sendmsg+0xb0/0xc0 ____sys_sendmsg+0x24e/0x300 ? copy_msghdr_from_user+0x62/0x80 ___sys_sendmsg+0x7c/0xd0 ? __do_fault+0x34/0x1a0 ? do_read_fault+0x5f/0x100 ? do_fault+0xb0/0x110 __sys_sendmsg+0x4d/0x80 do_syscall_64+0x45/0xf0 entry_SYSCALL_64_after_hwframe+0x6e/0x76 When the first RNIC is set down, the lgr->lnk[0] will be cleared and an asymmetric link will be allocated in lgr->link[SMC_LINKS_PER_LGR_MAX - 1] by smc_llc_alloc_alt_link(). Then when we try to dump SMC-R connections in __smc_diag_dump(), the invalid lgr->lnk[0] will be accessed, resulting in this issue. So fix it by accessing the right link. Fixes: f16a7dd ("smc: netlink interface for SMC sockets") Reported-by: henaumars <[email protected]> Closes: https://bugzilla.openanolis.cn/show_bug.cgi?id=7616 Signed-off-by: Wen Gu <[email protected]> Reviewed-by: Tony Lu <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>

Until now, the generic weak kgdb_roundup_cpus() has been used for kgdb on RISCV. A custom one allows to debug CPUs that are stuck with interrupts disabled with NMI support in the future. And using an IPI is better than the generic one since it avoids the potential situation described in the generic kgdb_call_nmi_hook(). As Andrew pointed out, once there is NMI support, we can easily extend this and the CPU backtrace support to use NMIs. After this patch, the kgdb test show that: # echo g > /proc/sysrq-trigger [2]kdb> btc btc: cpu status: Currently on cpu 2 Available cpus: 0-1(-), 2, 3(-) Stack traceback for pid 0 0xffffffff81c13a40 0 0 1 0 - 0xffffffff81c14510 swapper/0 CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.10.0-g3120273055b6-dirty #51 Hardware name: riscv-virtio,qemu (DT) Call Trace: [<ffffffff80006c48>] dump_backtrace+0x28/0x30 [<ffffffff80fceb38>] show_stack+0x38/0x44 [<ffffffff80fe6a04>] dump_stack_lvl+0x58/0x7a [<ffffffff80fe6a3e>] dump_stack+0x18/0x20 [<ffffffff801143fa>] kgdb_cpu_enter+0x682/0x6b2 [<ffffffff801144ca>] kgdb_nmicallback+0xa0/0xac [<ffffffff8000a392>] handle_IPI+0x9c/0x120 [<ffffffff800a2baa>] handle_percpu_devid_irq+0xa4/0x1e4 [<ffffffff8009cca8>] generic_handle_domain_irq+0x28/0x36 [<ffffffff800a9e5c>] ipi_mux_process+0xe8/0x110 [<ffffffff806e1e30>] imsic_handle_irq+0xf8/0x13a [<ffffffff8009cca8>] generic_handle_domain_irq+0x28/0x36 [<ffffffff806dff12>] riscv_intc_aia_irq+0x2e/0x40 [<ffffffff80fe6ab0>] handle_riscv_irq+0x54/0x86 [<ffffffff80ff2e4a>] call_on_irq_stack+0x32/0x40 Rebased on Ryo Takakura's "RISC-V: Enable IPI CPU Backtrace" patch. Signed-off-by: Jinjie Ruan <[email protected]> Reviewed-by: Andrew Jones <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Palmer Dabbelt <[email protected]>

matttbe added the enhancement label Jul 13, 2020

geliangtang self-assigned this Nov 3, 2020

matttbe assigned dcaratti Jan 28, 2021

dcaratti mentioned this issue Feb 2, 2021

MP_PRIO smoketest multipath-tcp/packetdrill#39

Merged

matttbe mentioned this issue Feb 11, 2021

iproute2: change backup mode (MP_PRIO) for active connections #158

Closed

2 tasks

matttbe closed this as completed Feb 11, 2021

dcaratti pushed a commit to dcaratti/mptcp_net-next that referenced this issue Sep 2, 2021

Merge pull request multipath-tcp#51 from namjaejeon/cifsd-for-next

10a96b2

cifsd-fixes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MP_PRIO support #51

MP_PRIO support #51

matttbe commented Jul 13, 2020 •

edited

Loading

matttbe commented Dec 16, 2020

matttbe commented Jan 28, 2021

matttbe commented Feb 11, 2021

MP_PRIO support #51

MP_PRIO support #51

Comments

matttbe commented Jul 13, 2020 • edited Loading

matttbe commented Dec 16, 2020

matttbe commented Jan 28, 2021

matttbe commented Feb 11, 2021

matttbe commented Jul 13, 2020 •

edited

Loading