Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merge pull request #1 from torvalds/master #122

Closed
wants to merge 1 commit into from

Conversation

rakesh-gopal
Copy link

pulling latest linux code

pulling latest linux code
torvalds pushed a commit that referenced this pull request Nov 22, 2014
Make sure the netlink group exists, otherwise you can trigger an out
of bound array memory access from the netlink_bind() path. This splat
can only be triggered only by superuser.

[  180.203600] UBSan: Undefined behaviour in ../net/netfilter/nfnetlink.c:467:28
[  180.204249] index 9 is out of range for type 'int [9]'
[  180.204697] CPU: 0 PID: 1771 Comm: trinity-main Not tainted 3.18.0-rc4-mm1+ #122
[  180.205365] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.7.5-0-ge51488c-20140602_164612-nilsson.home.kraxel.org
+04/01/2014
[  180.206498]  0000000000000018 0000000000000000 0000000000000009 ffff88007bdf7da8
[  180.207220]  ffffffff82b0ef5f 0000000000000092 ffffffff845ae2e0 ffff88007bdf7db8
[  180.207887]  ffffffff8199e489 ffff88007bdf7e18 ffffffff8199ea22 0000003900000000
[  180.208639] Call Trace:
[  180.208857] dump_stack (lib/dump_stack.c:52)
[  180.209370] ubsan_epilogue (lib/ubsan.c:174)
[  180.209849] __ubsan_handle_out_of_bounds (lib/ubsan.c:400)
[  180.210512] nfnetlink_bind (net/netfilter/nfnetlink.c:467)
[  180.210986] netlink_bind (net/netlink/af_netlink.c:1483)
[  180.211495] SYSC_bind (net/socket.c:1541)

Moreover, define the missing nf_tables and nf_acct multicast groups too.

Reported-by: Andrey Ryabinin <[email protected]>
Signed-off-by: Pablo Neira Ayuso <[email protected]>
CkNoSFeRaTU pushed a commit to CkNoSFeRaTU/linux that referenced this pull request Aug 25, 2016
laijs pushed a commit to laijs/linux that referenced this pull request Feb 13, 2017
Add a VDE backend driver for virtio-net
fengguang pushed a commit to 0day-ci/linux that referenced this pull request Mar 9, 2017
We have to avoid taking ww_mutex inside the shrinker as we use it as a
plain mutex type and so need to avoid recursive deadlocks:

[  602.771969] =================================
[  602.771970] [ INFO: inconsistent lock state ]
[  602.771973] 4.10.0gpudebug+ torvalds#122 Not tainted
[  602.771974] ---------------------------------
[  602.771975] inconsistent {RECLAIM_FS-ON-W} -> {IN-RECLAIM_FS-W} usage.
[  602.771978] kswapd0/40 [HC0[0]:SC0[0]:HE1:SE1] takes:
[  602.771979]  (reservation_ww_class_mutex){+.+.?.}, at: [<ffffffffa054680a>] i915_gem_object_wait+0x39a/0x410 [i915]
[  602.772020] {RECLAIM_FS-ON-W} state was registered at:
[  602.772024]   mark_held_locks+0x76/0x90
[  602.772026]   lockdep_trace_alloc+0xb8/0xc0
[  602.772028]   __kmalloc_track_caller+0x5d/0x130
[  602.772031]   krealloc+0x89/0xb0
[  602.772033]   reservation_object_reserve_shared+0xaf/0xd0
[  602.772055]   i915_gem_do_execbuffer.isra.35+0x1413/0x18b0 [i915]
[  602.772075]   i915_gem_execbuffer2+0x10e/0x1d0 [i915]
[  602.772078]   drm_ioctl+0x291/0x480
[  602.772079]   do_vfs_ioctl+0x695/0x6f0
[  602.772081]   SyS_ioctl+0x3c/0x70
[  602.772084]   entry_SYSCALL_64_fastpath+0x18/0xad
[  602.772085] irq event stamp: 5197423
[  602.772088] hardirqs last  enabled at (5197423): [<ffffffff8116751d>] kfree+0xdd/0x170
[  602.772091] hardirqs last disabled at (5197422): [<ffffffff811674f9>] kfree+0xb9/0x170
[  602.772095] softirqs last  enabled at (5190992): [<ffffffff8107bfe1>] __do_softirq+0x221/0x280
[  602.772097] softirqs last disabled at (5190575): [<ffffffff8107c294>] irq_exit+0x64/0xc0
[  602.772099]
               other info that might help us debug this:
[  602.772100]  Possible unsafe locking scenario:

[  602.772101]        CPU0
[  602.772101]        ----
[  602.772102]   lock(reservation_ww_class_mutex);
[  602.772104]   <Interrupt>
[  602.772105]     lock(reservation_ww_class_mutex);
[  602.772107]
                *** DEADLOCK ***

[  602.772109] 2 locks held by kswapd0/40:
[  602.772110]  #0:  (shrinker_rwsem){++++..}, at: [<ffffffff811337b5>] shrink_slab.constprop.62+0x35/0x280
[  602.772116]  #1:  (&dev->struct_mutex){+.+.+.}, at: [<ffffffffa0553957>] i915_gem_shrinker_lock+0x27/0x60 [i915]
[  602.772141]
               stack backtrace:
[  602.772144] CPU: 2 PID: 40 Comm: kswapd0 Not tainted 4.10.0gpudebug+ torvalds#122
[  602.772145] Hardware name: LENOVO 42433ZG/42433ZG, BIOS 8AET64WW (1.44 ) 07/26/2013
[  602.772147] Call Trace:
[  602.772151]  dump_stack+0x68/0xa1
[  602.772153]  print_usage_bug+0x1d4/0x1f0
[  602.772155]  mark_lock+0x390/0x530
[  602.772157]  ? print_irq_inversion_bug+0x200/0x200
[  602.772159]  __lock_acquire+0x405/0x1260
[  602.772181]  ? i915_gem_object_wait+0x39a/0x410 [i915]
[  602.772183]  lock_acquire+0x60/0x80
[  602.772205]  ? i915_gem_object_wait+0x39a/0x410 [i915]
[  602.772207]  mutex_lock_nested+0x69/0x760
[  602.772229]  ? i915_gem_object_wait+0x39a/0x410 [i915]
[  602.772231]  ? kfree+0xdd/0x170
[  602.772253]  ? i915_gem_object_wait+0x163/0x410 [i915]
[  602.772255]  ? trace_hardirqs_on_caller+0x18d/0x1c0
[  602.772256]  ? trace_hardirqs_on+0xd/0x10
[  602.772278]  i915_gem_object_wait+0x39a/0x410 [i915]
[  602.772300]  i915_gem_object_unbind+0x5e/0x130 [i915]
[  602.772323]  i915_gem_shrink+0x22d/0x3d0 [i915]
[  602.772347]  i915_gem_shrinker_scan+0x3f/0x80 [i915]
[  602.772349]  shrink_slab.constprop.62+0x1ad/0x280
[  602.772352]  shrink_node+0x52/0x80
[  602.772355]  kswapd+0x427/0x5c0
[  602.772358]  kthread+0x122/0x130
[  602.772360]  ? try_to_free_pages+0x270/0x270
[  602.772362]  ? kthread_stop+0x70/0x70
[  602.772365]  ret_from_fork+0x2e/0x40

v2: Add commentary about the pruning being opportunistic

Reported-by: Jan Nordholz <[email protected]>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=99977#c10
Fixes: e54ca97 ("drm/i915: Remove completed fences after a wait")
Signed-off-by: Chris Wilson <[email protected]>
Cc: Joonas Lahtinen <[email protected]>
Cc: Matthew Auld <[email protected]>
Reviewed-by: Joonas Lahtinen <[email protected]>
Link: http://patchwork.freedesktop.org/patch/msgid/[email protected]
fengguang pushed a commit to 0day-ci/linux that referenced this pull request Mar 13, 2017
GIT 5e5b7b03826a265c97d63df49311a18432542ceb

commit dd3bf50b35e3e111d6325207177b12af88aec824
Author: Sebastian Reichel <[email protected]>
Date:   Thu Mar 2 01:27:09 2017 +0100

    rtc: cpcap: new rtc driver
    
    This driver supports the Motorola CPCAP PMIC found on
    some of Motorola's mobile phones, such as the Droid 4.
    
    Tested-by: Tony Lindgren <[email protected]>
    Signed-off-by: Sebastian Reichel <[email protected]>
    Acked-by: Rob Herring <[email protected]>
    Signed-off-by: Alexandre Belloni <[email protected]>

commit 0522de00929e9e9ee51235fc40035179e4d45381
Author: Sebastian Reichel <[email protected]>
Date:   Thu Mar 2 01:27:08 2017 +0100

    dt-bindings: Add vendor prefix for Motorola
    
    Motorola was involved in semiconductor and mobile phone business.
    The "motorola," prefix is already used by a couple of bindings:
    
    * rtc/rtc-cmos.txt
    * mfd/motorola-cpcap.txt
    * regulator/cpcap-regulator.txt
    
    Apart from that it is used in the DT file for the Droid 4 mobile
    phone.
    
    Acked-by: Rob Herring <[email protected]>
    Signed-off-by: Sebastian Reichel <[email protected]>
    Signed-off-by: Alexandre Belloni <[email protected]>

commit 0c749eacaa80a36b9d2b74c6c99c61baa0196eb9
Author: Dmitry Torokhov <[email protected]>
Date:   Wed Mar 1 15:34:38 2017 -0800

    rtc: omap: mark PM methods as __maybe_unused
    
    Instead of using #ifdef guards around PM methods, let's annotate
    them as __maybe_unused, as it provides better compile coverage.
    
    Also drop empty stub for omap_rtc_runtime_resume().
    
    Signed-off-by: Dmitry Torokhov <[email protected]>
    Signed-off-by: Alexandre Belloni <[email protected]>

commit b9de1a1dae8c78de05a59e40632aa8241ef78e1a
Author: Dmitry Torokhov <[email protected]>
Date:   Wed Mar 1 15:33:23 2017 -0800

    rtc: omap: remove incorrect __exit markups
    
    Even if bus is not hot-pluggable, devices can be unbound from the
    driver via sysfs, so we should not be using __exit annotations on
    remove() methods. The only exception is drivers registered with
    platform_driver_probe(), which specifically disables sysfs bind/unbind
    attributes.
    
    Signed-off-by: Dmitry Torokhov <[email protected]>
    Reviewed-By: Sebastian Reichel <[email protected]>
    Signed-off-by: Alexandre Belloni <[email protected]>

commit ff764b88e63a097a6a8383f0f1f76e7fa5a5e096
Author: Javier Martinez Canillas <[email protected]>
Date:   Fri Mar 3 11:29:24 2017 -0300

    rtc: rs5c372: Add OF device ID table
    
    The driver doesn't have a struct of_device_id table but supported devices
    are registered via Device Trees. This is working on the assumption that a
    I2C device registered via OF will always match a legacy I2C device ID and
    that the MODALIAS reported will always be of the form i2c:<device>.
    
    But this could change in the future so the correct approach is to have an
    OF device ID table if the devices are registered via OF.
    
    Signed-off-by: Javier Martinez Canillas <[email protected]>
    Signed-off-by: Alexandre Belloni <[email protected]>

commit eb235c561d04ea64d19a0d3e1413f9c3bc25596c
Author: Javier Martinez Canillas <[email protected]>
Date:   Fri Mar 3 11:29:23 2017 -0300

    rtc: m41t80: Add OF device ID table
    
    The driver doesn't have a struct of_device_id table but supported devices
    are registered via Device Trees. This is working on the assumption that a
    I2C device registered via OF will always match a legacy I2C device ID and
    that the MODALIAS reported will always be of the form i2c:<device>.
    
    But this could change in the future so the correct approach is to have an
    OF device ID table if the devices are registered via OF.
    
    Signed-off-by: Javier Martinez Canillas <[email protected]>
    Signed-off-by: Alexandre Belloni <[email protected]>

commit ffbecfbdbeeebe9749398fe9eb9eab2dd47ddff4
Author: Javier Martinez Canillas <[email protected]>
Date:   Fri Mar 3 11:29:22 2017 -0300

    rtc: rx8581: Add OF device ID table
    
    The driver doesn't have a struct of_device_id table but supported devices
    are registered via Device Trees. This is working on the assumption that a
    I2C device registered via OF will always match a legacy I2C device ID and
    that the MODALIAS reported will always be of the form i2c:<device>.
    
    But this could change in the future so the correct approach is to have an
    OF device ID table if the devices are registered via OF.
    
    Signed-off-by: Javier Martinez Canillas <[email protected]>
    Signed-off-by: Alexandre Belloni <[email protected]>

commit 21f274327080271dc5adb7ae9852ffa5e028ce55
Author: Javier Martinez Canillas <[email protected]>
Date:   Fri Mar 3 11:29:21 2017 -0300

    rtc: s35390a: Add OF device ID table
    
    The driver doesn't have a struct of_device_id table but supported devices
    are registered via Device Trees. This is working on the assumption that a
    I2C device registered via OF will always match a legacy I2C device ID and
    that the MODALIAS reported will always be of the form i2c:<device>.
    
    But this could change in the future so the correct approach is to have an
    OF device ID table if the devices are registered via OF.
    
    Signed-off-by: Javier Martinez Canillas <[email protected]>
    Signed-off-by: Alexandre Belloni <[email protected]>

commit 0383550402352c0036129524f751d8182e837960
Author: Javier Martinez Canillas <[email protected]>
Date:   Fri Mar 3 11:29:20 2017 -0300

    rtc: isl1208: Add OF device ID table
    
    The driver doesn't have a struct of_device_id table but supported devices
    are registered via Device Trees. This is working on the assumption that a
    I2C device registered via OF will always match a legacy I2C device ID and
    that the MODALIAS reported will always be of the form i2c:<device>.
    
    But this could change in the future so the correct approach is to have an
    OF device ID table if the devices are registered via OF.
    
    Signed-off-by: Javier Martinez Canillas <[email protected]>
    Signed-off-by: Alexandre Belloni <[email protected]>

commit abac12e1105e7f3572f641db7583547069c1deaf
Author: Javier Martinez Canillas <[email protected]>
Date:   Fri Mar 3 11:29:19 2017 -0300

    rtc: ds1374: Set .of_match_table to OF device ID table
    
    The driver has a OF device ID table but the struct i2c_driver
    .of_match_table field is not set.
    
    Signed-off-by: Javier Martinez Canillas <[email protected]>
    Signed-off-by: Alexandre Belloni <[email protected]>

commit 23194ac0992be4f4f6ebd0926dac473ed03568d0
Author: Javier Martinez Canillas <[email protected]>
Date:   Fri Mar 3 11:29:18 2017 -0300

    rtc: rtc-ds1672: Add OF device ID table
    
    The driver doesn't have a struct of_device_id table but supported devices
    are registered via Device Trees. This is working on the assumption that a
    I2C device registered via OF will always match a legacy I2C device ID and
    that the MODALIAS reported will always be of the form i2c:<device>.
    
    But this could change in the future so the correct approach is to have an
    OF device ID table if the devices are registered via OF.
    
    Signed-off-by: Javier Martinez Canillas <[email protected]>
    Signed-off-by: Alexandre Belloni <[email protected]>

commit 4dfbd1378dceaa418ec1634428ffe038e0870b66
Author: Javier Martinez Canillas <[email protected]>
Date:   Fri Mar 3 11:29:17 2017 -0300

    rtc: ds3232: Add OF device ID table
    
    The driver doesn't have a struct of_device_id table but supported devices
    are registered via Device Trees. This is working on the assumption that a
    I2C device registered via OF will always match a legacy I2C device ID and
    that the MODALIAS reported will always be of the form i2c:<device>.
    
    But this could change in the future so the correct approach is to have an
    OF device ID table if the devices are registered via OF.
    
    Signed-off-by: Javier Martinez Canillas <[email protected]>
    Signed-off-by: Alexandre Belloni <[email protected]>

commit 81b779ce5f3bb1df54d0accd83834a1c90a32121
Author: Javier Martinez Canillas <[email protected]>
Date:   Fri Mar 3 11:29:16 2017 -0300

    rtc: rx8010: Add OF device ID table
    
    The driver doesn't have a struct of_device_id table but supported devices
    are registered via Device Trees. This is working on the assumption that a
    I2C device registered via OF will always match a legacy I2C device ID and
    that the MODALIAS reported will always be of the form i2c:<device>.
    
    But this could change in the future so the correct approach is to have an
    OF device ID table if the devices are registered via OF.
    
    Signed-off-by: Javier Martinez Canillas <[email protected]>
    Signed-off-by: Alexandre Belloni <[email protected]>

commit 7ef6d2c266e5b0eb09ba0d5f5aa2dfc11df19745
Author: Javier Martinez Canillas <[email protected]>
Date:   Fri Mar 3 11:29:15 2017 -0300

    rtc: ds1307: Add OF device ID table
    
    The driver doesn't have a struct of_device_id table but supported devices
    are registered via Device Trees. This is working on the assumption that a
    I2C device registered via OF will always match a legacy I2C device ID and
    that the MODALIAS reported will always be of the form i2c:<device>.
    
    But this could change in the future so the correct approach is to have an
    OF device ID table if the devices are registered via OF.
    
    Signed-off-by: Javier Martinez Canillas <[email protected]>
    Signed-off-by: Alexandre Belloni <[email protected]>

commit 36138c70e69c9272bd7e0be517f1ac7ad0c4f0df
Author: Javier Martinez Canillas <[email protected]>
Date:   Fri Mar 3 11:29:14 2017 -0300

    rtc: bq32k: Add OF device ID table
    
    The driver doesn't have a struct of_device_id table but supported devices
    are registered via Device Trees. This is working on the assumption that a
    I2C device registered via OF will always match a legacy I2C device ID and
    that the MODALIAS reported will always be of the form i2c:<device>.
    
    But this could change in the future so the correct approach is to have an
    OF device ID table if the devices are registered via OF.
    
    Signed-off-by: Javier Martinez Canillas <[email protected]>
    Signed-off-by: Alexandre Belloni <[email protected]>

commit e696a1dd7e80ee64fd930982d9e1c6b152635534
Author: Javier Martinez Canillas <[email protected]>
Date:   Fri Mar 3 11:29:13 2017 -0300

    rtc: rv3029: Add OF device ID table
    
    The driver doesn't have a struct of_device_id table but supported devices
    are registered via Device Trees. This is working on the assumption that a
    I2C device registered via OF will always match a legacy I2C device ID and
    that the MODALIAS reported will always be of the form i2c:<device>.
    
    But this could change in the future so the correct approach is to have an
    OF device ID table if the devices are registered via OF.
    
    Signed-off-by: Javier Martinez Canillas <[email protected]>
    Signed-off-by: Alexandre Belloni <[email protected]>

commit 740ad8f43c78a8dc3ec73515b7b847ca0edac781
Author: Javier Martinez Canillas <[email protected]>
Date:   Fri Mar 3 11:29:12 2017 -0300

    rtc: rv8803: Add OF device ID table
    
    The driver doesn't have a struct of_device_id table but supported devices
    are registered via Device Trees. This is working on the assumption that a
    I2C device registered via OF will always match a legacy I2C device ID and
    that the MODALIAS reported will always be of the form i2c:<device>.
    
    But this could change in the future so the correct approach is to have an
    OF device ID table if the devices are registered via OF.
    
    Signed-off-by: Javier Martinez Canillas <[email protected]>
    Signed-off-by: Alexandre Belloni <[email protected]>

commit aa805dad9ae66cc4f5106d004b6bb5102fd84434
Author: Paul E. McKenney <[email protected]>
Date:   Wed Mar 8 15:32:29 2017 -0800

    cpu: Move RCU earlier in x86 CPU-online procedure
    
    Running rcutorture on v4.11-rc1 results in the following splat on x86
    on kernels built with both CPU hotplug and lockdep:
    
    [   30.694013] =============================
    [   30.694013] WARNING: suspicious RCU usage
    [   30.694013] 4.11.0-rc1+ #1 Not tainted
    [   30.694013] -----------------------------
    [   30.694013] /home/git/linux-2.6-tip/kernel/workqueue.c:712 sched RCU or wq_pool_mutex should be held!
    [   30.694013]
    [   30.694013] other info that might help us debug this:
    [   30.694013]
    [   30.694013]
    [   30.694013] RCU used illegally from offline CPU!
    [   30.694013] rcu_scheduler_active = 2, debug_locks = 0
    [   30.694013] no locks held by swapper/1/0.
    [   30.694013]
    [   30.694013] stack backtrace:
    [   30.694013] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.11.0-rc1+ #1
    [   30.694013] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
    [   30.694013] Call Trace:
    [   30.694013]  dump_stack+0x67/0x99
    [   30.694013]  lockdep_rcu_suspicious+0xe7/0x120
    [   30.694013]  get_work_pool+0x82/0x90
    [   30.694013]  __queue_work+0x70/0x5f0
    [   30.694013]  queue_work_on+0x33/0x70
    [   30.694013]  clear_sched_clock_stable+0x33/0x40
    [   30.694013]  early_init_intel+0xe7/0x2f0
    [   30.694013]  init_intel+0x11/0x350
    [   30.694013]  identify_cpu+0x344/0x5a0
    [   30.694013]  identify_secondary_cpu+0x18/0x80
    [   30.694013]  smp_store_cpu_info+0x39/0x40
    [   30.694013]  start_secondary+0x4e/0x100
    [   30.694013]  start_cpu+0x14/0x14
    
    This is caused by the fact that smp_store_cpu_info() indirectly invokes
    schedule_work(), which wants to use RCU.  But RCU isn't informed of the
    incoming CPU until the later call to notify_cpu_starting(), which causes
    lockdep to complain bitterly about the use of RCU by the premature call
    to schedule_work().  It is said to be unwise to move the call to
    notify_cpu_starting() to precede that to smp_store_cpu_info(), so this
    commit adds a ARCH_RCU_ONLINE_EARLY Kconfig option that is selected by
    the x86 architecture.  This option suppresses the call to rcu_cpu_starting()
    from notify_cpu_starting(), and a call to rcu_cpu_starting() is added
    preceding the call to smp_store_cpu_info().
    
    Signed-off-by: Paul E. McKenney <[email protected]>
    Cc: Ingo Molnar <[email protected]>
    Cc: Peter Zijlstra <[email protected]>
    Cc: Frederic Weisbecker <[email protected]>
    Cc: Thomas Gleixner <[email protected]>

commit 4b9502e63b5e2b1b5ef491919d3219b9440fe0b3
Author: Elena Reshetova <[email protected]>
Date:   Wed Mar 8 10:00:40 2017 +0200

    kernel: convert css_set.refcount from atomic_t to refcount_t
    
    refcount_t type and corresponding API should be
    used instead of atomic_t when the variable is used as
    a reference counter. This allows to avoid accidental
    refcounter overflows that might lead to use-after-free
    situations.
    
    Signed-off-by: Elena Reshetova <[email protected]>
    Signed-off-by: Hans Liljestrand <[email protected]>
    Signed-off-by: Kees Cook <[email protected]>
    Signed-off-by: David Windsor <[email protected]>
    Signed-off-by: Tejun Heo <[email protected]>

commit 86b04268d4b25c3a00fa5f77ceebabecf681040a
Author: Daniel Vetter <[email protected]>
Date:   Wed Mar 1 10:52:26 2017 +0100

    drm/i915: Fix up verify_encoder_state
    
    The trouble here is that looking at all connector->state in the
    verifier isn't good, because that's run from the commit work, which
    doesn't hold the connection_mutex. Which means we're only allowed to
    look at states in our atomic update.
    
    The simple fix for future proofing would be to switch over to
    drm_for_each_connector_in_state, but that has the problem that the
    verification then fails if not all connectors are in the state. And we
    also need to be careful to check both old and new encoders, and not
    screw things up when an encoder gets reassigned.
    
    Note that this isn't the full fix, since we still look at
    connector->state. To fix that, we need Maarten's patch series to
    switch over to state pointers within drm_atomic_state, but that's a
    different series.
    
    v2: Use oldnew iterator (Maarten).
    
    v3: Rebase onto the iter_get/put->iter_begin/end rename.
    
    Cc: Thierry Reding <[email protected]>
    Cc: Maarten Lankhorst <[email protected]>
    Reviewed-by: Maarten Lankhorst <[email protected]>
    Signed-off-by: Daniel Vetter <[email protected]>
    Link: http://patchwork.freedesktop.org/patch/msgid/[email protected]

commit f9e905cabdccfda0be094debbf21cb2cc45b8c51
Author: Daniel Vetter <[email protected]>
Date:   Wed Mar 1 10:52:25 2017 +0100

    drm/i915: use for_each_intel_connector_iter in intel_display.c
    
    This gets rid of the last users of for_each_intel_connector(), remove
    that too.
    
    At first I wasn't sure whether the 2 loops in the modeset state
    checker should instead only loop over the connectors in the atomic
    commit. But we never add connectors to an atomic update if they don't
    (or won't have) a CRTC assigned, which means there'd be a gap in check
    coverage. Hence loop over everything on those too.
    
    v2: Rebase onto the iter_get/put->iter_begin/end rename.
    
    Cc: Thierry Reding <[email protected]>
    Cc: Maarten Lankhorst <[email protected]>
    Reviewed-by: Maarten Lankhorst <[email protected]>
    Signed-off-by: Daniel Vetter <[email protected]>
    Link: http://patchwork.freedesktop.org/patch/msgid/[email protected]

commit 51ec53da407a7a6e1c28e973524a15d521f96c7a
Author: Daniel Vetter <[email protected]>
Date:   Wed Mar 1 10:52:24 2017 +0100

    drm/i915: Make intel_get_pipe_from_connector atomic
    
    Drive-by fixup while looking at all the connector_list walkers -
    holding connection_mutex does actually _not_ give you locking to look
    at the legacy drm_connector->encoder->crtc pointer chain. That one is
    solely owned by the atomic commit workers. Instead we must inspect the
    atomic state.
    
    Reviewed-by: Maarten Lankhorst <[email protected]>
    Signed-off-by: Daniel Vetter <[email protected]>
    Link: http://patchwork.freedesktop.org/patch/msgid/[email protected]

commit f57c84212d8da657def31ca04a898a2e98396cfa
Author: Daniel Vetter <[email protected]>
Date:   Wed Mar 1 10:52:23 2017 +0100

    drm/i915: use drm_connector_list_iter in intel_opregion.c
    
    One case where I nuked a now unecessary locking, otherwise all just
    boring stuff.
    
    v2: Rebase onto the iter_get/put->iter_begin/end rename.
    
    Cc: Thierry Reding <[email protected]>
    Cc: Maarten Lankhorst <[email protected]>
    Cc: Jani Nikula <[email protected]>
    Reviewed-by: Maarten Lankhorst <[email protected]>
    Signed-off-by: Daniel Vetter <[email protected]>
    Link: http://patchwork.freedesktop.org/patch/msgid/[email protected]

commit cc3ca4f33d922f3de3fb8e3ca3708f258eb953b1
Author: Daniel Vetter <[email protected]>
Date:   Wed Mar 1 10:52:22 2017 +0100

    drm/i915: use drm_connector_list_iter in intel_hotplug.c
    
    Nothing special, just rote conversion.
    
    v2: Rebase onto the iter_get/put->iter_begin/end rename.
    
    Cc: Thierry Reding <[email protected]>
    Cc: Maarten Lankhorst <[email protected]>
    Reviewed-by: Maarten Lankhorst <[email protected]>
    Signed-off-by: Daniel Vetter <[email protected]>
    Link: http://patchwork.freedesktop.org/patch/msgid/[email protected]

commit 3f6a5e1ea6b91ec08548101010fe0ab33c25950b
Author: Daniel Vetter <[email protected]>
Date:   Wed Mar 1 10:52:21 2017 +0100

    drm/i915: Use drm_connector_list_iter in debugfs
    
    While at it also try to reduce the locking a bit to what's really just
    needed instead of everything that we could possibly lock.
    
    Added a new for_each_intel_connector_iter which includes the cast to
    intel_connector.
    
    Otherwise just plain transformation with nothing special going on.
    
    v2: Review from Maarten:
    - Stick with modeset_lock_all in sink_crc, it looks at crtc->state.
    - Fix up early loop exit in i915_displayport_test_active_write.
    
    v3: Rebase onto the iter_get/put->iter_begin/end rename.
    
    Cc: Thierry Reding <[email protected]>
    Cc: Maarten Lankhorst <[email protected]>
    Reviewed-by: Maarten Lankhorst <[email protected]> (v2)
    Signed-off-by: Daniel Vetter <[email protected]>
    Link: http://patchwork.freedesktop.org/patch/msgid/[email protected]

commit 0044c533cf940b600c4ed01c5a66e9d6cc421c23
Author: Paul E. McKenney <[email protected]>
Date:   Wed Mar 8 13:44:37 2017 -0800

    clock: Fix smp_processor_id() in preemptible bug
    
    The v4.11-rc1 kernel emits the following splat in some configurations:
    
    [   43.681891] BUG: using smp_processor_id() in preemptible [00000000] code: kworker/3:1/49
    [   43.682511] caller is debug_smp_processor_id+0x17/0x20
    [   43.682893] CPU: 0 PID: 49 Comm: kworker/3:1 Not tainted 4.11.0-rc1+ #1
    [   43.683382] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
    [   43.683497] Workqueue: events __clear_sched_clock_stable
    [   43.683497] Call Trace:
    [   43.683497]  dump_stack+0x4f/0x69
    [   43.683497]  check_preemption_disabled+0xd9/0xf0
    [   43.683497]  debug_smp_processor_id+0x17/0x20
    [   43.683497]  __clear_sched_clock_stable+0x11/0x60
    [   43.683497]  process_one_work+0x146/0x430
    [   43.683497]  worker_thread+0x126/0x490
    [   43.683497]  kthread+0xfc/0x130
    [   43.683497]  ? process_one_work+0x430/0x430
    [   43.683497]  ? kthread_create_on_node+0x40/0x40
    [   43.683497]  ? umh_complete+0x30/0x30
    [   43.683497]  ? call_usermodehelper_exec_async+0x12a/0x130
    [   43.683497]  ret_from_fork+0x29/0x40
    [   43.689244] sched_clock: Marking unstable (43688244724, 179505618)<-(43867750342, 0)
    
    This happens because workqueue handlers run with preemption enabled
    by default and the new this_scd() function accesses per-CPU variables.
    This commit therefore disables preemption across this call to this_scd()
    and to the uses of the pointer that it returns.
    
    Signed-off-by: Paul E. McKenney <[email protected]>
    Cc: Ingo Molnar <[email protected]>
    Cc: Peter Zijlstra <[email protected]>
    Cc: Frederic Weisbecker <[email protected]>

commit 5b5554c51d38a227c4c5bee6e57026e4ebb0bde9
Author: Chris Wilson <[email protected]>
Date:   Wed Mar 8 14:22:38 2017 +0000

    drm/i915: Check for an invalid seqno before __i915_gem_request_started
    
    __i915_gem_request_started() asserts that the seqno is valid, but
    i915_spin_request() was not checking before querying whether the request
    had started.
    
    Reported-by: Michał Winiarski <[email protected]>
    Fixes: 754c9fd57649 ("drm/i915: Protect the request->global_seqno with the engine->timeline lock")
    Signed-off-by: Chris Wilson <[email protected]>
    Cc: Tvrtko Ursulin <[email protected]>
    Cc: Michał Winiarski <[email protected]>
    Link: http://patchwork.freedesktop.org/patch/msgid/[email protected]
    Reviewed-by: Michał Winiarski <[email protected]>
    Reviewed-by: Tvrtko Ursulin <[email protected]>

commit f166244a6dafdbb6c4e9eba77a04fd55a5bf9657
Author: Chris Wilson <[email protected]>
Date:   Wed Mar 8 13:26:29 2017 +0000

    drm/i915: Purge i915_gem_object_is_dead()
    
    i915_gem_object_is_dead() was a temporary lockdep aide whilst
    transitioning to a new locking structure for obj->mm. Since commit
    1233e2db199d ("drm/i915: Move object backing storage manipulation to its
    own locking") it is now unused and should be removed.
    
    Signed-off-by: Chris Wilson <[email protected]>
    Cc: Joonas Lahtinen <[email protected]>
    Reviewed-by: Joonas Lahtinen <[email protected]>
    Link: http://patchwork.freedesktop.org/patch/msgid/[email protected]

commit 03d1cac6eefef523fa6118f156e811252e6276b4
Author: Chris Wilson <[email protected]>
Date:   Wed Mar 8 13:26:28 2017 +0000

    drm/i915: Avoiding recursing on ww_mutex inside shrinker
    
    We have to avoid taking ww_mutex inside the shrinker as we use it as a
    plain mutex type and so need to avoid recursive deadlocks:
    
    [  602.771969] =================================
    [  602.771970] [ INFO: inconsistent lock state ]
    [  602.771973] 4.10.0gpudebug+ #122 Not tainted
    [  602.771974] ---------------------------------
    [  602.771975] inconsistent {RECLAIM_FS-ON-W} -> {IN-RECLAIM_FS-W} usage.
    [  602.771978] kswapd0/40 [HC0[0]:SC0[0]:HE1:SE1] takes:
    [  602.771979]  (reservation_ww_class_mutex){+.+.?.}, at: [<ffffffffa054680a>] i915_gem_object_wait+0x39a/0x410 [i915]
    [  602.772020] {RECLAIM_FS-ON-W} state was registered at:
    [  602.772024]   mark_held_locks+0x76/0x90
    [  602.772026]   lockdep_trace_alloc+0xb8/0xc0
    [  602.772028]   __kmalloc_track_caller+0x5d/0x130
    [  602.772031]   krealloc+0x89/0xb0
    [  602.772033]   reservation_object_reserve_shared+0xaf/0xd0
    [  602.772055]   i915_gem_do_execbuffer.isra.35+0x1413/0x18b0 [i915]
    [  602.772075]   i915_gem_execbuffer2+0x10e/0x1d0 [i915]
    [  602.772078]   drm_ioctl+0x291/0x480
    [  602.772079]   do_vfs_ioctl+0x695/0x6f0
    [  602.772081]   SyS_ioctl+0x3c/0x70
    [  602.772084]   entry_SYSCALL_64_fastpath+0x18/0xad
    [  602.772085] irq event stamp: 5197423
    [  602.772088] hardirqs last  enabled at (5197423): [<ffffffff8116751d>] kfree+0xdd/0x170
    [  602.772091] hardirqs last disabled at (5197422): [<ffffffff811674f9>] kfree+0xb9/0x170
    [  602.772095] softirqs last  enabled at (5190992): [<ffffffff8107bfe1>] __do_softirq+0x221/0x280
    [  602.772097] softirqs last disabled at (5190575): [<ffffffff8107c294>] irq_exit+0x64/0xc0
    [  602.772099]
                   other info that might help us debug this:
    [  602.772100]  Possible unsafe locking scenario:
    
    [  602.772101]        CPU0
    [  602.772101]        ----
    [  602.772102]   lock(reservation_ww_class_mutex);
    [  602.772104]   <Interrupt>
    [  602.772105]     lock(reservation_ww_class_mutex);
    [  602.772107]
                    *** DEADLOCK ***
    
    [  602.772109] 2 locks held by kswapd0/40:
    [  602.772110]  #0:  (shrinker_rwsem){++++..}, at: [<ffffffff811337b5>] shrink_slab.constprop.62+0x35/0x280
    [  602.772116]  #1:  (&dev->struct_mutex){+.+.+.}, at: [<ffffffffa0553957>] i915_gem_shrinker_lock+0x27/0x60 [i915]
    [  602.772141]
                   stack backtrace:
    [  602.772144] CPU: 2 PID: 40 Comm: kswapd0 Not tainted 4.10.0gpudebug+ #122
    [  602.772145] Hardware name: LENOVO 42433ZG/42433ZG, BIOS 8AET64WW (1.44 ) 07/26/2013
    [  602.772147] Call Trace:
    [  602.772151]  dump_stack+0x68/0xa1
    [  602.772153]  print_usage_bug+0x1d4/0x1f0
    [  602.772155]  mark_lock+0x390/0x530
    [  602.772157]  ? print_irq_inversion_bug+0x200/0x200
    [  602.772159]  __lock_acquire+0x405/0x1260
    [  602.772181]  ? i915_gem_object_wait+0x39a/0x410 [i915]
    [  602.772183]  lock_acquire+0x60/0x80
    [  602.772205]  ? i915_gem_object_wait+0x39a/0x410 [i915]
    [  602.772207]  mutex_lock_nested+0x69/0x760
    [  602.772229]  ? i915_gem_object_wait+0x39a/0x410 [i915]
    [  602.772231]  ? kfree+0xdd/0x170
    [  602.772253]  ? i915_gem_object_wait+0x163/0x410 [i915]
    [  602.772255]  ? trace_hardirqs_on_caller+0x18d/0x1c0
    [  602.772256]  ? trace_hardirqs_on+0xd/0x10
    [  602.772278]  i915_gem_object_wait+0x39a/0x410 [i915]
    [  602.772300]  i915_gem_object_unbind+0x5e/0x130 [i915]
    [  602.772323]  i915_gem_shrink+0x22d/0x3d0 [i915]
    [  602.772347]  i915_gem_shrinker_scan+0x3f/0x80 [i915]
    [  602.772349]  shrink_slab.constprop.62+0x1ad/0x280
    [  602.772352]  shrink_node+0x52/0x80
    [  602.772355]  kswapd+0x427/0x5c0
    [  602.772358]  kthread+0x122/0x130
    [  602.772360]  ? try_to_free_pages+0x270/0x270
    [  602.772362]  ? kthread_stop+0x70/0x70
    [  602.772365]  ret_from_fork+0x2e/0x40
    
    v2: Add commentary about the pruning being opportunistic
    
    Reported-by: Jan Nordholz <[email protected]>
    Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=99977#c10
    Fixes: e54ca9774777 ("drm/i915: Remove completed fences after a wait")
    Signed-off-by: Chris Wilson <[email protected]>
    Cc: Joonas Lahtinen <[email protected]>
    Cc: Matthew Auld <[email protected]>
    Reviewed-by: Joonas Lahtinen <[email protected]>
    Link: http://patchwork.freedesktop.org/patch/msgid/[email protected]

commit 7ea79ae86c28e729d51fa5703b093d27cca25217
Author: Rafał Miłecki <[email protected]>
Date:   Mon Mar 6 06:19:45 2017 +0100

    leds: gpio: use OF variant of LED registering function
    
    In leds-gpio we support LEDs specified in DT so we should use
    (devm_)of_led_classdev_register. This allows passing DT node as argument
    for use by the LED subsystem.
    
    Signed-off-by: Rafał Miłecki <[email protected]>
    Acked-by: Pavel Machek <[email protected]>
    Signed-off-by: Jacek Anaszewski <[email protected]>

commit 442c609830e98919faa78b797e9b89c53bab9cbf
Author: Rafał Miłecki <[email protected]>
Date:   Mon Mar 6 06:19:44 2017 +0100

    leds: core: add OF variants of LED registering functions
    
    These new functions allow passing an additional device_node argument
    that will be internally set for created LED device. Thanks to this LED
    core code and triggers will be able to access DT node for reading extra
    info.
    
    The easiest solution for achieving this was reworking old functions to
    more generic ones & adding simple defines for API compatibility.
    
    Signed-off-by: Rafał Miłecki <[email protected]>
    Acked-by: Pavel Machek <[email protected]>
    Signed-off-by: Jacek Anaszewski <[email protected]>

commit de43e82734a4fe12779cdec47ab23a9bb247a608
Author: Thierry Reding <[email protected]>
Date:   Thu Feb 23 18:30:53 2017 +0100

    arm64: tegra: Add GPIO expanders on P2771
    
    The P2771 development board expands the number of GPIOs via two I2C
    chips.
    
    Acked-by: Jon Hunter <[email protected]>
    Signed-off-by: Thierry Reding <[email protected]>

commit f7c967f33d82388bc005bbe7a8356178e768b213
Author: Thierry Reding <[email protected]>
Date:   Thu Feb 23 18:30:52 2017 +0100

    arm64: tegra: Add power monitors on P2771
    
    The P2771 development board comes with two power monitors that can be
    used to determine power consumption in different parts of the board.
    
    Acked-by: Jon Hunter <[email protected]>
    Signed-off-by: Thierry Reding <[email protected]>

commit b2c1233559946ec9e083f1013a615e7e8265b80d
Author: Thierry Reding <[email protected]>
Date:   Thu Feb 23 18:30:51 2017 +0100

    arm64: tegra: Add GPIO keys on P2771
    
    The P2771 has three keys (power, volume up and volume down) that are
    connected to pins on the AON GPIO controller.
    
    Acked-by: Jon Hunter <[email protected]>
    Signed-off-by: Thierry Reding <[email protected]>

commit 525e43e0170e0dc2b3dcee31117cf1a0d9bcc61e
Author: Thierry Reding <[email protected]>
Date:   Thu Feb 23 18:30:50 2017 +0100

    arm64: tegra: Enable current monitors on P3310
    
    The P3310 processor module contains two current monitors that can be
    used to determine the current flow across various parts of the board
    design.
    
    Acked-by: Jon Hunter <[email protected]>
    Signed-off-by: Thierry Reding <[email protected]>

commit d40b681817d21ea873a422d5cca30095e7a8027c
Author: Thierry Reding <[email protected]>
Date:   Wed Mar 8 14:31:45 2017 +0100

    arm64: tegra: Enable SD/MMC slot on P2771
    
    The P3310 processor module makes provisions for exposing the SDMMC1
    controller via a standard SD/MMC slot, which the P2771 supports. Hook
    up the power supply provided on the P2771 carrier board and enable
    the device tree node.
    
    Acked-by: Jon Hunter <[email protected]>
    Signed-off-by: Thierry Reding <[email protected]>

commit e4d97d5d0267598c974a213882006949d162ee1a
Author: Thierry Reding <[email protected]>
Date:   Thu Feb 23 18:30:49 2017 +0100

    arm64: tegra: Enable SDHCI controllers on P3110
    
    The P3110 processor module wires one of the SDHCI controllers to an on-
    board eMMC and exposes another set of SD/MMC signals on the connector to
    support an external SD/MMC card. A third controller is connected to the
    SDIO pins of an M.2 KEY E connector.
    
    Acked-by: Jon Hunter <[email protected]>
    Signed-off-by: Thierry Reding <[email protected]>

commit c17786438583adac0e0cd35e5b135a7064a7770e
Author: Thierry Reding <[email protected]>
Date:   Thu Feb 23 18:30:48 2017 +0100

    arm64: tegra: Add initial power tree for P3310
    
    Enable the Maxim MAX77620 PMIC found on P3310 and add some fixed
    regulators to model the power tree.
    
    Acked-by: Jon Hunter <[email protected]>
    Signed-off-by: Thierry Reding <[email protected]>

commit 854014236290364f9bec84ba56b043bb697d4f0e
Author: Thierry Reding <[email protected]>
Date:   Thu Feb 23 18:11:57 2017 +0100

    soc/tegra: Implement Tegra186 PMC support
    
    The power management controller on Tegra186 has changed in backwards-
    incompatible ways with respect to earlier generations. This implements a
    new driver that supports inversion of the PMU interrupt as well as the
    "recovery", "bootloader" and "forced-recovery" reboot commands.
    
    Acked-by: Rob Herring <[email protected]>
    Signed-off-by: Thierry Reding <[email protected]>

commit c1183db8853d14df390748aa6baa69734db151aa
Author: Taehee Yoo <[email protected]>
Date:   Tue Mar 7 22:02:50 2017 +0900

    netfilter: nf_reject: remove unused variable
    
    variable oiph is not used.
    
    Signed-off-by: Taehee Yoo <[email protected]>
    Signed-off-by: Pablo Neira Ayuso <[email protected]>

commit efc9b8e33b8b5ef890288758454ce62a1319c94a
Author: Florian Westphal <[email protected]>
Date:   Tue Mar 7 12:45:04 2017 +0100

    netfilter: bridge: remove unneeded rcu_read_lock
    
    as comment says, the function is always called with rcu read lock held.
    
    Signed-off-by: Florian Westphal <[email protected]>
    Signed-off-by: Pablo Neira Ayuso <[email protected]>

commit 02c9a7dfd9100ee14c88199f630d13e27183511d
Author: Paul E. McKenney <[email protected]>
Date:   Tue Mar 7 07:30:58 2017 -0800

    doc: Update rcu_assign_pointer() definition in whatisRCU.txt
    
    The rcu_assign_pointer() macro has changed over time, and the version
    in Documentation/RCU/whatisRCU.txt has not kept up.  This commit brings
    it into 2017, albeit in a simplified fashion.
    
    Reported-by: Andrea Parri <[email protected]>
    Signed-off-by: Paul E. McKenney <[email protected]>

commit c8691441cc02135662de9144dd486545d08cb560
Author: Paul E. McKenney <[email protected]>
Date:   Tue Mar 7 07:21:23 2017 -0800

    rcu: Add smp_mb__after_atomic() to sync_exp_work_done()
    
    The sync_exp_work_done() function needs to fully order the counter-check
    operation against anything happening after the corresponding grace period.
    This is a theoretical bug, as all current architectures either provide
    full ordering for atomic operation on the one hand or implement,
    however, a little future-proofing is a good thing.  This commit
    therefore adds smp_mb__after_atomic() after the atomic_long_inc()
    in sync_exp_work_done().
    
    Reported-by: Dmitry Vyukov <[email protected]>
    Signed-off-by: Paul E. McKenney <[email protected]>

commit 0ce113690e32bcdb3d72e019012f4cd68bac9ae8
Author: Dmitry Vyukov <[email protected]>
Date:   Sun Mar 5 12:17:31 2017 -0800

    rcu: fix warning in rcu_seq_end()
    
    rcu_seq_end() increments seq signifying completion of a grace period,
    after that checks that the seq is even and wakes
    _synchronize_rcu_expedited(). _synchronize_rcu_expedited() uses
    wait_event() to wait for even seq. The problem is that wait_event()
    can return as soon as seq becomes even without waiting for the wakeup.
    In such case the warning in rcu_seq_end() can falsely fire if the next
    expedited grace period starts before the check.
    
    Check that seq has good value before incrementing it.
    
    Signed-off-by: Dmitry Vyukov <[email protected]>
    Cc: [email protected]
    Cc: [email protected]
    Cc: Paul E. McKenney <[email protected]>
    Cc: Steven Rostedt <[email protected]>
    Cc: Mathieu Desnoyers <[email protected]>
    Cc: [email protected]
    Cc: [email protected]
    Signed-off-by: Paul E. McKenney <[email protected]>
    
    ---
    
    syzkaller-triggered warning:
    
    WARNING: CPU: 0 PID: 4832 at kernel/rcu/tree.c:3533
    rcu_seq_end+0x110/0x140 kernel/rcu/tree.c:3533
    CPU: 0 PID: 4832 Comm: kworker/0:3 Not tainted 4.10.0+ #276
    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
    Workqueue: events wait_rcu_exp_gp
    Call Trace:
     __dump_stack lib/dump_stack.c:15 [inline]
     dump_stack+0x2ee/0x3ef lib/dump_stack.c:51
     panic+0x1fb/0x412 kernel/panic.c:179
     __warn+0x1c4/0x1e0 kernel/panic.c:540
     warn_slowpath_null+0x2c/0x40 kernel/panic.c:583
     rcu_seq_end+0x110/0x140 kernel/rcu/tree.c:3533
     rcu_exp_gp_seq_end kernel/rcu/tree_exp.h:36 [inline]
     rcu_exp_wait_wake+0x8a9/0x1330 kernel/rcu/tree_exp.h:517
     rcu_exp_sel_wait_wake kernel/rcu/tree_exp.h:559 [inline]
     wait_rcu_exp_gp+0x83/0xc0 kernel/rcu/tree_exp.h:570
     process_one_work+0xc06/0x1c20 kernel/workqueue.c:2096
     worker_thread+0x223/0x19c0 kernel/workqueue.c:2230
     kthread+0x326/0x3f0 kernel/kthread.c:227
     ret_from_fork+0x31/0x40 arch/x86/entry/entry_64.S:430
    ---

commit 72c3a03bfb3bab33bc637e07e21115f58fe01a7e
Author: Paul E. McKenney <[email protected]>
Date:   Sat Mar 4 12:33:53 2017 -0800

    rcu: Expedited wakeups need to be fully ordered
    
    Expedited grace periods use workqueue handlers that wake up the requesters,
    but there is no lock mediating this wakeup.  Therefore, memory barriers
    are required to ensure that the handler's memory references are seen by
    all to occur before synchronize_*_expedited() returns to its caller.
    Possibly detected by syzkaller.
    
    Reported-by: Dmitry Vyukov <[email protected]>
    Signed-off-by: Paul E. McKenney <[email protected]>

commit a57de130ccb9bf1a4c63ab15e1256f8038aaeeee
Author: Paul E. McKenney <[email protected]>
Date:   Thu Mar 2 19:42:58 2017 -0800

    srcu: Add tests for mid-boot use of SRCU
    
    This commit adds calls to synchronize_srcu_expedited and to
    synchronize_srcu() to rcu_test_sync_prims(), so that these
    primitives will be tested during the mid-boot dead period
    in kernels built with CONFIG_SRCU=y and CONFIG_PROVE_RCU=y.
    
    Signed-off-by: Paul E. McKenney <[email protected]>

commit bc138e7ad1f6fece72cf475557c2dda303eb068f
Author: Paul E. McKenney <[email protected]>
Date:   Wed Mar 1 23:48:40 2017 -0800

    srcu: Allow mid-boot use of synchronize_srcu()
    
    This commit adds checks for mid-boot use of synchronize_srcu(), and
    during that time period dispenses with workqueues in favor of making
    the requesting thread drive the grace period.
    
    Signed-off-by: Paul E. McKenney <[email protected]>

commit 52686742802b90b1829e3678b05869f179a180a9
Author: Michael S. Tsirkin <[email protected]>
Date:   Mon Feb 27 21:14:19 2017 +0200

    hlist_add_tail_rcu disable sparse warning
    
    sparse is unhappy about this code in hlist_add_tail_rcu:
    
            struct hlist_node *i, *last = NULL;
    
            for (i = hlist_first_rcu(h); i; i = hlist_next_rcu(i))
                    last = i;
    
    This is because hlist_next_rcu and hlist_next_rcu return
    __rcu pointers.
    
    It's a false positive - it's a write side primitive and so
    does not need to be called in a read side critical section.
    
    The following trivial patch disables the warning
    without changing the behaviour in any way.
    
    Note: __hlist_for_each_rcu would also remove the warning but it would be
    confusing since it calls rcu_derefence and is designed to run in the rcu
    read side critical section.
    
    Signed-off-by: Michael S. Tsirkin <[email protected]>
    Reviewed-by: Steven Rostedt (VMware) <[email protected]>
    Signed-off-by: Paul E. McKenney <[email protected]>

commit f49ada453dec7f7c9491e71fc61f3a15f4389e0f
Author: Paul E. McKenney <[email protected]>
Date:   Mon Feb 20 14:57:17 2017 -0800

    rcu: Move rcu_seq_start() and friends to rcu.h
    
    This commit moves rcu_seq_start(), rcu_seq_end(), rcu_seq_snap(),
    and rcu_seq_done() from kernel/rcu/tree.c to kernel/rcu/rcu.h.
    This will allow SRCU to use these functions, which in turn will
    allow SRCU to move from a single global callback queue to a
    per-CPU callback queue.
    
    Signed-off-by: Paul E. McKenney <[email protected]>

commit 6c66002aee107d6b29b6b286db7888e19cc95602
Author: Paul E. McKenney <[email protected]>
Date:   Wed Feb 15 17:50:50 2017 -0800

    rcu: Add single-element dequeue functions to rcu_segcblist
    
    This commit adds single-element dequeue functions to rcu_segcblist.
    These are less efficient than using the extract and insert functions,
    but allow more precise debugging code.  These functions are thus
    expected to be used only in debug builds, for example, CONFIG_PROVE_RCU.
    
    Signed-off-by: Paul E. McKenney <[email protected]>

commit 059eb0b7a76c36f233e355369001606555cf78a0
Author: Paul E. McKenney <[email protected]>
Date:   Fri Feb 10 14:50:46 2017 -0800

    rcu: Allow early boot use of synchronize_srcu()
    
    This commit checks for pre-scheduler state, and if that early in the
    boot process, synchronize_srcu() and friends are no-ops.
    
    Signed-off-by: Paul E. McKenney <[email protected]>

commit ad42c0c03e92c76ed1e019a8d86c01e7a02276cc
Author: Paul E. McKenney <[email protected]>
Date:   Fri Feb 10 14:32:54 2017 -0800

    rcu: Allow SRCU to access rcu_scheduler_active
    
    This is primarily a code-movement commit in preparation for allowing
    SRCU to handle early-boot SRCU grace periods.
    
    Signed-off-by: Paul E. McKenney <[email protected]>

commit ae2c7aab83ebd58b7a2747c1fbdd38af2be0c196
Author: Paul E. McKenney <[email protected]>
Date:   Wed Feb 8 14:58:41 2017 -0800

    rcu: Remove obsolete comment from rcu_future_gp_cleanup() header
    
    The rcu_nocb_gp_cleanup() function is now invoked elsewhere, so this
    commit drags this comment into the year 2017.
    
    Reported-by: Michalis Kokologiannakis <[email protected]>
    Signed-off-by: Paul E. McKenney <[email protected]>

commit 08dd2d5d948b59ad9da0d3c886a9098a0063c547
Author: Paul E. McKenney <[email protected]>
Date:   Wed Feb 8 14:49:27 2017 -0800

    rcu: Fix typo in PER_RCU_NODE_PERIOD header comment
    
    This commit just changes a "the the" to "the" to reduce repetition.
    
    Reported-by: Michalis Kokologiannakis <[email protected]>
    Signed-off-by: Paul E. McKenney <[email protected]>

commit 4465db28d5250c7dd53d3ff628d6ba33002b7cf2
Author: Paul E. McKenney <[email protected]>
Date:   Wed Feb 8 14:30:15 2017 -0800

    doc: Update stallwarn.txt to make causes more prominent
    
    This commit rearranges the Documentation/RCU/stallwarn.txt file to
    put the list of issues that can cause RCU CPU stall warnings near
    the beginning of the document.
    
    Signed-off-by: Paul E. McKenney <[email protected]>

commit f1531c507a92aa99bd763292b31ab2f435208528
Author: Paul E. McKenney <[email protected]>
Date:   Wed Feb 8 12:36:42 2017 -0800

    rcu: Abstract multi-tail callback list handling
    
    RCU has only one multi-tail callback list, which is implemented via
    the nxtlist, nxttail, nxtcompleted, qlen_lazy, and qlen fields in the
    rcu_data structure, and whose operations are open-code throughout the
    Tree RCU implementation.  This has been more or less OK in the past,
    but upcoming callback-list optimizations in SRCU could really use
    a multi-tail callback list there as well.
    
    This commit therefore abstracts the multi-tail callback list handling
    into a new kernel/rcu/rcu_segcblist.h file, and uses this new API.
    The simple head-and-tail pointer callback list is also abstracted and
    applied everywhere except for the NOCB callback-offload lists.  (Yes,
    the plan is to apply them there as well, but this commit is already
    bigger than would be good.)
    
    Signed-off-by: Paul E. McKenney <[email protected]>

commit f9c181f732e6b068fcaab1399dd6e11cdf35530b
Author: Paul E. McKenney <[email protected]>
Date:   Fri Feb 3 09:54:05 2017 -0800

    rcu: Default RCU_FANOUT_LEAF to 16 unless explicitly changed
    
    If the RCU_EXPERT Kconfig option is not set (the default), then the
    RCU_FANOUT_LEAF Kconfig option will not be defined, which will cause
    the leaf-level rcu_node tree fanout to default to 32 on 32-bit systems
    and 64 on 64-bit systems.  This can result in excessive lock contention.
    This commit therefore changes the computation of the leaf-level rcu_node
    tree fanout so that the result will be 16 unless an explicit Kconfig or
    kernel-boot setting says otherwise.
    
    Reported-by: Peter Zijlstra <[email protected]>
    Signed-off-by: Paul E. McKenney <[email protected]>

commit 41381f71735b4eeb60a5a66c2b635c90d6b1897e
Author: Paul E. McKenney <[email protected]>
Date:   Fri Feb 3 09:27:00 2017 -0800

    rcu: Make RCU_FANOUT_LEAF help text more explicit about skew_tick
    
    If you set RCU_FANOUT_LEAF too high, you can get lock contention
    on the leaf rcu_node, and you should boot with the skew_tick kernel
    parameter set in order to avoid this lock contention.  This commit
    therefore upgrades the RCU_FANOUT_LEAF help text to explicitly state
    this.
    
    Signed-off-by: Paul E. McKenney <[email protected]>

commit 6b88c5d1b480fcc9f64ef37cb44e74dbef8d8e3d
Author: Paul E. McKenney <[email protected]>
Date:   Thu Feb 2 11:40:15 2017 -0800

    types: Update obsolete callback_head comment
    
    The comment header for callback_head (and thus for rcu_head) states that
    the bottom two bits of a pointer to these structures must be zero.  This
    is obsolete:  The new requirement is that only the bottom bit need be
    zero.  This commit therefore updates this comment.
    
    Signed-off-by: Paul E. McKenney <[email protected]>

commit d898bc54b5d316a77822e7a75d189b84ea9e7d33
Author: Paul E. McKenney <[email protected]>
Date:   Tue Jan 31 07:45:13 2017 -0800

    lockdep: Use "WARNING" tag on lockdep splats
    
    This commit changes lockdep splats to begin lines with "WARNING" and
    to use pr_warn() instead of printk().  This change eases scripted
    analysis of kernel console output.
    
    Reported-by: Dmitry Vyukov <[email protected]>
    Reported-by: Ingo Molnar <[email protected]>
    Signed-off-by: Paul E. McKenney <[email protected]>
    Acked-by: Dmitry Vyukov <[email protected]>

commit 65ebde3170b4e3c55895785aa23d7b0fc3fcd787
Author: Paul E. McKenney <[email protected]>
Date:   Fri Jan 27 14:17:50 2017 -0800

    rcu: Place guard on rcu_all_qs() and rcu_note_context_switch() actions
    
    The rcu_all_qs() and rcu_note_context_switch() do a series of checks,
    taking various actions to supply RCU with quiescent states, depending
    on the outcomes of the various checks.  This is a bit much for scheduling
    fastpaths, so this commit creates a separate ->rcu_urgent_qs field in
    the rcu_dynticks structure that acts as a global guard for these checks.
    Thus, in the common case, rcu_all_qs() and rcu_note_context_switch()
    check the ->rcu_urgent_qs field, find it false, and simply return.
    
    Signed-off-by: Paul E. McKenney <[email protected]>
    Cc: Peter Zijlstra <[email protected]>

commit 84521b695b3f51ffc859eb3864d8eadbf31667fa
Author: Paul E. McKenney <[email protected]>
Date:   Fri Jan 27 13:17:02 2017 -0800

    rcu: Eliminate flavor scan in rcu_momentary_dyntick_idle()
    
    The rcu_momentary_dyntick_idle() function scans the RCU flavors, checking
    that one of them still needs a quiescent state before doing an expensive
    atomic operation on the ->dynticks counter.  However, this check reduces
    overhead only after a rare race condition, and increases complexity.  This
    commit therefore removes the scan and the mechanism enabling the scan.
    
    Signed-off-by: Paul E. McKenney <[email protected]>

commit 1103a4db6cb25862fb1547d8cbdfdda91d6c3487
Author: Paul E. McKenney <[email protected]>
Date:   Thu Jan 26 16:18:07 2017 -0800

    rcu: Pull rcu_qs_ctr into rcu_dynticks structure
    
    The rcu_qs_ctr variable is yet another isolated per-CPU variable,
    so this commit pulls it into the pre-existing rcu_dynticks per-CPU
    structure.
    
    Signed-off-by: Paul E. McKenney <[email protected]>

commit bebcd9a45ea3c405f984f6d68fe52e767348917b
Author: Paul E. McKenney <[email protected]>
Date:   Thu Jan 26 13:45:38 2017 -0800

    rcu: Pull rcu_sched_qs_mask into rcu_dynticks structure
    
    The rcu_sched_qs_mask variable is yet another isolated per-CPU variable,
    so this commit pulls it into the pre-existing rcu_dynticks per-CPU
    structure.
    
    Signed-off-by: Paul E. McKenney <[email protected]>

commit e1dde8b7e10174237c13c08f9c35295e9bfe57ca
Author: Paul E. McKenney <[email protected]>
Date:   Thu Jan 26 13:26:00 2017 -0800

    rcu: Make rcu_note_context_switch() do deferred NOCB wakeups
    
    If a CONFIG_RCU_NOCB_CPUS kernel invokes call_rcu() with interrupts
    disabled, wakeups must be deferred in order to avoid self-deadlock in the
    cases where the disabled interrupts are due to scheduler locks being held.
    In this case, a flag is set and is checked on entry to extended quiescent
    states (usermode, idle), on exit from the RCU_SOFTIRQ handler, when the
    CPU in question goes offline, on a subsequent invocation of call_rcu(),
    and from rcu_all_qs().  However, a given CPU could avoid all of those
    states for a considerable length of time.
    
    This commit therefore allows an invocation of rcu_note_context_switch()
    to do the wakeup.  It also makes the wakeup function clear the
    deferred-wakeup flag.
    
    Signed-off-by: Paul E. McKenney <[email protected]>

commit b2a56c8f852c7c0a18409fef84fee3e03fab261c
Author: Paul E. McKenney <[email protected]>
Date:   Wed Jan 25 15:28:49 2017 -0800

    rcu: Make rcu_all_qs() do deferred NOCB wakeups
    
    If a CONFIG_RCU_NOCB_CPUS kernel invokes call_rcu() with interrupts
    disabled, wakeups must be deferred in order to avoid self-deadlock in
    the cases where the disabled interrupts are due to scheduler locks being
    held.  In this case, a flag is set and is checked on entry to extended
    quiescent states (usermode, idle), on exit from the RCU_SOFTIRQ handler,
    when the CPU in question goes offline, and on a subsequent invocation
    of call_rcu().  However, a given CPU could avoid all of those states
    for a considerable length of time.
    
    This commit therefore allows an invocation of rcu_all_qs() to do the
    wakeup.  It also makes the wakeup function clear the deferred-wakeup flag.
    
    Signed-off-by: Paul E. McKenney <[email protected]>

commit ba644f517666b5ff1323988c7c7916e987ea95f0
Author: Paul E. McKenney <[email protected]>
Date:   Tue Jan 24 16:17:42 2017 -0800

    rcu: Make call_rcu() do deferred NOCB wakeups
    
    If a CONFIG_RCU_NOCB_CPUS kernel invokes call_rcu() with interrupts
    disabled, wakeups must be deferred in order to avoid self-deadlock in the
    cases where the disabled interrupts are due to scheduler locks being held.
    In this case, a flag is set and is checked on entry to extended quiescent
    states (usermode, idle), on exit from the RCU_SOFTIRQ handler, and when
    the CPU in question goes offline.  However, a given CPU could avoid all
    of those states for a considerable length of time.
    
    This commit therefore allows a subsequent invocation of call_rcu() with
    interrupts enabled to do the wakeup.  It also makes the wakeup function
    clear the deferred-wakeup flag.
    
    Signed-off-by: Paul E. McKenney <[email protected]>

commit 670dbfdc046ac07b13fa5a6460e198c1729f9363
Author: Paul E. McKenney <[email protected]>
Date:   Mon Jan 23 12:04:46 2017 -0800

    rcu: Semicolon inside RCU_TRACE() for tree.c
    
    The current use of "RCU_TRACE(statement);" can cause odd bugs, especially
    where "statement" is a local-variable declaration, as it can leave a
    misplaced ";" in the source code.  This commit therefore converts these
    to "RCU_TRACE(statement;)", which avoids the misplaced ";".
    
    Reported-by: Josh Triplett <[email protected]>
    Signed-off-by: Paul E. McKenney <[email protected]>

commit 5d5d02869ad82b738abc5230376d8ef14600ce11
Author: Paul E. McKenney <[email protected]>
Date:   Mon Jan 23 12:02:09 2017 -0800

    rcu: Semicolon inside RCU_TRACE() for Tiny RCU
    
    The current use of "RCU_TRACE(statement);" can cause odd bugs, especially
    where "statement" is a local-variable declaration, as it can leave a
    misplaced ";" in the source code.  This commit therefore converts these
    to "RCU_TRACE(statement;)", which avoids the misplaced ";".
    
    Reported-by: Josh Triplett <[email protected]>
    Signed-off-by: Paul E. McKenney <[email protected]>

commit a03e331a2e260781c7ea1e560db16b780a968796
Author: Paul E. McKenney <[email protected]>
Date:   Mon Jan 23 11:55:43 2017 -0800

    rcu: Semicolon inside RCU_TRACE() for rcu.h
    
    The current use of "RCU_TRACE(statement);" can cause odd bugs, especially
    where "statement" is a local-variable declaration, as it can leave a
    misplaced ";" in the source code.  This commit therefore converts these
    to "RCU_TRACE(statement;)", which avoids the misplaced ";".
    
    Reported-by: Josh Triplett <[email protected]>
    Signed-off-by: Paul E. McKenney <[email protected]>

commit e7d87c282e01eca2c4ccbb0a33bb0bbdc451f880
Author: Paul E. McKenney <[email protected]>
Date:   Thu Jan 19 13:40:09 2017 -0800

    srcu: Check for tardy grace-period activity in cleanup_srcu_struct()
    
    Users of SRCU are obliged to complete all grace-period activity before
    invoking cleanup_srcu_struct().  This means that all calls to either
    synchronize_srcu() or synchronize_srcu_expedited() must have returned,
    and all calls to call_srcu() must have returned, and the last call to
    call_srcu() must have been followed by a call to srcu_barrier().
    Furthermore, the caller must have done something to prevent any
    further calls to synchronize_srcu(), synchronize_srcu_expedited(),
    and call_srcu().
    
    Therefore, if there has ever been an invocation of call_srcu() on
    the srcu_struct in question, the sequence of events must be as
    follows:
    
    1.  Prevent any further calls to call_srcu().
    2.  Wait for any pre-existing call_srcu() invocations to return.
    3.  Invoke srcu_barrier().
    4.  It is now safe to invoke cleanup_srcu_struct().
    
    On the other hand, if there has ever been a call to synchronize_srcu()
    or synchronize_srcu_expedited(), the sequence of events must be as
    follows:
    
    1.  Prevent any further calls to synchronize_srcu() or
        synchronize_srcu_expedited().
    2.  Wait for any pre-existing synchronize_srcu() or
        synchronize_srcu_expedited() invocations to return.
    3.  It is now safe to invoke cleanup_srcu_struct().
    
    If there have been calls to all both types of functions (call_srcu()
    and either of synchronize_srcu() and synchronize_srcu_expedited()), then
    the caller must do the first three steps of the call_srcu() procedure
    above and the first two steps of the synchronize_s*() procedure above,
    and only then invoke cleanup_srcu_struct().
    
    Reported-by: Paolo Bonzini <[email protected]>
    Signed-off-by: Paul E. McKenney <[email protected]>

commit 95aaa3b002381ec5f7dfe4e1fb770d95eaf7e856
Author: Paul E. McKenney <[email protected]>
Date:   Thu Jan 19 13:33:17 2017 -0800

    srcu: Consolidate batch checking into rcu_all_batches_empty()
    
    The srcu_reschedule() function invokes rcu_batch_empty() on each of
    the four rcu_batch structures in the srcu_struct in question twice.
    Given that this check will also be needed in cleanup_srcu_struct(), this
    commit consolidates these four checks into a new rcu_all_batches_empty()
    function.
    
    Signed-off-by: Paul E. McKenney <[email protected]>

commit 023998ea4a309150570dcf8ea9dfd9d4a50d973b
Author: Paul E. McKenney <[email protected]>
Date:   Wed Jan 18 02:53:44 2017 -0800

    mm: Rename SLAB_DESTROY_BY_RCU to SLAB_TYPESAFE_BY_RCU
    
    A group of Linux kernel hackers reported chasing a bug that resulted
    from their assumption that SLAB_DESTROY_BY_RCU provided an existence
    guarantee, that is, that no block from such a slab would be reallocated
    during an RCU read-side critical section.  Of course, that is not the
    case.  Instead, SLAB_DESTROY_BY_RCU only prevents freeing of an entire
    slab of blocks.
    
    However, there is a phrase for this, namely "type safety".  This commit
    therefore renames SLAB_DESTROY_BY_RCU to SLAB_TYPESAFE_BY_RCU in order
    to avoid future instances of this sort of confusion.
    
    Signed-off-by: Paul E. McKenney <[email protected]>
    Cc: Christoph Lameter <[email protected]>
    Cc: Pekka Enberg <[email protected]>
    Cc: David Rientjes <[email protected]>
    Cc: Joonsoo Kim <[email protected]>
    Cc: Andrew Morton <[email protected]>
    Cc: <[email protected]>
    Acked-by: Johannes Weiner <[email protected]>
    Acked-by: Vlastimil Babka <[email protected]>
    [ paulmck: Add "tombstone" comments as requested by Eric Dumazet. ]

commit 568af6de058cb2b0c5b98d98ffcf37cdc6bc38a7
Author: Pablo Neira Ayuso <[email protected]>
Date:   Sat Mar 4 19:53:47 2017 +0100

    netfilter: nf_tables: set pktinfo->thoff at AH header if found
    
    Phil Sutter reports that IPv6 AH header matching is broken. From
    userspace, nft generates bytecode that expects to find the AH header at
    NFT_PAYLOAD_TRANSPORT_HEADER both for IPv4 and IPv6. However,
    pktinfo->thoff is set to the inner header after the AH header in IPv6,
    while in IPv4 pktinfo->thoff points to the AH header indeed. This
    behaviour is inconsistent. This patch fixes this problem by updating
    ipv6_find_hdr() to get the IP6_FH_F_AUTH flag so this function stops at
    the AH header, so both IPv4 and IPv6 pktinfo->thoff point to the AH
    header.
    
    This is also inconsistent when trying to match encapsulated headers:
    
    1) A packet that looks like IPv4 + AH + TCP dport 22 will *not* match.
    2) A packet that looks like IPv6 + AH + TCP dport 22 will match.
    
    Reported-by: Phil Sutter <[email protected]>
    Signed-off-by: Pablo Neira Ayuso <[email protected]>

commit 5b3db78c4e90ec0ffc70c62819774e42745fa61c
Author: Paul E. McKenney <[email protected]>
Date:   Wed Nov 30 06:24:30 2016 -0800

    sched,rcu: Make cond_resched() provide RCU quiescent state
    
    There is some confusion as to which of cond_resched() or
    cond_resched_rcu_qs() should be added to long in-kernel loops.
    This commit therefore eliminates the decision by adding RCU
    quiescent states to cond_resched().
    
    Warning: This is a prototype.  For example, it does not correctly
    handle Tasks RCU.  Which is OK for the moment, given that no one
    actually uses Tasks RCU yet.
    
    Reported-by: Michal Hocko <[email protected]>
    Signed-off-by: Paul E. McKenney <[email protected]>
    Cc: Peter Zijlstra <[email protected]>

commit 378b6714c24e8eafd706ab89908ee7efc87ad67a
Author: Paul E. McKenney <[email protected]>
Date:   Sun Jan 15 16:12:09 2017 -0800

    doc: Add mid-boot operation to expedited grace periods
    
    This commit adds a description of how expedited grace periods operate
    during the mid-boot "dead zone", which starts when the scheduler spawns
    the first kthread and ends when all of RCU's kthreads have been spawned.
    In short, before mid-boot, synchronous grace periods can be a no-op.
    After the end of mid-boot, workqueues may be used.  During mid-boot,
    the requesting task drivees the expedited grace period.
    
    Signed-off-by: Paul E. McKenney <[email protected]>

commit 9457f1b256b64dc844c9071a79524f610a65bb19
Author: Paul E. McKenney <[email protected]>
Date:   Sun Jan 15 15:18:22 2017 -0800

    doc: Synchronous RCU grace periods are now legal throughout boot
    
    This commit updates the "Early Boot" section of the RCU requirements
    to describe how synchronous RCU grace periods are now legal throughout
    the boot process.
    
    Signed-off-by: Paul E. McKenney <[email protected]>

commit 376d6c63db54fb8d960dd7d1b66a073028f78723
Author: Paul E. McKenney <[email protected]>
Date:   Sat Jan 14 13:32:50 2017 -0800

    rcu: Make arch select smp_mb__after_unlock_lock() strength
    
    The definition of smp_mb__after_unlock_lock() is currently smp_mb()
    for CONFIG_PPC and a no-op otherwise.  It would be better to instead
    provide an architecture-selectable Kconfig option, and select the
    strength of smp_mb__after_unlock_lock() based on that option.  This
    commit therefore creates CONFIG_ARCH_WEAK_RELACQ, has PPC select it,
    and bases the definition of smp_mb__after_unlock_lock() on this new
    CONFIG_ARCH_WEAK_RELACQ Kconfig option.
    
    Reported-by: Ingo Molnar <[email protected]>
    Signed-off-by: Paul E. McKenney <[email protected]>
    Cc: Peter Zijlstra <[email protected]>
    Cc: Will Deacon <[email protected]>
    Cc: Boqun Feng <[email protected]>
    Cc: Benjamin Herrenschmidt <[email protected]>
    Cc: Paul Mackerras <[email protected]>
    Acked-by: Michael Ellerman <[email protected]>
    Cc: <[email protected]>

commit 8f5fbaac2a39696d0e2945a2ca12135a8fd26ba9
Author: Paul E. McKenney <[email protected]>
Date:   Tue Nov 8 14:25:21 2016 -0800

    rcu: Maintain special bits at bottom of ->dynticks counter
    
    Currently, IPIs are used to force other CPUs to invalidate their TLBs
    in response to a kernel virtual-memory mapping change.  This works, but
    degrades both battery lifetime (for idle CPUs) and real-time response
    (for nohz_full CPUs), and in addition results in unnecessary IPIs due to
    the fact that CPUs executing in usermode are unaffected by stale kernel
    mappings.  It would be better to cause a CPU executing in usermode to
    wait until it is entering kernel mode to do the flush, first to avoid
    interrupting usemode tasks and second to handle multiple flush requests
    with a single flush in the case of a long-running user task.
    
    This commit therefore reserves a bit at the bottom of the ->dynticks
    counter, which is checked upon exit from extended quiescent states.
    If it is set, it is cleared and then a new rcu_eqs_special_exit() macro is
    invoked, which, if not supplied, is an empty single-pass do-while loop.
    If this bottom bit is …
iaguis pushed a commit to kinvolk/linux that referenced this pull request Feb 6, 2018
metux added a commit to metux/linux that referenced this pull request Apr 27, 2019
Fix checkpatch warnings:

    WARNING: line over 80 characters
    torvalds#9: FILE: drivers/tty/serial/apbuart.c:9:
    + *  Copyright (C) 2006 Daniel Hellstrom <[email protected]>, Aeroflex Gaisler AB

    WARNING: line over 80 characters
    torvalds#11: FILE: drivers/tty/serial/apbuart.c:11:
    + *  Copyright (C) 2009 Kristoffer Glembo <[email protected]>, Aeroflex Gaisler AB

    WARNING: line over 80 characters
    torvalds#16: FILE: drivers/tty/serial/apbuart.c:16:
    +#if defined(CONFIG_SERIAL_GRLIB_GAISLER_APBUART_CONSOLE) && defined(CONFIG_MAGIC_SYSRQ)

    WARNING: labels should not be indented
    torvalds#122: FILE: drivers/tty/serial/apbuart.c:122:
    +	      ignore_char:

    WARNING: Missing a blank line after declarations
    torvalds#186: FILE: drivers/tty/serial/apbuart.c:186:
    +	unsigned int status = UART_GET_STATUS(port);
    +	return status & UART_STATUS_THE ? TIOCSER_TEMT : 0;

    WARNING: Missing a blank line after declarations
    torvalds#322: FILE: drivers/tty/serial/apbuart.c:322:
    +	int ret = 0;
    +	if (ser->type != PORT_UNKNOWN && ser->type != PORT_APBUART)

    WARNING: Missing a blank line after declarations
    torvalds#427: FILE: drivers/tty/serial/apbuart.c:427:
    +	unsigned int status;
    +	do {

    WARNING: Missing a blank line after declarations
    torvalds#463: FILE: drivers/tty/serial/apbuart.c:463:
    +		unsigned int quot, status;
    +		status = UART_GET_STATUS(port);

    WARNING: line over 80 characters
    torvalds#627: FILE: drivers/tty/serial/apbuart.c:627:
    +		port->membase = ioremap(addr, sizeof(struct grlib_apbuart_regs_map));

    WARNING: line over 80 characters
    torvalds#634: FILE: drivers/tty/serial/apbuart.c:634:
    +		port->fifosize = apbuart_scan_fifo_size((struct uart_port *) port, line);

Signed-off-by: Enrico Weigelt <[email protected]>
metux added a commit to metux/linux that referenced this pull request Apr 29, 2019
Fix checkpatch warnings:

    WARNING: line over 80 characters
    torvalds#9: FILE: drivers/tty/serial/apbuart.c:9:
    + *  Copyright (C) 2006 Daniel Hellstrom <[email protected]>, Aeroflex Gaisler AB

    WARNING: line over 80 characters
    torvalds#11: FILE: drivers/tty/serial/apbuart.c:11:
    + *  Copyright (C) 2009 Kristoffer Glembo <[email protected]>, Aeroflex Gaisler AB

    WARNING: line over 80 characters
    torvalds#16: FILE: drivers/tty/serial/apbuart.c:16:
    +#if defined(CONFIG_SERIAL_GRLIB_GAISLER_APBUART_CONSOLE) && defined(CONFIG_MAGIC_SYSRQ)

    WARNING: labels should not be indented
    torvalds#122: FILE: drivers/tty/serial/apbuart.c:122:
    +	      ignore_char:

    WARNING: Missing a blank line after declarations
    torvalds#186: FILE: drivers/tty/serial/apbuart.c:186:
    +	unsigned int status = UART_GET_STATUS(port);
    +	return status & UART_STATUS_THE ? TIOCSER_TEMT : 0;

    WARNING: Missing a blank line after declarations
    torvalds#322: FILE: drivers/tty/serial/apbuart.c:322:
    +	int ret = 0;
    +	if (ser->type != PORT_UNKNOWN && ser->type != PORT_APBUART)

    WARNING: Missing a blank line after declarations
    torvalds#427: FILE: drivers/tty/serial/apbuart.c:427:
    +	unsigned int status;
    +	do {

    WARNING: Missing a blank line after declarations
    torvalds#463: FILE: drivers/tty/serial/apbuart.c:463:
    +		unsigned int quot, status;
    +		status = UART_GET_STATUS(port);

    WARNING: line over 80 characters
    torvalds#627: FILE: drivers/tty/serial/apbuart.c:627:
    +		port->membase = ioremap(addr, sizeof(struct grlib_apbuart_regs_map));

    WARNING: line over 80 characters
    torvalds#634: FILE: drivers/tty/serial/apbuart.c:634:
    +		port->fifosize = apbuart_scan_fifo_size((struct uart_port *) port, line);

Signed-off-by: Enrico Weigelt <[email protected]>
metux added a commit to metux/linux that referenced this pull request Apr 30, 2019
Fix checkpatch warnings:

    WARNING: line over 80 characters
    torvalds#9: FILE: drivers/tty/serial/apbuart.c:9:
    + *  Copyright (C) 2006 Daniel Hellstrom <[email protected]>, Aeroflex Gaisler AB

    WARNING: line over 80 characters
    torvalds#11: FILE: drivers/tty/serial/apbuart.c:11:
    + *  Copyright (C) 2009 Kristoffer Glembo <[email protected]>, Aeroflex Gaisler AB

    WARNING: line over 80 characters
    torvalds#16: FILE: drivers/tty/serial/apbuart.c:16:
    +#if defined(CONFIG_SERIAL_GRLIB_GAISLER_APBUART_CONSOLE) && defined(CONFIG_MAGIC_SYSRQ)

    WARNING: labels should not be indented
    torvalds#122: FILE: drivers/tty/serial/apbuart.c:122:
    +	      ignore_char:

    WARNING: Missing a blank line after declarations
    torvalds#186: FILE: drivers/tty/serial/apbuart.c:186:
    +	unsigned int status = UART_GET_STATUS(port);
    +	return status & UART_STATUS_THE ? TIOCSER_TEMT : 0;

    WARNING: Missing a blank line after declarations
    torvalds#322: FILE: drivers/tty/serial/apbuart.c:322:
    +	int ret = 0;
    +	if (ser->type != PORT_UNKNOWN && ser->type != PORT_APBUART)

    WARNING: Missing a blank line after declarations
    torvalds#427: FILE: drivers/tty/serial/apbuart.c:427:
    +	unsigned int status;
    +	do {

    WARNING: Missing a blank line after declarations
    torvalds#463: FILE: drivers/tty/serial/apbuart.c:463:
    +		unsigned int quot, status;
    +		status = UART_GET_STATUS(port);

    WARNING: line over 80 characters
    torvalds#627: FILE: drivers/tty/serial/apbuart.c:627:
    +		port->membase = ioremap(addr, sizeof(struct grlib_apbuart_regs_map));

    WARNING: line over 80 characters
    torvalds#634: FILE: drivers/tty/serial/apbuart.c:634:
    +		port->fifosize = apbuart_scan_fifo_size((struct uart_port *) port, line);

Signed-off-by: Enrico Weigelt <[email protected]>
metux added a commit to metux/linux that referenced this pull request Apr 30, 2019
Fix checkpatch warnings:

    WARNING: line over 80 characters
    torvalds#9: FILE: drivers/tty/serial/apbuart.c:9:
    + *  Copyright (C) 2006 Daniel Hellstrom <[email protected]>, Aeroflex Gaisler AB

    WARNING: line over 80 characters
    torvalds#11: FILE: drivers/tty/serial/apbuart.c:11:
    + *  Copyright (C) 2009 Kristoffer Glembo <[email protected]>, Aeroflex Gaisler AB

    WARNING: line over 80 characters
    torvalds#16: FILE: drivers/tty/serial/apbuart.c:16:
    +#if defined(CONFIG_SERIAL_GRLIB_GAISLER_APBUART_CONSOLE) && defined(CONFIG_MAGIC_SYSRQ)

    WARNING: labels should not be indented
    torvalds#122: FILE: drivers/tty/serial/apbuart.c:122:
    +	      ignore_char:

    WARNING: Missing a blank line after declarations
    torvalds#186: FILE: drivers/tty/serial/apbuart.c:186:
    +	unsigned int status = UART_GET_STATUS(port);
    +	return status & UART_STATUS_THE ? TIOCSER_TEMT : 0;

    WARNING: Missing a blank line after declarations
    torvalds#322: FILE: drivers/tty/serial/apbuart.c:322:
    +	int ret = 0;
    +	if (ser->type != PORT_UNKNOWN && ser->type != PORT_APBUART)

    WARNING: Missing a blank line after declarations
    torvalds#427: FILE: drivers/tty/serial/apbuart.c:427:
    +	unsigned int status;
    +	do {

    WARNING: Missing a blank line after declarations
    torvalds#463: FILE: drivers/tty/serial/apbuart.c:463:
    +		unsigned int quot, status;
    +		status = UART_GET_STATUS(port);

    WARNING: line over 80 characters
    torvalds#627: FILE: drivers/tty/serial/apbuart.c:627:
    +		port->membase = ioremap(addr, sizeof(struct grlib_apbuart_regs_map));

    WARNING: line over 80 characters
    torvalds#634: FILE: drivers/tty/serial/apbuart.c:634:
    +		port->fifosize = apbuart_scan_fifo_size((struct uart_port *) port, line);

Signed-off-by: Enrico Weigelt <[email protected]>
metux added a commit to metux/linux that referenced this pull request Jun 12, 2019
Fix checkpatch warnings:

    WARNING: line over 80 characters
    torvalds#9: FILE: drivers/tty/serial/apbuart.c:9:
    + *  Copyright (C) 2006 Daniel Hellstrom <[email protected]>, Aeroflex Gaisler AB

    WARNING: line over 80 characters
    torvalds#11: FILE: drivers/tty/serial/apbuart.c:11:
    + *  Copyright (C) 2009 Kristoffer Glembo <[email protected]>, Aeroflex Gaisler AB

    WARNING: line over 80 characters
    torvalds#16: FILE: drivers/tty/serial/apbuart.c:16:
    +#if defined(CONFIG_SERIAL_GRLIB_GAISLER_APBUART_CONSOLE) && defined(CONFIG_MAGIC_SYSRQ)

    WARNING: labels should not be indented
    torvalds#122: FILE: drivers/tty/serial/apbuart.c:122:
    +	      ignore_char:

    WARNING: Missing a blank line after declarations
    torvalds#186: FILE: drivers/tty/serial/apbuart.c:186:
    +	unsigned int status = UART_GET_STATUS(port);
    +	return status & UART_STATUS_THE ? TIOCSER_TEMT : 0;

    WARNING: Missing a blank line after declarations
    torvalds#322: FILE: drivers/tty/serial/apbuart.c:322:
    +	int ret = 0;
    +	if (ser->type != PORT_UNKNOWN && ser->type != PORT_APBUART)

    WARNING: Missing a blank line after declarations
    torvalds#427: FILE: drivers/tty/serial/apbuart.c:427:
    +	unsigned int status;
    +	do {

    WARNING: Missing a blank line after declarations
    torvalds#463: FILE: drivers/tty/serial/apbuart.c:463:
    +		unsigned int quot, status;
    +		status = UART_GET_STATUS(port);

    WARNING: line over 80 characters
    torvalds#627: FILE: drivers/tty/serial/apbuart.c:627:
    +		port->membase = ioremap(addr, sizeof(struct grlib_apbuart_regs_map));

    WARNING: line over 80 characters
    torvalds#634: FILE: drivers/tty/serial/apbuart.c:634:
    +		port->fifosize = apbuart_scan_fifo_size((struct uart_port *) port, line);

Signed-off-by: Enrico Weigelt <[email protected]>
metux added a commit to metux/linux that referenced this pull request Jun 27, 2019
Fix checkpatch warnings:

    WARNING: line over 80 characters
    torvalds#9: FILE: drivers/tty/serial/apbuart.c:9:
    + *  Copyright (C) 2006 Daniel Hellstrom <[email protected]>, Aeroflex Gaisler AB

    WARNING: line over 80 characters
    torvalds#11: FILE: drivers/tty/serial/apbuart.c:11:
    + *  Copyright (C) 2009 Kristoffer Glembo <[email protected]>, Aeroflex Gaisler AB

    WARNING: line over 80 characters
    torvalds#16: FILE: drivers/tty/serial/apbuart.c:16:
    +#if defined(CONFIG_SERIAL_GRLIB_GAISLER_APBUART_CONSOLE) && defined(CONFIG_MAGIC_SYSRQ)

    WARNING: labels should not be indented
    torvalds#122: FILE: drivers/tty/serial/apbuart.c:122:
    +	      ignore_char:

    WARNING: Missing a blank line after declarations
    torvalds#186: FILE: drivers/tty/serial/apbuart.c:186:
    +	unsigned int status = UART_GET_STATUS(port);
    +	return status & UART_STATUS_THE ? TIOCSER_TEMT : 0;

    WARNING: Missing a blank line after declarations
    torvalds#322: FILE: drivers/tty/serial/apbuart.c:322:
    +	int ret = 0;
    +	if (ser->type != PORT_UNKNOWN && ser->type != PORT_APBUART)

    WARNING: Missing a blank line after declarations
    torvalds#427: FILE: drivers/tty/serial/apbuart.c:427:
    +	unsigned int status;
    +	do {

    WARNING: Missing a blank line after declarations
    torvalds#463: FILE: drivers/tty/serial/apbuart.c:463:
    +		unsigned int quot, status;
    +		status = UART_GET_STATUS(port);

    WARNING: line over 80 characters
    torvalds#627: FILE: drivers/tty/serial/apbuart.c:627:
    +		port->membase = ioremap(addr, sizeof(struct grlib_apbuart_regs_map));

    WARNING: line over 80 characters
    torvalds#634: FILE: drivers/tty/serial/apbuart.c:634:
    +		port->fifosize = apbuart_scan_fifo_size((struct uart_port *) port, line);

Signed-off-by: Enrico Weigelt <[email protected]>
metux added a commit to metux/linux that referenced this pull request Jul 10, 2019
Fix checkpatch warnings:

    WARNING: line over 80 characters
    torvalds#9: FILE: drivers/tty/serial/apbuart.c:9:
    + *  Copyright (C) 2006 Daniel Hellstrom <[email protected]>, Aeroflex Gaisler AB

    WARNING: line over 80 characters
    torvalds#11: FILE: drivers/tty/serial/apbuart.c:11:
    + *  Copyright (C) 2009 Kristoffer Glembo <[email protected]>, Aeroflex Gaisler AB

    WARNING: line over 80 characters
    torvalds#16: FILE: drivers/tty/serial/apbuart.c:16:
    +#if defined(CONFIG_SERIAL_GRLIB_GAISLER_APBUART_CONSOLE) && defined(CONFIG_MAGIC_SYSRQ)

    WARNING: labels should not be indented
    torvalds#122: FILE: drivers/tty/serial/apbuart.c:122:
    +	      ignore_char:

    WARNING: Missing a blank line after declarations
    torvalds#186: FILE: drivers/tty/serial/apbuart.c:186:
    +	unsigned int status = UART_GET_STATUS(port);
    +	return status & UART_STATUS_THE ? TIOCSER_TEMT : 0;

    WARNING: Missing a blank line after declarations
    torvalds#322: FILE: drivers/tty/serial/apbuart.c:322:
    +	int ret = 0;
    +	if (ser->type != PORT_UNKNOWN && ser->type != PORT_APBUART)

    WARNING: Missing a blank line after declarations
    torvalds#427: FILE: drivers/tty/serial/apbuart.c:427:
    +	unsigned int status;
    +	do {

    WARNING: Missing a blank line after declarations
    torvalds#463: FILE: drivers/tty/serial/apbuart.c:463:
    +		unsigned int quot, status;
    +		status = UART_GET_STATUS(port);

    WARNING: line over 80 characters
    torvalds#627: FILE: drivers/tty/serial/apbuart.c:627:
    +		port->membase = ioremap(addr, sizeof(struct grlib_apbuart_regs_map));

    WARNING: line over 80 characters
    torvalds#634: FILE: drivers/tty/serial/apbuart.c:634:
    +		port->fifosize = apbuart_scan_fifo_size((struct uart_port *) port, line);

Signed-off-by: Enrico Weigelt <[email protected]>
metux added a commit to metux/linux that referenced this pull request Nov 21, 2019
Fix checkpatch warnings:

    WARNING: line over 80 characters
    torvalds#9: FILE: drivers/tty/serial/apbuart.c:9:
    + *  Copyright (C) 2006 Daniel Hellstrom <[email protected]>, Aeroflex Gaisler AB

    WARNING: line over 80 characters
    torvalds#11: FILE: drivers/tty/serial/apbuart.c:11:
    + *  Copyright (C) 2009 Kristoffer Glembo <[email protected]>, Aeroflex Gaisler AB

    WARNING: line over 80 characters
    torvalds#16: FILE: drivers/tty/serial/apbuart.c:16:
    +#if defined(CONFIG_SERIAL_GRLIB_GAISLER_APBUART_CONSOLE) && defined(CONFIG_MAGIC_SYSRQ)

    WARNING: labels should not be indented
    torvalds#122: FILE: drivers/tty/serial/apbuart.c:122:
    +	      ignore_char:

    WARNING: Missing a blank line after declarations
    torvalds#186: FILE: drivers/tty/serial/apbuart.c:186:
    +	unsigned int status = UART_GET_STATUS(port);
    +	return status & UART_STATUS_THE ? TIOCSER_TEMT : 0;

    WARNING: Missing a blank line after declarations
    torvalds#322: FILE: drivers/tty/serial/apbuart.c:322:
    +	int ret = 0;
    +	if (ser->type != PORT_UNKNOWN && ser->type != PORT_APBUART)

    WARNING: Missing a blank line after declarations
    torvalds#427: FILE: drivers/tty/serial/apbuart.c:427:
    +	unsigned int status;
    +	do {

    WARNING: Missing a blank line after declarations
    torvalds#463: FILE: drivers/tty/serial/apbuart.c:463:
    +		unsigned int quot, status;
    +		status = UART_GET_STATUS(port);

    WARNING: line over 80 characters
    torvalds#627: FILE: drivers/tty/serial/apbuart.c:627:
    +		port->membase = ioremap(addr, sizeof(struct grlib_apbuart_regs_map));

    WARNING: line over 80 characters
    torvalds#634: FILE: drivers/tty/serial/apbuart.c:634:
    +		port->fifosize = apbuart_scan_fifo_size((struct uart_port *) port, line);

Signed-off-by: Enrico Weigelt <[email protected]>
metux added a commit to metux/linux that referenced this pull request Jan 10, 2020
Fix checkpatch warnings:

    WARNING: line over 80 characters
    torvalds#9: FILE: drivers/tty/serial/apbuart.c:9:
    + *  Copyright (C) 2006 Daniel Hellstrom <[email protected]>, Aeroflex Gaisler AB

    WARNING: line over 80 characters
    torvalds#11: FILE: drivers/tty/serial/apbuart.c:11:
    + *  Copyright (C) 2009 Kristoffer Glembo <[email protected]>, Aeroflex Gaisler AB

    WARNING: line over 80 characters
    torvalds#16: FILE: drivers/tty/serial/apbuart.c:16:
    +#if defined(CONFIG_SERIAL_GRLIB_GAISLER_APBUART_CONSOLE) && defined(CONFIG_MAGIC_SYSRQ)

    WARNING: labels should not be indented
    torvalds#122: FILE: drivers/tty/serial/apbuart.c:122:
    +	      ignore_char:

    WARNING: Missing a blank line after declarations
    torvalds#186: FILE: drivers/tty/serial/apbuart.c:186:
    +	unsigned int status = UART_GET_STATUS(port);
    +	return status & UART_STATUS_THE ? TIOCSER_TEMT : 0;

    WARNING: Missing a blank line after declarations
    torvalds#322: FILE: drivers/tty/serial/apbuart.c:322:
    +	int ret = 0;
    +	if (ser->type != PORT_UNKNOWN && ser->type != PORT_APBUART)

    WARNING: Missing a blank line after declarations
    torvalds#427: FILE: drivers/tty/serial/apbuart.c:427:
    +	unsigned int status;
    +	do {

    WARNING: Missing a blank line after declarations
    torvalds#463: FILE: drivers/tty/serial/apbuart.c:463:
    +		unsigned int quot, status;
    +		status = UART_GET_STATUS(port);

    WARNING: line over 80 characters
    torvalds#627: FILE: drivers/tty/serial/apbuart.c:627:
    +		port->membase = ioremap(addr, sizeof(struct grlib_apbuart_regs_map));

    WARNING: line over 80 characters
    torvalds#634: FILE: drivers/tty/serial/apbuart.c:634:
    +		port->fifosize = apbuart_scan_fifo_size((struct uart_port *) port, line);

Signed-off-by: Enrico Weigelt <[email protected]>
fengguang pushed a commit to 0day-ci/linux that referenced this pull request Apr 9, 2020
Reading TMC mode register in tmc_read_prepare_etb without
enabling the TMC hardware leads to async exceptions like
the one in the call trace below. This can happen if the
user tries to read the TMC etf data via device node without
setting up source and the sink first which enables the TMC
hardware in the path. So make sure that the TMC is enabled
before we try to read TMC data.

 Kernel panic - not syncing: Asynchronous SError Interrupt
 CPU: 7 PID: 2605 Comm: hexdump Tainted: G S                5.4.30 torvalds#122
 Call trace:
  dump_backtrace+0x0/0x188
  show_stack+0x20/0x2c
  dump_stack+0xdc/0x144
  panic+0x168/0x36c
  panic+0x0/0x36c
  arm64_serror_panic+0x78/0x84
  do_serror+0x130/0x138
  el1_error+0x84/0xf8
  tmc_read_prepare_etb+0x88/0xb8
  tmc_open+0x40/0xd8
  misc_open+0x120/0x158
  chrdev_open+0xb8/0x1a4
  do_dentry_open+0x268/0x3a0
  vfs_open+0x34/0x40
  path_openat+0x39c/0xdf4
  do_filp_open+0x90/0x10c
  do_sys_open+0x150/0x3e8
  __arm64_compat_sys_openat+0x28/0x34
  el0_svc_common+0xa8/0x160
  el0_svc_compat_handler+0x2c/0x38
  el0_svc_compat+0x8/0x10

Fixes: 4525412 ("coresight: tmc: making prepare/unprepare functions generic")
Reported-by: Stephen Boyd <[email protected]>
Signed-off-by: Sai Prakash Ranjan <[email protected]>
fengguang pushed a commit to 0day-ci/linux that referenced this pull request Apr 16, 2020
On some QCOM platforms like SC7180, SDM845 and SM8150,
reading TMC mode register without proper coresight power
management can lead to async exceptions like the one in
the call trace below in tmc_read_prepare_etb(). This can
happen if the user tries to read the TMC etf data via
device node without setting up source and the sink first.
Fix this by having a check for coresight sysfs mode
before reading TMC mode management register.

 Kernel panic - not syncing: Asynchronous SError Interrupt
 CPU: 7 PID: 2605 Comm: hexdump Tainted: G S                5.4.30 torvalds#122
 Call trace:
  dump_backtrace+0x0/0x188
  show_stack+0x20/0x2c
  dump_stack+0xdc/0x144
  panic+0x168/0x36c
  panic+0x0/0x36c
  arm64_serror_panic+0x78/0x84
  do_serror+0x130/0x138
  el1_error+0x84/0xf8
  tmc_read_prepare_etb+0x88/0xb8
  tmc_open+0x40/0xd8
  misc_open+0x120/0x158
  chrdev_open+0xb8/0x1a4
  do_dentry_open+0x268/0x3a0
  vfs_open+0x34/0x40
  path_openat+0x39c/0xdf4
  do_filp_open+0x90/0x10c
  do_sys_open+0x150/0x3e8
  __arm64_compat_sys_openat+0x28/0x34
  el0_svc_common+0xa8/0x160
  el0_svc_compat_handler+0x2c/0x38
  el0_svc_compat+0x8/0x10

Fixes: 4525412 ("coresight: tmc: making prepare/unprepare functions generic")
Reported-by: Stephen Boyd <[email protected]>
Suggested-by: Mathieu Poirier <[email protected]>
Signed-off-by: Sai Prakash Ranjan <[email protected]>
ruscur pushed a commit to ruscur/linux that referenced this pull request Apr 21, 2020
On some QCOM platforms like SC7180, SDM845 and SM8150,
reading TMC mode register without proper coresight power
management can lead to async exceptions like the one in
the call trace below in tmc_read_prepare_etb(). This can
happen if the user tries to read the TMC etf data via
device node without setting up source and the sink first.
Fix this by having a check for coresight sysfs mode
before reading TMC mode management register.

 Kernel panic - not syncing: Asynchronous SError Interrupt
 CPU: 7 PID: 2605 Comm: hexdump Tainted: G S                5.4.30 torvalds#122
 Call trace:
  dump_backtrace+0x0/0x188
  show_stack+0x20/0x2c
  dump_stack+0xdc/0x144
  panic+0x168/0x36c
  panic+0x0/0x36c
  arm64_serror_panic+0x78/0x84
  do_serror+0x130/0x138
  el1_error+0x84/0xf8
  tmc_read_prepare_etb+0x88/0xb8
  tmc_open+0x40/0xd8
  misc_open+0x120/0x158
  chrdev_open+0xb8/0x1a4
  do_dentry_open+0x268/0x3a0
  vfs_open+0x34/0x40
  path_openat+0x39c/0xdf4
  do_filp_open+0x90/0x10c
  do_sys_open+0x150/0x3e8
  __arm64_compat_sys_openat+0x28/0x34
  el0_svc_common+0xa8/0x160
  el0_svc_compat_handler+0x2c/0x38
  el0_svc_compat+0x8/0x10

Fixes: 4525412 ("coresight: tmc: making prepare/unprepare functions generic")
Reported-by: Stephen Boyd <[email protected]>
Suggested-by: Mathieu Poirier <[email protected]>
Signed-off-by: Sai Prakash Ranjan <[email protected]>
Signed-off-by: Mathieu Poirier <[email protected]>
fengguang pushed a commit to 0day-ci/linux that referenced this pull request May 19, 2020
On some QCOM platforms like SC7180, SDM845 and SM8150,
reading TMC mode register without proper coresight power
management can lead to async exceptions like the one in
the call trace below in tmc_read_prepare_etb(). This can
happen if the user tries to read the TMC etf data via
device node without setting up source and the sink first.
Fix this by having a check for coresight sysfs mode
before reading TMC mode management register.

 Kernel panic - not syncing: Asynchronous SError Interrupt
 CPU: 7 PID: 2605 Comm: hexdump Tainted: G S                5.4.30 torvalds#122
 Call trace:
  dump_backtrace+0x0/0x188
  show_stack+0x20/0x2c
  dump_stack+0xdc/0x144
  panic+0x168/0x36c
  panic+0x0/0x36c
  arm64_serror_panic+0x78/0x84
  do_serror+0x130/0x138
  el1_error+0x84/0xf8
  tmc_read_prepare_etb+0x88/0xb8
  tmc_open+0x40/0xd8
  misc_open+0x120/0x158
  chrdev_open+0xb8/0x1a4
  do_dentry_open+0x268/0x3a0
  vfs_open+0x34/0x40
  path_openat+0x39c/0xdf4
  do_filp_open+0x90/0x10c
  do_sys_open+0x150/0x3e8
  __arm64_compat_sys_openat+0x28/0x34
  el0_svc_common+0xa8/0x160
  el0_svc_compat_handler+0x2c/0x38
  el0_svc_compat+0x8/0x10

Fixes: 4525412 ("coresight: tmc: making prepare/unprepare functions generic")
Reported-by: Stephen Boyd <[email protected]>
Suggested-by: Mathieu Poirier <[email protected]>
Signed-off-by: Sai Prakash Ranjan <[email protected]>
Signed-off-by: Mathieu Poirier <[email protected]>
fengguang pushed a commit to 0day-ci/linux that referenced this pull request May 19, 2020
On some QCOM platforms like SC7180, SDM845 and SM8150,
reading TMC mode register without proper coresight power
management can lead to async exceptions like the one in
the call trace below in tmc_read_prepare_etb(). This can
happen if the user tries to read the TMC etf data via
device node without setting up source and the sink first.
Fix this by having a check for coresight sysfs mode
before reading TMC mode management register.

 Kernel panic - not syncing: Asynchronous SError Interrupt
 CPU: 7 PID: 2605 Comm: hexdump Tainted: G S                5.4.30 torvalds#122
 Call trace:
  dump_backtrace+0x0/0x188
  show_stack+0x20/0x2c
  dump_stack+0xdc/0x144
  panic+0x168/0x36c
  panic+0x0/0x36c
  arm64_serror_panic+0x78/0x84
  do_serror+0x130/0x138
  el1_error+0x84/0xf8
  tmc_read_prepare_etb+0x88/0xb8
  tmc_open+0x40/0xd8
  misc_open+0x120/0x158
  chrdev_open+0xb8/0x1a4
  do_dentry_open+0x268/0x3a0
  vfs_open+0x34/0x40
  path_openat+0x39c/0xdf4
  do_filp_open+0x90/0x10c
  do_sys_open+0x150/0x3e8
  __arm64_compat_sys_openat+0x28/0x34
  el0_svc_common+0xa8/0x160
  el0_svc_compat_handler+0x2c/0x38
  el0_svc_compat+0x8/0x10

Fixes: 4525412 ("coresight: tmc: making prepare/unprepare functions generic")
Reported-by: Stephen Boyd <[email protected]>
Suggested-by: Mathieu Poirier <[email protected]>
Signed-off-by: Sai Prakash Ranjan <[email protected]>
Signed-off-by: Mathieu Poirier <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Greg Kroah-Hartman <[email protected]>
Noltari pushed a commit to Noltari/linux that referenced this pull request Jun 24, 2020
[ Upstream commit 347adb0 ]

On some QCOM platforms like SC7180, SDM845 and SM8150,
reading TMC mode register without proper coresight power
management can lead to async exceptions like the one in
the call trace below in tmc_read_prepare_etb(). This can
happen if the user tries to read the TMC etf data via
device node without setting up source and the sink first.
Fix this by having a check for coresight sysfs mode
before reading TMC mode management register.

 Kernel panic - not syncing: Asynchronous SError Interrupt
 CPU: 7 PID: 2605 Comm: hexdump Tainted: G S                5.4.30 torvalds#122
 Call trace:
  dump_backtrace+0x0/0x188
  show_stack+0x20/0x2c
  dump_stack+0xdc/0x144
  panic+0x168/0x36c
  panic+0x0/0x36c
  arm64_serror_panic+0x78/0x84
  do_serror+0x130/0x138
  el1_error+0x84/0xf8
  tmc_read_prepare_etb+0x88/0xb8
  tmc_open+0x40/0xd8
  misc_open+0x120/0x158
  chrdev_open+0xb8/0x1a4
  do_dentry_open+0x268/0x3a0
  vfs_open+0x34/0x40
  path_openat+0x39c/0xdf4
  do_filp_open+0x90/0x10c
  do_sys_open+0x150/0x3e8
  __arm64_compat_sys_openat+0x28/0x34
  el0_svc_common+0xa8/0x160
  el0_svc_compat_handler+0x2c/0x38
  el0_svc_compat+0x8/0x10

Fixes: 4525412 ("coresight: tmc: making prepare/unprepare functions generic")
Reported-by: Stephen Boyd <[email protected]>
Suggested-by: Mathieu Poirier <[email protected]>
Signed-off-by: Sai Prakash Ranjan <[email protected]>
Signed-off-by: Mathieu Poirier <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Greg Kroah-Hartman <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
snajpa pushed a commit to vpsfreecz/linux that referenced this pull request Jun 24, 2020
[ Upstream commit 347adb0 ]

On some QCOM platforms like SC7180, SDM845 and SM8150,
reading TMC mode register without proper coresight power
management can lead to async exceptions like the one in
the call trace below in tmc_read_prepare_etb(). This can
happen if the user tries to read the TMC etf data via
device node without setting up source and the sink first.
Fix this by having a check for coresight sysfs mode
before reading TMC mode management register.

 Kernel panic - not syncing: Asynchronous SError Interrupt
 CPU: 7 PID: 2605 Comm: hexdump Tainted: G S                5.4.30 torvalds#122
 Call trace:
  dump_backtrace+0x0/0x188
  show_stack+0x20/0x2c
  dump_stack+0xdc/0x144
  panic+0x168/0x36c
  panic+0x0/0x36c
  arm64_serror_panic+0x78/0x84
  do_serror+0x130/0x138
  el1_error+0x84/0xf8
  tmc_read_prepare_etb+0x88/0xb8
  tmc_open+0x40/0xd8
  misc_open+0x120/0x158
  chrdev_open+0xb8/0x1a4
  do_dentry_open+0x268/0x3a0
  vfs_open+0x34/0x40
  path_openat+0x39c/0xdf4
  do_filp_open+0x90/0x10c
  do_sys_open+0x150/0x3e8
  __arm64_compat_sys_openat+0x28/0x34
  el0_svc_common+0xa8/0x160
  el0_svc_compat_handler+0x2c/0x38
  el0_svc_compat+0x8/0x10

Fixes: 4525412 ("coresight: tmc: making prepare/unprepare functions generic")
Reported-by: Stephen Boyd <[email protected]>
Suggested-by: Mathieu Poirier <[email protected]>
Signed-off-by: Sai Prakash Ranjan <[email protected]>
Signed-off-by: Mathieu Poirier <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Greg Kroah-Hartman <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
MingcongBai pushed a commit to AOSC-Tracking/linux that referenced this pull request Dec 24, 2023
Like commit 1cf3bfc ("bpf: Support 64-bit pointers to kfuncs")
for s390x, add support for 64-bit pointers to kfuncs for LoongArch.
Since the infrastructure is already implemented in BPF core, the only
thing need to be done is to override bpf_jit_supports_far_kfunc_call().

Before this change, several test_verifier tests failed:

  # ./test_verifier | grep # | grep FAIL
  torvalds#119/p calls: invalid kfunc call: ptr_to_mem to struct with non-scalar FAIL
  torvalds#120/p calls: invalid kfunc call: ptr_to_mem to struct with nesting depth > 4 FAIL
  torvalds#121/p calls: invalid kfunc call: ptr_to_mem to struct with FAM FAIL
  torvalds#122/p calls: invalid kfunc call: reg->type != PTR_TO_CTX FAIL
  torvalds#123/p calls: invalid kfunc call: void * not allowed in func proto without mem size arg FAIL
  torvalds#124/p calls: trigger reg2btf_ids[reg->type] for reg->type > __BPF_REG_TYPE_MAX FAIL
  torvalds#125/p calls: invalid kfunc call: reg->off must be zero when passed to release kfunc FAIL
  torvalds#126/p calls: invalid kfunc call: don't match first member type when passed to release kfunc FAIL
  torvalds#127/p calls: invalid kfunc call: PTR_TO_BTF_ID with negative offset FAIL
  torvalds#128/p calls: invalid kfunc call: PTR_TO_BTF_ID with variable offset FAIL
  torvalds#129/p calls: invalid kfunc call: referenced arg needs refcounted PTR_TO_BTF_ID FAIL
  torvalds#130/p calls: valid kfunc call: referenced arg needs refcounted PTR_TO_BTF_ID FAIL
  torvalds#486/p map_kptr: ref: reference state created and released on xchg FAIL

This is because the kfuncs in the loaded module are far away from
__bpf_call_base:

  ffff800002009440 t bpf_kfunc_call_test_fail1    [bpf_testmod]
  9000000002e128d8 T __bpf_call_base

The offset relative to __bpf_call_base does NOT fit in s32, which breaks
the assumption in BPF core. Enable bpf_jit_supports_far_kfunc_call() lifts
this limit.

Note that to reproduce the above result, tools/testing/selftests/bpf/config
should be applied, and run the test with JIT enabled, unpriv BPF enabled.

With this change, the test_verifier tests now all passed:

  # ./test_verifier
  ...
  Summary: 777 PASSED, 0 SKIPPED, 0 FAILED

Tested-by: Tiezhu Yang <[email protected]>
Signed-off-by: Hengqi Chen <[email protected]>
Signed-off-by: Huacai Chen <[email protected]>
shikongzhineng pushed a commit to shikongzhineng/linux that referenced this pull request Dec 28, 2023
Like commit 1cf3bfc ("bpf: Support 64-bit pointers to kfuncs")
for s390x, add support for 64-bit pointers to kfuncs for LoongArch.
Since the infrastructure is already implemented in BPF core, the only
thing need to be done is to override bpf_jit_supports_far_kfunc_call().

Before this change, several test_verifier tests failed:

  # ./test_verifier | grep # | grep FAIL
  torvalds#119/p calls: invalid kfunc call: ptr_to_mem to struct with non-scalar FAIL
  torvalds#120/p calls: invalid kfunc call: ptr_to_mem to struct with nesting depth > 4 FAIL
  torvalds#121/p calls: invalid kfunc call: ptr_to_mem to struct with FAM FAIL
  torvalds#122/p calls: invalid kfunc call: reg->type != PTR_TO_CTX FAIL
  torvalds#123/p calls: invalid kfunc call: void * not allowed in func proto without mem size arg FAIL
  torvalds#124/p calls: trigger reg2btf_ids[reg->type] for reg->type > __BPF_REG_TYPE_MAX FAIL
  torvalds#125/p calls: invalid kfunc call: reg->off must be zero when passed to release kfunc FAIL
  torvalds#126/p calls: invalid kfunc call: don't match first member type when passed to release kfunc FAIL
  torvalds#127/p calls: invalid kfunc call: PTR_TO_BTF_ID with negative offset FAIL
  torvalds#128/p calls: invalid kfunc call: PTR_TO_BTF_ID with variable offset FAIL
  torvalds#129/p calls: invalid kfunc call: referenced arg needs refcounted PTR_TO_BTF_ID FAIL
  torvalds#130/p calls: valid kfunc call: referenced arg needs refcounted PTR_TO_BTF_ID FAIL
  torvalds#486/p map_kptr: ref: reference state created and released on xchg FAIL

This is because the kfuncs in the loaded module are far away from
__bpf_call_base:

  ffff800002009440 t bpf_kfunc_call_test_fail1    [bpf_testmod]
  9000000002e128d8 T __bpf_call_base

The offset relative to __bpf_call_base does NOT fit in s32, which breaks
the assumption in BPF core. Enable bpf_jit_supports_far_kfunc_call() lifts
this limit.

Note that to reproduce the above result, tools/testing/selftests/bpf/config
should be applied, and run the test with JIT enabled, unpriv BPF enabled.

With this change, the test_verifier tests now all passed:

  # ./test_verifier
  ...
  Summary: 777 PASSED, 0 SKIPPED, 0 FAILED

Tested-by: Tiezhu Yang <[email protected]>
Signed-off-by: Hengqi Chen <[email protected]>
Signed-off-by: Huacai Chen <[email protected]>
yetist pushed a commit to loongarchlinux/linux that referenced this pull request Jan 9, 2024
Like commit 1cf3bfc ("bpf: Support 64-bit pointers to kfuncs")
for s390x, add support for 64-bit pointers to kfuncs for LoongArch.
Since the infrastructure is already implemented in BPF core, the only
thing need to be done is to override bpf_jit_supports_far_kfunc_call().

Before this change, several test_verifier tests failed:

  # ./test_verifier | grep # | grep FAIL
  torvalds#119/p calls: invalid kfunc call: ptr_to_mem to struct with non-scalar FAIL
  torvalds#120/p calls: invalid kfunc call: ptr_to_mem to struct with nesting depth > 4 FAIL
  torvalds#121/p calls: invalid kfunc call: ptr_to_mem to struct with FAM FAIL
  torvalds#122/p calls: invalid kfunc call: reg->type != PTR_TO_CTX FAIL
  torvalds#123/p calls: invalid kfunc call: void * not allowed in func proto without mem size arg FAIL
  torvalds#124/p calls: trigger reg2btf_ids[reg->type] for reg->type > __BPF_REG_TYPE_MAX FAIL
  torvalds#125/p calls: invalid kfunc call: reg->off must be zero when passed to release kfunc FAIL
  torvalds#126/p calls: invalid kfunc call: don't match first member type when passed to release kfunc FAIL
  torvalds#127/p calls: invalid kfunc call: PTR_TO_BTF_ID with negative offset FAIL
  torvalds#128/p calls: invalid kfunc call: PTR_TO_BTF_ID with variable offset FAIL
  torvalds#129/p calls: invalid kfunc call: referenced arg needs refcounted PTR_TO_BTF_ID FAIL
  torvalds#130/p calls: valid kfunc call: referenced arg needs refcounted PTR_TO_BTF_ID FAIL
  torvalds#486/p map_kptr: ref: reference state created and released on xchg FAIL

This is because the kfuncs in the loaded module are far away from
__bpf_call_base:

  ffff800002009440 t bpf_kfunc_call_test_fail1    [bpf_testmod]
  9000000002e128d8 T __bpf_call_base

The offset relative to __bpf_call_base does NOT fit in s32, which breaks
the assumption in BPF core. Enable bpf_jit_supports_far_kfunc_call() lifts
this limit.

Note that to reproduce the above result, tools/testing/selftests/bpf/config
should be applied, and run the test with JIT enabled, unpriv BPF enabled.

With this change, the test_verifier tests now all passed:

  # ./test_verifier
  ...
  Summary: 777 PASSED, 0 SKIPPED, 0 FAILED

Tested-by: Tiezhu Yang <[email protected]>
Signed-off-by: Hengqi Chen <[email protected]>
Signed-off-by: Huacai Chen <[email protected]>
intel-lab-lkp pushed a commit to intel-lab-lkp/linux that referenced this pull request Jan 9, 2024
Like commit 1cf3bfc ("bpf: Support 64-bit pointers to kfuncs")
for s390x, add support for 64-bit pointers to kfuncs for LoongArch.
Since the infrastructure is already implemented in BPF core, the only
thing need to be done is to override bpf_jit_supports_far_kfunc_call().

Before this change, several test_verifier tests failed:

  # ./test_verifier | grep # | grep FAIL
  torvalds#119/p calls: invalid kfunc call: ptr_to_mem to struct with non-scalar FAIL
  torvalds#120/p calls: invalid kfunc call: ptr_to_mem to struct with nesting depth > 4 FAIL
  torvalds#121/p calls: invalid kfunc call: ptr_to_mem to struct with FAM FAIL
  torvalds#122/p calls: invalid kfunc call: reg->type != PTR_TO_CTX FAIL
  torvalds#123/p calls: invalid kfunc call: void * not allowed in func proto without mem size arg FAIL
  torvalds#124/p calls: trigger reg2btf_ids[reg->type] for reg->type > __BPF_REG_TYPE_MAX FAIL
  torvalds#125/p calls: invalid kfunc call: reg->off must be zero when passed to release kfunc FAIL
  torvalds#126/p calls: invalid kfunc call: don't match first member type when passed to release kfunc FAIL
  torvalds#127/p calls: invalid kfunc call: PTR_TO_BTF_ID with negative offset FAIL
  torvalds#128/p calls: invalid kfunc call: PTR_TO_BTF_ID with variable offset FAIL
  torvalds#129/p calls: invalid kfunc call: referenced arg needs refcounted PTR_TO_BTF_ID FAIL
  torvalds#130/p calls: valid kfunc call: referenced arg needs refcounted PTR_TO_BTF_ID FAIL
  torvalds#486/p map_kptr: ref: reference state created and released on xchg FAIL

This is because the kfuncs in the loaded module are far away from
__bpf_call_base:

  ffff800002009440 t bpf_kfunc_call_test_fail1    [bpf_testmod]
  9000000002e128d8 T __bpf_call_base

The offset relative to __bpf_call_base does NOT fit in s32, which breaks
the assumption in BPF core. Enable bpf_jit_supports_far_kfunc_call() lifts
this limit.

Note that to reproduce the above result, tools/testing/selftests/bpf/config
should be applied, and run the test with JIT enabled, unpriv BPF enabled.

With this change, the test_verifier tests now all passed:

  # ./test_verifier
  ...
  Summary: 777 PASSED, 0 SKIPPED, 0 FAILED

Tested-by: Tiezhu Yang <[email protected]>
Signed-off-by: Hengqi Chen <[email protected]>
Signed-off-by: Huacai Chen <[email protected]>
arinc9 pushed a commit to arinc9/linux that referenced this pull request Jan 10, 2024
Like commit 1cf3bfc ("bpf: Support 64-bit pointers to kfuncs")
for s390x, add support for 64-bit pointers to kfuncs for LoongArch.
Since the infrastructure is already implemented in BPF core, the only
thing need to be done is to override bpf_jit_supports_far_kfunc_call().

Before this change, several test_verifier tests failed:

  # ./test_verifier | grep # | grep FAIL
  torvalds#119/p calls: invalid kfunc call: ptr_to_mem to struct with non-scalar FAIL
  torvalds#120/p calls: invalid kfunc call: ptr_to_mem to struct with nesting depth > 4 FAIL
  torvalds#121/p calls: invalid kfunc call: ptr_to_mem to struct with FAM FAIL
  torvalds#122/p calls: invalid kfunc call: reg->type != PTR_TO_CTX FAIL
  torvalds#123/p calls: invalid kfunc call: void * not allowed in func proto without mem size arg FAIL
  torvalds#124/p calls: trigger reg2btf_ids[reg->type] for reg->type > __BPF_REG_TYPE_MAX FAIL
  torvalds#125/p calls: invalid kfunc call: reg->off must be zero when passed to release kfunc FAIL
  torvalds#126/p calls: invalid kfunc call: don't match first member type when passed to release kfunc FAIL
  torvalds#127/p calls: invalid kfunc call: PTR_TO_BTF_ID with negative offset FAIL
  torvalds#128/p calls: invalid kfunc call: PTR_TO_BTF_ID with variable offset FAIL
  torvalds#129/p calls: invalid kfunc call: referenced arg needs refcounted PTR_TO_BTF_ID FAIL
  torvalds#130/p calls: valid kfunc call: referenced arg needs refcounted PTR_TO_BTF_ID FAIL
  torvalds#486/p map_kptr: ref: reference state created and released on xchg FAIL

This is because the kfuncs in the loaded module are far away from
__bpf_call_base:

  ffff800002009440 t bpf_kfunc_call_test_fail1    [bpf_testmod]
  9000000002e128d8 T __bpf_call_base

The offset relative to __bpf_call_base does NOT fit in s32, which breaks
the assumption in BPF core. Enable bpf_jit_supports_far_kfunc_call() lifts
this limit.

Note that to reproduce the above result, tools/testing/selftests/bpf/config
should be applied, and run the test with JIT enabled, unpriv BPF enabled.

With this change, the test_verifier tests now all passed:

  # ./test_verifier
  ...
  Summary: 777 PASSED, 0 SKIPPED, 0 FAILED

Tested-by: Tiezhu Yang <[email protected]>
Signed-off-by: Hengqi Chen <[email protected]>
Signed-off-by: Huacai Chen <[email protected]>
shikongzhineng pushed a commit to shikongzhineng/linux that referenced this pull request Jan 10, 2024
Like commit 1cf3bfc ("bpf: Support 64-bit pointers to kfuncs")
for s390x, add support for 64-bit pointers to kfuncs for LoongArch.
Since the infrastructure is already implemented in BPF core, the only
thing need to be done is to override bpf_jit_supports_far_kfunc_call().

Before this change, several test_verifier tests failed:

  # ./test_verifier | grep # | grep FAIL
  torvalds#119/p calls: invalid kfunc call: ptr_to_mem to struct with non-scalar FAIL
  torvalds#120/p calls: invalid kfunc call: ptr_to_mem to struct with nesting depth > 4 FAIL
  torvalds#121/p calls: invalid kfunc call: ptr_to_mem to struct with FAM FAIL
  torvalds#122/p calls: invalid kfunc call: reg->type != PTR_TO_CTX FAIL
  torvalds#123/p calls: invalid kfunc call: void * not allowed in func proto without mem size arg FAIL
  torvalds#124/p calls: trigger reg2btf_ids[reg->type] for reg->type > __BPF_REG_TYPE_MAX FAIL
  torvalds#125/p calls: invalid kfunc call: reg->off must be zero when passed to release kfunc FAIL
  torvalds#126/p calls: invalid kfunc call: don't match first member type when passed to release kfunc FAIL
  torvalds#127/p calls: invalid kfunc call: PTR_TO_BTF_ID with negative offset FAIL
  torvalds#128/p calls: invalid kfunc call: PTR_TO_BTF_ID with variable offset FAIL
  torvalds#129/p calls: invalid kfunc call: referenced arg needs refcounted PTR_TO_BTF_ID FAIL
  torvalds#130/p calls: valid kfunc call: referenced arg needs refcounted PTR_TO_BTF_ID FAIL
  torvalds#486/p map_kptr: ref: reference state created and released on xchg FAIL

This is because the kfuncs in the loaded module are far away from
__bpf_call_base:

  ffff800002009440 t bpf_kfunc_call_test_fail1    [bpf_testmod]
  9000000002e128d8 T __bpf_call_base

The offset relative to __bpf_call_base does NOT fit in s32, which breaks
the assumption in BPF core. Enable bpf_jit_supports_far_kfunc_call() lifts
this limit.

Note that to reproduce the above result, tools/testing/selftests/bpf/config
should be applied, and run the test with JIT enabled, unpriv BPF enabled.

With this change, the test_verifier tests now all passed:

  # ./test_verifier
  ...
  Summary: 777 PASSED, 0 SKIPPED, 0 FAILED

Tested-by: Tiezhu Yang <[email protected]>
Signed-off-by: Hengqi Chen <[email protected]>
Signed-off-by: Huacai Chen <[email protected]>
Gelbpunkt pushed a commit to sm8450-mainline/linux that referenced this pull request Jan 11, 2024
Like commit 1cf3bfc ("bpf: Support 64-bit pointers to kfuncs")
for s390x, add support for 64-bit pointers to kfuncs for LoongArch.
Since the infrastructure is already implemented in BPF core, the only
thing need to be done is to override bpf_jit_supports_far_kfunc_call().

Before this change, several test_verifier tests failed:

  # ./test_verifier | grep # | grep FAIL
  torvalds#119/p calls: invalid kfunc call: ptr_to_mem to struct with non-scalar FAIL
  torvalds#120/p calls: invalid kfunc call: ptr_to_mem to struct with nesting depth > 4 FAIL
  torvalds#121/p calls: invalid kfunc call: ptr_to_mem to struct with FAM FAIL
  torvalds#122/p calls: invalid kfunc call: reg->type != PTR_TO_CTX FAIL
  torvalds#123/p calls: invalid kfunc call: void * not allowed in func proto without mem size arg FAIL
  torvalds#124/p calls: trigger reg2btf_ids[reg->type] for reg->type > __BPF_REG_TYPE_MAX FAIL
  torvalds#125/p calls: invalid kfunc call: reg->off must be zero when passed to release kfunc FAIL
  torvalds#126/p calls: invalid kfunc call: don't match first member type when passed to release kfunc FAIL
  torvalds#127/p calls: invalid kfunc call: PTR_TO_BTF_ID with negative offset FAIL
  torvalds#128/p calls: invalid kfunc call: PTR_TO_BTF_ID with variable offset FAIL
  torvalds#129/p calls: invalid kfunc call: referenced arg needs refcounted PTR_TO_BTF_ID FAIL
  torvalds#130/p calls: valid kfunc call: referenced arg needs refcounted PTR_TO_BTF_ID FAIL
  torvalds#486/p map_kptr: ref: reference state created and released on xchg FAIL

This is because the kfuncs in the loaded module are far away from
__bpf_call_base:

  ffff800002009440 t bpf_kfunc_call_test_fail1    [bpf_testmod]
  9000000002e128d8 T __bpf_call_base

The offset relative to __bpf_call_base does NOT fit in s32, which breaks
the assumption in BPF core. Enable bpf_jit_supports_far_kfunc_call() lifts
this limit.

Note that to reproduce the above result, tools/testing/selftests/bpf/config
should be applied, and run the test with JIT enabled, unpriv BPF enabled.

With this change, the test_verifier tests now all passed:

  # ./test_verifier
  ...
  Summary: 777 PASSED, 0 SKIPPED, 0 FAILED

Tested-by: Tiezhu Yang <[email protected]>
Signed-off-by: Hengqi Chen <[email protected]>
Signed-off-by: Huacai Chen <[email protected]>
intel-lab-lkp pushed a commit to intel-lab-lkp/linux that referenced this pull request Jan 12, 2024
Like commit 1cf3bfc ("bpf: Support 64-bit pointers to kfuncs")
for s390x, add support for 64-bit pointers to kfuncs for LoongArch.
Since the infrastructure is already implemented in BPF core, the only
thing need to be done is to override bpf_jit_supports_far_kfunc_call().

Before this change, several test_verifier tests failed:

  # ./test_verifier | grep # | grep FAIL
  torvalds#119/p calls: invalid kfunc call: ptr_to_mem to struct with non-scalar FAIL
  torvalds#120/p calls: invalid kfunc call: ptr_to_mem to struct with nesting depth > 4 FAIL
  torvalds#121/p calls: invalid kfunc call: ptr_to_mem to struct with FAM FAIL
  torvalds#122/p calls: invalid kfunc call: reg->type != PTR_TO_CTX FAIL
  torvalds#123/p calls: invalid kfunc call: void * not allowed in func proto without mem size arg FAIL
  torvalds#124/p calls: trigger reg2btf_ids[reg->type] for reg->type > __BPF_REG_TYPE_MAX FAIL
  torvalds#125/p calls: invalid kfunc call: reg->off must be zero when passed to release kfunc FAIL
  torvalds#126/p calls: invalid kfunc call: don't match first member type when passed to release kfunc FAIL
  torvalds#127/p calls: invalid kfunc call: PTR_TO_BTF_ID with negative offset FAIL
  torvalds#128/p calls: invalid kfunc call: PTR_TO_BTF_ID with variable offset FAIL
  torvalds#129/p calls: invalid kfunc call: referenced arg needs refcounted PTR_TO_BTF_ID FAIL
  torvalds#130/p calls: valid kfunc call: referenced arg needs refcounted PTR_TO_BTF_ID FAIL
  torvalds#486/p map_kptr: ref: reference state created and released on xchg FAIL

This is because the kfuncs in the loaded module are far away from
__bpf_call_base:

  ffff800002009440 t bpf_kfunc_call_test_fail1    [bpf_testmod]
  9000000002e128d8 T __bpf_call_base

The offset relative to __bpf_call_base does NOT fit in s32, which breaks
the assumption in BPF core. Enable bpf_jit_supports_far_kfunc_call() lifts
this limit.

Note that to reproduce the above result, tools/testing/selftests/bpf/config
should be applied, and run the test with JIT enabled, unpriv BPF enabled.

With this change, the test_verifier tests now all passed:

  # ./test_verifier
  ...
  Summary: 777 PASSED, 0 SKIPPED, 0 FAILED

Tested-by: Tiezhu Yang <[email protected]>
Signed-off-by: Hengqi Chen <[email protected]>
Signed-off-by: Huacai Chen <[email protected]>
cthbleachbit pushed a commit to AOSC-Tracking/linux that referenced this pull request Jan 17, 2024
Like commit 1cf3bfc ("bpf: Support 64-bit pointers to kfuncs")
for s390x, add support for 64-bit pointers to kfuncs for LoongArch.
Since the infrastructure is already implemented in BPF core, the only
thing need to be done is to override bpf_jit_supports_far_kfunc_call().

Before this change, several test_verifier tests failed:

  # ./test_verifier | grep # | grep FAIL
  torvalds#119/p calls: invalid kfunc call: ptr_to_mem to struct with non-scalar FAIL
  torvalds#120/p calls: invalid kfunc call: ptr_to_mem to struct with nesting depth > 4 FAIL
  torvalds#121/p calls: invalid kfunc call: ptr_to_mem to struct with FAM FAIL
  torvalds#122/p calls: invalid kfunc call: reg->type != PTR_TO_CTX FAIL
  torvalds#123/p calls: invalid kfunc call: void * not allowed in func proto without mem size arg FAIL
  torvalds#124/p calls: trigger reg2btf_ids[reg->type] for reg->type > __BPF_REG_TYPE_MAX FAIL
  torvalds#125/p calls: invalid kfunc call: reg->off must be zero when passed to release kfunc FAIL
  torvalds#126/p calls: invalid kfunc call: don't match first member type when passed to release kfunc FAIL
  torvalds#127/p calls: invalid kfunc call: PTR_TO_BTF_ID with negative offset FAIL
  torvalds#128/p calls: invalid kfunc call: PTR_TO_BTF_ID with variable offset FAIL
  torvalds#129/p calls: invalid kfunc call: referenced arg needs refcounted PTR_TO_BTF_ID FAIL
  torvalds#130/p calls: valid kfunc call: referenced arg needs refcounted PTR_TO_BTF_ID FAIL
  torvalds#486/p map_kptr: ref: reference state created and released on xchg FAIL

This is because the kfuncs in the loaded module are far away from
__bpf_call_base:

  ffff800002009440 t bpf_kfunc_call_test_fail1    [bpf_testmod]
  9000000002e128d8 T __bpf_call_base

The offset relative to __bpf_call_base does NOT fit in s32, which breaks
the assumption in BPF core. Enable bpf_jit_supports_far_kfunc_call() lifts
this limit.

Note that to reproduce the above result, tools/testing/selftests/bpf/config
should be applied, and run the test with JIT enabled, unpriv BPF enabled.

With this change, the test_verifier tests now all passed:

  # ./test_verifier
  ...
  Summary: 777 PASSED, 0 SKIPPED, 0 FAILED

Tested-by: Tiezhu Yang <[email protected]>
Signed-off-by: Hengqi Chen <[email protected]>
Signed-off-by: Huacai Chen <[email protected]>
cthbleachbit pushed a commit to AOSC-Tracking/linux that referenced this pull request Jan 17, 2024
Like commit 1cf3bfc ("bpf: Support 64-bit pointers to kfuncs")
for s390x, add support for 64-bit pointers to kfuncs for LoongArch.
Since the infrastructure is already implemented in BPF core, the only
thing need to be done is to override bpf_jit_supports_far_kfunc_call().

Before this change, several test_verifier tests failed:

  # ./test_verifier | grep # | grep FAIL
  torvalds#119/p calls: invalid kfunc call: ptr_to_mem to struct with non-scalar FAIL
  torvalds#120/p calls: invalid kfunc call: ptr_to_mem to struct with nesting depth > 4 FAIL
  torvalds#121/p calls: invalid kfunc call: ptr_to_mem to struct with FAM FAIL
  torvalds#122/p calls: invalid kfunc call: reg->type != PTR_TO_CTX FAIL
  torvalds#123/p calls: invalid kfunc call: void * not allowed in func proto without mem size arg FAIL
  torvalds#124/p calls: trigger reg2btf_ids[reg->type] for reg->type > __BPF_REG_TYPE_MAX FAIL
  torvalds#125/p calls: invalid kfunc call: reg->off must be zero when passed to release kfunc FAIL
  torvalds#126/p calls: invalid kfunc call: don't match first member type when passed to release kfunc FAIL
  torvalds#127/p calls: invalid kfunc call: PTR_TO_BTF_ID with negative offset FAIL
  torvalds#128/p calls: invalid kfunc call: PTR_TO_BTF_ID with variable offset FAIL
  torvalds#129/p calls: invalid kfunc call: referenced arg needs refcounted PTR_TO_BTF_ID FAIL
  torvalds#130/p calls: valid kfunc call: referenced arg needs refcounted PTR_TO_BTF_ID FAIL
  torvalds#486/p map_kptr: ref: reference state created and released on xchg FAIL

This is because the kfuncs in the loaded module are far away from
__bpf_call_base:

  ffff800002009440 t bpf_kfunc_call_test_fail1    [bpf_testmod]
  9000000002e128d8 T __bpf_call_base

The offset relative to __bpf_call_base does NOT fit in s32, which breaks
the assumption in BPF core. Enable bpf_jit_supports_far_kfunc_call() lifts
this limit.

Note that to reproduce the above result, tools/testing/selftests/bpf/config
should be applied, and run the test with JIT enabled, unpriv BPF enabled.

With this change, the test_verifier tests now all passed:

  # ./test_verifier
  ...
  Summary: 777 PASSED, 0 SKIPPED, 0 FAILED

Tested-by: Tiezhu Yang <[email protected]>
Signed-off-by: Hengqi Chen <[email protected]>
Signed-off-by: Huacai Chen <[email protected]>
roxell pushed a commit to roxell/linux that referenced this pull request Jan 17, 2024
Like commit 1cf3bfc ("bpf: Support 64-bit pointers to kfuncs")
for s390x, add support for 64-bit pointers to kfuncs for LoongArch.
Since the infrastructure is already implemented in BPF core, the only
thing need to be done is to override bpf_jit_supports_far_kfunc_call().

Before this change, several test_verifier tests failed:

  # ./test_verifier | grep # | grep FAIL
  torvalds#119/p calls: invalid kfunc call: ptr_to_mem to struct with non-scalar FAIL
  torvalds#120/p calls: invalid kfunc call: ptr_to_mem to struct with nesting depth > 4 FAIL
  torvalds#121/p calls: invalid kfunc call: ptr_to_mem to struct with FAM FAIL
  torvalds#122/p calls: invalid kfunc call: reg->type != PTR_TO_CTX FAIL
  torvalds#123/p calls: invalid kfunc call: void * not allowed in func proto without mem size arg FAIL
  torvalds#124/p calls: trigger reg2btf_ids[reg->type] for reg->type > __BPF_REG_TYPE_MAX FAIL
  torvalds#125/p calls: invalid kfunc call: reg->off must be zero when passed to release kfunc FAIL
  torvalds#126/p calls: invalid kfunc call: don't match first member type when passed to release kfunc FAIL
  torvalds#127/p calls: invalid kfunc call: PTR_TO_BTF_ID with negative offset FAIL
  torvalds#128/p calls: invalid kfunc call: PTR_TO_BTF_ID with variable offset FAIL
  torvalds#129/p calls: invalid kfunc call: referenced arg needs refcounted PTR_TO_BTF_ID FAIL
  torvalds#130/p calls: valid kfunc call: referenced arg needs refcounted PTR_TO_BTF_ID FAIL
  torvalds#486/p map_kptr: ref: reference state created and released on xchg FAIL

This is because the kfuncs in the loaded module are far away from
__bpf_call_base:

  ffff800002009440 t bpf_kfunc_call_test_fail1    [bpf_testmod]
  9000000002e128d8 T __bpf_call_base

The offset relative to __bpf_call_base does NOT fit in s32, which breaks
the assumption in BPF core. Enable bpf_jit_supports_far_kfunc_call() lifts
this limit.

Note that to reproduce the above result, tools/testing/selftests/bpf/config
should be applied, and run the test with JIT enabled, unpriv BPF enabled.

With this change, the test_verifier tests now all passed:

  # ./test_verifier
  ...
  Summary: 777 PASSED, 0 SKIPPED, 0 FAILED

Tested-by: Tiezhu Yang <[email protected]>
Signed-off-by: Hengqi Chen <[email protected]>
Signed-off-by: Huacai Chen <[email protected]>
cthbleachbit pushed a commit to AOSC-Tracking/linux that referenced this pull request Jan 17, 2024
Like commit 1cf3bfc ("bpf: Support 64-bit pointers to kfuncs")
for s390x, add support for 64-bit pointers to kfuncs for LoongArch.
Since the infrastructure is already implemented in BPF core, the only
thing need to be done is to override bpf_jit_supports_far_kfunc_call().

Before this change, several test_verifier tests failed:

  # ./test_verifier | grep # | grep FAIL
  torvalds#119/p calls: invalid kfunc call: ptr_to_mem to struct with non-scalar FAIL
  torvalds#120/p calls: invalid kfunc call: ptr_to_mem to struct with nesting depth > 4 FAIL
  torvalds#121/p calls: invalid kfunc call: ptr_to_mem to struct with FAM FAIL
  torvalds#122/p calls: invalid kfunc call: reg->type != PTR_TO_CTX FAIL
  torvalds#123/p calls: invalid kfunc call: void * not allowed in func proto without mem size arg FAIL
  torvalds#124/p calls: trigger reg2btf_ids[reg->type] for reg->type > __BPF_REG_TYPE_MAX FAIL
  torvalds#125/p calls: invalid kfunc call: reg->off must be zero when passed to release kfunc FAIL
  torvalds#126/p calls: invalid kfunc call: don't match first member type when passed to release kfunc FAIL
  torvalds#127/p calls: invalid kfunc call: PTR_TO_BTF_ID with negative offset FAIL
  torvalds#128/p calls: invalid kfunc call: PTR_TO_BTF_ID with variable offset FAIL
  torvalds#129/p calls: invalid kfunc call: referenced arg needs refcounted PTR_TO_BTF_ID FAIL
  torvalds#130/p calls: valid kfunc call: referenced arg needs refcounted PTR_TO_BTF_ID FAIL
  torvalds#486/p map_kptr: ref: reference state created and released on xchg FAIL

This is because the kfuncs in the loaded module are far away from
__bpf_call_base:

  ffff800002009440 t bpf_kfunc_call_test_fail1    [bpf_testmod]
  9000000002e128d8 T __bpf_call_base

The offset relative to __bpf_call_base does NOT fit in s32, which breaks
the assumption in BPF core. Enable bpf_jit_supports_far_kfunc_call() lifts
this limit.

Note that to reproduce the above result, tools/testing/selftests/bpf/config
should be applied, and run the test with JIT enabled, unpriv BPF enabled.

With this change, the test_verifier tests now all passed:

  # ./test_verifier
  ...
  Summary: 777 PASSED, 0 SKIPPED, 0 FAILED

Tested-by: Tiezhu Yang <[email protected]>
Signed-off-by: Hengqi Chen <[email protected]>
Signed-off-by: Huacai Chen <[email protected]>
intel-lab-lkp pushed a commit to intel-lab-lkp/linux that referenced this pull request Jan 18, 2024
Like commit 1cf3bfc ("bpf: Support 64-bit pointers to kfuncs")
for s390x, add support for 64-bit pointers to kfuncs for LoongArch.
Since the infrastructure is already implemented in BPF core, the only
thing need to be done is to override bpf_jit_supports_far_kfunc_call().

Before this change, several test_verifier tests failed:

  # ./test_verifier | grep # | grep FAIL
  torvalds#119/p calls: invalid kfunc call: ptr_to_mem to struct with non-scalar FAIL
  torvalds#120/p calls: invalid kfunc call: ptr_to_mem to struct with nesting depth > 4 FAIL
  torvalds#121/p calls: invalid kfunc call: ptr_to_mem to struct with FAM FAIL
  torvalds#122/p calls: invalid kfunc call: reg->type != PTR_TO_CTX FAIL
  torvalds#123/p calls: invalid kfunc call: void * not allowed in func proto without mem size arg FAIL
  torvalds#124/p calls: trigger reg2btf_ids[reg->type] for reg->type > __BPF_REG_TYPE_MAX FAIL
  torvalds#125/p calls: invalid kfunc call: reg->off must be zero when passed to release kfunc FAIL
  torvalds#126/p calls: invalid kfunc call: don't match first member type when passed to release kfunc FAIL
  torvalds#127/p calls: invalid kfunc call: PTR_TO_BTF_ID with negative offset FAIL
  torvalds#128/p calls: invalid kfunc call: PTR_TO_BTF_ID with variable offset FAIL
  torvalds#129/p calls: invalid kfunc call: referenced arg needs refcounted PTR_TO_BTF_ID FAIL
  torvalds#130/p calls: valid kfunc call: referenced arg needs refcounted PTR_TO_BTF_ID FAIL
  torvalds#486/p map_kptr: ref: reference state created and released on xchg FAIL

This is because the kfuncs in the loaded module are far away from
__bpf_call_base:

  ffff800002009440 t bpf_kfunc_call_test_fail1    [bpf_testmod]
  9000000002e128d8 T __bpf_call_base

The offset relative to __bpf_call_base does NOT fit in s32, which breaks
the assumption in BPF core. Enable bpf_jit_supports_far_kfunc_call() lifts
this limit.

Note that to reproduce the above result, tools/testing/selftests/bpf/config
should be applied, and run the test with JIT enabled, unpriv BPF enabled.

With this change, the test_verifier tests now all passed:

  # ./test_verifier
  ...
  Summary: 777 PASSED, 0 SKIPPED, 0 FAILED

Tested-by: Tiezhu Yang <[email protected]>
Signed-off-by: Hengqi Chen <[email protected]>
Signed-off-by: Huacai Chen <[email protected]>
cthbleachbit pushed a commit to AOSC-Tracking/linux that referenced this pull request Jan 28, 2024
Like commit 1cf3bfc ("bpf: Support 64-bit pointers to kfuncs")
for s390x, add support for 64-bit pointers to kfuncs for LoongArch.
Since the infrastructure is already implemented in BPF core, the only
thing need to be done is to override bpf_jit_supports_far_kfunc_call().

Before this change, several test_verifier tests failed:

  # ./test_verifier | grep # | grep FAIL
  torvalds#119/p calls: invalid kfunc call: ptr_to_mem to struct with non-scalar FAIL
  torvalds#120/p calls: invalid kfunc call: ptr_to_mem to struct with nesting depth > 4 FAIL
  torvalds#121/p calls: invalid kfunc call: ptr_to_mem to struct with FAM FAIL
  torvalds#122/p calls: invalid kfunc call: reg->type != PTR_TO_CTX FAIL
  torvalds#123/p calls: invalid kfunc call: void * not allowed in func proto without mem size arg FAIL
  torvalds#124/p calls: trigger reg2btf_ids[reg->type] for reg->type > __BPF_REG_TYPE_MAX FAIL
  torvalds#125/p calls: invalid kfunc call: reg->off must be zero when passed to release kfunc FAIL
  torvalds#126/p calls: invalid kfunc call: don't match first member type when passed to release kfunc FAIL
  torvalds#127/p calls: invalid kfunc call: PTR_TO_BTF_ID with negative offset FAIL
  torvalds#128/p calls: invalid kfunc call: PTR_TO_BTF_ID with variable offset FAIL
  torvalds#129/p calls: invalid kfunc call: referenced arg needs refcounted PTR_TO_BTF_ID FAIL
  torvalds#130/p calls: valid kfunc call: referenced arg needs refcounted PTR_TO_BTF_ID FAIL
  torvalds#486/p map_kptr: ref: reference state created and released on xchg FAIL

This is because the kfuncs in the loaded module are far away from
__bpf_call_base:

  ffff800002009440 t bpf_kfunc_call_test_fail1    [bpf_testmod]
  9000000002e128d8 T __bpf_call_base

The offset relative to __bpf_call_base does NOT fit in s32, which breaks
the assumption in BPF core. Enable bpf_jit_supports_far_kfunc_call() lifts
this limit.

Note that to reproduce the above result, tools/testing/selftests/bpf/config
should be applied, and run the test with JIT enabled, unpriv BPF enabled.

With this change, the test_verifier tests now all passed:

  # ./test_verifier
  ...
  Summary: 777 PASSED, 0 SKIPPED, 0 FAILED

Tested-by: Tiezhu Yang <[email protected]>
Signed-off-by: Hengqi Chen <[email protected]>
Signed-off-by: Huacai Chen <[email protected]>
shikongzhineng pushed a commit to shikongzhineng/linux that referenced this pull request Feb 7, 2024
Like commit 1cf3bfc ("bpf: Support 64-bit pointers to kfuncs")
for s390x, add support for 64-bit pointers to kfuncs for LoongArch.
Since the infrastructure is already implemented in BPF core, the only
thing need to be done is to override bpf_jit_supports_far_kfunc_call().

Before this change, several test_verifier tests failed:

  # ./test_verifier | grep # | grep FAIL
  torvalds#119/p calls: invalid kfunc call: ptr_to_mem to struct with non-scalar FAIL
  torvalds#120/p calls: invalid kfunc call: ptr_to_mem to struct with nesting depth > 4 FAIL
  torvalds#121/p calls: invalid kfunc call: ptr_to_mem to struct with FAM FAIL
  torvalds#122/p calls: invalid kfunc call: reg->type != PTR_TO_CTX FAIL
  torvalds#123/p calls: invalid kfunc call: void * not allowed in func proto without mem size arg FAIL
  torvalds#124/p calls: trigger reg2btf_ids[reg->type] for reg->type > __BPF_REG_TYPE_MAX FAIL
  torvalds#125/p calls: invalid kfunc call: reg->off must be zero when passed to release kfunc FAIL
  torvalds#126/p calls: invalid kfunc call: don't match first member type when passed to release kfunc FAIL
  torvalds#127/p calls: invalid kfunc call: PTR_TO_BTF_ID with negative offset FAIL
  torvalds#128/p calls: invalid kfunc call: PTR_TO_BTF_ID with variable offset FAIL
  torvalds#129/p calls: invalid kfunc call: referenced arg needs refcounted PTR_TO_BTF_ID FAIL
  torvalds#130/p calls: valid kfunc call: referenced arg needs refcounted PTR_TO_BTF_ID FAIL
  torvalds#486/p map_kptr: ref: reference state created and released on xchg FAIL

This is because the kfuncs in the loaded module are far away from
__bpf_call_base:

  ffff800002009440 t bpf_kfunc_call_test_fail1    [bpf_testmod]
  9000000002e128d8 T __bpf_call_base

The offset relative to __bpf_call_base does NOT fit in s32, which breaks
the assumption in BPF core. Enable bpf_jit_supports_far_kfunc_call() lifts
this limit.

Note that to reproduce the above result, tools/testing/selftests/bpf/config
should be applied, and run the test with JIT enabled, unpriv BPF enabled.

With this change, the test_verifier tests now all passed:

  # ./test_verifier
  ...
  Summary: 777 PASSED, 0 SKIPPED, 0 FAILED

Tested-by: Tiezhu Yang <[email protected]>
Signed-off-by: Hengqi Chen <[email protected]>
Signed-off-by: Huacai Chen <[email protected]>
cthbleachbit pushed a commit to AOSC-Tracking/linux that referenced this pull request Feb 17, 2024
Like commit 1cf3bfc ("bpf: Support 64-bit pointers to kfuncs")
for s390x, add support for 64-bit pointers to kfuncs for LoongArch.
Since the infrastructure is already implemented in BPF core, the only
thing need to be done is to override bpf_jit_supports_far_kfunc_call().

Before this change, several test_verifier tests failed:

  # ./test_verifier | grep # | grep FAIL
  torvalds#119/p calls: invalid kfunc call: ptr_to_mem to struct with non-scalar FAIL
  torvalds#120/p calls: invalid kfunc call: ptr_to_mem to struct with nesting depth > 4 FAIL
  torvalds#121/p calls: invalid kfunc call: ptr_to_mem to struct with FAM FAIL
  torvalds#122/p calls: invalid kfunc call: reg->type != PTR_TO_CTX FAIL
  torvalds#123/p calls: invalid kfunc call: void * not allowed in func proto without mem size arg FAIL
  torvalds#124/p calls: trigger reg2btf_ids[reg->type] for reg->type > __BPF_REG_TYPE_MAX FAIL
  torvalds#125/p calls: invalid kfunc call: reg->off must be zero when passed to release kfunc FAIL
  torvalds#126/p calls: invalid kfunc call: don't match first member type when passed to release kfunc FAIL
  torvalds#127/p calls: invalid kfunc call: PTR_TO_BTF_ID with negative offset FAIL
  torvalds#128/p calls: invalid kfunc call: PTR_TO_BTF_ID with variable offset FAIL
  torvalds#129/p calls: invalid kfunc call: referenced arg needs refcounted PTR_TO_BTF_ID FAIL
  torvalds#130/p calls: valid kfunc call: referenced arg needs refcounted PTR_TO_BTF_ID FAIL
  torvalds#486/p map_kptr: ref: reference state created and released on xchg FAIL

This is because the kfuncs in the loaded module are far away from
__bpf_call_base:

  ffff800002009440 t bpf_kfunc_call_test_fail1    [bpf_testmod]
  9000000002e128d8 T __bpf_call_base

The offset relative to __bpf_call_base does NOT fit in s32, which breaks
the assumption in BPF core. Enable bpf_jit_supports_far_kfunc_call() lifts
this limit.

Note that to reproduce the above result, tools/testing/selftests/bpf/config
should be applied, and run the test with JIT enabled, unpriv BPF enabled.

With this change, the test_verifier tests now all passed:

  # ./test_verifier
  ...
  Summary: 777 PASSED, 0 SKIPPED, 0 FAILED

Tested-by: Tiezhu Yang <[email protected]>
Signed-off-by: Hengqi Chen <[email protected]>
Signed-off-by: Huacai Chen <[email protected]>
yetist pushed a commit to loongarchlinux/linux that referenced this pull request Feb 29, 2024
Like commit 1cf3bfc ("bpf: Support 64-bit pointers to kfuncs")
for s390x, add support for 64-bit pointers to kfuncs for LoongArch.
Since the infrastructure is already implemented in BPF core, the only
thing need to be done is to override bpf_jit_supports_far_kfunc_call().

Before this change, several test_verifier tests failed:

  # ./test_verifier | grep # | grep FAIL
  torvalds#119/p calls: invalid kfunc call: ptr_to_mem to struct with non-scalar FAIL
  torvalds#120/p calls: invalid kfunc call: ptr_to_mem to struct with nesting depth > 4 FAIL
  torvalds#121/p calls: invalid kfunc call: ptr_to_mem to struct with FAM FAIL
  torvalds#122/p calls: invalid kfunc call: reg->type != PTR_TO_CTX FAIL
  torvalds#123/p calls: invalid kfunc call: void * not allowed in func proto without mem size arg FAIL
  torvalds#124/p calls: trigger reg2btf_ids[reg->type] for reg->type > __BPF_REG_TYPE_MAX FAIL
  torvalds#125/p calls: invalid kfunc call: reg->off must be zero when passed to release kfunc FAIL
  torvalds#126/p calls: invalid kfunc call: don't match first member type when passed to release kfunc FAIL
  torvalds#127/p calls: invalid kfunc call: PTR_TO_BTF_ID with negative offset FAIL
  torvalds#128/p calls: invalid kfunc call: PTR_TO_BTF_ID with variable offset FAIL
  torvalds#129/p calls: invalid kfunc call: referenced arg needs refcounted PTR_TO_BTF_ID FAIL
  torvalds#130/p calls: valid kfunc call: referenced arg needs refcounted PTR_TO_BTF_ID FAIL
  torvalds#486/p map_kptr: ref: reference state created and released on xchg FAIL

This is because the kfuncs in the loaded module are far away from
__bpf_call_base:

  ffff800002009440 t bpf_kfunc_call_test_fail1    [bpf_testmod]
  9000000002e128d8 T __bpf_call_base

The offset relative to __bpf_call_base does NOT fit in s32, which breaks
the assumption in BPF core. Enable bpf_jit_supports_far_kfunc_call() lifts
this limit.

Note that to reproduce the above result, tools/testing/selftests/bpf/config
should be applied, and run the test with JIT enabled, unpriv BPF enabled.

With this change, the test_verifier tests now all passed:

  # ./test_verifier
  ...
  Summary: 777 PASSED, 0 SKIPPED, 0 FAILED

Tested-by: Tiezhu Yang <[email protected]>
Signed-off-by: Hengqi Chen <[email protected]>
Signed-off-by: Huacai Chen <[email protected]>
shikongzhineng pushed a commit to shikongzhineng/linux that referenced this pull request Mar 17, 2024
Like commit 1cf3bfc ("bpf: Support 64-bit pointers to kfuncs")
for s390x, add support for 64-bit pointers to kfuncs for LoongArch.
Since the infrastructure is already implemented in BPF core, the only
thing need to be done is to override bpf_jit_supports_far_kfunc_call().

Before this change, several test_verifier tests failed:

  # ./test_verifier | grep # | grep FAIL
  torvalds#119/p calls: invalid kfunc call: ptr_to_mem to struct with non-scalar FAIL
  torvalds#120/p calls: invalid kfunc call: ptr_to_mem to struct with nesting depth > 4 FAIL
  torvalds#121/p calls: invalid kfunc call: ptr_to_mem to struct with FAM FAIL
  torvalds#122/p calls: invalid kfunc call: reg->type != PTR_TO_CTX FAIL
  torvalds#123/p calls: invalid kfunc call: void * not allowed in func proto without mem size arg FAIL
  torvalds#124/p calls: trigger reg2btf_ids[reg->type] for reg->type > __BPF_REG_TYPE_MAX FAIL
  torvalds#125/p calls: invalid kfunc call: reg->off must be zero when passed to release kfunc FAIL
  torvalds#126/p calls: invalid kfunc call: don't match first member type when passed to release kfunc FAIL
  torvalds#127/p calls: invalid kfunc call: PTR_TO_BTF_ID with negative offset FAIL
  torvalds#128/p calls: invalid kfunc call: PTR_TO_BTF_ID with variable offset FAIL
  torvalds#129/p calls: invalid kfunc call: referenced arg needs refcounted PTR_TO_BTF_ID FAIL
  torvalds#130/p calls: valid kfunc call: referenced arg needs refcounted PTR_TO_BTF_ID FAIL
  torvalds#486/p map_kptr: ref: reference state created and released on xchg FAIL

This is because the kfuncs in the loaded module are far away from
__bpf_call_base:

  ffff800002009440 t bpf_kfunc_call_test_fail1    [bpf_testmod]
  9000000002e128d8 T __bpf_call_base

The offset relative to __bpf_call_base does NOT fit in s32, which breaks
the assumption in BPF core. Enable bpf_jit_supports_far_kfunc_call() lifts
this limit.

Note that to reproduce the above result, tools/testing/selftests/bpf/config
should be applied, and run the test with JIT enabled, unpriv BPF enabled.

With this change, the test_verifier tests now all passed:

  # ./test_verifier
  ...
  Summary: 777 PASSED, 0 SKIPPED, 0 FAILED

Tested-by: Tiezhu Yang <[email protected]>
Signed-off-by: Hengqi Chen <[email protected]>
Signed-off-by: Huacai Chen <[email protected]>
intel-lab-lkp pushed a commit to intel-lab-lkp/linux that referenced this pull request Apr 23, 2024
WARNING: braces {} are not necessary for single statement blocks
torvalds#122: FILE: fs/binfmt_elf_fdpic.c:651:
+	if (bprm->have_execfd) {
+		NEW_AUX_ENT(AT_EXECFD, bprm->execfd);
+	}

Cc: Al Viro <[email protected]>
Cc: Christian Brauner <[email protected]>
Cc: Eric Biederman <[email protected]>
Cc: Jan Kara <[email protected]>
Cc: Kees Cook <[email protected]>
Cc: Max Filippov <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
intel-lab-lkp pushed a commit to intel-lab-lkp/linux that referenced this pull request Apr 25, 2024
WARNING: braces {} are not necessary for single statement blocks
torvalds#122: FILE: fs/binfmt_elf_fdpic.c:651:
+	if (bprm->have_execfd) {
+		NEW_AUX_ENT(AT_EXECFD, bprm->execfd);
+	}

Cc: Al Viro <[email protected]>
Cc: Christian Brauner <[email protected]>
Cc: Eric Biederman <[email protected]>
Cc: Jan Kara <[email protected]>
Cc: Kees Cook <[email protected]>
Cc: Max Filippov <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
arinc9 pushed a commit to arinc9/linux that referenced this pull request Apr 26, 2024
WARNING: braces {} are not necessary for single statement blocks
torvalds#122: FILE: fs/binfmt_elf_fdpic.c:651:
+	if (bprm->have_execfd) {
+		NEW_AUX_ENT(AT_EXECFD, bprm->execfd);
+	}

Cc: Al Viro <[email protected]>
Cc: Christian Brauner <[email protected]>
Cc: Eric Biederman <[email protected]>
Cc: Jan Kara <[email protected]>
Cc: Kees Cook <[email protected]>
Cc: Max Filippov <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
shipujin pushed a commit to shipujin/linux that referenced this pull request Jul 24, 2024
Like commit 1cf3bfc ("bpf: Support 64-bit pointers to kfuncs")
for s390x, add support for 64-bit pointers to kfuncs for LoongArch.
Since the infrastructure is already implemented in BPF core, the only
thing need to be done is to override bpf_jit_supports_far_kfunc_call().

Before this change, several test_verifier tests failed:

  # ./test_verifier | grep # | grep FAIL
  torvalds#119/p calls: invalid kfunc call: ptr_to_mem to struct with non-scalar FAIL
  torvalds#120/p calls: invalid kfunc call: ptr_to_mem to struct with nesting depth > 4 FAIL
  torvalds#121/p calls: invalid kfunc call: ptr_to_mem to struct with FAM FAIL
  torvalds#122/p calls: invalid kfunc call: reg->type != PTR_TO_CTX FAIL
  torvalds#123/p calls: invalid kfunc call: void * not allowed in func proto without mem size arg FAIL
  torvalds#124/p calls: trigger reg2btf_ids[reg->type] for reg->type > __BPF_REG_TYPE_MAX FAIL
  torvalds#125/p calls: invalid kfunc call: reg->off must be zero when passed to release kfunc FAIL
  torvalds#126/p calls: invalid kfunc call: don't match first member type when passed to release kfunc FAIL
  torvalds#127/p calls: invalid kfunc call: PTR_TO_BTF_ID with negative offset FAIL
  torvalds#128/p calls: invalid kfunc call: PTR_TO_BTF_ID with variable offset FAIL
  torvalds#129/p calls: invalid kfunc call: referenced arg needs refcounted PTR_TO_BTF_ID FAIL
  torvalds#130/p calls: valid kfunc call: referenced arg needs refcounted PTR_TO_BTF_ID FAIL
  torvalds#486/p map_kptr: ref: reference state created and released on xchg FAIL

This is because the kfuncs in the loaded module are far away from
__bpf_call_base:

  ffff800002009440 t bpf_kfunc_call_test_fail1    [bpf_testmod]
  9000000002e128d8 T __bpf_call_base

The offset relative to __bpf_call_base does NOT fit in s32, which breaks
the assumption in BPF core. Enable bpf_jit_supports_far_kfunc_call() lifts
this limit.

Note that to reproduce the above result, tools/testing/selftests/bpf/config
should be applied, and run the test with JIT enabled, unpriv BPF enabled.

With this change, the test_verifier tests now all passed:

  # ./test_verifier
  ...
  Summary: 777 PASSED, 0 SKIPPED, 0 FAILED

Tested-by: Tiezhu Yang <[email protected]>
Signed-off-by: Hengqi Chen <[email protected]>
Signed-off-by: Huacai Chen <[email protected]>
intel-lab-lkp pushed a commit to intel-lab-lkp/linux that referenced this pull request Aug 7, 2024
With structure layout randomization enabled for 'struct inode' we need
to avoid overlapping any of the RCU-used / initialized-only-once
members, e.g. i_lru or i_sb_list to not corrupt related list traversals
when making use of the rcu_head.

For an unlucky structure layout of 'struct inode' we may end up with the
following splat when running the ftrace selftests:

[<...>] list_del corruption, ffff888103ee2cb0->next (tracefs_inode_cache+0x0/0x4e0 [slab object]) is NULL (prev is tracefs_inode_cache+0x78/0x4e0 [slab object])
[<...>] ------------[ cut here ]------------
[<...>] kernel BUG at lib/list_debug.c:54!
[<...>] invalid opcode: 0000 [#1] PREEMPT SMP KASAN
[<...>] CPU: 3 PID: 2550 Comm: mount Tainted: G                 N  6.8.12-grsec+ torvalds#122 ed2f536ca62f28b087b90e3cc906a8d25b3ddc65
[<...>] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-2 04/01/2014
[<...>] RIP: 0010:[<ffffffff84656018>] __list_del_entry_valid_or_report+0x138/0x3e0
[<...>] Code: 48 b8 99 fb 65 f2 ff ff ff ff e9 03 5c d9 fc cc 48 b8 99 fb 65 f2 ff ff ff ff e9 33 5a d9 fc cc 48 b8 99 fb 65 f2 ff ff ff ff <0f> 0b 4c 89 e9 48 89 ea 48 89 ee 48 c7 c7 60 8f dd 89 31 c0 e8 2f
[<...>] RSP: 0018:fffffe80416afaf0 EFLAGS: 00010283
[<...>] RAX: 0000000000000098 RBX: ffff888103ee2cb0 RCX: 0000000000000000
[<...>] RDX: ffffffff84655fe8 RSI: ffffffff89dd8b60 RDI: 0000000000000001
[<...>] RBP: ffff888103ee2cb0 R08: 0000000000000001 R09: fffffbd0082d5f25
[<...>] R10: fffffe80416af92f R11: 0000000000000001 R12: fdf99c16731d9b6d
[<...>] R13: 0000000000000000 R14: ffff88819ad4b8b8 R15: 0000000000000000
[<...>] RBX: tracefs_inode_cache+0x0/0x4e0 [slab object]
[<...>] RDX: __list_del_entry_valid_or_report+0x108/0x3e0
[<...>] RSI: __func__.47+0x4340/0x4400
[<...>] RBP: tracefs_inode_cache+0x0/0x4e0 [slab object]
[<...>] RSP: process kstack fffffe80416afaf0+0x7af0/0x8000 [mount 2550 2550]
[<...>] R09: kasan shadow of process kstack fffffe80416af928+0x7928/0x8000 [mount 2550 2550]
[<...>] R10: process kstack fffffe80416af92f+0x792f/0x8000 [mount 2550 2550]
[<...>] R14: tracefs_inode_cache+0x78/0x4e0 [slab object]
[<...>] FS:  00006dcb380c1840(0000) GS:ffff8881e0600000(0000) knlGS:0000000000000000
[<...>] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[<...>] CR2: 000076ab72b30e84 CR3: 000000000b088004 CR4: 0000000000360ef0 shadow CR4: 0000000000360ef0
[<...>] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[<...>] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[<...>] ASID: 0003
[<...>] Stack:
[<...>]  ffffffff818a2315 00000000f5c856ee ffffffff896f1840 ffff888103ee2cb0
[<...>]  ffff88812b6b9750 0000000079d714b6 fffffbfff1e9280b ffffffff8f49405f
[<...>]  0000000000000001 0000000000000000 ffff888104457280 ffffffff8248b392
[<...>] Call Trace:
[<...>]  <TASK>
[<...>]  [<ffffffff818a2315>] ? lock_release+0x175/0x380 fffffe80416afaf0
[<...>]  [<ffffffff8248b392>] list_lru_del+0x152/0x740 fffffe80416afb48
[<...>]  [<ffffffff8248ba93>] list_lru_del_obj+0x113/0x280 fffffe80416afb88
[<...>]  [<ffffffff8940fd19>] ? _atomic_dec_and_lock+0x119/0x200 fffffe80416afb90
[<...>]  [<ffffffff8295b244>] iput_final+0x1c4/0x9a0 fffffe80416afbb8
[<...>]  [<ffffffff8293a52b>] dentry_unlink_inode+0x44b/0xaa0 fffffe80416afbf8
[<...>]  [<ffffffff8293fefc>] __dentry_kill+0x23c/0xf00 fffffe80416afc40
[<...>]  [<ffffffff8953a85f>] ? __this_cpu_preempt_check+0x1f/0xa0 fffffe80416afc48
[<...>]  [<ffffffff82949ce5>] ? shrink_dentry_list+0x1c5/0x760 fffffe80416afc70
[<...>]  [<ffffffff82949b71>] ? shrink_dentry_list+0x51/0x760 fffffe80416afc78
[<...>]  [<ffffffff82949da8>] shrink_dentry_list+0x288/0x760 fffffe80416afc80
[<...>]  [<ffffffff8294ae75>] shrink_dcache_sb+0x155/0x420 fffffe80416afcc8
[<...>]  [<ffffffff8953a7c3>] ? debug_smp_processor_id+0x23/0xa0 fffffe80416afce0
[<...>]  [<ffffffff8294ad20>] ? do_one_tree+0x140/0x140 fffffe80416afcf8
[<...>]  [<ffffffff82997349>] ? do_remount+0x329/0xa00 fffffe80416afd18
[<...>]  [<ffffffff83ebf7a1>] ? security_sb_remount+0x81/0x1c0 fffffe80416afd38
[<...>]  [<ffffffff82892096>] reconfigure_super+0x856/0x14e0 fffffe80416afd70
[<...>]  [<ffffffff815d1327>] ? ns_capable_common+0xe7/0x2a0 fffffe80416afd90
[<...>]  [<ffffffff82997436>] do_remount+0x416/0xa00 fffffe80416afdd0
[<...>]  [<ffffffff829b2ba4>] path_mount+0x5c4/0x900 fffffe80416afe28
[<...>]  [<ffffffff829b25e0>] ? finish_automount+0x13a0/0x13a0 fffffe80416afe60
[<...>]  [<ffffffff82903812>] ? user_path_at_empty+0xb2/0x140 fffffe80416afe88
[<...>]  [<ffffffff829b2ff5>] do_mount+0x115/0x1c0 fffffe80416afeb8
[<...>]  [<ffffffff829b2ee0>] ? path_mount+0x900/0x900 fffffe80416afed8
[<...>]  [<ffffffff8272461c>] ? __kasan_check_write+0x1c/0xa0 fffffe80416afee0
[<...>]  [<ffffffff829b31cf>] __do_sys_mount+0x12f/0x280 fffffe80416aff30
[<...>]  [<ffffffff829b36cd>] __x64_sys_mount+0xcd/0x2e0 fffffe80416aff70
[<...>]  [<ffffffff819f8818>] ? syscall_trace_enter+0x218/0x380 fffffe80416aff88
[<...>]  [<ffffffff8111655e>] x64_sys_call+0x5d5e/0x6720 fffffe80416affa8
[<...>]  [<ffffffff8952756d>] do_syscall_64+0xcd/0x3c0 fffffe80416affb8
[<...>]  [<ffffffff8100119b>] entry_SYSCALL_64_safe_stack+0x4c/0x87 fffffe80416affe8
[<...>]  </TASK>
[<...>]  <PTREGS>
[<...>] RIP: 0033:[<00006dcb382ff66a>] vm_area_struct[mount 2550 2550 file 6dcb38225000-6dcb3837e000 22 55(read|exec|mayread|mayexec)]+0x0/0xb8 [userland map]
[<...>] Code: 48 8b 0d 29 18 0d 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 49 89 ca b8 a5 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d f6 17 0d 00 f7 d8 64 89 01 48
[<...>] RSP: 002b:0000763d68192558 EFLAGS: 00000246 ORIG_RAX: 00000000000000a5
[<...>] RAX: ffffffffffffffda RBX: 00006dcb38433264 RCX: 00006dcb382ff66a
[<...>] RDX: 000017c3e0d11210 RSI: 000017c3e0d1a5a0 RDI: 000017c3e0d1ae70
[<...>] RBP: 000017c3e0d10fb0 R08: 000017c3e0d11260 R09: 00006dcb383d1be0
[<...>] R10: 000000000020002e R11: 0000000000000246 R12: 0000000000000000
[<...>] R13: 000017c3e0d1ae70 R14: 000017c3e0d11210 R15: 000017c3e0d10fb0
[<...>] RBX: vm_area_struct[mount 2550 2550 file 6dcb38433000-6dcb38434000 5b 100033(read|write|mayread|maywrite|account)]+0x0/0xb8 [userland map]
[<...>] RCX: vm_area_struct[mount 2550 2550 file 6dcb38225000-6dcb3837e000 22 55(read|exec|mayread|mayexec)]+0x0/0xb8 [userland map]
[<...>] RDX: vm_area_struct[mount 2550 2550 anon 17c3e0d0f000-17c3e0d31000 17c3e0d0f 100033(read|write|mayread|maywrite|account)]+0x0/0xb8 [userland map]
[<...>] RSI: vm_area_struct[mount 2550 2550 anon 17c3e0d0f000-17c3e0d31000 17c3e0d0f 100033(read|write|mayread|maywrite|account)]+0x0/0xb8 [userland map]
[<...>] RDI: vm_area_struct[mount 2550 2550 anon 17c3e0d0f000-17c3e0d31000 17c3e0d0f 100033(read|write|mayread|maywrite|account)]+0x0/0xb8 [userland map]
[<...>] RBP: vm_area_struct[mount 2550 2550 anon 17c3e0d0f000-17c3e0d31000 17c3e0d0f 100033(read|write|mayread|maywrite|account)]+0x0/0xb8 [userland map]
[<...>] RSP: vm_area_struct[mount 2550 2550 anon 763d68173000-763d68195000 7ffffffdd 100133(read|write|mayread|maywrite|growsdown|account)]+0x0/0xb8 [userland map]
[<...>] R08: vm_area_struct[mount 2550 2550 anon 17c3e0d0f000-17c3e0d31000 17c3e0d0f 100033(read|write|mayread|maywrite|account)]+0x0/0xb8 [userland map]
[<...>] R09: vm_area_struct[mount 2550 2550 file 6dcb383d1000-6dcb383d3000 1cd 100033(read|write|mayread|maywrite|account)]+0x0/0xb8 [userland map]
[<...>] R13: vm_area_struct[mount 2550 2550 anon 17c3e0d0f000-17c3e0d31000 17c3e0d0f 100033(read|write|mayread|maywrite|account)]+0x0/0xb8 [userland map]
[<...>] R14: vm_area_struct[mount 2550 2550 anon 17c3e0d0f000-17c3e0d31000 17c3e0d0f 100033(read|write|mayread|maywrite|account)]+0x0/0xb8 [userland map]
[<...>] R15: vm_area_struct[mount 2550 2550 anon 17c3e0d0f000-17c3e0d31000 17c3e0d0f 100033(read|write|mayread|maywrite|account)]+0x0/0xb8 [userland map]
[<...>]  </PTREGS>
[<...>] Modules linked in:
[<...>] ---[ end trace 0000000000000000 ]---

The list debug message as well as RBX's symbolic value point out that
the object in question was allocated from 'tracefs_inode_cache' and that
the list's '->next' member is at offset 0. Dumping the layout of the
relevant parts of 'struct tracefs_inode' gives the following:

  struct tracefs_inode {
    union {
      struct inode {
        struct list_head {
          struct list_head * next;                    /*     0     8 */
          struct list_head * prev;                    /*     8     8 */
        } i_lru;
        [...]
      } vfs_inode;
      struct callback_head {
        void (*func)(struct callback_head *);         /*     0     8 */
        struct callback_head * next;                  /*     8     8 */
      } rcu;
    };
    [...]
  };

Above shows that 'vfs_inode.i_lru' overlaps with 'rcu' which will
destroy the 'i_lru' list as soon as the 'rcu' member gets used, e.g. in
call_rcu() or later when calling the RCU callback. This will disturb
concurrent list traversals as well as object reuse which assumes these
list heads will keep their integrity.

For reproduction, the following diff manually overlays 'i_lru' with
'rcu' as, otherwise, one would require some good portion of luck for
gambling an unlucky RANDSTRUCT seed:

  --- a/include/linux/fs.h
  +++ b/include/linux/fs.h
  @@ -629,6 +629,7 @@ struct inode {
   	umode_t			i_mode;
   	unsigned short		i_opflags;
   	kuid_t			i_uid;
  +	struct list_head	i_lru;		/* inode LRU list */
   	kgid_t			i_gid;
   	unsigned int		i_flags;

  @@ -690,7 +691,6 @@ struct inode {
   	u16			i_wb_frn_avg_time;
   	u16			i_wb_frn_history;
   #endif
  -	struct list_head	i_lru;		/* inode LRU list */
   	struct list_head	i_sb_list;
   	struct list_head	i_wb_list;	/* backing dev writeback list */
   	union {

To avoid the list corruption, just ditch the union as it has no impact
on the memory footprint of 'struct tracefs_inode' at all -- the
premature layout optimization to save space just isn't. There's plenty
of padding space available.

Cc: Al Viro <[email protected]>
Fixes: baa23a8 ("tracefs: Reset permissions on remount if permissions are options")
Reported-by: Brad Spengler <[email protected]>
Signed-off-by: Mathias Krause <[email protected]>
torvalds pushed a commit that referenced this pull request Aug 8, 2024
With structure layout randomization enabled for 'struct inode' we need to
avoid overlapping any of the RCU-used / initialized-only-once members,
e.g. i_lru or i_sb_list to not corrupt related list traversals when making
use of the rcu_head.

For an unlucky structure layout of 'struct inode' we may end up with the
following splat when running the ftrace selftests:

[<...>] list_del corruption, ffff888103ee2cb0->next (tracefs_inode_cache+0x0/0x4e0 [slab object]) is NULL (prev is tracefs_inode_cache+0x78/0x4e0 [slab object])
[<...>] ------------[ cut here ]------------
[<...>] kernel BUG at lib/list_debug.c:54!
[<...>] invalid opcode: 0000 [#1] PREEMPT SMP KASAN
[<...>] CPU: 3 PID: 2550 Comm: mount Tainted: G                 N  6.8.12-grsec+ #122 ed2f536ca62f28b087b90e3cc906a8d25b3ddc65
[<...>] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-2 04/01/2014
[<...>] RIP: 0010:[<ffffffff84656018>] __list_del_entry_valid_or_report+0x138/0x3e0
[<...>] Code: 48 b8 99 fb 65 f2 ff ff ff ff e9 03 5c d9 fc cc 48 b8 99 fb 65 f2 ff ff ff ff e9 33 5a d9 fc cc 48 b8 99 fb 65 f2 ff ff ff ff <0f> 0b 4c 89 e9 48 89 ea 48 89 ee 48 c7 c7 60 8f dd 89 31 c0 e8 2f
[<...>] RSP: 0018:fffffe80416afaf0 EFLAGS: 00010283
[<...>] RAX: 0000000000000098 RBX: ffff888103ee2cb0 RCX: 0000000000000000
[<...>] RDX: ffffffff84655fe8 RSI: ffffffff89dd8b60 RDI: 0000000000000001
[<...>] RBP: ffff888103ee2cb0 R08: 0000000000000001 R09: fffffbd0082d5f25
[<...>] R10: fffffe80416af92f R11: 0000000000000001 R12: fdf99c16731d9b6d
[<...>] R13: 0000000000000000 R14: ffff88819ad4b8b8 R15: 0000000000000000
[<...>] RBX: tracefs_inode_cache+0x0/0x4e0 [slab object]
[<...>] RDX: __list_del_entry_valid_or_report+0x108/0x3e0
[<...>] RSI: __func__.47+0x4340/0x4400
[<...>] RBP: tracefs_inode_cache+0x0/0x4e0 [slab object]
[<...>] RSP: process kstack fffffe80416afaf0+0x7af0/0x8000 [mount 2550 2550]
[<...>] R09: kasan shadow of process kstack fffffe80416af928+0x7928/0x8000 [mount 2550 2550]
[<...>] R10: process kstack fffffe80416af92f+0x792f/0x8000 [mount 2550 2550]
[<...>] R14: tracefs_inode_cache+0x78/0x4e0 [slab object]
[<...>] FS:  00006dcb380c1840(0000) GS:ffff8881e0600000(0000) knlGS:0000000000000000
[<...>] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[<...>] CR2: 000076ab72b30e84 CR3: 000000000b088004 CR4: 0000000000360ef0 shadow CR4: 0000000000360ef0
[<...>] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[<...>] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[<...>] ASID: 0003
[<...>] Stack:
[<...>]  ffffffff818a2315 00000000f5c856ee ffffffff896f1840 ffff888103ee2cb0
[<...>]  ffff88812b6b9750 0000000079d714b6 fffffbfff1e9280b ffffffff8f49405f
[<...>]  0000000000000001 0000000000000000 ffff888104457280 ffffffff8248b392
[<...>] Call Trace:
[<...>]  <TASK>
[<...>]  [<ffffffff818a2315>] ? lock_release+0x175/0x380 fffffe80416afaf0
[<...>]  [<ffffffff8248b392>] list_lru_del+0x152/0x740 fffffe80416afb48
[<...>]  [<ffffffff8248ba93>] list_lru_del_obj+0x113/0x280 fffffe80416afb88
[<...>]  [<ffffffff8940fd19>] ? _atomic_dec_and_lock+0x119/0x200 fffffe80416afb90
[<...>]  [<ffffffff8295b244>] iput_final+0x1c4/0x9a0 fffffe80416afbb8
[<...>]  [<ffffffff8293a52b>] dentry_unlink_inode+0x44b/0xaa0 fffffe80416afbf8
[<...>]  [<ffffffff8293fefc>] __dentry_kill+0x23c/0xf00 fffffe80416afc40
[<...>]  [<ffffffff8953a85f>] ? __this_cpu_preempt_check+0x1f/0xa0 fffffe80416afc48
[<...>]  [<ffffffff82949ce5>] ? shrink_dentry_list+0x1c5/0x760 fffffe80416afc70
[<...>]  [<ffffffff82949b71>] ? shrink_dentry_list+0x51/0x760 fffffe80416afc78
[<...>]  [<ffffffff82949da8>] shrink_dentry_list+0x288/0x760 fffffe80416afc80
[<...>]  [<ffffffff8294ae75>] shrink_dcache_sb+0x155/0x420 fffffe80416afcc8
[<...>]  [<ffffffff8953a7c3>] ? debug_smp_processor_id+0x23/0xa0 fffffe80416afce0
[<...>]  [<ffffffff8294ad20>] ? do_one_tree+0x140/0x140 fffffe80416afcf8
[<...>]  [<ffffffff82997349>] ? do_remount+0x329/0xa00 fffffe80416afd18
[<...>]  [<ffffffff83ebf7a1>] ? security_sb_remount+0x81/0x1c0 fffffe80416afd38
[<...>]  [<ffffffff82892096>] reconfigure_super+0x856/0x14e0 fffffe80416afd70
[<...>]  [<ffffffff815d1327>] ? ns_capable_common+0xe7/0x2a0 fffffe80416afd90
[<...>]  [<ffffffff82997436>] do_remount+0x416/0xa00 fffffe80416afdd0
[<...>]  [<ffffffff829b2ba4>] path_mount+0x5c4/0x900 fffffe80416afe28
[<...>]  [<ffffffff829b25e0>] ? finish_automount+0x13a0/0x13a0 fffffe80416afe60
[<...>]  [<ffffffff82903812>] ? user_path_at_empty+0xb2/0x140 fffffe80416afe88
[<...>]  [<ffffffff829b2ff5>] do_mount+0x115/0x1c0 fffffe80416afeb8
[<...>]  [<ffffffff829b2ee0>] ? path_mount+0x900/0x900 fffffe80416afed8
[<...>]  [<ffffffff8272461c>] ? __kasan_check_write+0x1c/0xa0 fffffe80416afee0
[<...>]  [<ffffffff829b31cf>] __do_sys_mount+0x12f/0x280 fffffe80416aff30
[<...>]  [<ffffffff829b36cd>] __x64_sys_mount+0xcd/0x2e0 fffffe80416aff70
[<...>]  [<ffffffff819f8818>] ? syscall_trace_enter+0x218/0x380 fffffe80416aff88
[<...>]  [<ffffffff8111655e>] x64_sys_call+0x5d5e/0x6720 fffffe80416affa8
[<...>]  [<ffffffff8952756d>] do_syscall_64+0xcd/0x3c0 fffffe80416affb8
[<...>]  [<ffffffff8100119b>] entry_SYSCALL_64_safe_stack+0x4c/0x87 fffffe80416affe8
[<...>]  </TASK>
[<...>]  <PTREGS>
[<...>] RIP: 0033:[<00006dcb382ff66a>] vm_area_struct[mount 2550 2550 file 6dcb38225000-6dcb3837e000 22 55(read|exec|mayread|mayexec)]+0x0/0xb8 [userland map]
[<...>] Code: 48 8b 0d 29 18 0d 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 49 89 ca b8 a5 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d f6 17 0d 00 f7 d8 64 89 01 48
[<...>] RSP: 002b:0000763d68192558 EFLAGS: 00000246 ORIG_RAX: 00000000000000a5
[<...>] RAX: ffffffffffffffda RBX: 00006dcb38433264 RCX: 00006dcb382ff66a
[<...>] RDX: 000017c3e0d11210 RSI: 000017c3e0d1a5a0 RDI: 000017c3e0d1ae70
[<...>] RBP: 000017c3e0d10fb0 R08: 000017c3e0d11260 R09: 00006dcb383d1be0
[<...>] R10: 000000000020002e R11: 0000000000000246 R12: 0000000000000000
[<...>] R13: 000017c3e0d1ae70 R14: 000017c3e0d11210 R15: 000017c3e0d10fb0
[<...>] RBX: vm_area_struct[mount 2550 2550 file 6dcb38433000-6dcb38434000 5b 100033(read|write|mayread|maywrite|account)]+0x0/0xb8 [userland map]
[<...>] RCX: vm_area_struct[mount 2550 2550 file 6dcb38225000-6dcb3837e000 22 55(read|exec|mayread|mayexec)]+0x0/0xb8 [userland map]
[<...>] RDX: vm_area_struct[mount 2550 2550 anon 17c3e0d0f000-17c3e0d31000 17c3e0d0f 100033(read|write|mayread|maywrite|account)]+0x0/0xb8 [userland map]
[<...>] RSI: vm_area_struct[mount 2550 2550 anon 17c3e0d0f000-17c3e0d31000 17c3e0d0f 100033(read|write|mayread|maywrite|account)]+0x0/0xb8 [userland map]
[<...>] RDI: vm_area_struct[mount 2550 2550 anon 17c3e0d0f000-17c3e0d31000 17c3e0d0f 100033(read|write|mayread|maywrite|account)]+0x0/0xb8 [userland map]
[<...>] RBP: vm_area_struct[mount 2550 2550 anon 17c3e0d0f000-17c3e0d31000 17c3e0d0f 100033(read|write|mayread|maywrite|account)]+0x0/0xb8 [userland map]
[<...>] RSP: vm_area_struct[mount 2550 2550 anon 763d68173000-763d68195000 7ffffffdd 100133(read|write|mayread|maywrite|growsdown|account)]+0x0/0xb8 [userland map]
[<...>] R08: vm_area_struct[mount 2550 2550 anon 17c3e0d0f000-17c3e0d31000 17c3e0d0f 100033(read|write|mayread|maywrite|account)]+0x0/0xb8 [userland map]
[<...>] R09: vm_area_struct[mount 2550 2550 file 6dcb383d1000-6dcb383d3000 1cd 100033(read|write|mayread|maywrite|account)]+0x0/0xb8 [userland map]
[<...>] R13: vm_area_struct[mount 2550 2550 anon 17c3e0d0f000-17c3e0d31000 17c3e0d0f 100033(read|write|mayread|maywrite|account)]+0x0/0xb8 [userland map]
[<...>] R14: vm_area_struct[mount 2550 2550 anon 17c3e0d0f000-17c3e0d31000 17c3e0d0f 100033(read|write|mayread|maywrite|account)]+0x0/0xb8 [userland map]
[<...>] R15: vm_area_struct[mount 2550 2550 anon 17c3e0d0f000-17c3e0d31000 17c3e0d0f 100033(read|write|mayread|maywrite|account)]+0x0/0xb8 [userland map]
[<...>]  </PTREGS>
[<...>] Modules linked in:
[<...>] ---[ end trace 0000000000000000 ]---

The list debug message as well as RBX's symbolic value point out that the
object in question was allocated from 'tracefs_inode_cache' and that the
list's '->next' member is at offset 0. Dumping the layout of the relevant
parts of 'struct tracefs_inode' gives the following:

  struct tracefs_inode {
    union {
      struct inode {
        struct list_head {
          struct list_head * next;                    /*     0     8 */
          struct list_head * prev;                    /*     8     8 */
        } i_lru;
        [...]
      } vfs_inode;
      struct callback_head {
        void (*func)(struct callback_head *);         /*     0     8 */
        struct callback_head * next;                  /*     8     8 */
      } rcu;
    };
    [...]
  };

Above shows that 'vfs_inode.i_lru' overlaps with 'rcu' which will
destroy the 'i_lru' list as soon as the 'rcu' member gets used, e.g. in
call_rcu() or later when calling the RCU callback. This will disturb
concurrent list traversals as well as object reuse which assumes these
list heads will keep their integrity.

For reproduction, the following diff manually overlays 'i_lru' with
'rcu' as, otherwise, one would require some good portion of luck for
gambling an unlucky RANDSTRUCT seed:

  --- a/include/linux/fs.h
  +++ b/include/linux/fs.h
  @@ -629,6 +629,7 @@ struct inode {
   	umode_t			i_mode;
   	unsigned short		i_opflags;
   	kuid_t			i_uid;
  +	struct list_head	i_lru;		/* inode LRU list */
   	kgid_t			i_gid;
   	unsigned int		i_flags;

  @@ -690,7 +691,6 @@ struct inode {
   	u16			i_wb_frn_avg_time;
   	u16			i_wb_frn_history;
   #endif
  -	struct list_head	i_lru;		/* inode LRU list */
   	struct list_head	i_sb_list;
   	struct list_head	i_wb_list;	/* backing dev writeback list */
   	union {

The tracefs inode does not need to supply its own RCU delayed destruction
of its inode. The inode code itself offers both a "destroy_inode()"
callback that gets called when the last reference of the inode is
released, and the "free_inode()" which is called after a RCU
synchronization period from the "destroy_inode()".

The tracefs code can unlink the inode from its list in the destroy_inode()
callback, and the simply free it from the free_inode() callback. This
should provide the same protection.

Link: https://lore.kernel.org/all/[email protected]/

Cc: [email protected]
Cc: Masami Hiramatsu <[email protected]>
Cc: Mathieu Desnoyers <[email protected]>
Cc: Ajay Kaher <[email protected]>
Cc: Ilkka =?utf-8?b?TmF1bGFww6TDpA==?= <[email protected]>
Link: https://lore.kernel.org/[email protected]
Fixes: baa23a8 ("tracefs: Reset permissions on remount if permissions are options")
Reported-by: Mathias Krause <[email protected]>
Reported-by: Brad Spengler <[email protected]>
Suggested-by: Al Viro <[email protected]>
Signed-off-by: Steven Rostedt (Google) <[email protected]>
mj22226 pushed a commit to mj22226/linux that referenced this pull request Aug 12, 2024
commit 0b6743b upstream.

With structure layout randomization enabled for 'struct inode' we need to
avoid overlapping any of the RCU-used / initialized-only-once members,
e.g. i_lru or i_sb_list to not corrupt related list traversals when making
use of the rcu_head.

For an unlucky structure layout of 'struct inode' we may end up with the
following splat when running the ftrace selftests:

[<...>] list_del corruption, ffff888103ee2cb0->next (tracefs_inode_cache+0x0/0x4e0 [slab object]) is NULL (prev is tracefs_inode_cache+0x78/0x4e0 [slab object])
[<...>] ------------[ cut here ]------------
[<...>] kernel BUG at lib/list_debug.c:54!
[<...>] invalid opcode: 0000 [#1] PREEMPT SMP KASAN
[<...>] CPU: 3 PID: 2550 Comm: mount Tainted: G                 N  6.8.12-grsec+ torvalds#122 ed2f536ca62f28b087b90e3cc906a8d25b3ddc65
[<...>] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-2 04/01/2014
[<...>] RIP: 0010:[<ffffffff84656018>] __list_del_entry_valid_or_report+0x138/0x3e0
[<...>] Code: 48 b8 99 fb 65 f2 ff ff ff ff e9 03 5c d9 fc cc 48 b8 99 fb 65 f2 ff ff ff ff e9 33 5a d9 fc cc 48 b8 99 fb 65 f2 ff ff ff ff <0f> 0b 4c 89 e9 48 89 ea 48 89 ee 48 c7 c7 60 8f dd 89 31 c0 e8 2f
[<...>] RSP: 0018:fffffe80416afaf0 EFLAGS: 00010283
[<...>] RAX: 0000000000000098 RBX: ffff888103ee2cb0 RCX: 0000000000000000
[<...>] RDX: ffffffff84655fe8 RSI: ffffffff89dd8b60 RDI: 0000000000000001
[<...>] RBP: ffff888103ee2cb0 R08: 0000000000000001 R09: fffffbd0082d5f25
[<...>] R10: fffffe80416af92f R11: 0000000000000001 R12: fdf99c16731d9b6d
[<...>] R13: 0000000000000000 R14: ffff88819ad4b8b8 R15: 0000000000000000
[<...>] RBX: tracefs_inode_cache+0x0/0x4e0 [slab object]
[<...>] RDX: __list_del_entry_valid_or_report+0x108/0x3e0
[<...>] RSI: __func__.47+0x4340/0x4400
[<...>] RBP: tracefs_inode_cache+0x0/0x4e0 [slab object]
[<...>] RSP: process kstack fffffe80416afaf0+0x7af0/0x8000 [mount 2550 2550]
[<...>] R09: kasan shadow of process kstack fffffe80416af928+0x7928/0x8000 [mount 2550 2550]
[<...>] R10: process kstack fffffe80416af92f+0x792f/0x8000 [mount 2550 2550]
[<...>] R14: tracefs_inode_cache+0x78/0x4e0 [slab object]
[<...>] FS:  00006dcb380c1840(0000) GS:ffff8881e0600000(0000) knlGS:0000000000000000
[<...>] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[<...>] CR2: 000076ab72b30e84 CR3: 000000000b088004 CR4: 0000000000360ef0 shadow CR4: 0000000000360ef0
[<...>] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[<...>] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[<...>] ASID: 0003
[<...>] Stack:
[<...>]  ffffffff818a2315 00000000f5c856ee ffffffff896f1840 ffff888103ee2cb0
[<...>]  ffff88812b6b9750 0000000079d714b6 fffffbfff1e9280b ffffffff8f49405f
[<...>]  0000000000000001 0000000000000000 ffff888104457280 ffffffff8248b392
[<...>] Call Trace:
[<...>]  <TASK>
[<...>]  [<ffffffff818a2315>] ? lock_release+0x175/0x380 fffffe80416afaf0
[<...>]  [<ffffffff8248b392>] list_lru_del+0x152/0x740 fffffe80416afb48
[<...>]  [<ffffffff8248ba93>] list_lru_del_obj+0x113/0x280 fffffe80416afb88
[<...>]  [<ffffffff8940fd19>] ? _atomic_dec_and_lock+0x119/0x200 fffffe80416afb90
[<...>]  [<ffffffff8295b244>] iput_final+0x1c4/0x9a0 fffffe80416afbb8
[<...>]  [<ffffffff8293a52b>] dentry_unlink_inode+0x44b/0xaa0 fffffe80416afbf8
[<...>]  [<ffffffff8293fefc>] __dentry_kill+0x23c/0xf00 fffffe80416afc40
[<...>]  [<ffffffff8953a85f>] ? __this_cpu_preempt_check+0x1f/0xa0 fffffe80416afc48
[<...>]  [<ffffffff82949ce5>] ? shrink_dentry_list+0x1c5/0x760 fffffe80416afc70
[<...>]  [<ffffffff82949b71>] ? shrink_dentry_list+0x51/0x760 fffffe80416afc78
[<...>]  [<ffffffff82949da8>] shrink_dentry_list+0x288/0x760 fffffe80416afc80
[<...>]  [<ffffffff8294ae75>] shrink_dcache_sb+0x155/0x420 fffffe80416afcc8
[<...>]  [<ffffffff8953a7c3>] ? debug_smp_processor_id+0x23/0xa0 fffffe80416afce0
[<...>]  [<ffffffff8294ad20>] ? do_one_tree+0x140/0x140 fffffe80416afcf8
[<...>]  [<ffffffff82997349>] ? do_remount+0x329/0xa00 fffffe80416afd18
[<...>]  [<ffffffff83ebf7a1>] ? security_sb_remount+0x81/0x1c0 fffffe80416afd38
[<...>]  [<ffffffff82892096>] reconfigure_super+0x856/0x14e0 fffffe80416afd70
[<...>]  [<ffffffff815d1327>] ? ns_capable_common+0xe7/0x2a0 fffffe80416afd90
[<...>]  [<ffffffff82997436>] do_remount+0x416/0xa00 fffffe80416afdd0
[<...>]  [<ffffffff829b2ba4>] path_mount+0x5c4/0x900 fffffe80416afe28
[<...>]  [<ffffffff829b25e0>] ? finish_automount+0x13a0/0x13a0 fffffe80416afe60
[<...>]  [<ffffffff82903812>] ? user_path_at_empty+0xb2/0x140 fffffe80416afe88
[<...>]  [<ffffffff829b2ff5>] do_mount+0x115/0x1c0 fffffe80416afeb8
[<...>]  [<ffffffff829b2ee0>] ? path_mount+0x900/0x900 fffffe80416afed8
[<...>]  [<ffffffff8272461c>] ? __kasan_check_write+0x1c/0xa0 fffffe80416afee0
[<...>]  [<ffffffff829b31cf>] __do_sys_mount+0x12f/0x280 fffffe80416aff30
[<...>]  [<ffffffff829b36cd>] __x64_sys_mount+0xcd/0x2e0 fffffe80416aff70
[<...>]  [<ffffffff819f8818>] ? syscall_trace_enter+0x218/0x380 fffffe80416aff88
[<...>]  [<ffffffff8111655e>] x64_sys_call+0x5d5e/0x6720 fffffe80416affa8
[<...>]  [<ffffffff8952756d>] do_syscall_64+0xcd/0x3c0 fffffe80416affb8
[<...>]  [<ffffffff8100119b>] entry_SYSCALL_64_safe_stack+0x4c/0x87 fffffe80416affe8
[<...>]  </TASK>
[<...>]  <PTREGS>
[<...>] RIP: 0033:[<00006dcb382ff66a>] vm_area_struct[mount 2550 2550 file 6dcb38225000-6dcb3837e000 22 55(read|exec|mayread|mayexec)]+0x0/0xb8 [userland map]
[<...>] Code: 48 8b 0d 29 18 0d 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 49 89 ca b8 a5 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d f6 17 0d 00 f7 d8 64 89 01 48
[<...>] RSP: 002b:0000763d68192558 EFLAGS: 00000246 ORIG_RAX: 00000000000000a5
[<...>] RAX: ffffffffffffffda RBX: 00006dcb38433264 RCX: 00006dcb382ff66a
[<...>] RDX: 000017c3e0d11210 RSI: 000017c3e0d1a5a0 RDI: 000017c3e0d1ae70
[<...>] RBP: 000017c3e0d10fb0 R08: 000017c3e0d11260 R09: 00006dcb383d1be0
[<...>] R10: 000000000020002e R11: 0000000000000246 R12: 0000000000000000
[<...>] R13: 000017c3e0d1ae70 R14: 000017c3e0d11210 R15: 000017c3e0d10fb0
[<...>] RBX: vm_area_struct[mount 2550 2550 file 6dcb38433000-6dcb38434000 5b 100033(read|write|mayread|maywrite|account)]+0x0/0xb8 [userland map]
[<...>] RCX: vm_area_struct[mount 2550 2550 file 6dcb38225000-6dcb3837e000 22 55(read|exec|mayread|mayexec)]+0x0/0xb8 [userland map]
[<...>] RDX: vm_area_struct[mount 2550 2550 anon 17c3e0d0f000-17c3e0d31000 17c3e0d0f 100033(read|write|mayread|maywrite|account)]+0x0/0xb8 [userland map]
[<...>] RSI: vm_area_struct[mount 2550 2550 anon 17c3e0d0f000-17c3e0d31000 17c3e0d0f 100033(read|write|mayread|maywrite|account)]+0x0/0xb8 [userland map]
[<...>] RDI: vm_area_struct[mount 2550 2550 anon 17c3e0d0f000-17c3e0d31000 17c3e0d0f 100033(read|write|mayread|maywrite|account)]+0x0/0xb8 [userland map]
[<...>] RBP: vm_area_struct[mount 2550 2550 anon 17c3e0d0f000-17c3e0d31000 17c3e0d0f 100033(read|write|mayread|maywrite|account)]+0x0/0xb8 [userland map]
[<...>] RSP: vm_area_struct[mount 2550 2550 anon 763d68173000-763d68195000 7ffffffdd 100133(read|write|mayread|maywrite|growsdown|account)]+0x0/0xb8 [userland map]
[<...>] R08: vm_area_struct[mount 2550 2550 anon 17c3e0d0f000-17c3e0d31000 17c3e0d0f 100033(read|write|mayread|maywrite|account)]+0x0/0xb8 [userland map]
[<...>] R09: vm_area_struct[mount 2550 2550 file 6dcb383d1000-6dcb383d3000 1cd 100033(read|write|mayread|maywrite|account)]+0x0/0xb8 [userland map]
[<...>] R13: vm_area_struct[mount 2550 2550 anon 17c3e0d0f000-17c3e0d31000 17c3e0d0f 100033(read|write|mayread|maywrite|account)]+0x0/0xb8 [userland map]
[<...>] R14: vm_area_struct[mount 2550 2550 anon 17c3e0d0f000-17c3e0d31000 17c3e0d0f 100033(read|write|mayread|maywrite|account)]+0x0/0xb8 [userland map]
[<...>] R15: vm_area_struct[mount 2550 2550 anon 17c3e0d0f000-17c3e0d31000 17c3e0d0f 100033(read|write|mayread|maywrite|account)]+0x0/0xb8 [userland map]
[<...>]  </PTREGS>
[<...>] Modules linked in:
[<...>] ---[ end trace 0000000000000000 ]---

The list debug message as well as RBX's symbolic value point out that the
object in question was allocated from 'tracefs_inode_cache' and that the
list's '->next' member is at offset 0. Dumping the layout of the relevant
parts of 'struct tracefs_inode' gives the following:

  struct tracefs_inode {
    union {
      struct inode {
        struct list_head {
          struct list_head * next;                    /*     0     8 */
          struct list_head * prev;                    /*     8     8 */
        } i_lru;
        [...]
      } vfs_inode;
      struct callback_head {
        void (*func)(struct callback_head *);         /*     0     8 */
        struct callback_head * next;                  /*     8     8 */
      } rcu;
    };
    [...]
  };

Above shows that 'vfs_inode.i_lru' overlaps with 'rcu' which will
destroy the 'i_lru' list as soon as the 'rcu' member gets used, e.g. in
call_rcu() or later when calling the RCU callback. This will disturb
concurrent list traversals as well as object reuse which assumes these
list heads will keep their integrity.

For reproduction, the following diff manually overlays 'i_lru' with
'rcu' as, otherwise, one would require some good portion of luck for
gambling an unlucky RANDSTRUCT seed:

  --- a/include/linux/fs.h
  +++ b/include/linux/fs.h
  @@ -629,6 +629,7 @@ struct inode {
   	umode_t			i_mode;
   	unsigned short		i_opflags;
   	kuid_t			i_uid;
  +	struct list_head	i_lru;		/* inode LRU list */
   	kgid_t			i_gid;
   	unsigned int		i_flags;

  @@ -690,7 +691,6 @@ struct inode {
   	u16			i_wb_frn_avg_time;
   	u16			i_wb_frn_history;
   #endif
  -	struct list_head	i_lru;		/* inode LRU list */
   	struct list_head	i_sb_list;
   	struct list_head	i_wb_list;	/* backing dev writeback list */
   	union {

The tracefs inode does not need to supply its own RCU delayed destruction
of its inode. The inode code itself offers both a "destroy_inode()"
callback that gets called when the last reference of the inode is
released, and the "free_inode()" which is called after a RCU
synchronization period from the "destroy_inode()".

The tracefs code can unlink the inode from its list in the destroy_inode()
callback, and the simply free it from the free_inode() callback. This
should provide the same protection.

Link: https://lore.kernel.org/all/[email protected]/

Cc: [email protected]
Cc: Masami Hiramatsu <[email protected]>
Cc: Mathieu Desnoyers <[email protected]>
Cc: Ajay Kaher <[email protected]>
Cc: Ilkka =?utf-8?b?TmF1bGFww6TDpA==?= <[email protected]>
Link: https://lore.kernel.org/[email protected]
Fixes: baa23a8 ("tracefs: Reset permissions on remount if permissions are options")
Reported-by: Mathias Krause <[email protected]>
Reported-by: Brad Spengler <[email protected]>
Suggested-by: Al Viro <[email protected]>
Signed-off-by: Steven Rostedt (Google) <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
mj22226 pushed a commit to mj22226/linux that referenced this pull request Aug 12, 2024
commit 0b6743b upstream.

With structure layout randomization enabled for 'struct inode' we need to
avoid overlapping any of the RCU-used / initialized-only-once members,
e.g. i_lru or i_sb_list to not corrupt related list traversals when making
use of the rcu_head.

For an unlucky structure layout of 'struct inode' we may end up with the
following splat when running the ftrace selftests:

[<...>] list_del corruption, ffff888103ee2cb0->next (tracefs_inode_cache+0x0/0x4e0 [slab object]) is NULL (prev is tracefs_inode_cache+0x78/0x4e0 [slab object])
[<...>] ------------[ cut here ]------------
[<...>] kernel BUG at lib/list_debug.c:54!
[<...>] invalid opcode: 0000 [#1] PREEMPT SMP KASAN
[<...>] CPU: 3 PID: 2550 Comm: mount Tainted: G                 N  6.8.12-grsec+ torvalds#122 ed2f536ca62f28b087b90e3cc906a8d25b3ddc65
[<...>] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-2 04/01/2014
[<...>] RIP: 0010:[<ffffffff84656018>] __list_del_entry_valid_or_report+0x138/0x3e0
[<...>] Code: 48 b8 99 fb 65 f2 ff ff ff ff e9 03 5c d9 fc cc 48 b8 99 fb 65 f2 ff ff ff ff e9 33 5a d9 fc cc 48 b8 99 fb 65 f2 ff ff ff ff <0f> 0b 4c 89 e9 48 89 ea 48 89 ee 48 c7 c7 60 8f dd 89 31 c0 e8 2f
[<...>] RSP: 0018:fffffe80416afaf0 EFLAGS: 00010283
[<...>] RAX: 0000000000000098 RBX: ffff888103ee2cb0 RCX: 0000000000000000
[<...>] RDX: ffffffff84655fe8 RSI: ffffffff89dd8b60 RDI: 0000000000000001
[<...>] RBP: ffff888103ee2cb0 R08: 0000000000000001 R09: fffffbd0082d5f25
[<...>] R10: fffffe80416af92f R11: 0000000000000001 R12: fdf99c16731d9b6d
[<...>] R13: 0000000000000000 R14: ffff88819ad4b8b8 R15: 0000000000000000
[<...>] RBX: tracefs_inode_cache+0x0/0x4e0 [slab object]
[<...>] RDX: __list_del_entry_valid_or_report+0x108/0x3e0
[<...>] RSI: __func__.47+0x4340/0x4400
[<...>] RBP: tracefs_inode_cache+0x0/0x4e0 [slab object]
[<...>] RSP: process kstack fffffe80416afaf0+0x7af0/0x8000 [mount 2550 2550]
[<...>] R09: kasan shadow of process kstack fffffe80416af928+0x7928/0x8000 [mount 2550 2550]
[<...>] R10: process kstack fffffe80416af92f+0x792f/0x8000 [mount 2550 2550]
[<...>] R14: tracefs_inode_cache+0x78/0x4e0 [slab object]
[<...>] FS:  00006dcb380c1840(0000) GS:ffff8881e0600000(0000) knlGS:0000000000000000
[<...>] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[<...>] CR2: 000076ab72b30e84 CR3: 000000000b088004 CR4: 0000000000360ef0 shadow CR4: 0000000000360ef0
[<...>] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[<...>] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[<...>] ASID: 0003
[<...>] Stack:
[<...>]  ffffffff818a2315 00000000f5c856ee ffffffff896f1840 ffff888103ee2cb0
[<...>]  ffff88812b6b9750 0000000079d714b6 fffffbfff1e9280b ffffffff8f49405f
[<...>]  0000000000000001 0000000000000000 ffff888104457280 ffffffff8248b392
[<...>] Call Trace:
[<...>]  <TASK>
[<...>]  [<ffffffff818a2315>] ? lock_release+0x175/0x380 fffffe80416afaf0
[<...>]  [<ffffffff8248b392>] list_lru_del+0x152/0x740 fffffe80416afb48
[<...>]  [<ffffffff8248ba93>] list_lru_del_obj+0x113/0x280 fffffe80416afb88
[<...>]  [<ffffffff8940fd19>] ? _atomic_dec_and_lock+0x119/0x200 fffffe80416afb90
[<...>]  [<ffffffff8295b244>] iput_final+0x1c4/0x9a0 fffffe80416afbb8
[<...>]  [<ffffffff8293a52b>] dentry_unlink_inode+0x44b/0xaa0 fffffe80416afbf8
[<...>]  [<ffffffff8293fefc>] __dentry_kill+0x23c/0xf00 fffffe80416afc40
[<...>]  [<ffffffff8953a85f>] ? __this_cpu_preempt_check+0x1f/0xa0 fffffe80416afc48
[<...>]  [<ffffffff82949ce5>] ? shrink_dentry_list+0x1c5/0x760 fffffe80416afc70
[<...>]  [<ffffffff82949b71>] ? shrink_dentry_list+0x51/0x760 fffffe80416afc78
[<...>]  [<ffffffff82949da8>] shrink_dentry_list+0x288/0x760 fffffe80416afc80
[<...>]  [<ffffffff8294ae75>] shrink_dcache_sb+0x155/0x420 fffffe80416afcc8
[<...>]  [<ffffffff8953a7c3>] ? debug_smp_processor_id+0x23/0xa0 fffffe80416afce0
[<...>]  [<ffffffff8294ad20>] ? do_one_tree+0x140/0x140 fffffe80416afcf8
[<...>]  [<ffffffff82997349>] ? do_remount+0x329/0xa00 fffffe80416afd18
[<...>]  [<ffffffff83ebf7a1>] ? security_sb_remount+0x81/0x1c0 fffffe80416afd38
[<...>]  [<ffffffff82892096>] reconfigure_super+0x856/0x14e0 fffffe80416afd70
[<...>]  [<ffffffff815d1327>] ? ns_capable_common+0xe7/0x2a0 fffffe80416afd90
[<...>]  [<ffffffff82997436>] do_remount+0x416/0xa00 fffffe80416afdd0
[<...>]  [<ffffffff829b2ba4>] path_mount+0x5c4/0x900 fffffe80416afe28
[<...>]  [<ffffffff829b25e0>] ? finish_automount+0x13a0/0x13a0 fffffe80416afe60
[<...>]  [<ffffffff82903812>] ? user_path_at_empty+0xb2/0x140 fffffe80416afe88
[<...>]  [<ffffffff829b2ff5>] do_mount+0x115/0x1c0 fffffe80416afeb8
[<...>]  [<ffffffff829b2ee0>] ? path_mount+0x900/0x900 fffffe80416afed8
[<...>]  [<ffffffff8272461c>] ? __kasan_check_write+0x1c/0xa0 fffffe80416afee0
[<...>]  [<ffffffff829b31cf>] __do_sys_mount+0x12f/0x280 fffffe80416aff30
[<...>]  [<ffffffff829b36cd>] __x64_sys_mount+0xcd/0x2e0 fffffe80416aff70
[<...>]  [<ffffffff819f8818>] ? syscall_trace_enter+0x218/0x380 fffffe80416aff88
[<...>]  [<ffffffff8111655e>] x64_sys_call+0x5d5e/0x6720 fffffe80416affa8
[<...>]  [<ffffffff8952756d>] do_syscall_64+0xcd/0x3c0 fffffe80416affb8
[<...>]  [<ffffffff8100119b>] entry_SYSCALL_64_safe_stack+0x4c/0x87 fffffe80416affe8
[<...>]  </TASK>
[<...>]  <PTREGS>
[<...>] RIP: 0033:[<00006dcb382ff66a>] vm_area_struct[mount 2550 2550 file 6dcb38225000-6dcb3837e000 22 55(read|exec|mayread|mayexec)]+0x0/0xb8 [userland map]
[<...>] Code: 48 8b 0d 29 18 0d 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 49 89 ca b8 a5 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d f6 17 0d 00 f7 d8 64 89 01 48
[<...>] RSP: 002b:0000763d68192558 EFLAGS: 00000246 ORIG_RAX: 00000000000000a5
[<...>] RAX: ffffffffffffffda RBX: 00006dcb38433264 RCX: 00006dcb382ff66a
[<...>] RDX: 000017c3e0d11210 RSI: 000017c3e0d1a5a0 RDI: 000017c3e0d1ae70
[<...>] RBP: 000017c3e0d10fb0 R08: 000017c3e0d11260 R09: 00006dcb383d1be0
[<...>] R10: 000000000020002e R11: 0000000000000246 R12: 0000000000000000
[<...>] R13: 000017c3e0d1ae70 R14: 000017c3e0d11210 R15: 000017c3e0d10fb0
[<...>] RBX: vm_area_struct[mount 2550 2550 file 6dcb38433000-6dcb38434000 5b 100033(read|write|mayread|maywrite|account)]+0x0/0xb8 [userland map]
[<...>] RCX: vm_area_struct[mount 2550 2550 file 6dcb38225000-6dcb3837e000 22 55(read|exec|mayread|mayexec)]+0x0/0xb8 [userland map]
[<...>] RDX: vm_area_struct[mount 2550 2550 anon 17c3e0d0f000-17c3e0d31000 17c3e0d0f 100033(read|write|mayread|maywrite|account)]+0x0/0xb8 [userland map]
[<...>] RSI: vm_area_struct[mount 2550 2550 anon 17c3e0d0f000-17c3e0d31000 17c3e0d0f 100033(read|write|mayread|maywrite|account)]+0x0/0xb8 [userland map]
[<...>] RDI: vm_area_struct[mount 2550 2550 anon 17c3e0d0f000-17c3e0d31000 17c3e0d0f 100033(read|write|mayread|maywrite|account)]+0x0/0xb8 [userland map]
[<...>] RBP: vm_area_struct[mount 2550 2550 anon 17c3e0d0f000-17c3e0d31000 17c3e0d0f 100033(read|write|mayread|maywrite|account)]+0x0/0xb8 [userland map]
[<...>] RSP: vm_area_struct[mount 2550 2550 anon 763d68173000-763d68195000 7ffffffdd 100133(read|write|mayread|maywrite|growsdown|account)]+0x0/0xb8 [userland map]
[<...>] R08: vm_area_struct[mount 2550 2550 anon 17c3e0d0f000-17c3e0d31000 17c3e0d0f 100033(read|write|mayread|maywrite|account)]+0x0/0xb8 [userland map]
[<...>] R09: vm_area_struct[mount 2550 2550 file 6dcb383d1000-6dcb383d3000 1cd 100033(read|write|mayread|maywrite|account)]+0x0/0xb8 [userland map]
[<...>] R13: vm_area_struct[mount 2550 2550 anon 17c3e0d0f000-17c3e0d31000 17c3e0d0f 100033(read|write|mayread|maywrite|account)]+0x0/0xb8 [userland map]
[<...>] R14: vm_area_struct[mount 2550 2550 anon 17c3e0d0f000-17c3e0d31000 17c3e0d0f 100033(read|write|mayread|maywrite|account)]+0x0/0xb8 [userland map]
[<...>] R15: vm_area_struct[mount 2550 2550 anon 17c3e0d0f000-17c3e0d31000 17c3e0d0f 100033(read|write|mayread|maywrite|account)]+0x0/0xb8 [userland map]
[<...>]  </PTREGS>
[<...>] Modules linked in:
[<...>] ---[ end trace 0000000000000000 ]---

The list debug message as well as RBX's symbolic value point out that the
object in question was allocated from 'tracefs_inode_cache' and that the
list's '->next' member is at offset 0. Dumping the layout of the relevant
parts of 'struct tracefs_inode' gives the following:

  struct tracefs_inode {
    union {
      struct inode {
        struct list_head {
          struct list_head * next;                    /*     0     8 */
          struct list_head * prev;                    /*     8     8 */
        } i_lru;
        [...]
      } vfs_inode;
      struct callback_head {
        void (*func)(struct callback_head *);         /*     0     8 */
        struct callback_head * next;                  /*     8     8 */
      } rcu;
    };
    [...]
  };

Above shows that 'vfs_inode.i_lru' overlaps with 'rcu' which will
destroy the 'i_lru' list as soon as the 'rcu' member gets used, e.g. in
call_rcu() or later when calling the RCU callback. This will disturb
concurrent list traversals as well as object reuse which assumes these
list heads will keep their integrity.

For reproduction, the following diff manually overlays 'i_lru' with
'rcu' as, otherwise, one would require some good portion of luck for
gambling an unlucky RANDSTRUCT seed:

  --- a/include/linux/fs.h
  +++ b/include/linux/fs.h
  @@ -629,6 +629,7 @@ struct inode {
   	umode_t			i_mode;
   	unsigned short		i_opflags;
   	kuid_t			i_uid;
  +	struct list_head	i_lru;		/* inode LRU list */
   	kgid_t			i_gid;
   	unsigned int		i_flags;

  @@ -690,7 +691,6 @@ struct inode {
   	u16			i_wb_frn_avg_time;
   	u16			i_wb_frn_history;
   #endif
  -	struct list_head	i_lru;		/* inode LRU list */
   	struct list_head	i_sb_list;
   	struct list_head	i_wb_list;	/* backing dev writeback list */
   	union {

The tracefs inode does not need to supply its own RCU delayed destruction
of its inode. The inode code itself offers both a "destroy_inode()"
callback that gets called when the last reference of the inode is
released, and the "free_inode()" which is called after a RCU
synchronization period from the "destroy_inode()".

The tracefs code can unlink the inode from its list in the destroy_inode()
callback, and the simply free it from the free_inode() callback. This
should provide the same protection.

Link: https://lore.kernel.org/all/[email protected]/

Cc: [email protected]
Cc: Masami Hiramatsu <[email protected]>
Cc: Mathieu Desnoyers <[email protected]>
Cc: Ajay Kaher <[email protected]>
Cc: Ilkka =?utf-8?b?TmF1bGFww6TDpA==?= <[email protected]>
Link: https://lore.kernel.org/[email protected]
Fixes: baa23a8 ("tracefs: Reset permissions on remount if permissions are options")
Reported-by: Mathias Krause <[email protected]>
Reported-by: Brad Spengler <[email protected]>
Suggested-by: Al Viro <[email protected]>
Signed-off-by: Steven Rostedt (Google) <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
Kaz205 pushed a commit to Kaz205/linux that referenced this pull request Aug 14, 2024
commit 0b6743b upstream.

With structure layout randomization enabled for 'struct inode' we need to
avoid overlapping any of the RCU-used / initialized-only-once members,
e.g. i_lru or i_sb_list to not corrupt related list traversals when making
use of the rcu_head.

For an unlucky structure layout of 'struct inode' we may end up with the
following splat when running the ftrace selftests:

[<...>] list_del corruption, ffff888103ee2cb0->next (tracefs_inode_cache+0x0/0x4e0 [slab object]) is NULL (prev is tracefs_inode_cache+0x78/0x4e0 [slab object])
[<...>] ------------[ cut here ]------------
[<...>] kernel BUG at lib/list_debug.c:54!
[<...>] invalid opcode: 0000 [#1] PREEMPT SMP KASAN
[<...>] CPU: 3 PID: 2550 Comm: mount Tainted: G                 N  6.8.12-grsec+ torvalds#122 ed2f536ca62f28b087b90e3cc906a8d25b3ddc65
[<...>] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-2 04/01/2014
[<...>] RIP: 0010:[<ffffffff84656018>] __list_del_entry_valid_or_report+0x138/0x3e0
[<...>] Code: 48 b8 99 fb 65 f2 ff ff ff ff e9 03 5c d9 fc cc 48 b8 99 fb 65 f2 ff ff ff ff e9 33 5a d9 fc cc 48 b8 99 fb 65 f2 ff ff ff ff <0f> 0b 4c 89 e9 48 89 ea 48 89 ee 48 c7 c7 60 8f dd 89 31 c0 e8 2f
[<...>] RSP: 0018:fffffe80416afaf0 EFLAGS: 00010283
[<...>] RAX: 0000000000000098 RBX: ffff888103ee2cb0 RCX: 0000000000000000
[<...>] RDX: ffffffff84655fe8 RSI: ffffffff89dd8b60 RDI: 0000000000000001
[<...>] RBP: ffff888103ee2cb0 R08: 0000000000000001 R09: fffffbd0082d5f25
[<...>] R10: fffffe80416af92f R11: 0000000000000001 R12: fdf99c16731d9b6d
[<...>] R13: 0000000000000000 R14: ffff88819ad4b8b8 R15: 0000000000000000
[<...>] RBX: tracefs_inode_cache+0x0/0x4e0 [slab object]
[<...>] RDX: __list_del_entry_valid_or_report+0x108/0x3e0
[<...>] RSI: __func__.47+0x4340/0x4400
[<...>] RBP: tracefs_inode_cache+0x0/0x4e0 [slab object]
[<...>] RSP: process kstack fffffe80416afaf0+0x7af0/0x8000 [mount 2550 2550]
[<...>] R09: kasan shadow of process kstack fffffe80416af928+0x7928/0x8000 [mount 2550 2550]
[<...>] R10: process kstack fffffe80416af92f+0x792f/0x8000 [mount 2550 2550]
[<...>] R14: tracefs_inode_cache+0x78/0x4e0 [slab object]
[<...>] FS:  00006dcb380c1840(0000) GS:ffff8881e0600000(0000) knlGS:0000000000000000
[<...>] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[<...>] CR2: 000076ab72b30e84 CR3: 000000000b088004 CR4: 0000000000360ef0 shadow CR4: 0000000000360ef0
[<...>] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[<...>] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[<...>] ASID: 0003
[<...>] Stack:
[<...>]  ffffffff818a2315 00000000f5c856ee ffffffff896f1840 ffff888103ee2cb0
[<...>]  ffff88812b6b9750 0000000079d714b6 fffffbfff1e9280b ffffffff8f49405f
[<...>]  0000000000000001 0000000000000000 ffff888104457280 ffffffff8248b392
[<...>] Call Trace:
[<...>]  <TASK>
[<...>]  [<ffffffff818a2315>] ? lock_release+0x175/0x380 fffffe80416afaf0
[<...>]  [<ffffffff8248b392>] list_lru_del+0x152/0x740 fffffe80416afb48
[<...>]  [<ffffffff8248ba93>] list_lru_del_obj+0x113/0x280 fffffe80416afb88
[<...>]  [<ffffffff8940fd19>] ? _atomic_dec_and_lock+0x119/0x200 fffffe80416afb90
[<...>]  [<ffffffff8295b244>] iput_final+0x1c4/0x9a0 fffffe80416afbb8
[<...>]  [<ffffffff8293a52b>] dentry_unlink_inode+0x44b/0xaa0 fffffe80416afbf8
[<...>]  [<ffffffff8293fefc>] __dentry_kill+0x23c/0xf00 fffffe80416afc40
[<...>]  [<ffffffff8953a85f>] ? __this_cpu_preempt_check+0x1f/0xa0 fffffe80416afc48
[<...>]  [<ffffffff82949ce5>] ? shrink_dentry_list+0x1c5/0x760 fffffe80416afc70
[<...>]  [<ffffffff82949b71>] ? shrink_dentry_list+0x51/0x760 fffffe80416afc78
[<...>]  [<ffffffff82949da8>] shrink_dentry_list+0x288/0x760 fffffe80416afc80
[<...>]  [<ffffffff8294ae75>] shrink_dcache_sb+0x155/0x420 fffffe80416afcc8
[<...>]  [<ffffffff8953a7c3>] ? debug_smp_processor_id+0x23/0xa0 fffffe80416afce0
[<...>]  [<ffffffff8294ad20>] ? do_one_tree+0x140/0x140 fffffe80416afcf8
[<...>]  [<ffffffff82997349>] ? do_remount+0x329/0xa00 fffffe80416afd18
[<...>]  [<ffffffff83ebf7a1>] ? security_sb_remount+0x81/0x1c0 fffffe80416afd38
[<...>]  [<ffffffff82892096>] reconfigure_super+0x856/0x14e0 fffffe80416afd70
[<...>]  [<ffffffff815d1327>] ? ns_capable_common+0xe7/0x2a0 fffffe80416afd90
[<...>]  [<ffffffff82997436>] do_remount+0x416/0xa00 fffffe80416afdd0
[<...>]  [<ffffffff829b2ba4>] path_mount+0x5c4/0x900 fffffe80416afe28
[<...>]  [<ffffffff829b25e0>] ? finish_automount+0x13a0/0x13a0 fffffe80416afe60
[<...>]  [<ffffffff82903812>] ? user_path_at_empty+0xb2/0x140 fffffe80416afe88
[<...>]  [<ffffffff829b2ff5>] do_mount+0x115/0x1c0 fffffe80416afeb8
[<...>]  [<ffffffff829b2ee0>] ? path_mount+0x900/0x900 fffffe80416afed8
[<...>]  [<ffffffff8272461c>] ? __kasan_check_write+0x1c/0xa0 fffffe80416afee0
[<...>]  [<ffffffff829b31cf>] __do_sys_mount+0x12f/0x280 fffffe80416aff30
[<...>]  [<ffffffff829b36cd>] __x64_sys_mount+0xcd/0x2e0 fffffe80416aff70
[<...>]  [<ffffffff819f8818>] ? syscall_trace_enter+0x218/0x380 fffffe80416aff88
[<...>]  [<ffffffff8111655e>] x64_sys_call+0x5d5e/0x6720 fffffe80416affa8
[<...>]  [<ffffffff8952756d>] do_syscall_64+0xcd/0x3c0 fffffe80416affb8
[<...>]  [<ffffffff8100119b>] entry_SYSCALL_64_safe_stack+0x4c/0x87 fffffe80416affe8
[<...>]  </TASK>
[<...>]  <PTREGS>
[<...>] RIP: 0033:[<00006dcb382ff66a>] vm_area_struct[mount 2550 2550 file 6dcb38225000-6dcb3837e000 22 55(read|exec|mayread|mayexec)]+0x0/0xb8 [userland map]
[<...>] Code: 48 8b 0d 29 18 0d 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 49 89 ca b8 a5 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d f6 17 0d 00 f7 d8 64 89 01 48
[<...>] RSP: 002b:0000763d68192558 EFLAGS: 00000246 ORIG_RAX: 00000000000000a5
[<...>] RAX: ffffffffffffffda RBX: 00006dcb38433264 RCX: 00006dcb382ff66a
[<...>] RDX: 000017c3e0d11210 RSI: 000017c3e0d1a5a0 RDI: 000017c3e0d1ae70
[<...>] RBP: 000017c3e0d10fb0 R08: 000017c3e0d11260 R09: 00006dcb383d1be0
[<...>] R10: 000000000020002e R11: 0000000000000246 R12: 0000000000000000
[<...>] R13: 000017c3e0d1ae70 R14: 000017c3e0d11210 R15: 000017c3e0d10fb0
[<...>] RBX: vm_area_struct[mount 2550 2550 file 6dcb38433000-6dcb38434000 5b 100033(read|write|mayread|maywrite|account)]+0x0/0xb8 [userland map]
[<...>] RCX: vm_area_struct[mount 2550 2550 file 6dcb38225000-6dcb3837e000 22 55(read|exec|mayread|mayexec)]+0x0/0xb8 [userland map]
[<...>] RDX: vm_area_struct[mount 2550 2550 anon 17c3e0d0f000-17c3e0d31000 17c3e0d0f 100033(read|write|mayread|maywrite|account)]+0x0/0xb8 [userland map]
[<...>] RSI: vm_area_struct[mount 2550 2550 anon 17c3e0d0f000-17c3e0d31000 17c3e0d0f 100033(read|write|mayread|maywrite|account)]+0x0/0xb8 [userland map]
[<...>] RDI: vm_area_struct[mount 2550 2550 anon 17c3e0d0f000-17c3e0d31000 17c3e0d0f 100033(read|write|mayread|maywrite|account)]+0x0/0xb8 [userland map]
[<...>] RBP: vm_area_struct[mount 2550 2550 anon 17c3e0d0f000-17c3e0d31000 17c3e0d0f 100033(read|write|mayread|maywrite|account)]+0x0/0xb8 [userland map]
[<...>] RSP: vm_area_struct[mount 2550 2550 anon 763d68173000-763d68195000 7ffffffdd 100133(read|write|mayread|maywrite|growsdown|account)]+0x0/0xb8 [userland map]
[<...>] R08: vm_area_struct[mount 2550 2550 anon 17c3e0d0f000-17c3e0d31000 17c3e0d0f 100033(read|write|mayread|maywrite|account)]+0x0/0xb8 [userland map]
[<...>] R09: vm_area_struct[mount 2550 2550 file 6dcb383d1000-6dcb383d3000 1cd 100033(read|write|mayread|maywrite|account)]+0x0/0xb8 [userland map]
[<...>] R13: vm_area_struct[mount 2550 2550 anon 17c3e0d0f000-17c3e0d31000 17c3e0d0f 100033(read|write|mayread|maywrite|account)]+0x0/0xb8 [userland map]
[<...>] R14: vm_area_struct[mount 2550 2550 anon 17c3e0d0f000-17c3e0d31000 17c3e0d0f 100033(read|write|mayread|maywrite|account)]+0x0/0xb8 [userland map]
[<...>] R15: vm_area_struct[mount 2550 2550 anon 17c3e0d0f000-17c3e0d31000 17c3e0d0f 100033(read|write|mayread|maywrite|account)]+0x0/0xb8 [userland map]
[<...>]  </PTREGS>
[<...>] Modules linked in:
[<...>] ---[ end trace 0000000000000000 ]---

The list debug message as well as RBX's symbolic value point out that the
object in question was allocated from 'tracefs_inode_cache' and that the
list's '->next' member is at offset 0. Dumping the layout of the relevant
parts of 'struct tracefs_inode' gives the following:

  struct tracefs_inode {
    union {
      struct inode {
        struct list_head {
          struct list_head * next;                    /*     0     8 */
          struct list_head * prev;                    /*     8     8 */
        } i_lru;
        [...]
      } vfs_inode;
      struct callback_head {
        void (*func)(struct callback_head *);         /*     0     8 */
        struct callback_head * next;                  /*     8     8 */
      } rcu;
    };
    [...]
  };

Above shows that 'vfs_inode.i_lru' overlaps with 'rcu' which will
destroy the 'i_lru' list as soon as the 'rcu' member gets used, e.g. in
call_rcu() or later when calling the RCU callback. This will disturb
concurrent list traversals as well as object reuse which assumes these
list heads will keep their integrity.

For reproduction, the following diff manually overlays 'i_lru' with
'rcu' as, otherwise, one would require some good portion of luck for
gambling an unlucky RANDSTRUCT seed:

  --- a/include/linux/fs.h
  +++ b/include/linux/fs.h
  @@ -629,6 +629,7 @@ struct inode {
   	umode_t			i_mode;
   	unsigned short		i_opflags;
   	kuid_t			i_uid;
  +	struct list_head	i_lru;		/* inode LRU list */
   	kgid_t			i_gid;
   	unsigned int		i_flags;

  @@ -690,7 +691,6 @@ struct inode {
   	u16			i_wb_frn_avg_time;
   	u16			i_wb_frn_history;
   #endif
  -	struct list_head	i_lru;		/* inode LRU list */
   	struct list_head	i_sb_list;
   	struct list_head	i_wb_list;	/* backing dev writeback list */
   	union {

The tracefs inode does not need to supply its own RCU delayed destruction
of its inode. The inode code itself offers both a "destroy_inode()"
callback that gets called when the last reference of the inode is
released, and the "free_inode()" which is called after a RCU
synchronization period from the "destroy_inode()".

The tracefs code can unlink the inode from its list in the destroy_inode()
callback, and the simply free it from the free_inode() callback. This
should provide the same protection.

Link: https://lore.kernel.org/all/[email protected]/

Cc: [email protected]
Cc: Masami Hiramatsu <[email protected]>
Cc: Mathieu Desnoyers <[email protected]>
Cc: Ajay Kaher <[email protected]>
Cc: Ilkka =?utf-8?b?TmF1bGFww6TDpA==?= <[email protected]>
Link: https://lore.kernel.org/[email protected]
Fixes: baa23a8 ("tracefs: Reset permissions on remount if permissions are options")
Reported-by: Mathias Krause <[email protected]>
Reported-by: Brad Spengler <[email protected]>
Suggested-by: Al Viro <[email protected]>
Signed-off-by: Steven Rostedt (Google) <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
Kaz205 pushed a commit to Kaz205/linux that referenced this pull request Aug 14, 2024
commit 0b6743b upstream.

With structure layout randomization enabled for 'struct inode' we need to
avoid overlapping any of the RCU-used / initialized-only-once members,
e.g. i_lru or i_sb_list to not corrupt related list traversals when making
use of the rcu_head.

For an unlucky structure layout of 'struct inode' we may end up with the
following splat when running the ftrace selftests:

[<...>] list_del corruption, ffff888103ee2cb0->next (tracefs_inode_cache+0x0/0x4e0 [slab object]) is NULL (prev is tracefs_inode_cache+0x78/0x4e0 [slab object])
[<...>] ------------[ cut here ]------------
[<...>] kernel BUG at lib/list_debug.c:54!
[<...>] invalid opcode: 0000 [#1] PREEMPT SMP KASAN
[<...>] CPU: 3 PID: 2550 Comm: mount Tainted: G                 N  6.8.12-grsec+ torvalds#122 ed2f536ca62f28b087b90e3cc906a8d25b3ddc65
[<...>] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-2 04/01/2014
[<...>] RIP: 0010:[<ffffffff84656018>] __list_del_entry_valid_or_report+0x138/0x3e0
[<...>] Code: 48 b8 99 fb 65 f2 ff ff ff ff e9 03 5c d9 fc cc 48 b8 99 fb 65 f2 ff ff ff ff e9 33 5a d9 fc cc 48 b8 99 fb 65 f2 ff ff ff ff <0f> 0b 4c 89 e9 48 89 ea 48 89 ee 48 c7 c7 60 8f dd 89 31 c0 e8 2f
[<...>] RSP: 0018:fffffe80416afaf0 EFLAGS: 00010283
[<...>] RAX: 0000000000000098 RBX: ffff888103ee2cb0 RCX: 0000000000000000
[<...>] RDX: ffffffff84655fe8 RSI: ffffffff89dd8b60 RDI: 0000000000000001
[<...>] RBP: ffff888103ee2cb0 R08: 0000000000000001 R09: fffffbd0082d5f25
[<...>] R10: fffffe80416af92f R11: 0000000000000001 R12: fdf99c16731d9b6d
[<...>] R13: 0000000000000000 R14: ffff88819ad4b8b8 R15: 0000000000000000
[<...>] RBX: tracefs_inode_cache+0x0/0x4e0 [slab object]
[<...>] RDX: __list_del_entry_valid_or_report+0x108/0x3e0
[<...>] RSI: __func__.47+0x4340/0x4400
[<...>] RBP: tracefs_inode_cache+0x0/0x4e0 [slab object]
[<...>] RSP: process kstack fffffe80416afaf0+0x7af0/0x8000 [mount 2550 2550]
[<...>] R09: kasan shadow of process kstack fffffe80416af928+0x7928/0x8000 [mount 2550 2550]
[<...>] R10: process kstack fffffe80416af92f+0x792f/0x8000 [mount 2550 2550]
[<...>] R14: tracefs_inode_cache+0x78/0x4e0 [slab object]
[<...>] FS:  00006dcb380c1840(0000) GS:ffff8881e0600000(0000) knlGS:0000000000000000
[<...>] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[<...>] CR2: 000076ab72b30e84 CR3: 000000000b088004 CR4: 0000000000360ef0 shadow CR4: 0000000000360ef0
[<...>] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[<...>] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[<...>] ASID: 0003
[<...>] Stack:
[<...>]  ffffffff818a2315 00000000f5c856ee ffffffff896f1840 ffff888103ee2cb0
[<...>]  ffff88812b6b9750 0000000079d714b6 fffffbfff1e9280b ffffffff8f49405f
[<...>]  0000000000000001 0000000000000000 ffff888104457280 ffffffff8248b392
[<...>] Call Trace:
[<...>]  <TASK>
[<...>]  [<ffffffff818a2315>] ? lock_release+0x175/0x380 fffffe80416afaf0
[<...>]  [<ffffffff8248b392>] list_lru_del+0x152/0x740 fffffe80416afb48
[<...>]  [<ffffffff8248ba93>] list_lru_del_obj+0x113/0x280 fffffe80416afb88
[<...>]  [<ffffffff8940fd19>] ? _atomic_dec_and_lock+0x119/0x200 fffffe80416afb90
[<...>]  [<ffffffff8295b244>] iput_final+0x1c4/0x9a0 fffffe80416afbb8
[<...>]  [<ffffffff8293a52b>] dentry_unlink_inode+0x44b/0xaa0 fffffe80416afbf8
[<...>]  [<ffffffff8293fefc>] __dentry_kill+0x23c/0xf00 fffffe80416afc40
[<...>]  [<ffffffff8953a85f>] ? __this_cpu_preempt_check+0x1f/0xa0 fffffe80416afc48
[<...>]  [<ffffffff82949ce5>] ? shrink_dentry_list+0x1c5/0x760 fffffe80416afc70
[<...>]  [<ffffffff82949b71>] ? shrink_dentry_list+0x51/0x760 fffffe80416afc78
[<...>]  [<ffffffff82949da8>] shrink_dentry_list+0x288/0x760 fffffe80416afc80
[<...>]  [<ffffffff8294ae75>] shrink_dcache_sb+0x155/0x420 fffffe80416afcc8
[<...>]  [<ffffffff8953a7c3>] ? debug_smp_processor_id+0x23/0xa0 fffffe80416afce0
[<...>]  [<ffffffff8294ad20>] ? do_one_tree+0x140/0x140 fffffe80416afcf8
[<...>]  [<ffffffff82997349>] ? do_remount+0x329/0xa00 fffffe80416afd18
[<...>]  [<ffffffff83ebf7a1>] ? security_sb_remount+0x81/0x1c0 fffffe80416afd38
[<...>]  [<ffffffff82892096>] reconfigure_super+0x856/0x14e0 fffffe80416afd70
[<...>]  [<ffffffff815d1327>] ? ns_capable_common+0xe7/0x2a0 fffffe80416afd90
[<...>]  [<ffffffff82997436>] do_remount+0x416/0xa00 fffffe80416afdd0
[<...>]  [<ffffffff829b2ba4>] path_mount+0x5c4/0x900 fffffe80416afe28
[<...>]  [<ffffffff829b25e0>] ? finish_automount+0x13a0/0x13a0 fffffe80416afe60
[<...>]  [<ffffffff82903812>] ? user_path_at_empty+0xb2/0x140 fffffe80416afe88
[<...>]  [<ffffffff829b2ff5>] do_mount+0x115/0x1c0 fffffe80416afeb8
[<...>]  [<ffffffff829b2ee0>] ? path_mount+0x900/0x900 fffffe80416afed8
[<...>]  [<ffffffff8272461c>] ? __kasan_check_write+0x1c/0xa0 fffffe80416afee0
[<...>]  [<ffffffff829b31cf>] __do_sys_mount+0x12f/0x280 fffffe80416aff30
[<...>]  [<ffffffff829b36cd>] __x64_sys_mount+0xcd/0x2e0 fffffe80416aff70
[<...>]  [<ffffffff819f8818>] ? syscall_trace_enter+0x218/0x380 fffffe80416aff88
[<...>]  [<ffffffff8111655e>] x64_sys_call+0x5d5e/0x6720 fffffe80416affa8
[<...>]  [<ffffffff8952756d>] do_syscall_64+0xcd/0x3c0 fffffe80416affb8
[<...>]  [<ffffffff8100119b>] entry_SYSCALL_64_safe_stack+0x4c/0x87 fffffe80416affe8
[<...>]  </TASK>
[<...>]  <PTREGS>
[<...>] RIP: 0033:[<00006dcb382ff66a>] vm_area_struct[mount 2550 2550 file 6dcb38225000-6dcb3837e000 22 55(read|exec|mayread|mayexec)]+0x0/0xb8 [userland map]
[<...>] Code: 48 8b 0d 29 18 0d 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 49 89 ca b8 a5 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d f6 17 0d 00 f7 d8 64 89 01 48
[<...>] RSP: 002b:0000763d68192558 EFLAGS: 00000246 ORIG_RAX: 00000000000000a5
[<...>] RAX: ffffffffffffffda RBX: 00006dcb38433264 RCX: 00006dcb382ff66a
[<...>] RDX: 000017c3e0d11210 RSI: 000017c3e0d1a5a0 RDI: 000017c3e0d1ae70
[<...>] RBP: 000017c3e0d10fb0 R08: 000017c3e0d11260 R09: 00006dcb383d1be0
[<...>] R10: 000000000020002e R11: 0000000000000246 R12: 0000000000000000
[<...>] R13: 000017c3e0d1ae70 R14: 000017c3e0d11210 R15: 000017c3e0d10fb0
[<...>] RBX: vm_area_struct[mount 2550 2550 file 6dcb38433000-6dcb38434000 5b 100033(read|write|mayread|maywrite|account)]+0x0/0xb8 [userland map]
[<...>] RCX: vm_area_struct[mount 2550 2550 file 6dcb38225000-6dcb3837e000 22 55(read|exec|mayread|mayexec)]+0x0/0xb8 [userland map]
[<...>] RDX: vm_area_struct[mount 2550 2550 anon 17c3e0d0f000-17c3e0d31000 17c3e0d0f 100033(read|write|mayread|maywrite|account)]+0x0/0xb8 [userland map]
[<...>] RSI: vm_area_struct[mount 2550 2550 anon 17c3e0d0f000-17c3e0d31000 17c3e0d0f 100033(read|write|mayread|maywrite|account)]+0x0/0xb8 [userland map]
[<...>] RDI: vm_area_struct[mount 2550 2550 anon 17c3e0d0f000-17c3e0d31000 17c3e0d0f 100033(read|write|mayread|maywrite|account)]+0x0/0xb8 [userland map]
[<...>] RBP: vm_area_struct[mount 2550 2550 anon 17c3e0d0f000-17c3e0d31000 17c3e0d0f 100033(read|write|mayread|maywrite|account)]+0x0/0xb8 [userland map]
[<...>] RSP: vm_area_struct[mount 2550 2550 anon 763d68173000-763d68195000 7ffffffdd 100133(read|write|mayread|maywrite|growsdown|account)]+0x0/0xb8 [userland map]
[<...>] R08: vm_area_struct[mount 2550 2550 anon 17c3e0d0f000-17c3e0d31000 17c3e0d0f 100033(read|write|mayread|maywrite|account)]+0x0/0xb8 [userland map]
[<...>] R09: vm_area_struct[mount 2550 2550 file 6dcb383d1000-6dcb383d3000 1cd 100033(read|write|mayread|maywrite|account)]+0x0/0xb8 [userland map]
[<...>] R13: vm_area_struct[mount 2550 2550 anon 17c3e0d0f000-17c3e0d31000 17c3e0d0f 100033(read|write|mayread|maywrite|account)]+0x0/0xb8 [userland map]
[<...>] R14: vm_area_struct[mount 2550 2550 anon 17c3e0d0f000-17c3e0d31000 17c3e0d0f 100033(read|write|mayread|maywrite|account)]+0x0/0xb8 [userland map]
[<...>] R15: vm_area_struct[mount 2550 2550 anon 17c3e0d0f000-17c3e0d31000 17c3e0d0f 100033(read|write|mayread|maywrite|account)]+0x0/0xb8 [userland map]
[<...>]  </PTREGS>
[<...>] Modules linked in:
[<...>] ---[ end trace 0000000000000000 ]---

The list debug message as well as RBX's symbolic value point out that the
object in question was allocated from 'tracefs_inode_cache' and that the
list's '->next' member is at offset 0. Dumping the layout of the relevant
parts of 'struct tracefs_inode' gives the following:

  struct tracefs_inode {
    union {
      struct inode {
        struct list_head {
          struct list_head * next;                    /*     0     8 */
          struct list_head * prev;                    /*     8     8 */
        } i_lru;
        [...]
      } vfs_inode;
      struct callback_head {
        void (*func)(struct callback_head *);         /*     0     8 */
        struct callback_head * next;                  /*     8     8 */
      } rcu;
    };
    [...]
  };

Above shows that 'vfs_inode.i_lru' overlaps with 'rcu' which will
destroy the 'i_lru' list as soon as the 'rcu' member gets used, e.g. in
call_rcu() or later when calling the RCU callback. This will disturb
concurrent list traversals as well as object reuse which assumes these
list heads will keep their integrity.

For reproduction, the following diff manually overlays 'i_lru' with
'rcu' as, otherwise, one would require some good portion of luck for
gambling an unlucky RANDSTRUCT seed:

  --- a/include/linux/fs.h
  +++ b/include/linux/fs.h
  @@ -629,6 +629,7 @@ struct inode {
   	umode_t			i_mode;
   	unsigned short		i_opflags;
   	kuid_t			i_uid;
  +	struct list_head	i_lru;		/* inode LRU list */
   	kgid_t			i_gid;
   	unsigned int		i_flags;

  @@ -690,7 +691,6 @@ struct inode {
   	u16			i_wb_frn_avg_time;
   	u16			i_wb_frn_history;
   #endif
  -	struct list_head	i_lru;		/* inode LRU list */
   	struct list_head	i_sb_list;
   	struct list_head	i_wb_list;	/* backing dev writeback list */
   	union {

The tracefs inode does not need to supply its own RCU delayed destruction
of its inode. The inode code itself offers both a "destroy_inode()"
callback that gets called when the last reference of the inode is
released, and the "free_inode()" which is called after a RCU
synchronization period from the "destroy_inode()".

The tracefs code can unlink the inode from its list in the destroy_inode()
callback, and the simply free it from the free_inode() callback. This
should provide the same protection.

Link: https://lore.kernel.org/all/[email protected]/

Cc: [email protected]
Cc: Masami Hiramatsu <[email protected]>
Cc: Mathieu Desnoyers <[email protected]>
Cc: Ajay Kaher <[email protected]>
Cc: Ilkka =?utf-8?b?TmF1bGFww6TDpA==?= <[email protected]>
Link: https://lore.kernel.org/[email protected]
Fixes: baa23a8 ("tracefs: Reset permissions on remount if permissions are options")
Reported-by: Mathias Krause <[email protected]>
Reported-by: Brad Spengler <[email protected]>
Suggested-by: Al Viro <[email protected]>
Signed-off-by: Steven Rostedt (Google) <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
intersectRaven pushed a commit to intersectRaven/linux that referenced this pull request Aug 14, 2024
commit 0b6743b upstream.

With structure layout randomization enabled for 'struct inode' we need to
avoid overlapping any of the RCU-used / initialized-only-once members,
e.g. i_lru or i_sb_list to not corrupt related list traversals when making
use of the rcu_head.

For an unlucky structure layout of 'struct inode' we may end up with the
following splat when running the ftrace selftests:

[<...>] list_del corruption, ffff888103ee2cb0->next (tracefs_inode_cache+0x0/0x4e0 [slab object]) is NULL (prev is tracefs_inode_cache+0x78/0x4e0 [slab object])
[<...>] ------------[ cut here ]------------
[<...>] kernel BUG at lib/list_debug.c:54!
[<...>] invalid opcode: 0000 [#1] PREEMPT SMP KASAN
[<...>] CPU: 3 PID: 2550 Comm: mount Tainted: G                 N  6.8.12-grsec+ torvalds#122 ed2f536ca62f28b087b90e3cc906a8d25b3ddc65
[<...>] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-2 04/01/2014
[<...>] RIP: 0010:[<ffffffff84656018>] __list_del_entry_valid_or_report+0x138/0x3e0
[<...>] Code: 48 b8 99 fb 65 f2 ff ff ff ff e9 03 5c d9 fc cc 48 b8 99 fb 65 f2 ff ff ff ff e9 33 5a d9 fc cc 48 b8 99 fb 65 f2 ff ff ff ff <0f> 0b 4c 89 e9 48 89 ea 48 89 ee 48 c7 c7 60 8f dd 89 31 c0 e8 2f
[<...>] RSP: 0018:fffffe80416afaf0 EFLAGS: 00010283
[<...>] RAX: 0000000000000098 RBX: ffff888103ee2cb0 RCX: 0000000000000000
[<...>] RDX: ffffffff84655fe8 RSI: ffffffff89dd8b60 RDI: 0000000000000001
[<...>] RBP: ffff888103ee2cb0 R08: 0000000000000001 R09: fffffbd0082d5f25
[<...>] R10: fffffe80416af92f R11: 0000000000000001 R12: fdf99c16731d9b6d
[<...>] R13: 0000000000000000 R14: ffff88819ad4b8b8 R15: 0000000000000000
[<...>] RBX: tracefs_inode_cache+0x0/0x4e0 [slab object]
[<...>] RDX: __list_del_entry_valid_or_report+0x108/0x3e0
[<...>] RSI: __func__.47+0x4340/0x4400
[<...>] RBP: tracefs_inode_cache+0x0/0x4e0 [slab object]
[<...>] RSP: process kstack fffffe80416afaf0+0x7af0/0x8000 [mount 2550 2550]
[<...>] R09: kasan shadow of process kstack fffffe80416af928+0x7928/0x8000 [mount 2550 2550]
[<...>] R10: process kstack fffffe80416af92f+0x792f/0x8000 [mount 2550 2550]
[<...>] R14: tracefs_inode_cache+0x78/0x4e0 [slab object]
[<...>] FS:  00006dcb380c1840(0000) GS:ffff8881e0600000(0000) knlGS:0000000000000000
[<...>] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[<...>] CR2: 000076ab72b30e84 CR3: 000000000b088004 CR4: 0000000000360ef0 shadow CR4: 0000000000360ef0
[<...>] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[<...>] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[<...>] ASID: 0003
[<...>] Stack:
[<...>]  ffffffff818a2315 00000000f5c856ee ffffffff896f1840 ffff888103ee2cb0
[<...>]  ffff88812b6b9750 0000000079d714b6 fffffbfff1e9280b ffffffff8f49405f
[<...>]  0000000000000001 0000000000000000 ffff888104457280 ffffffff8248b392
[<...>] Call Trace:
[<...>]  <TASK>
[<...>]  [<ffffffff818a2315>] ? lock_release+0x175/0x380 fffffe80416afaf0
[<...>]  [<ffffffff8248b392>] list_lru_del+0x152/0x740 fffffe80416afb48
[<...>]  [<ffffffff8248ba93>] list_lru_del_obj+0x113/0x280 fffffe80416afb88
[<...>]  [<ffffffff8940fd19>] ? _atomic_dec_and_lock+0x119/0x200 fffffe80416afb90
[<...>]  [<ffffffff8295b244>] iput_final+0x1c4/0x9a0 fffffe80416afbb8
[<...>]  [<ffffffff8293a52b>] dentry_unlink_inode+0x44b/0xaa0 fffffe80416afbf8
[<...>]  [<ffffffff8293fefc>] __dentry_kill+0x23c/0xf00 fffffe80416afc40
[<...>]  [<ffffffff8953a85f>] ? __this_cpu_preempt_check+0x1f/0xa0 fffffe80416afc48
[<...>]  [<ffffffff82949ce5>] ? shrink_dentry_list+0x1c5/0x760 fffffe80416afc70
[<...>]  [<ffffffff82949b71>] ? shrink_dentry_list+0x51/0x760 fffffe80416afc78
[<...>]  [<ffffffff82949da8>] shrink_dentry_list+0x288/0x760 fffffe80416afc80
[<...>]  [<ffffffff8294ae75>] shrink_dcache_sb+0x155/0x420 fffffe80416afcc8
[<...>]  [<ffffffff8953a7c3>] ? debug_smp_processor_id+0x23/0xa0 fffffe80416afce0
[<...>]  [<ffffffff8294ad20>] ? do_one_tree+0x140/0x140 fffffe80416afcf8
[<...>]  [<ffffffff82997349>] ? do_remount+0x329/0xa00 fffffe80416afd18
[<...>]  [<ffffffff83ebf7a1>] ? security_sb_remount+0x81/0x1c0 fffffe80416afd38
[<...>]  [<ffffffff82892096>] reconfigure_super+0x856/0x14e0 fffffe80416afd70
[<...>]  [<ffffffff815d1327>] ? ns_capable_common+0xe7/0x2a0 fffffe80416afd90
[<...>]  [<ffffffff82997436>] do_remount+0x416/0xa00 fffffe80416afdd0
[<...>]  [<ffffffff829b2ba4>] path_mount+0x5c4/0x900 fffffe80416afe28
[<...>]  [<ffffffff829b25e0>] ? finish_automount+0x13a0/0x13a0 fffffe80416afe60
[<...>]  [<ffffffff82903812>] ? user_path_at_empty+0xb2/0x140 fffffe80416afe88
[<...>]  [<ffffffff829b2ff5>] do_mount+0x115/0x1c0 fffffe80416afeb8
[<...>]  [<ffffffff829b2ee0>] ? path_mount+0x900/0x900 fffffe80416afed8
[<...>]  [<ffffffff8272461c>] ? __kasan_check_write+0x1c/0xa0 fffffe80416afee0
[<...>]  [<ffffffff829b31cf>] __do_sys_mount+0x12f/0x280 fffffe80416aff30
[<...>]  [<ffffffff829b36cd>] __x64_sys_mount+0xcd/0x2e0 fffffe80416aff70
[<...>]  [<ffffffff819f8818>] ? syscall_trace_enter+0x218/0x380 fffffe80416aff88
[<...>]  [<ffffffff8111655e>] x64_sys_call+0x5d5e/0x6720 fffffe80416affa8
[<...>]  [<ffffffff8952756d>] do_syscall_64+0xcd/0x3c0 fffffe80416affb8
[<...>]  [<ffffffff8100119b>] entry_SYSCALL_64_safe_stack+0x4c/0x87 fffffe80416affe8
[<...>]  </TASK>
[<...>]  <PTREGS>
[<...>] RIP: 0033:[<00006dcb382ff66a>] vm_area_struct[mount 2550 2550 file 6dcb38225000-6dcb3837e000 22 55(read|exec|mayread|mayexec)]+0x0/0xb8 [userland map]
[<...>] Code: 48 8b 0d 29 18 0d 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 49 89 ca b8 a5 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d f6 17 0d 00 f7 d8 64 89 01 48
[<...>] RSP: 002b:0000763d68192558 EFLAGS: 00000246 ORIG_RAX: 00000000000000a5
[<...>] RAX: ffffffffffffffda RBX: 00006dcb38433264 RCX: 00006dcb382ff66a
[<...>] RDX: 000017c3e0d11210 RSI: 000017c3e0d1a5a0 RDI: 000017c3e0d1ae70
[<...>] RBP: 000017c3e0d10fb0 R08: 000017c3e0d11260 R09: 00006dcb383d1be0
[<...>] R10: 000000000020002e R11: 0000000000000246 R12: 0000000000000000
[<...>] R13: 000017c3e0d1ae70 R14: 000017c3e0d11210 R15: 000017c3e0d10fb0
[<...>] RBX: vm_area_struct[mount 2550 2550 file 6dcb38433000-6dcb38434000 5b 100033(read|write|mayread|maywrite|account)]+0x0/0xb8 [userland map]
[<...>] RCX: vm_area_struct[mount 2550 2550 file 6dcb38225000-6dcb3837e000 22 55(read|exec|mayread|mayexec)]+0x0/0xb8 [userland map]
[<...>] RDX: vm_area_struct[mount 2550 2550 anon 17c3e0d0f000-17c3e0d31000 17c3e0d0f 100033(read|write|mayread|maywrite|account)]+0x0/0xb8 [userland map]
[<...>] RSI: vm_area_struct[mount 2550 2550 anon 17c3e0d0f000-17c3e0d31000 17c3e0d0f 100033(read|write|mayread|maywrite|account)]+0x0/0xb8 [userland map]
[<...>] RDI: vm_area_struct[mount 2550 2550 anon 17c3e0d0f000-17c3e0d31000 17c3e0d0f 100033(read|write|mayread|maywrite|account)]+0x0/0xb8 [userland map]
[<...>] RBP: vm_area_struct[mount 2550 2550 anon 17c3e0d0f000-17c3e0d31000 17c3e0d0f 100033(read|write|mayread|maywrite|account)]+0x0/0xb8 [userland map]
[<...>] RSP: vm_area_struct[mount 2550 2550 anon 763d68173000-763d68195000 7ffffffdd 100133(read|write|mayread|maywrite|growsdown|account)]+0x0/0xb8 [userland map]
[<...>] R08: vm_area_struct[mount 2550 2550 anon 17c3e0d0f000-17c3e0d31000 17c3e0d0f 100033(read|write|mayread|maywrite|account)]+0x0/0xb8 [userland map]
[<...>] R09: vm_area_struct[mount 2550 2550 file 6dcb383d1000-6dcb383d3000 1cd 100033(read|write|mayread|maywrite|account)]+0x0/0xb8 [userland map]
[<...>] R13: vm_area_struct[mount 2550 2550 anon 17c3e0d0f000-17c3e0d31000 17c3e0d0f 100033(read|write|mayread|maywrite|account)]+0x0/0xb8 [userland map]
[<...>] R14: vm_area_struct[mount 2550 2550 anon 17c3e0d0f000-17c3e0d31000 17c3e0d0f 100033(read|write|mayread|maywrite|account)]+0x0/0xb8 [userland map]
[<...>] R15: vm_area_struct[mount 2550 2550 anon 17c3e0d0f000-17c3e0d31000 17c3e0d0f 100033(read|write|mayread|maywrite|account)]+0x0/0xb8 [userland map]
[<...>]  </PTREGS>
[<...>] Modules linked in:
[<...>] ---[ end trace 0000000000000000 ]---

The list debug message as well as RBX's symbolic value point out that the
object in question was allocated from 'tracefs_inode_cache' and that the
list's '->next' member is at offset 0. Dumping the layout of the relevant
parts of 'struct tracefs_inode' gives the following:

  struct tracefs_inode {
    union {
      struct inode {
        struct list_head {
          struct list_head * next;                    /*     0     8 */
          struct list_head * prev;                    /*     8     8 */
        } i_lru;
        [...]
      } vfs_inode;
      struct callback_head {
        void (*func)(struct callback_head *);         /*     0     8 */
        struct callback_head * next;                  /*     8     8 */
      } rcu;
    };
    [...]
  };

Above shows that 'vfs_inode.i_lru' overlaps with 'rcu' which will
destroy the 'i_lru' list as soon as the 'rcu' member gets used, e.g. in
call_rcu() or later when calling the RCU callback. This will disturb
concurrent list traversals as well as object reuse which assumes these
list heads will keep their integrity.

For reproduction, the following diff manually overlays 'i_lru' with
'rcu' as, otherwise, one would require some good portion of luck for
gambling an unlucky RANDSTRUCT seed:

  --- a/include/linux/fs.h
  +++ b/include/linux/fs.h
  @@ -629,6 +629,7 @@ struct inode {
   	umode_t			i_mode;
   	unsigned short		i_opflags;
   	kuid_t			i_uid;
  +	struct list_head	i_lru;		/* inode LRU list */
   	kgid_t			i_gid;
   	unsigned int		i_flags;

  @@ -690,7 +691,6 @@ struct inode {
   	u16			i_wb_frn_avg_time;
   	u16			i_wb_frn_history;
   #endif
  -	struct list_head	i_lru;		/* inode LRU list */
   	struct list_head	i_sb_list;
   	struct list_head	i_wb_list;	/* backing dev writeback list */
   	union {

The tracefs inode does not need to supply its own RCU delayed destruction
of its inode. The inode code itself offers both a "destroy_inode()"
callback that gets called when the last reference of the inode is
released, and the "free_inode()" which is called after a RCU
synchronization period from the "destroy_inode()".

The tracefs code can unlink the inode from its list in the destroy_inode()
callback, and the simply free it from the free_inode() callback. This
should provide the same protection.

Link: https://lore.kernel.org/all/[email protected]/

Cc: [email protected]
Cc: Masami Hiramatsu <[email protected]>
Cc: Mathieu Desnoyers <[email protected]>
Cc: Ajay Kaher <[email protected]>
Cc: Ilkka =?utf-8?b?TmF1bGFww6TDpA==?= <[email protected]>
Link: https://lore.kernel.org/[email protected]
Fixes: baa23a8 ("tracefs: Reset permissions on remount if permissions are options")
Reported-by: Mathias Krause <[email protected]>
Reported-by: Brad Spengler <[email protected]>
Suggested-by: Al Viro <[email protected]>
Signed-off-by: Steven Rostedt (Google) <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
mj22226 pushed a commit to mj22226/linux that referenced this pull request Aug 15, 2024
commit 0b6743b upstream.

With structure layout randomization enabled for 'struct inode' we need to
avoid overlapping any of the RCU-used / initialized-only-once members,
e.g. i_lru or i_sb_list to not corrupt related list traversals when making
use of the rcu_head.

For an unlucky structure layout of 'struct inode' we may end up with the
following splat when running the ftrace selftests:

[<...>] list_del corruption, ffff888103ee2cb0->next (tracefs_inode_cache+0x0/0x4e0 [slab object]) is NULL (prev is tracefs_inode_cache+0x78/0x4e0 [slab object])
[<...>] ------------[ cut here ]------------
[<...>] kernel BUG at lib/list_debug.c:54!
[<...>] invalid opcode: 0000 [#1] PREEMPT SMP KASAN
[<...>] CPU: 3 PID: 2550 Comm: mount Tainted: G                 N  6.8.12-grsec+ torvalds#122 ed2f536ca62f28b087b90e3cc906a8d25b3ddc65
[<...>] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-2 04/01/2014
[<...>] RIP: 0010:[<ffffffff84656018>] __list_del_entry_valid_or_report+0x138/0x3e0
[<...>] Code: 48 b8 99 fb 65 f2 ff ff ff ff e9 03 5c d9 fc cc 48 b8 99 fb 65 f2 ff ff ff ff e9 33 5a d9 fc cc 48 b8 99 fb 65 f2 ff ff ff ff <0f> 0b 4c 89 e9 48 89 ea 48 89 ee 48 c7 c7 60 8f dd 89 31 c0 e8 2f
[<...>] RSP: 0018:fffffe80416afaf0 EFLAGS: 00010283
[<...>] RAX: 0000000000000098 RBX: ffff888103ee2cb0 RCX: 0000000000000000
[<...>] RDX: ffffffff84655fe8 RSI: ffffffff89dd8b60 RDI: 0000000000000001
[<...>] RBP: ffff888103ee2cb0 R08: 0000000000000001 R09: fffffbd0082d5f25
[<...>] R10: fffffe80416af92f R11: 0000000000000001 R12: fdf99c16731d9b6d
[<...>] R13: 0000000000000000 R14: ffff88819ad4b8b8 R15: 0000000000000000
[<...>] RBX: tracefs_inode_cache+0x0/0x4e0 [slab object]
[<...>] RDX: __list_del_entry_valid_or_report+0x108/0x3e0
[<...>] RSI: __func__.47+0x4340/0x4400
[<...>] RBP: tracefs_inode_cache+0x0/0x4e0 [slab object]
[<...>] RSP: process kstack fffffe80416afaf0+0x7af0/0x8000 [mount 2550 2550]
[<...>] R09: kasan shadow of process kstack fffffe80416af928+0x7928/0x8000 [mount 2550 2550]
[<...>] R10: process kstack fffffe80416af92f+0x792f/0x8000 [mount 2550 2550]
[<...>] R14: tracefs_inode_cache+0x78/0x4e0 [slab object]
[<...>] FS:  00006dcb380c1840(0000) GS:ffff8881e0600000(0000) knlGS:0000000000000000
[<...>] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[<...>] CR2: 000076ab72b30e84 CR3: 000000000b088004 CR4: 0000000000360ef0 shadow CR4: 0000000000360ef0
[<...>] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[<...>] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[<...>] ASID: 0003
[<...>] Stack:
[<...>]  ffffffff818a2315 00000000f5c856ee ffffffff896f1840 ffff888103ee2cb0
[<...>]  ffff88812b6b9750 0000000079d714b6 fffffbfff1e9280b ffffffff8f49405f
[<...>]  0000000000000001 0000000000000000 ffff888104457280 ffffffff8248b392
[<...>] Call Trace:
[<...>]  <TASK>
[<...>]  [<ffffffff818a2315>] ? lock_release+0x175/0x380 fffffe80416afaf0
[<...>]  [<ffffffff8248b392>] list_lru_del+0x152/0x740 fffffe80416afb48
[<...>]  [<ffffffff8248ba93>] list_lru_del_obj+0x113/0x280 fffffe80416afb88
[<...>]  [<ffffffff8940fd19>] ? _atomic_dec_and_lock+0x119/0x200 fffffe80416afb90
[<...>]  [<ffffffff8295b244>] iput_final+0x1c4/0x9a0 fffffe80416afbb8
[<...>]  [<ffffffff8293a52b>] dentry_unlink_inode+0x44b/0xaa0 fffffe80416afbf8
[<...>]  [<ffffffff8293fefc>] __dentry_kill+0x23c/0xf00 fffffe80416afc40
[<...>]  [<ffffffff8953a85f>] ? __this_cpu_preempt_check+0x1f/0xa0 fffffe80416afc48
[<...>]  [<ffffffff82949ce5>] ? shrink_dentry_list+0x1c5/0x760 fffffe80416afc70
[<...>]  [<ffffffff82949b71>] ? shrink_dentry_list+0x51/0x760 fffffe80416afc78
[<...>]  [<ffffffff82949da8>] shrink_dentry_list+0x288/0x760 fffffe80416afc80
[<...>]  [<ffffffff8294ae75>] shrink_dcache_sb+0x155/0x420 fffffe80416afcc8
[<...>]  [<ffffffff8953a7c3>] ? debug_smp_processor_id+0x23/0xa0 fffffe80416afce0
[<...>]  [<ffffffff8294ad20>] ? do_one_tree+0x140/0x140 fffffe80416afcf8
[<...>]  [<ffffffff82997349>] ? do_remount+0x329/0xa00 fffffe80416afd18
[<...>]  [<ffffffff83ebf7a1>] ? security_sb_remount+0x81/0x1c0 fffffe80416afd38
[<...>]  [<ffffffff82892096>] reconfigure_super+0x856/0x14e0 fffffe80416afd70
[<...>]  [<ffffffff815d1327>] ? ns_capable_common+0xe7/0x2a0 fffffe80416afd90
[<...>]  [<ffffffff82997436>] do_remount+0x416/0xa00 fffffe80416afdd0
[<...>]  [<ffffffff829b2ba4>] path_mount+0x5c4/0x900 fffffe80416afe28
[<...>]  [<ffffffff829b25e0>] ? finish_automount+0x13a0/0x13a0 fffffe80416afe60
[<...>]  [<ffffffff82903812>] ? user_path_at_empty+0xb2/0x140 fffffe80416afe88
[<...>]  [<ffffffff829b2ff5>] do_mount+0x115/0x1c0 fffffe80416afeb8
[<...>]  [<ffffffff829b2ee0>] ? path_mount+0x900/0x900 fffffe80416afed8
[<...>]  [<ffffffff8272461c>] ? __kasan_check_write+0x1c/0xa0 fffffe80416afee0
[<...>]  [<ffffffff829b31cf>] __do_sys_mount+0x12f/0x280 fffffe80416aff30
[<...>]  [<ffffffff829b36cd>] __x64_sys_mount+0xcd/0x2e0 fffffe80416aff70
[<...>]  [<ffffffff819f8818>] ? syscall_trace_enter+0x218/0x380 fffffe80416aff88
[<...>]  [<ffffffff8111655e>] x64_sys_call+0x5d5e/0x6720 fffffe80416affa8
[<...>]  [<ffffffff8952756d>] do_syscall_64+0xcd/0x3c0 fffffe80416affb8
[<...>]  [<ffffffff8100119b>] entry_SYSCALL_64_safe_stack+0x4c/0x87 fffffe80416affe8
[<...>]  </TASK>
[<...>]  <PTREGS>
[<...>] RIP: 0033:[<00006dcb382ff66a>] vm_area_struct[mount 2550 2550 file 6dcb38225000-6dcb3837e000 22 55(read|exec|mayread|mayexec)]+0x0/0xb8 [userland map]
[<...>] Code: 48 8b 0d 29 18 0d 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 49 89 ca b8 a5 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d f6 17 0d 00 f7 d8 64 89 01 48
[<...>] RSP: 002b:0000763d68192558 EFLAGS: 00000246 ORIG_RAX: 00000000000000a5
[<...>] RAX: ffffffffffffffda RBX: 00006dcb38433264 RCX: 00006dcb382ff66a
[<...>] RDX: 000017c3e0d11210 RSI: 000017c3e0d1a5a0 RDI: 000017c3e0d1ae70
[<...>] RBP: 000017c3e0d10fb0 R08: 000017c3e0d11260 R09: 00006dcb383d1be0
[<...>] R10: 000000000020002e R11: 0000000000000246 R12: 0000000000000000
[<...>] R13: 000017c3e0d1ae70 R14: 000017c3e0d11210 R15: 000017c3e0d10fb0
[<...>] RBX: vm_area_struct[mount 2550 2550 file 6dcb38433000-6dcb38434000 5b 100033(read|write|mayread|maywrite|account)]+0x0/0xb8 [userland map]
[<...>] RCX: vm_area_struct[mount 2550 2550 file 6dcb38225000-6dcb3837e000 22 55(read|exec|mayread|mayexec)]+0x0/0xb8 [userland map]
[<...>] RDX: vm_area_struct[mount 2550 2550 anon 17c3e0d0f000-17c3e0d31000 17c3e0d0f 100033(read|write|mayread|maywrite|account)]+0x0/0xb8 [userland map]
[<...>] RSI: vm_area_struct[mount 2550 2550 anon 17c3e0d0f000-17c3e0d31000 17c3e0d0f 100033(read|write|mayread|maywrite|account)]+0x0/0xb8 [userland map]
[<...>] RDI: vm_area_struct[mount 2550 2550 anon 17c3e0d0f000-17c3e0d31000 17c3e0d0f 100033(read|write|mayread|maywrite|account)]+0x0/0xb8 [userland map]
[<...>] RBP: vm_area_struct[mount 2550 2550 anon 17c3e0d0f000-17c3e0d31000 17c3e0d0f 100033(read|write|mayread|maywrite|account)]+0x0/0xb8 [userland map]
[<...>] RSP: vm_area_struct[mount 2550 2550 anon 763d68173000-763d68195000 7ffffffdd 100133(read|write|mayread|maywrite|growsdown|account)]+0x0/0xb8 [userland map]
[<...>] R08: vm_area_struct[mount 2550 2550 anon 17c3e0d0f000-17c3e0d31000 17c3e0d0f 100033(read|write|mayread|maywrite|account)]+0x0/0xb8 [userland map]
[<...>] R09: vm_area_struct[mount 2550 2550 file 6dcb383d1000-6dcb383d3000 1cd 100033(read|write|mayread|maywrite|account)]+0x0/0xb8 [userland map]
[<...>] R13: vm_area_struct[mount 2550 2550 anon 17c3e0d0f000-17c3e0d31000 17c3e0d0f 100033(read|write|mayread|maywrite|account)]+0x0/0xb8 [userland map]
[<...>] R14: vm_area_struct[mount 2550 2550 anon 17c3e0d0f000-17c3e0d31000 17c3e0d0f 100033(read|write|mayread|maywrite|account)]+0x0/0xb8 [userland map]
[<...>] R15: vm_area_struct[mount 2550 2550 anon 17c3e0d0f000-17c3e0d31000 17c3e0d0f 100033(read|write|mayread|maywrite|account)]+0x0/0xb8 [userland map]
[<...>]  </PTREGS>
[<...>] Modules linked in:
[<...>] ---[ end trace 0000000000000000 ]---

The list debug message as well as RBX's symbolic value point out that the
object in question was allocated from 'tracefs_inode_cache' and that the
list's '->next' member is at offset 0. Dumping the layout of the relevant
parts of 'struct tracefs_inode' gives the following:

  struct tracefs_inode {
    union {
      struct inode {
        struct list_head {
          struct list_head * next;                    /*     0     8 */
          struct list_head * prev;                    /*     8     8 */
        } i_lru;
        [...]
      } vfs_inode;
      struct callback_head {
        void (*func)(struct callback_head *);         /*     0     8 */
        struct callback_head * next;                  /*     8     8 */
      } rcu;
    };
    [...]
  };

Above shows that 'vfs_inode.i_lru' overlaps with 'rcu' which will
destroy the 'i_lru' list as soon as the 'rcu' member gets used, e.g. in
call_rcu() or later when calling the RCU callback. This will disturb
concurrent list traversals as well as object reuse which assumes these
list heads will keep their integrity.

For reproduction, the following diff manually overlays 'i_lru' with
'rcu' as, otherwise, one would require some good portion of luck for
gambling an unlucky RANDSTRUCT seed:

  --- a/include/linux/fs.h
  +++ b/include/linux/fs.h
  @@ -629,6 +629,7 @@ struct inode {
   	umode_t			i_mode;
   	unsigned short		i_opflags;
   	kuid_t			i_uid;
  +	struct list_head	i_lru;		/* inode LRU list */
   	kgid_t			i_gid;
   	unsigned int		i_flags;

  @@ -690,7 +691,6 @@ struct inode {
   	u16			i_wb_frn_avg_time;
   	u16			i_wb_frn_history;
   #endif
  -	struct list_head	i_lru;		/* inode LRU list */
   	struct list_head	i_sb_list;
   	struct list_head	i_wb_list;	/* backing dev writeback list */
   	union {

The tracefs inode does not need to supply its own RCU delayed destruction
of its inode. The inode code itself offers both a "destroy_inode()"
callback that gets called when the last reference of the inode is
released, and the "free_inode()" which is called after a RCU
synchronization period from the "destroy_inode()".

The tracefs code can unlink the inode from its list in the destroy_inode()
callback, and the simply free it from the free_inode() callback. This
should provide the same protection.

Link: https://lore.kernel.org/all/[email protected]/

Cc: [email protected]
Cc: Masami Hiramatsu <[email protected]>
Cc: Mathieu Desnoyers <[email protected]>
Cc: Ajay Kaher <[email protected]>
Cc: Ilkka =?utf-8?b?TmF1bGFww6TDpA==?= <[email protected]>
Link: https://lore.kernel.org/[email protected]
Fixes: baa23a8 ("tracefs: Reset permissions on remount if permissions are options")
Reported-by: Mathias Krause <[email protected]>
Reported-by: Brad Spengler <[email protected]>
Suggested-by: Al Viro <[email protected]>
Signed-off-by: Steven Rostedt (Google) <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant