Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test Github pull request - ignore #6

Closed
wants to merge 1 commit into from
Closed

test Github pull request - ignore #6

wants to merge 1 commit into from

Conversation

LinuxMinion
Copy link
Member

Test pull request. Ignore.

@LinuxMinion
Copy link
Member Author

Test additional comment 1

@LinuxMinion
Copy link
Member Author

Ignore

@LinuxMinion LinuxMinion closed this Dec 6, 2018
gregmarsden pushed a commit that referenced this pull request Dec 14, 2018
[ Upstream commit 46b3722 ]

We occasionaly hit following assert failure in 'perf top', when processing the
/proc info in multiple threads.

  perf: ...include/linux/refcount.h:109: refcount_inc:
        Assertion `!(!refcount_inc_not_zero(r))' failed.

The gdb backtrace looks like this:

  [Switching to Thread 0x7ffff11ba700 (LWP 13749)]
  0x00007ffff50839fb in raise () from /lib64/libc.so.6
  (gdb)
  #0  0x00007ffff50839fb in raise () from /lib64/libc.so.6
  #1  0x00007ffff5085800 in abort () from /lib64/libc.so.6
  #2  0x00007ffff507c0da in __assert_fail_base () from /lib64/libc.so.6
  #3  0x00007ffff507c152 in __assert_fail () from /lib64/libc.so.6
  #4  0x0000000000535373 in refcount_inc (r=0x7fffdc009be0)
      at ...include/linux/refcount.h:109
  #5  0x00000000005354f1 in comm_str__get (cs=0x7fffdc009bc0)
      at util/comm.c:24
  #6  0x00000000005356bd in __comm_str__findnew (str=0x7fffd000b260 ":2",
      root=0xbed5c0 <comm_str_root>) at util/comm.c:72
  #7  0x000000000053579e in comm_str__findnew (str=0x7fffd000b260 ":2",
      root=0xbed5c0 <comm_str_root>) at util/comm.c:95
  #8  0x000000000053582e in comm__new (str=0x7fffd000b260 ":2",
      timestamp=0, exec=false) at util/comm.c:111
  #9  0x00000000005363bc in thread__new (pid=2, tid=2) at util/thread.c:57
  #10 0x0000000000523da0 in ____machine__findnew_thread (machine=0xbfde38,
      threads=0xbfdf28, pid=2, tid=2, create=true) at util/machine.c:457
  #11 0x0000000000523eb4 in __machine__findnew_thread (machine=0xbfde38,
  ...

The failing assertion is this one:

  REFCOUNT_WARN(!refcount_inc_not_zero(r), ...

The problem is that we keep global comm_str_root list, which
is accessed by multiple threads during the 'perf top' startup
and following 2 paths can race:

  thread 1:
    ...
    thread__new
      comm__new
        comm_str__findnew
          down_write(&comm_str_lock);
          __comm_str__findnew
            comm_str__get

  thread 2:
    ...
    comm__override or comm__free
      comm_str__put
        refcount_dec_and_test
          down_write(&comm_str_lock);
          rb_erase(&cs->rb_node, &comm_str_root);

Because thread 2 first decrements the refcnt and only after then it removes the
struct comm_str from the list, the thread 1 can find this object on the list
with refcnt equls to 0 and hit the assert.

This patch fixes the thread 1 __comm_str__findnew path, by ignoring objects
that already dropped the refcnt to 0. For the rest of the objects we take the
refcnt before comparing its name and release it afterwards with comm_str__put,
which can also release the object completely.

Signed-off-by: Jiri Olsa <[email protected]>
Acked-by: Namhyung Kim <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: David Ahern <[email protected]>
Cc: Kan Liang <[email protected]>
Cc: Lukasz Odzioba <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Wang Nan <[email protected]>
Cc: [email protected]
Link: http://lkml.kernel.org/r/20180720101740.GA27176@krava
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
gregmarsden pushed a commit that referenced this pull request Jan 18, 2019
…error

The sequence that leads to this state is as follows.

1) First we see CQ error logged.

Sep 29 22:32:33 dm54cel14 kernel: [471472.784371] mlx4_core
0000:46:00.0: CQ access violation on CQN 000419 syndrome=0x2
vendor_error_syndrome=0x0

2) That is followed by the drop of the associated RDS connection.

Sep 29 22:32:33 dm54cel14 kernel: [471472.784403] RDS/IB: connection
<192.168.54.43,192.168.54.1,0> dropped due to 'qp event'

3) We don't get the WR_FLUSH_ERRs for the posted receive buffers after that.

4) RDS is stuck in rds_ib_conn_shutdown while shutting down that connection.

crash64> bt 62577
PID: 62577  TASK: ffff88143f045400  CPU: 4   COMMAND: "kworker/u224:1"
 #0 [ffff8813663bbb58] __schedule at ffffffff816ab68b
 #1 [ffff8813663bbbb0] schedule at ffffffff816abca7
 #2 [ffff8813663bbbd0] schedule_timeout at ffffffff816aee71
 #3 [ffff8813663bbc80] rds_ib_conn_shutdown at ffffffffa041f7d1 [rds_rdma]
 #4 [ffff8813663bbd10] rds_conn_shutdown at ffffffffa03dc6e2 [rds]
 #5 [ffff8813663bbdb0] rds_shutdown_worker at ffffffffa03e2699 [rds]
 #6 [ffff8813663bbe00] process_one_work at ffffffff8109cda1
 #7 [ffff8813663bbe50] worker_thread at ffffffff8109d92b
 #8 [ffff8813663bbec0] kthread at ffffffff810a304b
 #9 [ffff8813663bbf50] ret_from_fork at ffffffff816b0752
crash64>

It was stuck here in rds_ib_conn_shutdown for ever:

                /* quiesce tx and rx completion before tearing down */
                while (!wait_event_timeout(rds_ib_ring_empty_wait,
                                rds_ib_ring_empty(&ic->i_recv_ring) &&
                                (atomic_read(&ic->i_signaled_sends) == 0),
                                msecs_to_jiffies(5000))) {

                        /* Try to reap pending RX completions every 5 secs */
                        if (!rds_ib_ring_empty(&ic->i_recv_ring)) {
                                spin_lock_bh(&ic->i_rx_lock);
                                rds_ib_rx(ic);
                                spin_unlock_bh(&ic->i_rx_lock);
                        }
                }

The recv ring was not empty.
w_alloc_ptr = 560
w_free_ptr  = 256

This is what Mellanox had to say:
When CQ moves to error (e.g. due to CQ Overrun, CQ Access violation) FW will
generate Async event to notify this error, also the QPs that tries to access
this CQ will be put to error state but will not be flushed since we must not
post CQEs to a broken CQ. The QP that tries to access will also issue an
Async catas event.

In summary we cannot wait for any more WR_FLUSH_ERRs in that state.

Orabug: 29180452

Reviewed-by: Rama Nichanamatlu <[email protected]>
Signed-off-by: Venkat Venkatsubra <[email protected]>
gregmarsden pushed a commit that referenced this pull request Jan 18, 2019
The sequence that leads to this state is as follows.

1) First we see CQ error logged.

Sep 29 22:32:33 dm54cel14 kernel: [471472.784371] mlx4_core
0000:46:00.0: CQ access violation on CQN 000419 syndrome=0x2
vendor_error_syndrome=0x0

2) That is followed by the drop of the associated RDS connection.

Sep 29 22:32:33 dm54cel14 kernel: [471472.784403] RDS/IB: connection
<192.168.54.43,192.168.54.1,0> dropped due to 'qp event'

3) We don't get the WR_FLUSH_ERRs for the posted receive buffers after that.

4) RDS is stuck in rds_ib_conn_shutdown while shutting down that connection.

crash64> bt 62577
PID: 62577  TASK: ffff88143f045400  CPU: 4   COMMAND: "kworker/u224:1"
 #0 [ffff8813663bbb58] __schedule at ffffffff816ab68b
 #1 [ffff8813663bbbb0] schedule at ffffffff816abca7
 #2 [ffff8813663bbbd0] schedule_timeout at ffffffff816aee71
 #3 [ffff8813663bbc80] rds_ib_conn_shutdown at ffffffffa041f7d1 [rds_rdma]
 #4 [ffff8813663bbd10] rds_conn_shutdown at ffffffffa03dc6e2 [rds]
 #5 [ffff8813663bbdb0] rds_shutdown_worker at ffffffffa03e2699 [rds]
 #6 [ffff8813663bbe00] process_one_work at ffffffff8109cda1
 #7 [ffff8813663bbe50] worker_thread at ffffffff8109d92b
 #8 [ffff8813663bbec0] kthread at ffffffff810a304b
 #9 [ffff8813663bbf50] ret_from_fork at ffffffff816b0752
crash64>

It was stuck here in rds_ib_conn_shutdown for ever:

                /* quiesce tx and rx completion before tearing down */
                while (!wait_event_timeout(rds_ib_ring_empty_wait,
                                rds_ib_ring_empty(&ic->i_recv_ring) &&
                                (atomic_read(&ic->i_signaled_sends) == 0),
                                msecs_to_jiffies(5000))) {

                        /* Try to reap pending RX completions every 5 secs */
                        if (!rds_ib_ring_empty(&ic->i_recv_ring)) {
                                spin_lock_bh(&ic->i_rx_lock);
                                rds_ib_rx(ic);
                                spin_unlock_bh(&ic->i_rx_lock);
                        }
                }

The recv ring was not empty.
w_alloc_ptr = 560
w_free_ptr  = 256

This is what Mellanox had to say:
When CQ moves to error (e.g. due to CQ Overrun, CQ Access violation) FW will
generate Async event to notify this error, also the QPs that tries to access
this CQ will be put to error state but will not be flushed since we must not
post CQEs to a broken CQ. The QP that tries to access will also issue an
Async catas event.

In summary we cannot wait for any more WR_FLUSH_ERRs in that state.

Orabug: 28733324

Reviewed-by: Rama Nichanamatlu <[email protected]>
Signed-off-by: Venkat Venkatsubra <[email protected]>
Signed-off-by: Brian Maly <[email protected]>
LinuxMinion pushed a commit that referenced this pull request Feb 27, 2019
…error

The sequence that leads to this state is as follows.

1) First we see CQ error logged.

Sep 29 22:32:33 dm54cel14 kernel: [471472.784371] mlx4_core
0000:46:00.0: CQ access violation on CQN 000419 syndrome=0x2
vendor_error_syndrome=0x0

2) That is followed by the drop of the associated RDS connection.

Sep 29 22:32:33 dm54cel14 kernel: [471472.784403] RDS/IB: connection
<192.168.54.43,192.168.54.1,0> dropped due to 'qp event'

3) We don't get the WR_FLUSH_ERRs for the posted receive buffers after that.

4) RDS is stuck in rds_ib_conn_shutdown while shutting down that connection.

crash64> bt 62577
PID: 62577  TASK: ffff88143f045400  CPU: 4   COMMAND: "kworker/u224:1"
 #0 [ffff8813663bbb58] __schedule at ffffffff816ab68b
 #1 [ffff8813663bbbb0] schedule at ffffffff816abca7
 #2 [ffff8813663bbbd0] schedule_timeout at ffffffff816aee71
 #3 [ffff8813663bbc80] rds_ib_conn_shutdown at ffffffffa041f7d1 [rds_rdma]
 #4 [ffff8813663bbd10] rds_conn_shutdown at ffffffffa03dc6e2 [rds]
 #5 [ffff8813663bbdb0] rds_shutdown_worker at ffffffffa03e2699 [rds]
 #6 [ffff8813663bbe00] process_one_work at ffffffff8109cda1
 #7 [ffff8813663bbe50] worker_thread at ffffffff8109d92b
 #8 [ffff8813663bbec0] kthread at ffffffff810a304b
 #9 [ffff8813663bbf50] ret_from_fork at ffffffff816b0752
crash64>

It was stuck here in rds_ib_conn_shutdown for ever:

                /* quiesce tx and rx completion before tearing down */
                while (!wait_event_timeout(rds_ib_ring_empty_wait,
                                rds_ib_ring_empty(&ic->i_recv_ring) &&
                                (atomic_read(&ic->i_signaled_sends) == 0),
                                msecs_to_jiffies(5000))) {

                        /* Try to reap pending RX completions every 5 secs */
                        if (!rds_ib_ring_empty(&ic->i_recv_ring)) {
                                spin_lock_bh(&ic->i_rx_lock);
                                rds_ib_rx(ic);
                                spin_unlock_bh(&ic->i_rx_lock);
                        }
                }

The recv ring was not empty.
w_alloc_ptr = 560
w_free_ptr  = 256

This is what Mellanox had to say:
When CQ moves to error (e.g. due to CQ Overrun, CQ Access violation) FW will
generate Async event to notify this error, also the QPs that tries to access
this CQ will be put to error state but will not be flushed since we must not
post CQEs to a broken CQ. The QP that tries to access will also issue an
Async catas event.

In summary we cannot wait for any more WR_FLUSH_ERRs in that state.

Orabug: 29180514

Reviewed-by: Rama Nichanamatlu <[email protected]>
Signed-off-by: Venkat Venkatsubra <[email protected]>
gregmarsden pushed a commit that referenced this pull request Mar 5, 2019
The customer hit this crash few times.

PID: 31556  TASK: ffff880f823caa00  CPU: 1   COMMAND: "cellsrv"
 #0 [ffff880f823db850] machine_kexec at ffffffff8105d93c
 #1 [ffff880f823db8b0] crash_kexec at ffffffff811103b3
 #2 [ffff880f823db980] oops_end at ffffffff8101a788
 #3 [ffff880f823db9b0] no_context at ffffffff8106b9cf
 #4 [ffff880f823dba20] __bad_area_nosemaphore at ffffffff8106bc9d
 #5 [ffff880f823dba70] bad_area at ffffffff8106be97
 #6 [ffff880f823dbaa0] __do_page_fault at ffffffff8106c71e
 #7 [ffff880f823dbb00] do_page_fault at ffffffff8106c81f
 #8 [ffff880f823dbb40] page_fault at ffffffff816b5a9f
    [exception RIP: rds_ib_inc_copy_to_user+104]
    RIP: ffffffffa04607b8  RSP: ffff880f823dbbf8  RFLAGS: 00010287
    RAX: 0000000000000340  RBX: 0000000000001000  RCX: 0000000000004000
    RDX: 0000000000001000  RSI: ffff88176cea2000  RDI: ffff8817d291f520
    RBP: ffff880f823dbc48   R8: 0000000000001340   R9: 0000000000001000
    R10: 0000000000001200  R11: ffff880f823dc000  R12: ffff880f823dbed0
    R13: 0000000000000000  R14: 0000000000000000  R15: 0000000000001000
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
 #9 [ffff880f823dbc50] rds_recvmsg at ffffffffa041d837 [rds]

int rds_ib_inc_copy_to_user(struct rds_incoming *inc, struct iov_iter *to)
...
...
        ibinc = container_of(inc, struct rds_ib_incoming, ii_inc);
        frag = list_entry(ibinc->ii_frags.next, struct rds_page_frag, f_item);
        len = be32_to_cpu(inc->i_hdr.h_len);
        sg = frag->f_sg;

        while (iov_iter_count(to) && copied < len) {
                to_copy = min_t(unsigned long, iov_iter_count(to),
                                sg->length - frag_off);
                ...

sg is NULL and it crashes accessing sg->length above.

The cause looks like is due to ic->i_frag_sz returning incorrect value.
16KB when 4KB was expected.

                if (copied % ic->i_frag_sz == 0) {
                        frag = list_entry(frag->f_item.next,
                                          struct rds_page_frag, f_item);
                        frag_off = 0;
                        sg = frag->f_sg;
                }

The other end is using 4KB RDS fragsize (Solaris Super Cluster).
This end is UEK4 (4.1.12-94.8.4.el6uek.x86_64).

The message being copied arrived over 4KB RDS frag size connection.
But during the above check ic->i_frag_sz is 16KB.
This can happen during a reconnect at the connection setup phase.
We start off with ic->i_frag_sz as 16KB. Then settle down at 4KB.

Failing this check
  if (copied % ic->i_frag_sz == 0) {
can result in sg not getting set correctly.

Say, "copied" = 4KB but ic->i_frag_sz is 16KB when it should be 4KB.

During race condition with a reconnect, ic->i_frag_sz can be 16KB
even though once the connection is set up it settled down to 4KB.
It can change from 4KB to 16KB and back to 4KB during connection setup
due to reconnect.

We started seeing this crash after bug 26848749.
But prior to that the same scenario could result in data copied to user
from incorrect "sg" resulting in data corruption.

Orabug: 28748049

Reviewed-by: Rama Nichanamatlu <[email protected]>
Signed-off-by: Venkat Venkatsubra <[email protected]>
gregmarsden pushed a commit that referenced this pull request Mar 5, 2019
…error

The sequence that leads to this state is as follows.

1) First we see CQ error logged.

Sep 29 22:32:33 dm54cel14 kernel: [471472.784371] mlx4_core
0000:46:00.0: CQ access violation on CQN 000419 syndrome=0x2
vendor_error_syndrome=0x0

2) That is followed by the drop of the associated RDS connection.

Sep 29 22:32:33 dm54cel14 kernel: [471472.784403] RDS/IB: connection
<192.168.54.43,192.168.54.1,0> dropped due to 'qp event'

3) We don't get the WR_FLUSH_ERRs for the posted receive buffers after that.

4) RDS is stuck in rds_ib_conn_shutdown while shutting down that connection.

crash64> bt 62577
PID: 62577  TASK: ffff88143f045400  CPU: 4   COMMAND: "kworker/u224:1"
 #0 [ffff8813663bbb58] __schedule at ffffffff816ab68b
 #1 [ffff8813663bbbb0] schedule at ffffffff816abca7
 #2 [ffff8813663bbbd0] schedule_timeout at ffffffff816aee71
 #3 [ffff8813663bbc80] rds_ib_conn_shutdown at ffffffffa041f7d1 [rds_rdma]
 #4 [ffff8813663bbd10] rds_conn_shutdown at ffffffffa03dc6e2 [rds]
 #5 [ffff8813663bbdb0] rds_shutdown_worker at ffffffffa03e2699 [rds]
 #6 [ffff8813663bbe00] process_one_work at ffffffff8109cda1
 #7 [ffff8813663bbe50] worker_thread at ffffffff8109d92b
 #8 [ffff8813663bbec0] kthread at ffffffff810a304b
 #9 [ffff8813663bbf50] ret_from_fork at ffffffff816b0752
crash64>

It was stuck here in rds_ib_conn_shutdown for ever:

                /* quiesce tx and rx completion before tearing down */
                while (!wait_event_timeout(rds_ib_ring_empty_wait,
                                rds_ib_ring_empty(&ic->i_recv_ring) &&
                                (atomic_read(&ic->i_signaled_sends) == 0),
                                msecs_to_jiffies(5000))) {

                        /* Try to reap pending RX completions every 5 secs */
                        if (!rds_ib_ring_empty(&ic->i_recv_ring)) {
                                spin_lock_bh(&ic->i_rx_lock);
                                rds_ib_rx(ic);
                                spin_unlock_bh(&ic->i_rx_lock);
                        }
                }

The recv ring was not empty.
w_alloc_ptr = 560
w_free_ptr  = 256

This is what Mellanox had to say:
When CQ moves to error (e.g. due to CQ Overrun, CQ Access violation) FW will
generate Async event to notify this error, also the QPs that tries to access
this CQ will be put to error state but will not be flushed since we must not
post CQEs to a broken CQ. The QP that tries to access will also issue an
Async catas event.

In summary we cannot wait for any more WR_FLUSH_ERRs in that state.

Orabug: 29180535

Reviewed-by: Rama Nichanamatlu <[email protected]>
Signed-off-by: Venkat Venkatsubra <[email protected]>
gregmarsden pushed a commit that referenced this pull request Mar 22, 2019
[ Upstream commit f5e2848 ]

When enumerating page size definitions to check hardware support,
we construct a constant which is (1U << (def->shift - 10)).

However, the array of page size definitions is only initalised for
various MMU_PAGE_* constants, so it contains a number of 0-initialised
elements with def->shift == 0. This means we end up shifting by a
very large number, which gives the following UBSan splat:

================================================================================
UBSAN: Undefined behaviour in /home/dja/dev/linux/linux/arch/powerpc/mm/tlb_nohash.c:506:21
shift exponent 4294967286 is too large for 32-bit type 'unsigned int'
CPU: 0 PID: 0 Comm: swapper Not tainted 4.19.0-rc3-00045-ga604f927b012-dirty #6
Call Trace:
[c00000000101bc20] [c000000000a13d54] .dump_stack+0xa8/0xec (unreliable)
[c00000000101bcb0] [c0000000004f20a8] .ubsan_epilogue+0x18/0x64
[c00000000101bd30] [c0000000004f2b10] .__ubsan_handle_shift_out_of_bounds+0x110/0x1a4
[c00000000101be20] [c000000000d21760] .early_init_mmu+0x1b4/0x5a0
[c00000000101bf10] [c000000000d1ba28] .early_setup+0x100/0x130
[c00000000101bf90] [c000000000000528] start_here_multiplatform+0x68/0x80
================================================================================

Fix this by first checking if the element exists (shift != 0) before
constructing the constant.

Signed-off-by: Daniel Axtens <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
gregmarsden pushed a commit that referenced this pull request Mar 29, 2019
[ Upstream commit ebaf39e ]

The *_frag_reasm() functions are susceptible to miscalculating the byte
count of packet fragments in case the truesize of a head buffer changes.
The truesize member may be changed by the call to skb_unclone(), leaving
the fragment memory limit counter unbalanced even if all fragments are
processed. This miscalculation goes unnoticed as long as the network
namespace which holds the counter is not destroyed.

Should an attempt be made to destroy a network namespace that holds an
unbalanced fragment memory limit counter the cleanup of the namespace
never finishes. The thread handling the cleanup gets stuck in
inet_frags_exit_net() waiting for the percpu counter to reach zero. The
thread is usually in running state with a stacktrace similar to:

 PID: 1073   TASK: ffff880626711440  CPU: 1   COMMAND: "kworker/u48:4"
  #5 [ffff880621563d48] _raw_spin_lock at ffffffff815f5480
  #6 [ffff880621563d48] inet_evict_bucket at ffffffff8158020b
  #7 [ffff880621563d80] inet_frags_exit_net at ffffffff8158051c
  #8 [ffff880621563db0] ops_exit_list at ffffffff814f5856
  #9 [ffff880621563dd8] cleanup_net at ffffffff814f67c0
 #10 [ffff880621563e38] process_one_work at ffffffff81096f14

It is not possible to create new network namespaces, and processes
that call unshare() end up being stuck in uninterruptible sleep state
waiting to acquire the net_mutex.

The bug was observed in the IPv6 netfilter code by Per Sundstrom.
I thank him for his analysis of the problem. The parts of this patch
that apply to IPv4 and IPv6 fragment reassembly are preemptive measures.

Signed-off-by: Jiri Wiesner <[email protected]>
Reported-by: Per Sundstrom <[email protected]>
Acked-by: Peter Oskolkov <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
gregmarsden pushed a commit that referenced this pull request Mar 29, 2019
[ Upstream commit c5a94f4 ]

It was observed that a process blocked indefintely in
__fscache_read_or_alloc_page(), waiting for FSCACHE_COOKIE_LOOKING_UP
to be cleared via fscache_wait_for_deferred_lookup().

At this time, ->backing_objects was empty, which would normaly prevent
__fscache_read_or_alloc_page() from getting to the point of waiting.
This implies that ->backing_objects was cleared *after*
__fscache_read_or_alloc_page was was entered.

When an object is "killed" and then "dropped",
FSCACHE_COOKIE_LOOKING_UP is cleared in fscache_lookup_failure(), then
KILL_OBJECT and DROP_OBJECT are "called" and only in DROP_OBJECT is
->backing_objects cleared.  This leaves a window where
something else can set FSCACHE_COOKIE_LOOKING_UP and
__fscache_read_or_alloc_page() can start waiting, before
->backing_objects is cleared

There is some uncertainty in this analysis, but it seems to be fit the
observations.  Adding the wake in this patch will be handled correctly
by __fscache_read_or_alloc_page(), as it checks if ->backing_objects
is empty again, after waiting.

Customer which reported the hang, also report that the hang cannot be
reproduced with this fix.

The backtrace for the blocked process looked like:

PID: 29360  TASK: ffff881ff2ac0f80  CPU: 3   COMMAND: "zsh"
 #0 [ffff881ff43efbf8] schedule at ffffffff815e56f1
 #1 [ffff881ff43efc58] bit_wait at ffffffff815e64ed
 #2 [ffff881ff43efc68] __wait_on_bit at ffffffff815e61b8
 #3 [ffff881ff43efca0] out_of_line_wait_on_bit at ffffffff815e625e
 #4 [ffff881ff43efd08] fscache_wait_for_deferred_lookup at ffffffffa04f2e8f [fscache]
 #5 [ffff881ff43efd18] __fscache_read_or_alloc_page at ffffffffa04f2ffe [fscache]
 #6 [ffff881ff43efd58] __nfs_readpage_from_fscache at ffffffffa0679668 [nfs]
 #7 [ffff881ff43efd78] nfs_readpage at ffffffffa067092b [nfs]
 #8 [ffff881ff43efda0] generic_file_read_iter at ffffffff81187a73
 #9 [ffff881ff43efe50] nfs_file_read at ffffffffa066544b [nfs]
#10 [ffff881ff43efe70] __vfs_read at ffffffff811fc756
#11 [ffff881ff43efee8] vfs_read at ffffffff811fccfa
#12 [ffff881ff43eff18] sys_read at ffffffff811fda62
#13 [ffff881ff43eff50] entry_SYSCALL_64_fastpath at ffffffff815e986e

Signed-off-by: NeilBrown <[email protected]>
Signed-off-by: David Howells <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
gregmarsden pushed a commit that referenced this pull request Apr 19, 2019
This work around should be reverted when upstream commit (d8b91dd
Merge branch 'perf-core-for-linus' of
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip)
is available in uek.

Issue appear is fixed in upstream tag 4.16.0 . Tag 4.15.0 still has this
issue.

The lask known commit on the perf topic branch that solve this issue
is this: (c19d084 (tag: perf-core-for-mingo-4.16-20180125)
perf trace beauty flock: Move to separate object file).

Without this commit the perf topic branch has the below issue. With
this commit the branch does not have the issue.

Issue is that the above commit does not fix the issue on top of upstream
tag 4.15.0. So the issue is probably fixed by this commit and some additional
commits on the perf topic branch *or/and* on master branch below the point that
the perf branch was branched.

Also this specific commit is not a fix and the only possible relation to this
bug is that it touches the 'flock' code which is used by bash/scripts to
synchronize.

To find the additional commits via git bisect I need to re-order the commits so
that the above commit will be *below* the other commits that solve this issue.
To do that I need to know what's the lowest commit that relate to this fix.

I do not know and have no way to know that.

Attempt to merge the perf topic on top of uek5 produce ~20k commits and tons
of merge conflicts as uek5 is way behind the upstream. So can't even know if
the topic branch with it's ~270 commits fix this issue for uek5.

So I chose to work-around the issue and wait for the upstream topic merge to
obsolite this commit.

When issue occuer:

Serial is flooded with messages:

[71266.680745] bondib0: link status up for interface ib0, enabling it in 0 ms
[71266.682740] bondib0: link status up for interface ib0, enabling it in 0 ms
[71266.685738] bondib0: link status up for interface ib0, enabling it in 0 ms

Then panic occur:

[71266.695757] INFO: task NetworkManager:5837 blocked for more than 120 seconds.
[71266.695759]       Not tainted 4.14.35-1902.0.6.el7uek.x86_64 #2
[71266.695760] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[71266.695761] NetworkManager  D    0  5837      1 0x00000082
[71266.695765] Call Trace:
[71266.695778]  __schedule+0x2bc/0x8da
[71266.695782]  schedule+0x36/0x7c
[71266.695785]  schedule_preempt_disabled+0xe/0x10
[71266.695788]  __mutex_lock.isra.5+0x20c/0x634
[71266.695792]  __mutex_lock_slowpath+0x13/0x15
[71266.695794]  mutex_lock+0x2f/0x3a
[71266.695800]  rtnetlink_rcv_msg+0x1d0/0x289
[71266.695806]  ? __skb_try_recv_datagram+0xca/0x174
[71266.695809]  ? rtnl_calcit.isra.25+0x110/0x103
[71266.695812]  netlink_rcv_skb+0xdf/0x111
[71266.695816]  rtnetlink_rcv+0x15/0x17
[71266.695818]  netlink_unicast+0x18d/0x255
[71266.695820]  netlink_sendmsg+0x2df/0x3cc
[71266.695825]  sock_sendmsg+0x3e/0x4a
[71266.695828]  ___sys_sendmsg+0x2b5/0x2c6
[71266.695832]  ? arch_tlb_finish_mmu+0x1b/0xcb
[71266.695835]  ? tlb_finish_mmu+0x23/0x30
[71266.695838]  ? unmap_region+0xf4/0x12d
[71266.695844]  ? lockref_put_or_lock+0x44/0x72
[71266.695846]  ? __vma_rb_erase+0x10f/0x1f4
[71266.695850]  __sys_sendmsg+0x54/0x8d
[71266.695854]  SyS_sendmsg+0x12/0x1c
[71266.695860]  do_syscall_64+0x79/0x1ae
[71266.695864]  entry_SYSCALL_64_after_hwframe+0x151/0x0
[71266.695866] RIP: 0033:0x7f16f2553c5d
[71266.695867] RSP: 002b:00007ffff7a493f0 EFLAGS: 00000293 ORIG_RAX: 000000000000002e
[71266.695870] RAX: ffffffffffffffda RBX: 00005570a5026380 RCX: 00007f16f2553c5d
[71266.695874] RDX: 0000000000000000 RSI: 00007ffff7a49420 RDI: 0000000000000007
[71266.695875] RBP: 00007ffff7a49420 R08: 0000000000000001 R09: 0000000000000000
[71266.695876] R10: 0000000000000808 R11: 0000000000000293 R12: 00005570a5026380
[71266.695876] R13: 0000000000000000 R14: 0000000000000000 R15: 00007f16d4004b70

Issue analysis:

The ip process is hung in addrconf_notify while trying to print to serial
one of the below messages:
"ADDRCONF(NETDEV_UP): %s: link is not ready\n"
"ADDRCONF(NETDEV_CHANGE): %s: link becomes ready\n"
The ip process hold the rtnl_lock while network-manager process try to grab
this lock in 1 msec loop and every time it fail to grab the lock, the
network-manager send additional line to the serial log as seen in the dmesg:
"bondib0: link status up for interface ib0, enabling it in 0 ms"
So the bond device flood the serial while waiting for the rtnl_lock while ip
hold the rtnl_lock while waiting for the serial.

Offending stack trace from vmcore is this:

PID: 30063  TASK: ffff909c3f675a00  CPU: 7   COMMAND: "ip"
 #0 [fffffe000013ce38] crash_nmi_callback at ffffffff8e059ba7
    /usr/src/debug/kernel-4.14.35/linux-4.14.35-1902.0.6.el7uek/arch/x86/include/asm/paravirt.h: 99
 #1 [fffffe000013ce48] nmi_handle at ffffffff8e032748
    /usr/src/debug/kernel-4.14.35/linux-4.14.35-1902.0.6.el7uek/arch/x86/kernel/nmi.c: 137
 #2 [fffffe000013cea0] default_do_nmi at ffffffff8e032c96
    /usr/src/debug/kernel-4.14.35/linux-4.14.35-1902.0.6.el7uek/arch/x86/kernel/nmi.c: 336
 #3 [fffffe000013cec8] do_nmi at ffffffff8e032e76
    /usr/src/debug/kernel-4.14.35/linux-4.14.35-1902.0.6.el7uek/arch/x86/kernel/nmi.c: 521
 #4 [fffffe000013cef0] end_repeat_nmi at ffffffff8ea0436f
    /usr/src/debug/kernel-4.14.35/linux-4.14.35-1902.0.6.el7uek/arch/x86/entry/entry_64.S: 1750
    [exception RIP: delay_tsc+51]
    RIP: ffffffff8e8558f3  RSP: ffff9f63c6c07390  RFLAGS: 00000046
    RAX: 0000000016d23977  RBX: ffffffff903fbc00  RCX: 00009b7616d23038
    RDX: 0000000000009b76  RSI: 0000000000000007  RDI: 000000000000095a
    RBP: ffff9f63c6c07390   R8: 00000000fffffffe   R9: 0000000000000000
    R10: 0000000000000005  R11: 0000000000020503  R12: 000000000000261f
    R13: 0000000000000020  R14: ffffffff8f96de2f  R15: ffffffff903fbc00
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
    /usr/src/debug/kernel-4.14.35/linux-4.14.35-1902.0.6.el7uek/arch/x86/include/asm/msr.h: 193
    <NMI exception stack>
 #5 [ffff9f63c6c07390] delay_tsc at ffffffff8e8558f3
    /usr/src/debug/kernel-4.14.35/linux-4.14.35-1902.0.6.el7uek/arch/x86/include/asm/msr.h: 193
 #6 [ffff9f63c6c07398] __const_udelay at ffffffff8e855838
    /usr/src/debug/kernel-4.14.35/linux-4.14.35-1902.0.6.el7uek/arch/x86/lib/delay.c: 176
 #7 [ffff9f63c6c073a8] wait_for_xmitr at ffffffff8e510dcc
    /usr/src/debug/kernel-4.14.35/linux-4.14.35-1902.0.6.el7uek/include/linux/nmi.h: 126
 #8 [ffff9f63c6c073d0] serial8250_console_putchar at ffffffff8e510e6c
    /usr/src/debug/kernel-4.14.35/linux-4.14.35-1902.0.6.el7uek/include/linux/serial_core.h: 265
 #9 [ffff9f63c6c073f0] uart_console_write at ffffffff8e509573
    /usr/src/debug/kernel-4.14.35/linux-4.14.35-1902.0.6.el7uek/drivers/tty/serial/serial_core.c: 1886
    /usr/src/debug/kernel-4.14.35/linux-4.14.35-1902.0.6.el7uek/drivers/tty/serial/8250/8250_port.c: 3256
    /usr/src/debug/kernel-4.14.35/linux-4.14.35-1902.0.6.el7uek/drivers/tty/serial/8250/8250_core.c: 598
    /usr/src/debug/kernel-4.14.35/linux-4.14.35-1902.0.6.el7uek/kernel/printk/printk.c: 1574
    /usr/src/debug/kernel-4.14.35/linux-4.14.35-1902.0.6.el7uek/kernel/printk/printk.c: 1766
    /usr/src/debug/kernel-4.14.35/linux-4.14.35-1902.0.6.el7uek/kernel/printk/printk.c: 1808
    /usr/src/debug/kernel-4.14.35/linux-4.14.35-1902.0.6.el7uek/kernel/printk/printk_safe.c: 402
    /usr/src/debug/kernel-4.14.35/linux-4.14.35-1902.0.6.el7uek/kernel/printk/printk.c: 1842
    /usr/src/debug/kernel-4.14.35/linux-4.14.35-1902.0.6.el7uek/net/ipv6/addrconf.c: 3532
    /usr/src/debug/kernel-4.14.35/linux-4.14.35-1902.0.6.el7uek/kernel/notifier.c: 95
    /usr/src/debug/kernel-4.14.35/linux-4.14.35-1902.0.6.el7uek/kernel/notifier.c: 402
    /usr/src/debug/kernel-4.14.35/linux-4.14.35-1902.0.6.el7uek/net/core/dev.c: 1682
    /usr/src/debug/kernel-4.14.35/linux-4.14.35-1902.0.6.el7uek/net/core/dev.c: 1697
    /usr/src/debug/kernel-4.14.35/linux-4.14.35-1902.0.6.el7uek/net/core/dev.c: 6903
    /usr/src/debug/kernel-4.14.35/linux-4.14.35-1902.0.6.el7uek/net/core/rtnetlink.c: 2072
    /usr/src/debug/kernel-4.14.35/linux-4.14.35-1902.0.6.el7uek/net/core/rtnetlink.c: 2624
    /usr/src/debug/kernel-4.14.35/linux-4.14.35-1902.0.6.el7uek/net/core/rtnetlink.c: 4255
    /usr/src/debug/kernel-4.14.35/linux-4.14.35-1902.0.6.el7uek/net/netlink/af_netlink.c: 2433
    /usr/src/debug/kernel-4.14.35/linux-4.14.35-1902.0.6.el7uek/net/core/rtnetlink.c: 4268
    /usr/src/debug/kernel-4.14.35/linux-4.14.35-1902.0.6.el7uek/net/netlink/af_netlink.c: 1287
    /usr/src/debug/kernel-4.14.35/linux-4.14.35-1902.0.6.el7uek/net/netlink/af_netlink.c: 1877
    /usr/src/debug/kernel-4.14.35/linux-4.14.35-1902.0.6.el7uek/net/socket.c: 646
    /usr/src/debug/kernel-4.14.35/linux-4.14.35-1902.0.6.el7uek/net/socket.c: 2061
    /usr/src/debug/kernel-4.14.35/linux-4.14.35-1902.0.6.el7uek/include/linux/file.h: 26
    /usr/src/debug/kernel-4.14.35/linux-4.14.35-1902.0.6.el7uek/net/socket.c: 2102
    /usr/src/debug/kernel-4.14.35/linux-4.14.35-1902.0.6.el7uek/arch/x86/entry/common.c: 295
    /usr/src/debug/kernel-4.14.35/linux-4.14.35-1902.0.6.el7uek/arch/x86/entry/entry_64.S: 247
    RIP: 00007faf75ccafd0  RSP: 00007ffc710a9368  RFLAGS: 00000246
    RAX: ffffffffffffffda  RBX: 000000005c65f66d  RCX: 00007faf75ccafd0
    RDX: 0000000000000000  RSI: 00007ffc710a93b0  RDI: 0000000000000003
    RBP: 00007ffc710a93b0   R8: 0000000000000000   R9: 0000000000000008
    R10: 00007ffc710a8f30  R11: 0000000000000246  R12: 0000000000000000
    R13: 000000000066a440  R14: 00007ffc710a9458  R15: 00007ffc710a9b88
    ORIG_RAX: 000000000000002e  CS: 0033  SS: 002b

Orabug: 29357838

Signed-off-by: Shamir Rabinovitch <[email protected]>
Signed-off-by: Aron Silverton <[email protected]>
Reviewed-by: John Haxby <[email protected]>
gregmarsden pushed a commit that referenced this pull request May 17, 2019
This work around should be reverted when upstream commit (d8b91dd
Merge branch 'perf-core-for-linus' of
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip)
is available in uek.

Issue appear is fixed in upstream tag 4.16.0 . Tag 4.15.0 still has this
issue.

The lask known commit on the perf topic branch that solve this issue
is this: (c19d084 (tag: perf-core-for-mingo-4.16-20180125)
perf trace beauty flock: Move to separate object file).

Without this commit the perf topic branch has the below issue. With
this commit the branch does not have the issue.

Issue is that the above commit does not fix the issue on top of upstream
tag 4.15.0. So the issue is probably fixed by this commit and some additional
commits on the perf topic branch *or/and* on master branch below the point that
the perf branch was branched.

Also this specific commit is not a fix and the only possible relation to this
bug is that it touches the 'flock' code which is used by bash/scripts to
synchronize.

To find the additional commits via git bisect I need to re-order the commits so
that the above commit will be *below* the other commits that solve this issue.
To do that I need to know what's the lowest commit that relate to this fix.

I do not know and have no way to know that.

Attempt to merge the perf topic on top of uek5 produce ~20k commits and tons
of merge conflicts as uek5 is way behind the upstream. So can't even know if
the topic branch with it's ~270 commits fix this issue for uek5.

So I chose to work-around the issue and wait for the upstream topic merge to
obsolite this commit.

When issue occuer:

Serial is flooded with messages:

[71266.680745] bondib0: link status up for interface ib0, enabling it in 0 ms
[71266.682740] bondib0: link status up for interface ib0, enabling it in 0 ms
[71266.685738] bondib0: link status up for interface ib0, enabling it in 0 ms

Then panic occur:

[71266.695757] INFO: task NetworkManager:5837 blocked for more than 120 seconds.
[71266.695759]       Not tainted 4.14.35-1902.0.6.el7uek.x86_64 #2
[71266.695760] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[71266.695761] NetworkManager  D    0  5837      1 0x00000082
[71266.695765] Call Trace:
[71266.695778]  __schedule+0x2bc/0x8da
[71266.695782]  schedule+0x36/0x7c
[71266.695785]  schedule_preempt_disabled+0xe/0x10
[71266.695788]  __mutex_lock.isra.5+0x20c/0x634
[71266.695792]  __mutex_lock_slowpath+0x13/0x15
[71266.695794]  mutex_lock+0x2f/0x3a
[71266.695800]  rtnetlink_rcv_msg+0x1d0/0x289
[71266.695806]  ? __skb_try_recv_datagram+0xca/0x174
[71266.695809]  ? rtnl_calcit.isra.25+0x110/0x103
[71266.695812]  netlink_rcv_skb+0xdf/0x111
[71266.695816]  rtnetlink_rcv+0x15/0x17
[71266.695818]  netlink_unicast+0x18d/0x255
[71266.695820]  netlink_sendmsg+0x2df/0x3cc
[71266.695825]  sock_sendmsg+0x3e/0x4a
[71266.695828]  ___sys_sendmsg+0x2b5/0x2c6
[71266.695832]  ? arch_tlb_finish_mmu+0x1b/0xcb
[71266.695835]  ? tlb_finish_mmu+0x23/0x30
[71266.695838]  ? unmap_region+0xf4/0x12d
[71266.695844]  ? lockref_put_or_lock+0x44/0x72
[71266.695846]  ? __vma_rb_erase+0x10f/0x1f4
[71266.695850]  __sys_sendmsg+0x54/0x8d
[71266.695854]  SyS_sendmsg+0x12/0x1c
[71266.695860]  do_syscall_64+0x79/0x1ae
[71266.695864]  entry_SYSCALL_64_after_hwframe+0x151/0x0
[71266.695866] RIP: 0033:0x7f16f2553c5d
[71266.695867] RSP: 002b:00007ffff7a493f0 EFLAGS: 00000293 ORIG_RAX: 000000000000002e
[71266.695870] RAX: ffffffffffffffda RBX: 00005570a5026380 RCX: 00007f16f2553c5d
[71266.695874] RDX: 0000000000000000 RSI: 00007ffff7a49420 RDI: 0000000000000007
[71266.695875] RBP: 00007ffff7a49420 R08: 0000000000000001 R09: 0000000000000000
[71266.695876] R10: 0000000000000808 R11: 0000000000000293 R12: 00005570a5026380
[71266.695876] R13: 0000000000000000 R14: 0000000000000000 R15: 00007f16d4004b70

Issue analysis:

The ip process is hung in addrconf_notify while trying to print to serial
one of the below messages:
"ADDRCONF(NETDEV_UP): %s: link is not ready\n"
"ADDRCONF(NETDEV_CHANGE): %s: link becomes ready\n"
The ip process hold the rtnl_lock while network-manager process try to grab
this lock in 1 msec loop and every time it fail to grab the lock, the
network-manager send additional line to the serial log as seen in the dmesg:
"bondib0: link status up for interface ib0, enabling it in 0 ms"
So the bond device flood the serial while waiting for the rtnl_lock while ip
hold the rtnl_lock while waiting for the serial.

Offending stack trace from vmcore is this:

PID: 30063  TASK: ffff909c3f675a00  CPU: 7   COMMAND: "ip"
 #0 [fffffe000013ce38] crash_nmi_callback at ffffffff8e059ba7
    /usr/src/debug/kernel-4.14.35/linux-4.14.35-1902.0.6.el7uek/arch/x86/include/asm/paravirt.h: 99
 #1 [fffffe000013ce48] nmi_handle at ffffffff8e032748
    /usr/src/debug/kernel-4.14.35/linux-4.14.35-1902.0.6.el7uek/arch/x86/kernel/nmi.c: 137
 #2 [fffffe000013cea0] default_do_nmi at ffffffff8e032c96
    /usr/src/debug/kernel-4.14.35/linux-4.14.35-1902.0.6.el7uek/arch/x86/kernel/nmi.c: 336
 #3 [fffffe000013cec8] do_nmi at ffffffff8e032e76
    /usr/src/debug/kernel-4.14.35/linux-4.14.35-1902.0.6.el7uek/arch/x86/kernel/nmi.c: 521
 #4 [fffffe000013cef0] end_repeat_nmi at ffffffff8ea0436f
    /usr/src/debug/kernel-4.14.35/linux-4.14.35-1902.0.6.el7uek/arch/x86/entry/entry_64.S: 1750
    [exception RIP: delay_tsc+51]
    RIP: ffffffff8e8558f3  RSP: ffff9f63c6c07390  RFLAGS: 00000046
    RAX: 0000000016d23977  RBX: ffffffff903fbc00  RCX: 00009b7616d23038
    RDX: 0000000000009b76  RSI: 0000000000000007  RDI: 000000000000095a
    RBP: ffff9f63c6c07390   R8: 00000000fffffffe   R9: 0000000000000000
    R10: 0000000000000005  R11: 0000000000020503  R12: 000000000000261f
    R13: 0000000000000020  R14: ffffffff8f96de2f  R15: ffffffff903fbc00
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
    /usr/src/debug/kernel-4.14.35/linux-4.14.35-1902.0.6.el7uek/arch/x86/include/asm/msr.h: 193
    <NMI exception stack>
 #5 [ffff9f63c6c07390] delay_tsc at ffffffff8e8558f3
    /usr/src/debug/kernel-4.14.35/linux-4.14.35-1902.0.6.el7uek/arch/x86/include/asm/msr.h: 193
 #6 [ffff9f63c6c07398] __const_udelay at ffffffff8e855838
    /usr/src/debug/kernel-4.14.35/linux-4.14.35-1902.0.6.el7uek/arch/x86/lib/delay.c: 176
 #7 [ffff9f63c6c073a8] wait_for_xmitr at ffffffff8e510dcc
    /usr/src/debug/kernel-4.14.35/linux-4.14.35-1902.0.6.el7uek/include/linux/nmi.h: 126
 #8 [ffff9f63c6c073d0] serial8250_console_putchar at ffffffff8e510e6c
    /usr/src/debug/kernel-4.14.35/linux-4.14.35-1902.0.6.el7uek/include/linux/serial_core.h: 265
 #9 [ffff9f63c6c073f0] uart_console_write at ffffffff8e509573
    /usr/src/debug/kernel-4.14.35/linux-4.14.35-1902.0.6.el7uek/drivers/tty/serial/serial_core.c: 1886
    /usr/src/debug/kernel-4.14.35/linux-4.14.35-1902.0.6.el7uek/drivers/tty/serial/8250/8250_port.c: 3256
    /usr/src/debug/kernel-4.14.35/linux-4.14.35-1902.0.6.el7uek/drivers/tty/serial/8250/8250_core.c: 598
    /usr/src/debug/kernel-4.14.35/linux-4.14.35-1902.0.6.el7uek/kernel/printk/printk.c: 1574
    /usr/src/debug/kernel-4.14.35/linux-4.14.35-1902.0.6.el7uek/kernel/printk/printk.c: 1766
    /usr/src/debug/kernel-4.14.35/linux-4.14.35-1902.0.6.el7uek/kernel/printk/printk.c: 1808
    /usr/src/debug/kernel-4.14.35/linux-4.14.35-1902.0.6.el7uek/kernel/printk/printk_safe.c: 402
    /usr/src/debug/kernel-4.14.35/linux-4.14.35-1902.0.6.el7uek/kernel/printk/printk.c: 1842
    /usr/src/debug/kernel-4.14.35/linux-4.14.35-1902.0.6.el7uek/net/ipv6/addrconf.c: 3532
    /usr/src/debug/kernel-4.14.35/linux-4.14.35-1902.0.6.el7uek/kernel/notifier.c: 95
    /usr/src/debug/kernel-4.14.35/linux-4.14.35-1902.0.6.el7uek/kernel/notifier.c: 402
    /usr/src/debug/kernel-4.14.35/linux-4.14.35-1902.0.6.el7uek/net/core/dev.c: 1682
    /usr/src/debug/kernel-4.14.35/linux-4.14.35-1902.0.6.el7uek/net/core/dev.c: 1697
    /usr/src/debug/kernel-4.14.35/linux-4.14.35-1902.0.6.el7uek/net/core/dev.c: 6903
    /usr/src/debug/kernel-4.14.35/linux-4.14.35-1902.0.6.el7uek/net/core/rtnetlink.c: 2072
    /usr/src/debug/kernel-4.14.35/linux-4.14.35-1902.0.6.el7uek/net/core/rtnetlink.c: 2624
    /usr/src/debug/kernel-4.14.35/linux-4.14.35-1902.0.6.el7uek/net/core/rtnetlink.c: 4255
    /usr/src/debug/kernel-4.14.35/linux-4.14.35-1902.0.6.el7uek/net/netlink/af_netlink.c: 2433
    /usr/src/debug/kernel-4.14.35/linux-4.14.35-1902.0.6.el7uek/net/core/rtnetlink.c: 4268
    /usr/src/debug/kernel-4.14.35/linux-4.14.35-1902.0.6.el7uek/net/netlink/af_netlink.c: 1287
    /usr/src/debug/kernel-4.14.35/linux-4.14.35-1902.0.6.el7uek/net/netlink/af_netlink.c: 1877
    /usr/src/debug/kernel-4.14.35/linux-4.14.35-1902.0.6.el7uek/net/socket.c: 646
    /usr/src/debug/kernel-4.14.35/linux-4.14.35-1902.0.6.el7uek/net/socket.c: 2061
    /usr/src/debug/kernel-4.14.35/linux-4.14.35-1902.0.6.el7uek/include/linux/file.h: 26
    /usr/src/debug/kernel-4.14.35/linux-4.14.35-1902.0.6.el7uek/net/socket.c: 2102
    /usr/src/debug/kernel-4.14.35/linux-4.14.35-1902.0.6.el7uek/arch/x86/entry/common.c: 295
    /usr/src/debug/kernel-4.14.35/linux-4.14.35-1902.0.6.el7uek/arch/x86/entry/entry_64.S: 247
    RIP: 00007faf75ccafd0  RSP: 00007ffc710a9368  RFLAGS: 00000246
    RAX: ffffffffffffffda  RBX: 000000005c65f66d  RCX: 00007faf75ccafd0
    RDX: 0000000000000000  RSI: 00007ffc710a93b0  RDI: 0000000000000003
    RBP: 00007ffc710a93b0   R8: 0000000000000000   R9: 0000000000000008
    R10: 00007ffc710a8f30  R11: 0000000000000246  R12: 0000000000000000
    R13: 000000000066a440  R14: 00007ffc710a9458  R15: 00007ffc710a9b88
    ORIG_RAX: 000000000000002e  CS: 0033  SS: 002b

Orabug: 29016284

Signed-off-by: Shamir Rabinovitch <[email protected]>
Reviewed-by: John Haxby <[email protected]>
gregmarsden pushed a commit that referenced this pull request May 17, 2019
This work around should be reverted when upstream commit (d8b91dd
Merge branch 'perf-core-for-linus' of
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip)
is available in uek.

Issue appear is fixed in upstream tag 4.16.0 . Tag 4.15.0 still has this
issue.

The lask known commit on the perf topic branch that solve this issue
is this: (c19d084 (tag: perf-core-for-mingo-4.16-20180125)
perf trace beauty flock: Move to separate object file).

Without this commit the perf topic branch has the below issue. With
this commit the branch does not have the issue.

Issue is that the above commit does not fix the issue on top of upstream
tag 4.15.0. So the issue is probably fixed by this commit and some additional
commits on the perf topic branch *or/and* on master branch below the point that
the perf branch was branched.

Also this specific commit is not a fix and the only possible relation to this
bug is that it touches the 'flock' code which is used by bash/scripts to
synchronize.

To find the additional commits via git bisect I need to re-order the commits so
that the above commit will be *below* the other commits that solve this issue.
To do that I need to know what's the lowest commit that relate to this fix.

I do not know and have no way to know that.

Attempt to merge the perf topic on top of uek5 produce ~20k commits and tons
of merge conflicts as uek5 is way behind the upstream. So can't even know if
the topic branch with it's ~270 commits fix this issue for uek5.

So I chose to work-around the issue and wait for the upstream topic merge to
obsolite this commit.

When issue occuer:

Serial is flooded with messages:

[71266.680745] bondib0: link status up for interface ib0, enabling it in 0 ms
[71266.682740] bondib0: link status up for interface ib0, enabling it in 0 ms
[71266.685738] bondib0: link status up for interface ib0, enabling it in 0 ms

Then panic occur:

[71266.695757] INFO: task NetworkManager:5837 blocked for more than 120 seconds.
[71266.695759]       Not tainted 4.14.35-1902.0.6.el7uek.x86_64 #2
[71266.695760] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[71266.695761] NetworkManager  D    0  5837      1 0x00000082
[71266.695765] Call Trace:
[71266.695778]  __schedule+0x2bc/0x8da
[71266.695782]  schedule+0x36/0x7c
[71266.695785]  schedule_preempt_disabled+0xe/0x10
[71266.695788]  __mutex_lock.isra.5+0x20c/0x634
[71266.695792]  __mutex_lock_slowpath+0x13/0x15
[71266.695794]  mutex_lock+0x2f/0x3a
[71266.695800]  rtnetlink_rcv_msg+0x1d0/0x289
[71266.695806]  ? __skb_try_recv_datagram+0xca/0x174
[71266.695809]  ? rtnl_calcit.isra.25+0x110/0x103
[71266.695812]  netlink_rcv_skb+0xdf/0x111
[71266.695816]  rtnetlink_rcv+0x15/0x17
[71266.695818]  netlink_unicast+0x18d/0x255
[71266.695820]  netlink_sendmsg+0x2df/0x3cc
[71266.695825]  sock_sendmsg+0x3e/0x4a
[71266.695828]  ___sys_sendmsg+0x2b5/0x2c6
[71266.695832]  ? arch_tlb_finish_mmu+0x1b/0xcb
[71266.695835]  ? tlb_finish_mmu+0x23/0x30
[71266.695838]  ? unmap_region+0xf4/0x12d
[71266.695844]  ? lockref_put_or_lock+0x44/0x72
[71266.695846]  ? __vma_rb_erase+0x10f/0x1f4
[71266.695850]  __sys_sendmsg+0x54/0x8d
[71266.695854]  SyS_sendmsg+0x12/0x1c
[71266.695860]  do_syscall_64+0x79/0x1ae
[71266.695864]  entry_SYSCALL_64_after_hwframe+0x151/0x0
[71266.695866] RIP: 0033:0x7f16f2553c5d
[71266.695867] RSP: 002b:00007ffff7a493f0 EFLAGS: 00000293 ORIG_RAX: 000000000000002e
[71266.695870] RAX: ffffffffffffffda RBX: 00005570a5026380 RCX: 00007f16f2553c5d
[71266.695874] RDX: 0000000000000000 RSI: 00007ffff7a49420 RDI: 0000000000000007
[71266.695875] RBP: 00007ffff7a49420 R08: 0000000000000001 R09: 0000000000000000
[71266.695876] R10: 0000000000000808 R11: 0000000000000293 R12: 00005570a5026380
[71266.695876] R13: 0000000000000000 R14: 0000000000000000 R15: 00007f16d4004b70

Issue analysis:

The ip process is hung in addrconf_notify while trying to print to serial
one of the below messages:
"ADDRCONF(NETDEV_UP): %s: link is not ready\n"
"ADDRCONF(NETDEV_CHANGE): %s: link becomes ready\n"
The ip process hold the rtnl_lock while network-manager process try to grab
this lock in 1 msec loop and every time it fail to grab the lock, the
network-manager send additional line to the serial log as seen in the dmesg:
"bondib0: link status up for interface ib0, enabling it in 0 ms"
So the bond device flood the serial while waiting for the rtnl_lock while ip
hold the rtnl_lock while waiting for the serial.

Offending stack trace from vmcore is this:

PID: 30063  TASK: ffff909c3f675a00  CPU: 7   COMMAND: "ip"
 #0 [fffffe000013ce38] crash_nmi_callback at ffffffff8e059ba7
    /usr/src/debug/kernel-4.14.35/linux-4.14.35-1902.0.6.el7uek/arch/x86/include/asm/paravirt.h: 99
 #1 [fffffe000013ce48] nmi_handle at ffffffff8e032748
    /usr/src/debug/kernel-4.14.35/linux-4.14.35-1902.0.6.el7uek/arch/x86/kernel/nmi.c: 137
 #2 [fffffe000013cea0] default_do_nmi at ffffffff8e032c96
    /usr/src/debug/kernel-4.14.35/linux-4.14.35-1902.0.6.el7uek/arch/x86/kernel/nmi.c: 336
 #3 [fffffe000013cec8] do_nmi at ffffffff8e032e76
    /usr/src/debug/kernel-4.14.35/linux-4.14.35-1902.0.6.el7uek/arch/x86/kernel/nmi.c: 521
 #4 [fffffe000013cef0] end_repeat_nmi at ffffffff8ea0436f
    /usr/src/debug/kernel-4.14.35/linux-4.14.35-1902.0.6.el7uek/arch/x86/entry/entry_64.S: 1750
    [exception RIP: delay_tsc+51]
    RIP: ffffffff8e8558f3  RSP: ffff9f63c6c07390  RFLAGS: 00000046
    RAX: 0000000016d23977  RBX: ffffffff903fbc00  RCX: 00009b7616d23038
    RDX: 0000000000009b76  RSI: 0000000000000007  RDI: 000000000000095a
    RBP: ffff9f63c6c07390   R8: 00000000fffffffe   R9: 0000000000000000
    R10: 0000000000000005  R11: 0000000000020503  R12: 000000000000261f
    R13: 0000000000000020  R14: ffffffff8f96de2f  R15: ffffffff903fbc00
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
    /usr/src/debug/kernel-4.14.35/linux-4.14.35-1902.0.6.el7uek/arch/x86/include/asm/msr.h: 193
    <NMI exception stack>
 #5 [ffff9f63c6c07390] delay_tsc at ffffffff8e8558f3
    /usr/src/debug/kernel-4.14.35/linux-4.14.35-1902.0.6.el7uek/arch/x86/include/asm/msr.h: 193
 #6 [ffff9f63c6c07398] __const_udelay at ffffffff8e855838
    /usr/src/debug/kernel-4.14.35/linux-4.14.35-1902.0.6.el7uek/arch/x86/lib/delay.c: 176
 #7 [ffff9f63c6c073a8] wait_for_xmitr at ffffffff8e510dcc
    /usr/src/debug/kernel-4.14.35/linux-4.14.35-1902.0.6.el7uek/include/linux/nmi.h: 126
 #8 [ffff9f63c6c073d0] serial8250_console_putchar at ffffffff8e510e6c
    /usr/src/debug/kernel-4.14.35/linux-4.14.35-1902.0.6.el7uek/include/linux/serial_core.h: 265
 #9 [ffff9f63c6c073f0] uart_console_write at ffffffff8e509573
    /usr/src/debug/kernel-4.14.35/linux-4.14.35-1902.0.6.el7uek/drivers/tty/serial/serial_core.c: 1886
    /usr/src/debug/kernel-4.14.35/linux-4.14.35-1902.0.6.el7uek/drivers/tty/serial/8250/8250_port.c: 3256
    /usr/src/debug/kernel-4.14.35/linux-4.14.35-1902.0.6.el7uek/drivers/tty/serial/8250/8250_core.c: 598
    /usr/src/debug/kernel-4.14.35/linux-4.14.35-1902.0.6.el7uek/kernel/printk/printk.c: 1574
    /usr/src/debug/kernel-4.14.35/linux-4.14.35-1902.0.6.el7uek/kernel/printk/printk.c: 1766
    /usr/src/debug/kernel-4.14.35/linux-4.14.35-1902.0.6.el7uek/kernel/printk/printk.c: 1808
    /usr/src/debug/kernel-4.14.35/linux-4.14.35-1902.0.6.el7uek/kernel/printk/printk_safe.c: 402
    /usr/src/debug/kernel-4.14.35/linux-4.14.35-1902.0.6.el7uek/kernel/printk/printk.c: 1842
    /usr/src/debug/kernel-4.14.35/linux-4.14.35-1902.0.6.el7uek/net/ipv6/addrconf.c: 3532
    /usr/src/debug/kernel-4.14.35/linux-4.14.35-1902.0.6.el7uek/kernel/notifier.c: 95
    /usr/src/debug/kernel-4.14.35/linux-4.14.35-1902.0.6.el7uek/kernel/notifier.c: 402
    /usr/src/debug/kernel-4.14.35/linux-4.14.35-1902.0.6.el7uek/net/core/dev.c: 1682
    /usr/src/debug/kernel-4.14.35/linux-4.14.35-1902.0.6.el7uek/net/core/dev.c: 1697
    /usr/src/debug/kernel-4.14.35/linux-4.14.35-1902.0.6.el7uek/net/core/dev.c: 6903
    /usr/src/debug/kernel-4.14.35/linux-4.14.35-1902.0.6.el7uek/net/core/rtnetlink.c: 2072
    /usr/src/debug/kernel-4.14.35/linux-4.14.35-1902.0.6.el7uek/net/core/rtnetlink.c: 2624
    /usr/src/debug/kernel-4.14.35/linux-4.14.35-1902.0.6.el7uek/net/core/rtnetlink.c: 4255
    /usr/src/debug/kernel-4.14.35/linux-4.14.35-1902.0.6.el7uek/net/netlink/af_netlink.c: 2433
    /usr/src/debug/kernel-4.14.35/linux-4.14.35-1902.0.6.el7uek/net/core/rtnetlink.c: 4268
    /usr/src/debug/kernel-4.14.35/linux-4.14.35-1902.0.6.el7uek/net/netlink/af_netlink.c: 1287
    /usr/src/debug/kernel-4.14.35/linux-4.14.35-1902.0.6.el7uek/net/netlink/af_netlink.c: 1877
    /usr/src/debug/kernel-4.14.35/linux-4.14.35-1902.0.6.el7uek/net/socket.c: 646
    /usr/src/debug/kernel-4.14.35/linux-4.14.35-1902.0.6.el7uek/net/socket.c: 2061
    /usr/src/debug/kernel-4.14.35/linux-4.14.35-1902.0.6.el7uek/include/linux/file.h: 26
    /usr/src/debug/kernel-4.14.35/linux-4.14.35-1902.0.6.el7uek/net/socket.c: 2102
    /usr/src/debug/kernel-4.14.35/linux-4.14.35-1902.0.6.el7uek/arch/x86/entry/common.c: 295
    /usr/src/debug/kernel-4.14.35/linux-4.14.35-1902.0.6.el7uek/arch/x86/entry/entry_64.S: 247
    RIP: 00007faf75ccafd0  RSP: 00007ffc710a9368  RFLAGS: 00000246
    RAX: ffffffffffffffda  RBX: 000000005c65f66d  RCX: 00007faf75ccafd0
    RDX: 0000000000000000  RSI: 00007ffc710a93b0  RDI: 0000000000000003
    RBP: 00007ffc710a93b0   R8: 0000000000000000   R9: 0000000000000008
    R10: 00007ffc710a8f30  R11: 0000000000000246  R12: 0000000000000000
    R13: 000000000066a440  R14: 00007ffc710a9458  R15: 00007ffc710a9b88
    ORIG_RAX: 000000000000002e  CS: 0033  SS: 002b

Orabug: 29631452

Signed-off-by: Shamir Rabinovitch <[email protected]>
Signed-off-by: Aron Silverton <[email protected]>
Reviewed-by: John Haxby <[email protected]>
LinuxMinion pushed a commit that referenced this pull request Sep 6, 2019
…_map

[ Upstream commit 39df730 ]

Detected via gcc's ASan:

  Direct leak of 2048 byte(s) in 64 object(s) allocated from:
    6     #0 0x7f606512e370 in __interceptor_realloc (/usr/lib/x86_64-linux-gnu/libasan.so.5+0xee370)
    7     #1 0x556b0f1d7ddd in thread_map__realloc util/thread_map.c:43
    8     #2 0x556b0f1d84c7 in thread_map__new_by_tid util/thread_map.c:85
    9     #3 0x556b0f0e045e in is_event_supported util/parse-events.c:2250
   10     #4 0x556b0f0e1aa1 in print_hwcache_events util/parse-events.c:2382
   11     #5 0x556b0f0e3231 in print_events util/parse-events.c:2514
   12     #6 0x556b0ee0a66e in cmd_list /home/changbin/work/linux/tools/perf/builtin-list.c:58
   13     #7 0x556b0f01e0ae in run_builtin /home/changbin/work/linux/tools/perf/perf.c:302
   14     #8 0x556b0f01e859 in handle_internal_command /home/changbin/work/linux/tools/perf/perf.c:354
   15     #9 0x556b0f01edc8 in run_argv /home/changbin/work/linux/tools/perf/perf.c:398
   16     #10 0x556b0f01f71f in main /home/changbin/work/linux/tools/perf/perf.c:520
   17     #11 0x7f6062ccf09a in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x2409a)

Signed-off-by: Changbin Du <[email protected]>
Reviewed-by: Jiri Olsa <[email protected]>
Cc: Alexei Starovoitov <[email protected]>
Cc: Daniel Borkmann <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Steven Rostedt (VMware) <[email protected]>
Fixes: 8989605 ("perf tools: Do not put a variable sized type not at the end of a struct")
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
LinuxMinion pushed a commit that referenced this pull request Sep 6, 2019
[ Upstream commit 54569ba ]

Detected with gcc's ASan:

  Direct leak of 66 byte(s) in 5 object(s) allocated from:
      #0 0x7ff3b1f32070 in __interceptor_strdup (/usr/lib/x86_64-linux-gnu/libasan.so.5+0x3b070)
      #1 0x560c8761034d in collect_config util/config.c:597
      #2 0x560c8760d9cb in get_value util/config.c:169
      #3 0x560c8760dfd7 in perf_parse_file util/config.c:285
      #4 0x560c8760e0d2 in perf_config_from_file util/config.c:476
      #5 0x560c876108fd in perf_config_set__init util/config.c:661
      #6 0x560c87610c72 in perf_config_set__new util/config.c:709
      #7 0x560c87610d2f in perf_config__init util/config.c:718
      #8 0x560c87610e5d in perf_config util/config.c:730
      #9 0x560c875ddea0 in main /home/changbin/work/linux/tools/perf/perf.c:442
      #10 0x7ff3afb8609a in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x2409a)

Signed-off-by: Changbin Du <[email protected]>
Reviewed-by: Jiri Olsa <[email protected]>
Cc: Alexei Starovoitov <[email protected]>
Cc: Daniel Borkmann <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Steven Rostedt (VMware) <[email protected]>
Cc: Taeung Song <[email protected]>
Fixes: 20105ca ("perf config: Introduce perf_config_set class")
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
LinuxMinion pushed a commit that referenced this pull request Sep 6, 2019
[ Upstream commit 8bde851 ]

Detected with gcc's ASan:

  Direct leak of 4356 byte(s) in 120 object(s) allocated from:
      #0 0x7ff1a2b5a070 in __interceptor_strdup (/usr/lib/x86_64-linux-gnu/libasan.so.5+0x3b070)
      #1 0x55719aef4814 in build_id_cache__origname util/build-id.c:215
      #2 0x55719af649b6 in print_sdt_events util/parse-events.c:2339
      #3 0x55719af66272 in print_events util/parse-events.c:2542
      #4 0x55719ad1ecaa in cmd_list /home/changbin/work/linux/tools/perf/builtin-list.c:58
      #5 0x55719aec745d in run_builtin /home/changbin/work/linux/tools/perf/perf.c:302
      #6 0x55719aec7d1a in handle_internal_command /home/changbin/work/linux/tools/perf/perf.c:354
      #7 0x55719aec8184 in run_argv /home/changbin/work/linux/tools/perf/perf.c:398
      #8 0x55719aeca41a in main /home/changbin/work/linux/tools/perf/perf.c:520
      #9 0x7ff1a07ae09a in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x2409a)

Signed-off-by: Changbin Du <[email protected]>
Reviewed-by: Jiri Olsa <[email protected]>
Cc: Alexei Starovoitov <[email protected]>
Cc: Daniel Borkmann <[email protected]>
Cc: Masami Hiramatsu <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Steven Rostedt (VMware) <[email protected]>
Fixes: 40218da ("perf list: Show SDT and pre-cached events")
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
LinuxMinion pushed a commit that referenced this pull request Sep 6, 2019
[ Upstream commit 42dfa45 ]

Using gcc's ASan, Changbin reports:

  =================================================================
  ==7494==ERROR: LeakSanitizer: detected memory leaks

  Direct leak of 48 byte(s) in 1 object(s) allocated from:
      #0 0x7f0333a89138 in calloc (/usr/lib/x86_64-linux-gnu/libasan.so.5+0xee138)
      #1 0x5625e5330a5e in zalloc util/util.h:23
      #2 0x5625e5330a9b in perf_counts__new util/counts.c:10
      #3 0x5625e5330ca0 in perf_evsel__alloc_counts util/counts.c:47
      #4 0x5625e520d8e5 in __perf_evsel__read_on_cpu util/evsel.c:1505
      #5 0x5625e517a985 in perf_evsel__read_on_cpu /home/work/linux/tools/perf/util/evsel.h:347
      #6 0x5625e517ad1a in test__openat_syscall_event tests/openat-syscall.c:47
      #7 0x5625e51528e6 in run_test tests/builtin-test.c:358
      #8 0x5625e5152baf in test_and_print tests/builtin-test.c:388
      #9 0x5625e51543fe in __cmd_test tests/builtin-test.c:583
      #10 0x5625e515572f in cmd_test tests/builtin-test.c:722
      #11 0x5625e51c3fb8 in run_builtin /home/changbin/work/linux/tools/perf/perf.c:302
      #12 0x5625e51c44f7 in handle_internal_command /home/changbin/work/linux/tools/perf/perf.c:354
      #13 0x5625e51c48fb in run_argv /home/changbin/work/linux/tools/perf/perf.c:398
      #14 0x5625e51c5069 in main /home/changbin/work/linux/tools/perf/perf.c:520
      #15 0x7f033214d09a in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x2409a)

  Indirect leak of 72 byte(s) in 1 object(s) allocated from:
      #0 0x7f0333a89138 in calloc (/usr/lib/x86_64-linux-gnu/libasan.so.5+0xee138)
      #1 0x5625e532560d in zalloc util/util.h:23
      #2 0x5625e532566b in xyarray__new util/xyarray.c:10
      #3 0x5625e5330aba in perf_counts__new util/counts.c:15
      #4 0x5625e5330ca0 in perf_evsel__alloc_counts util/counts.c:47
      #5 0x5625e520d8e5 in __perf_evsel__read_on_cpu util/evsel.c:1505
      #6 0x5625e517a985 in perf_evsel__read_on_cpu /home/work/linux/tools/perf/util/evsel.h:347
      #7 0x5625e517ad1a in test__openat_syscall_event tests/openat-syscall.c:47
      #8 0x5625e51528e6 in run_test tests/builtin-test.c:358
      #9 0x5625e5152baf in test_and_print tests/builtin-test.c:388
      #10 0x5625e51543fe in __cmd_test tests/builtin-test.c:583
      #11 0x5625e515572f in cmd_test tests/builtin-test.c:722
      #12 0x5625e51c3fb8 in run_builtin /home/changbin/work/linux/tools/perf/perf.c:302
      #13 0x5625e51c44f7 in handle_internal_command /home/changbin/work/linux/tools/perf/perf.c:354
      #14 0x5625e51c48fb in run_argv /home/changbin/work/linux/tools/perf/perf.c:398
      #15 0x5625e51c5069 in main /home/changbin/work/linux/tools/perf/perf.c:520
      #16 0x7f033214d09a in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x2409a)

His patch took care of evsel->prev_raw_counts, but the above backtraces
are about evsel->counts, so fix that instead.

Reported-by: Changbin Du <[email protected]>
Cc: Alexei Starovoitov <[email protected]>
Cc: Daniel Borkmann <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Steven Rostedt (VMware) <[email protected]>
Link: https://lkml.kernel.org/n/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
LinuxMinion pushed a commit that referenced this pull request Sep 6, 2019
…_event_on_all_cpus test

[ Upstream commit 93faa52 ]

  =================================================================
  ==7497==ERROR: LeakSanitizer: detected memory leaks

  Direct leak of 40 byte(s) in 1 object(s) allocated from:
      #0 0x7f0333a88f30 in __interceptor_malloc (/usr/lib/x86_64-linux-gnu/libasan.so.5+0xedf30)
      #1 0x5625e5326213 in cpu_map__trim_new util/cpumap.c:45
      #2 0x5625e5326703 in cpu_map__read util/cpumap.c:103
      #3 0x5625e53267ef in cpu_map__read_all_cpu_map util/cpumap.c:120
      #4 0x5625e5326915 in cpu_map__new util/cpumap.c:135
      #5 0x5625e517b355 in test__openat_syscall_event_on_all_cpus tests/openat-syscall-all-cpus.c:36
      #6 0x5625e51528e6 in run_test tests/builtin-test.c:358
      #7 0x5625e5152baf in test_and_print tests/builtin-test.c:388
      #8 0x5625e51543fe in __cmd_test tests/builtin-test.c:583
      #9 0x5625e515572f in cmd_test tests/builtin-test.c:722
      #10 0x5625e51c3fb8 in run_builtin /home/changbin/work/linux/tools/perf/perf.c:302
      #11 0x5625e51c44f7 in handle_internal_command /home/changbin/work/linux/tools/perf/perf.c:354
      #12 0x5625e51c48fb in run_argv /home/changbin/work/linux/tools/perf/perf.c:398
      #13 0x5625e51c5069 in main /home/changbin/work/linux/tools/perf/perf.c:520
      #14 0x7f033214d09a in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x2409a)

Signed-off-by: Changbin Du <[email protected]>
Reviewed-by: Jiri Olsa <[email protected]>
Cc: Alexei Starovoitov <[email protected]>
Cc: Daniel Borkmann <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Steven Rostedt (VMware) <[email protected]>
Fixes: f30a79b ("perf tools: Add reference counting for cpu_map object")
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
LinuxMinion pushed a commit that referenced this pull request Sep 6, 2019
[ Upstream commit f97a899 ]

  =================================================================
  ==7506==ERROR: LeakSanitizer: detected memory leaks

  Direct leak of 13 byte(s) in 3 object(s) allocated from:
      #0 0x7f03339d6070 in __interceptor_strdup (/usr/lib/x86_64-linux-gnu/libasan.so.5+0x3b070)
      #1 0x5625e53aaef0 in expr__find_other util/expr.y:221
      #2 0x5625e51bcd3f in test__expr tests/expr.c:52
      #3 0x5625e51528e6 in run_test tests/builtin-test.c:358
      #4 0x5625e5152baf in test_and_print tests/builtin-test.c:388
      #5 0x5625e51543fe in __cmd_test tests/builtin-test.c:583
      #6 0x5625e515572f in cmd_test tests/builtin-test.c:722
      #7 0x5625e51c3fb8 in run_builtin /home/changbin/work/linux/tools/perf/perf.c:302
      #8 0x5625e51c44f7 in handle_internal_command /home/changbin/work/linux/tools/perf/perf.c:354
      #9 0x5625e51c48fb in run_argv /home/changbin/work/linux/tools/perf/perf.c:398
      #10 0x5625e51c5069 in main /home/changbin/work/linux/tools/perf/perf.c:520
      #11 0x7f033214d09a in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x2409a)

Signed-off-by: Changbin Du <[email protected]>
Cc: Alexei Starovoitov <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Daniel Borkmann <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Steven Rostedt (VMware) <[email protected]>
Fixes: 0751673 ("perf tools: Add a simple expression parser for JSON")
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
LinuxMinion pushed a commit that referenced this pull request Sep 6, 2019
[ Upstream commit d982b33 ]

  =================================================================
  ==20875==ERROR: LeakSanitizer: detected memory leaks

  Direct leak of 1160 byte(s) in 1 object(s) allocated from:
      #0 0x7f1b6fc84138 in calloc (/usr/lib/x86_64-linux-gnu/libasan.so.5+0xee138)
      #1 0x55bd50005599 in zalloc util/util.h:23
      #2 0x55bd500068f5 in perf_evsel__newtp_idx util/evsel.c:327
      #3 0x55bd4ff810fc in perf_evsel__newtp /home/work/linux/tools/perf/util/evsel.h:216
      #4 0x55bd4ff81608 in test__perf_evsel__tp_sched_test tests/evsel-tp-sched.c:69
      #5 0x55bd4ff528e6 in run_test tests/builtin-test.c:358
      #6 0x55bd4ff52baf in test_and_print tests/builtin-test.c:388
      #7 0x55bd4ff543fe in __cmd_test tests/builtin-test.c:583
      #8 0x55bd4ff5572f in cmd_test tests/builtin-test.c:722
      #9 0x55bd4ffc4087 in run_builtin /home/changbin/work/linux/tools/perf/perf.c:302
      #10 0x55bd4ffc45c6 in handle_internal_command /home/changbin/work/linux/tools/perf/perf.c:354
      #11 0x55bd4ffc49ca in run_argv /home/changbin/work/linux/tools/perf/perf.c:398
      #12 0x55bd4ffc5138 in main /home/changbin/work/linux/tools/perf/perf.c:520
      #13 0x7f1b6e34809a in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x2409a)

  Indirect leak of 19 byte(s) in 1 object(s) allocated from:
      #0 0x7f1b6fc83f30 in __interceptor_malloc (/usr/lib/x86_64-linux-gnu/libasan.so.5+0xedf30)
      #1 0x7f1b6e3ac30f in vasprintf (/lib/x86_64-linux-gnu/libc.so.6+0x8830f)

Signed-off-by: Changbin Du <[email protected]>
Reviewed-by: Jiri Olsa <[email protected]>
Cc: Alexei Starovoitov <[email protected]>
Cc: Daniel Borkmann <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Steven Rostedt (VMware) <[email protected]>
Fixes: 6a6cd11 ("perf test: Add test for the sched tracepoint format fields")
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
LinuxMinion pushed a commit that referenced this pull request Sep 6, 2019
[ Upstream commit c34a838 ]

Clients may submit a new requests from the completion callback
context. The driver was not prepared to receive a request in this
state because it already held the request queue lock and a recursive
lock error is triggered.

Now all completions are queued up until we are ready to drop the queue
lock and then delivered.

The fault was triggered by TCP over an IPsec connection in the LTP
test suite:
  LTP: starting tcp4_ipsec02 (tcp_ipsec.sh -p ah -m transport -s "100 1000 65535")
  BUG: spinlock recursion on CPU#1, genload/943
   lock: 0xbf3c3094, .magic: dead4ead, .owner: genload/943, .owner_cpu: 1
  CPU: 1 PID: 943 Comm: genload Tainted: G           O    4.9.62-axis5-devel #6
  Hardware name: Axis ARTPEC-6 Platform
   (unwind_backtrace) from [<8010d134>] (show_stack+0x18/0x1c)
   (show_stack) from [<803a289c>] (dump_stack+0x84/0x98)
   (dump_stack) from [<8016e164>] (do_raw_spin_lock+0x124/0x128)
   (do_raw_spin_lock) from [<804de1a4>] (artpec6_crypto_submit+0x2c/0xa0)
   (artpec6_crypto_submit) from [<804def38>] (artpec6_crypto_prepare_submit_hash+0xd0/0x54c)
   (artpec6_crypto_prepare_submit_hash) from [<7f3165f0>] (ah_output+0x2a4/0x3dc [ah4])
   (ah_output [ah4]) from [<805df9bc>] (xfrm_output_resume+0x178/0x4a4)
   (xfrm_output_resume) from [<805d283c>] (xfrm4_output+0xac/0xbc)
   (xfrm4_output) from [<80587928>] (ip_queue_xmit+0x140/0x3b4)
   (ip_queue_xmit) from [<805a13b4>] (tcp_transmit_skb+0x4c4/0x95c)
   (tcp_transmit_skb) from [<8059f218>] (tcp_rcv_state_process+0xdf4/0xdfc)
   (tcp_rcv_state_process) from [<805a7530>] (tcp_v4_do_rcv+0x64/0x1ac)
   (tcp_v4_do_rcv) from [<805a9724>] (tcp_v4_rcv+0xa34/0xb74)
   (tcp_v4_rcv) from [<80581d34>] (ip_local_deliver_finish+0x78/0x2b0)
   (ip_local_deliver_finish) from [<8058259c>] (ip_local_deliver+0xe4/0x104)
   (ip_local_deliver) from [<805d23ec>] (xfrm4_transport_finish+0xf4/0x144)
   (xfrm4_transport_finish) from [<805df564>] (xfrm_input+0x4f4/0x74c)
   (xfrm_input) from [<804de420>] (artpec6_crypto_task+0x208/0x38c)
   (artpec6_crypto_task) from [<801271b0>] (tasklet_action+0x60/0xec)
   (tasklet_action) from [<801266d4>] (__do_softirq+0xcc/0x3a4)
   (__do_softirq) from [<80126d20>] (irq_exit+0xf4/0x15c)
   (irq_exit) from [<801741e8>] (__handle_domain_irq+0x68/0xbc)
   (__handle_domain_irq) from [<801014f0>] (gic_handle_irq+0x50/0x94)
   (gic_handle_irq) from [<80657370>] (__irq_usr+0x50/0x80)

Signed-off-by: Lars Persson <[email protected]>
Signed-off-by: Herbert Xu <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
gregmarsden pushed a commit that referenced this pull request Nov 15, 2019
commit 7298e3b upstream.

Currently the calcuation of end_pfn can round up the pfn number to more
than the actual maximum number of pfns, causing an Oops.  Fix this by
ensuring end_pfn is never more than max_pfn.

This can be easily triggered when on systems where the end_pfn gets
rounded up to more than max_pfn using the idle-page stress-ng stress test:

sudo stress-ng --idle-page 0

  BUG: unable to handle kernel paging request at 00000000000020d8
  #PF error: [normal kernel read fault]
  PGD 0 P4D 0
  Oops: 0000 [#1] SMP PTI
  CPU: 1 PID: 11039 Comm: stress-ng-idle- Not tainted 5.0.0-5-generic #6-Ubuntu
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1 04/01/2014
  RIP: 0010:page_idle_get_page+0xc8/0x1a0
  Code: 0f b1 0a 75 7d 48 8b 03 48 89 c2 48 c1 e8 33 83 e0 07 48 c1 ea 36 48 8d 0c 40 4c 8d 24 88 49 c1 e4 07 4c 03 24 d5 00 89 c3 be <49> 8b 44 24 58 48 8d b8 80 a1 02 00 e8 07 d5 77 00 48 8b 53 08 48
  RSP: 0018:ffffafd7c672fde8 EFLAGS: 00010202
  RAX: 0000000000000005 RBX: ffffe36341fff700 RCX: 000000000000000f
  RDX: 0000000000000284 RSI: 0000000000000275 RDI: 0000000001fff700
  RBP: ffffafd7c672fe00 R08: ffffa0bc34056410 R09: 0000000000000276
  R10: ffffa0bc754e9b40 R11: ffffa0bc330f6400 R12: 0000000000002080
  R13: ffffe36341fff700 R14: 0000000000080000 R15: ffffa0bc330f6400
  FS: 00007f0ec1ea5740(0000) GS:ffffa0bc7db00000(0000) knlGS:0000000000000000
  CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: 00000000000020d8 CR3: 0000000077d68000 CR4: 00000000000006e0
  Call Trace:
    page_idle_bitmap_write+0x8c/0x140
    sysfs_kf_bin_write+0x5c/0x70
    kernfs_fop_write+0x12e/0x1b0
    __vfs_write+0x1b/0x40
    vfs_write+0xab/0x1b0
    ksys_write+0x55/0xc0
    __x64_sys_write+0x1a/0x20
    do_syscall_64+0x5a/0x110
    entry_SYSCALL_64_after_hwframe+0x44/0xa9

Link: http://lkml.kernel.org/r/[email protected]
Fixes: 33c3fc7 ("mm: introduce idle page tracking")
Signed-off-by: Colin Ian King <[email protected]>
Reviewed-by: Andrew Morton <[email protected]>
Acked-by: Vladimir Davydov <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Mike Rapoport <[email protected]>
Cc: Mel Gorman <[email protected]>
Cc: Stephen Rothwell <[email protected]>
Cc: Andrey Ryabinin <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
gregmarsden pushed a commit that referenced this pull request Nov 22, 2019
…erycap

[ Upstream commit 5d2e73a ]

SyzKaller hit the null pointer deref while reading from uninitialized
udev->product in zr364xx_vidioc_querycap().

==================================================================
BUG: KASAN: null-ptr-deref in read_word_at_a_time+0xe/0x20
include/linux/compiler.h:274
Read of size 1 at addr 0000000000000000 by task v4l_id/5287

CPU: 1 PID: 5287 Comm: v4l_id Not tainted 5.1.0-rc3-319004-g43151d6 #6
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
Google 01/01/2011
Call Trace:
  __dump_stack lib/dump_stack.c:77 [inline]
  dump_stack+0xe8/0x16e lib/dump_stack.c:113
  kasan_report.cold+0x5/0x3c mm/kasan/report.c:321
  read_word_at_a_time+0xe/0x20 include/linux/compiler.h:274
  strscpy+0x8a/0x280 lib/string.c:207
  zr364xx_vidioc_querycap+0xb5/0x210 drivers/media/usb/zr364xx/zr364xx.c:706
  v4l_querycap+0x12b/0x340 drivers/media/v4l2-core/v4l2-ioctl.c:1062
  __video_do_ioctl+0x5bb/0xb40 drivers/media/v4l2-core/v4l2-ioctl.c:2874
  video_usercopy+0x44e/0xf00 drivers/media/v4l2-core/v4l2-ioctl.c:3056
  v4l2_ioctl+0x14e/0x1a0 drivers/media/v4l2-core/v4l2-dev.c:364
  vfs_ioctl fs/ioctl.c:46 [inline]
  file_ioctl fs/ioctl.c:509 [inline]
  do_vfs_ioctl+0xced/0x12f0 fs/ioctl.c:696
  ksys_ioctl+0xa0/0xc0 fs/ioctl.c:713
  __do_sys_ioctl fs/ioctl.c:720 [inline]
  __se_sys_ioctl fs/ioctl.c:718 [inline]
  __x64_sys_ioctl+0x74/0xb0 fs/ioctl.c:718
  do_syscall_64+0xcf/0x4f0 arch/x86/entry/common.c:290
  entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x7f3b56d8b347
Code: 90 90 90 48 8b 05 f1 fa 2a 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff
ff c3 90 90 90 90 90 90 90 90 90 90 b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff
ff 73 01 c3 48 8b 0d c1 fa 2a 00 31 d2 48 29 c2 64
RSP: 002b:00007ffe005d5d68 EFLAGS: 00000202 ORIG_RAX: 0000000000000010
RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00007f3b56d8b347
RDX: 00007ffe005d5d70 RSI: 0000000080685600 RDI: 0000000000000003
RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000202 R12: 0000000000400884
R13: 00007ffe005d5ec0 R14: 0000000000000000 R15: 0000000000000000
==================================================================

For this device udev->product is not initialized and accessing it causes a NULL pointer deref.

The fix is to check for NULL before strscpy() and copy empty string, if
product is NULL

Reported-by: [email protected]
Signed-off-by: Vandana BN <[email protected]>
Signed-off-by: Hans Verkuil <[email protected]>
Signed-off-by: Mauro Carvalho Chehab <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
(cherry picked from commit dbabee0cac1e3c5502ed0e9298226d81ca71a441)

Orabug: 30532773
CVE: CVE-2019-15217

Signed-off-by: Larry Bassel <[email protected]>
Reviewed-by: John Donnelly <[email protected]>
Signed-off-by: Somasundaram Krishnasamy <[email protected]>
gregmarsden pushed a commit that referenced this pull request Nov 22, 2019
…erycap

[ Upstream commit 5d2e73a ]

SyzKaller hit the null pointer deref while reading from uninitialized
udev->product in zr364xx_vidioc_querycap().

==================================================================
BUG: KASAN: null-ptr-deref in read_word_at_a_time+0xe/0x20
include/linux/compiler.h:274
Read of size 1 at addr 0000000000000000 by task v4l_id/5287

CPU: 1 PID: 5287 Comm: v4l_id Not tainted 5.1.0-rc3-319004-g43151d6 #6
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
Google 01/01/2011
Call Trace:
  __dump_stack lib/dump_stack.c:77 [inline]
  dump_stack+0xe8/0x16e lib/dump_stack.c:113
  kasan_report.cold+0x5/0x3c mm/kasan/report.c:321
  read_word_at_a_time+0xe/0x20 include/linux/compiler.h:274
  strscpy+0x8a/0x280 lib/string.c:207
  zr364xx_vidioc_querycap+0xb5/0x210 drivers/media/usb/zr364xx/zr364xx.c:706
  v4l_querycap+0x12b/0x340 drivers/media/v4l2-core/v4l2-ioctl.c:1062
  __video_do_ioctl+0x5bb/0xb40 drivers/media/v4l2-core/v4l2-ioctl.c:2874
  video_usercopy+0x44e/0xf00 drivers/media/v4l2-core/v4l2-ioctl.c:3056
  v4l2_ioctl+0x14e/0x1a0 drivers/media/v4l2-core/v4l2-dev.c:364
  vfs_ioctl fs/ioctl.c:46 [inline]
  file_ioctl fs/ioctl.c:509 [inline]
  do_vfs_ioctl+0xced/0x12f0 fs/ioctl.c:696
  ksys_ioctl+0xa0/0xc0 fs/ioctl.c:713
  __do_sys_ioctl fs/ioctl.c:720 [inline]
  __se_sys_ioctl fs/ioctl.c:718 [inline]
  __x64_sys_ioctl+0x74/0xb0 fs/ioctl.c:718
  do_syscall_64+0xcf/0x4f0 arch/x86/entry/common.c:290
  entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x7f3b56d8b347
Code: 90 90 90 48 8b 05 f1 fa 2a 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff
ff c3 90 90 90 90 90 90 90 90 90 90 b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff
ff 73 01 c3 48 8b 0d c1 fa 2a 00 31 d2 48 29 c2 64
RSP: 002b:00007ffe005d5d68 EFLAGS: 00000202 ORIG_RAX: 0000000000000010
RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00007f3b56d8b347
RDX: 00007ffe005d5d70 RSI: 0000000080685600 RDI: 0000000000000003
RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000202 R12: 0000000000400884
R13: 00007ffe005d5ec0 R14: 0000000000000000 R15: 0000000000000000
==================================================================

For this device udev->product is not initialized and accessing it causes a NULL pointer deref.

The fix is to check for NULL before strscpy() and copy empty string, if
product is NULL

Reported-by: [email protected]
Signed-off-by: Vandana BN <[email protected]>
Signed-off-by: Hans Verkuil <[email protected]>
Signed-off-by: Mauro Carvalho Chehab <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
(cherry picked from commit dbabee0cac1e3c5502ed0e9298226d81ca71a441)

Orabug: 30532771
CVE: CVE-2019-15217

Signed-off-by: Larry Bassel <[email protected]>
Reviewed-by: John Donnelly <[email protected]>
Signed-off-by: Somasundaram Krishnasamy <[email protected]>
gregmarsden pushed a commit that referenced this pull request Nov 29, 2019
commit d0a255e upstream.

A deadlock with this stacktrace was observed.

The loop thread does a GFP_KERNEL allocation, it calls into dm-bufio
shrinker and the shrinker depends on I/O completion in the dm-bufio
subsystem.

In order to fix the deadlock (and other similar ones), we set the flag
PF_MEMALLOC_NOIO at loop thread entry.

PID: 474    TASK: ffff8813e11f4600  CPU: 10  COMMAND: "kswapd0"
   #0 [ffff8813dedfb938] __schedule at ffffffff8173f405
   #1 [ffff8813dedfb990] schedule at ffffffff8173fa27
   #2 [ffff8813dedfb9b0] schedule_timeout at ffffffff81742fec
   #3 [ffff8813dedfba60] io_schedule_timeout at ffffffff8173f186
   #4 [ffff8813dedfbaa0] bit_wait_io at ffffffff8174034f
   #5 [ffff8813dedfbac0] __wait_on_bit at ffffffff8173fec8
   #6 [ffff8813dedfbb10] out_of_line_wait_on_bit at ffffffff8173ff81
   #7 [ffff8813dedfbb90] __make_buffer_clean at ffffffffa038736f [dm_bufio]
   #8 [ffff8813dedfbbb0] __try_evict_buffer at ffffffffa0387bb8 [dm_bufio]
   #9 [ffff8813dedfbbd0] dm_bufio_shrink_scan at ffffffffa0387cc3 [dm_bufio]
  #10 [ffff8813dedfbc40] shrink_slab at ffffffff811a87ce
  #11 [ffff8813dedfbd30] shrink_zone at ffffffff811ad778
  #12 [ffff8813dedfbdc0] kswapd at ffffffff811ae92f
  #13 [ffff8813dedfbec0] kthread at ffffffff810a8428
  #14 [ffff8813dedfbf50] ret_from_fork at ffffffff81745242

  PID: 14127  TASK: ffff881455749c00  CPU: 11  COMMAND: "loop1"
   #0 [ffff88272f5af228] __schedule at ffffffff8173f405
   #1 [ffff88272f5af280] schedule at ffffffff8173fa27
   #2 [ffff88272f5af2a0] schedule_preempt_disabled at ffffffff8173fd5e
   #3 [ffff88272f5af2b0] __mutex_lock_slowpath at ffffffff81741fb5
   #4 [ffff88272f5af330] mutex_lock at ffffffff81742133
   #5 [ffff88272f5af350] dm_bufio_shrink_count at ffffffffa03865f9 [dm_bufio]
   #6 [ffff88272f5af380] shrink_slab at ffffffff811a86bd
   #7 [ffff88272f5af470] shrink_zone at ffffffff811ad778
   #8 [ffff88272f5af500] do_try_to_free_pages at ffffffff811adb34
   #9 [ffff88272f5af590] try_to_free_pages at ffffffff811adef8
  #10 [ffff88272f5af610] __alloc_pages_nodemask at ffffffff811a09c3
  #11 [ffff88272f5af710] alloc_pages_current at ffffffff811e8b71
  #12 [ffff88272f5af760] new_slab at ffffffff811f4523
  #13 [ffff88272f5af7b0] __slab_alloc at ffffffff8173a1b5
  #14 [ffff88272f5af880] kmem_cache_alloc at ffffffff811f484b
  #15 [ffff88272f5af8d0] do_blockdev_direct_IO at ffffffff812535b3
  #16 [ffff88272f5afb00] __blockdev_direct_IO at ffffffff81255dc3
  #17 [ffff88272f5afb30] xfs_vm_direct_IO at ffffffffa01fe3fc [xfs]
  #18 [ffff88272f5afb90] generic_file_read_iter at ffffffff81198994
  #19 [ffff88272f5afc50] __dta_xfs_file_read_iter_2398 at ffffffffa020c970 [xfs]
  #20 [ffff88272f5afcc0] lo_rw_aio at ffffffffa0377042 [loop]
  #21 [ffff88272f5afd70] loop_queue_work at ffffffffa0377c3b [loop]
  #22 [ffff88272f5afe60] kthread_worker_fn at ffffffff810a8a0c
  #23 [ffff88272f5afec0] kthread at ffffffff810a8428
  #24 [ffff88272f5aff50] ret_from_fork at ffffffff81745242

Signed-off-by: Mikulas Patocka <[email protected]>
Cc: [email protected]
Signed-off-by: Jens Axboe <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
gregmarsden pushed a commit that referenced this pull request Nov 29, 2019
commit cf3591e upstream.

Revert the commit bd293d0. The proper
fix has been made available with commit d0a255e ("loop: set
PF_MEMALLOC_NOIO for the worker thread").

Note that the fix offered by commit bd293d0 doesn't really prevent
the deadlock from occuring - if we look at the stacktrace reported by
Junxiao Bi, we see that it hangs in bit_wait_io and not on the mutex -
i.e. it has already successfully taken the mutex. Changing the mutex
from mutex_lock to mutex_trylock won't help with deadlocks that happen
afterwards.

PID: 474    TASK: ffff8813e11f4600  CPU: 10  COMMAND: "kswapd0"
   #0 [ffff8813dedfb938] __schedule at ffffffff8173f405
   #1 [ffff8813dedfb990] schedule at ffffffff8173fa27
   #2 [ffff8813dedfb9b0] schedule_timeout at ffffffff81742fec
   #3 [ffff8813dedfba60] io_schedule_timeout at ffffffff8173f186
   #4 [ffff8813dedfbaa0] bit_wait_io at ffffffff8174034f
   #5 [ffff8813dedfbac0] __wait_on_bit at ffffffff8173fec8
   #6 [ffff8813dedfbb10] out_of_line_wait_on_bit at ffffffff8173ff81
   #7 [ffff8813dedfbb90] __make_buffer_clean at ffffffffa038736f [dm_bufio]
   #8 [ffff8813dedfbbb0] __try_evict_buffer at ffffffffa0387bb8 [dm_bufio]
   #9 [ffff8813dedfbbd0] dm_bufio_shrink_scan at ffffffffa0387cc3 [dm_bufio]
  #10 [ffff8813dedfbc40] shrink_slab at ffffffff811a87ce
  #11 [ffff8813dedfbd30] shrink_zone at ffffffff811ad778
  #12 [ffff8813dedfbdc0] kswapd at ffffffff811ae92f
  #13 [ffff8813dedfbec0] kthread at ffffffff810a8428
  #14 [ffff8813dedfbf50] ret_from_fork at ffffffff81745242

Signed-off-by: Mikulas Patocka <[email protected]>
Cc: [email protected]
Fixes: bd293d0 ("dm bufio: fix deadlock with loop device")
Depends-on: d0a255e ("loop: set PF_MEMALLOC_NOIO for the worker thread")
Signed-off-by: Mike Snitzer <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
gregmarsden pushed a commit that referenced this pull request Nov 29, 2019
commit 314eed3 upstream.

When running on a system with >512MB RAM with a 32-bit kernel built with:

	CONFIG_DEBUG_VIRTUAL=y
	CONFIG_HIGHMEM=y
	CONFIG_HARDENED_USERCOPY=y

all execve()s will fail due to argv copying into kmap()ed pages, and on
usercopy checking the calls ultimately of virt_to_page() will be looking
for "bad" kmap (highmem) pointers due to CONFIG_DEBUG_VIRTUAL=y:

 ------------[ cut here ]------------
 kernel BUG at ../arch/x86/mm/physaddr.c:83!
 invalid opcode: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
 CPU: 1 PID: 1 Comm: swapper/0 Not tainted 5.3.0-rc8 #6
 Hardware name: Dell Inc. Inspiron 1318/0C236D, BIOS A04 01/15/2009
 EIP: __phys_addr+0xaf/0x100
 ...
 Call Trace:
  __check_object_size+0xaf/0x3c0
  ? __might_sleep+0x80/0xa0
  copy_strings+0x1c2/0x370
  copy_strings_kernel+0x2b/0x40
  __do_execve_file+0x4ca/0x810
  ? kmem_cache_alloc+0x1c7/0x370
  do_execve+0x1b/0x20
  ...

The check is from arch/x86/mm/physaddr.c:

	VIRTUAL_BUG_ON((phys_addr >> PAGE_SHIFT) > max_low_pfn);

Due to the kmap() in fs/exec.c:

		kaddr = kmap(kmapped_page);
	...
	if (copy_from_user(kaddr+offset, str, bytes_to_copy)) ...

Now we can fetch the correct page to avoid the pfn check. In both cases,
hardened usercopy will need to walk the page-span checker (if enabled)
to do sanity checking.

Reported-by: Randy Dunlap <[email protected]>
Tested-by: Randy Dunlap <[email protected]>
Fixes: f5509cc ("mm: Hardened usercopy")
Cc: Matthew Wilcox <[email protected]>
Cc: [email protected]
Signed-off-by: Kees Cook <[email protected]>
Reviewed-by: Matthew Wilcox (Oracle) <[email protected]>
Link: https://lore.kernel.org/r/201909171056.7F2FFD17@keescook
Signed-off-by: Greg Kroah-Hartman <[email protected]>
oraclelinuxkernel pushed a commit that referenced this pull request Oct 20, 2023
The following call trace shows a deadlock issue due to recursive locking of
mutex "device_mutex". First lock acquire is in target_for_each_device() and
second in target_free_device().

 PID: 148266   TASK: ffff8be21ffb5d00  CPU: 10   COMMAND: "iscsi_ttx"
  #0 [ffffa2bfc9ec3b18] __schedule at ffffffffa8060e7f
  #1 [ffffa2bfc9ec3ba0] schedule at ffffffffa8061224
  #2 [ffffa2bfc9ec3bb8] schedule_preempt_disabled at ffffffffa80615ee
  #3 [ffffa2bfc9ec3bc8] __mutex_lock at ffffffffa8062fd7
  #4 [ffffa2bfc9ec3c40] __mutex_lock_slowpath at ffffffffa80631d3
  #5 [ffffa2bfc9ec3c50] mutex_lock at ffffffffa806320c
  #6 [ffffa2bfc9ec3c68] target_free_device at ffffffffc0935998 [target_core_mod]
  #7 [ffffa2bfc9ec3c90] target_core_dev_release at ffffffffc092f975 [target_core_mod]
  #8 [ffffa2bfc9ec3ca0] config_item_put at ffffffffa79d250f
  #9 [ffffa2bfc9ec3cd0] config_item_put at ffffffffa79d2583
 #10 [ffffa2bfc9ec3ce0] target_devices_idr_iter at ffffffffc0933f3a [target_core_mod]
 #11 [ffffa2bfc9ec3d00] idr_for_each at ffffffffa803f6fc
 #12 [ffffa2bfc9ec3d60] target_for_each_device at ffffffffc0935670 [target_core_mod]
 #13 [ffffa2bfc9ec3d98] transport_deregister_session at ffffffffc0946408 [target_core_mod]
 #14 [ffffa2bfc9ec3dc8] iscsit_close_session at ffffffffc09a44a6 [iscsi_target_mod]
 #15 [ffffa2bfc9ec3df0] iscsit_close_connection at ffffffffc09a4a88 [iscsi_target_mod]
 #16 [ffffa2bfc9ec3df8] finish_task_switch at ffffffffa76e5d07
 #17 [ffffa2bfc9ec3e78] iscsit_take_action_for_connection_exit at ffffffffc0991c23 [iscsi_target_mod]
 #18 [ffffa2bfc9ec3ea0] iscsi_target_tx_thread at ffffffffc09a403b [iscsi_target_mod]
 #19 [ffffa2bfc9ec3f08] kthread at ffffffffa76d8080
 #20 [ffffa2bfc9ec3f50] ret_from_fork at ffffffffa8200364

Fixes: 36d4cb4 ("scsi: target: Avoid that EXTENDED COPY commands trigger lock inversion")
Signed-off-by: Junxiao Bi <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Reviewed-by: Mike Christie <[email protected]>
Signed-off-by: Martin K. Petersen <[email protected]>
(cherry picked from commit a154f5f)

Orabug: 35761341
Signed-off-by: Junxiao Bi <[email protected]>
Signed-off-by: Sherry Yang <[email protected]>
oraclelinuxkernel pushed a commit that referenced this pull request Nov 10, 2023
[ Upstream commit a154f5f ]

The following call trace shows a deadlock issue due to recursive locking of
mutex "device_mutex". First lock acquire is in target_for_each_device() and
second in target_free_device().

 PID: 148266   TASK: ffff8be21ffb5d00  CPU: 10   COMMAND: "iscsi_ttx"
  #0 [ffffa2bfc9ec3b18] __schedule at ffffffffa8060e7f
  #1 [ffffa2bfc9ec3ba0] schedule at ffffffffa8061224
  #2 [ffffa2bfc9ec3bb8] schedule_preempt_disabled at ffffffffa80615ee
  #3 [ffffa2bfc9ec3bc8] __mutex_lock at ffffffffa8062fd7
  #4 [ffffa2bfc9ec3c40] __mutex_lock_slowpath at ffffffffa80631d3
  #5 [ffffa2bfc9ec3c50] mutex_lock at ffffffffa806320c
  #6 [ffffa2bfc9ec3c68] target_free_device at ffffffffc0935998 [target_core_mod]
  #7 [ffffa2bfc9ec3c90] target_core_dev_release at ffffffffc092f975 [target_core_mod]
  #8 [ffffa2bfc9ec3ca0] config_item_put at ffffffffa79d250f
  #9 [ffffa2bfc9ec3cd0] config_item_put at ffffffffa79d2583
 #10 [ffffa2bfc9ec3ce0] target_devices_idr_iter at ffffffffc0933f3a [target_core_mod]
 #11 [ffffa2bfc9ec3d00] idr_for_each at ffffffffa803f6fc
 #12 [ffffa2bfc9ec3d60] target_for_each_device at ffffffffc0935670 [target_core_mod]
 #13 [ffffa2bfc9ec3d98] transport_deregister_session at ffffffffc0946408 [target_core_mod]
 #14 [ffffa2bfc9ec3dc8] iscsit_close_session at ffffffffc09a44a6 [iscsi_target_mod]
 #15 [ffffa2bfc9ec3df0] iscsit_close_connection at ffffffffc09a4a88 [iscsi_target_mod]
 #16 [ffffa2bfc9ec3df8] finish_task_switch at ffffffffa76e5d07
 #17 [ffffa2bfc9ec3e78] iscsit_take_action_for_connection_exit at ffffffffc0991c23 [iscsi_target_mod]
 #18 [ffffa2bfc9ec3ea0] iscsi_target_tx_thread at ffffffffc09a403b [iscsi_target_mod]
 #19 [ffffa2bfc9ec3f08] kthread at ffffffffa76d8080
 #20 [ffffa2bfc9ec3f50] ret_from_fork at ffffffffa8200364

Fixes: 36d4cb4 ("scsi: target: Avoid that EXTENDED COPY commands trigger lock inversion")
Signed-off-by: Junxiao Bi <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Reviewed-by: Mike Christie <[email protected]>
Signed-off-by: Martin K. Petersen <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
(cherry picked from commit eae6aabc48eea9b75cf50d98ddf9a19401d87adf)
oraclelinuxkernel pushed a commit that referenced this pull request Jan 19, 2024
[ Upstream commit 97cf796 ]

KASAN reported a UAF bug when I was running xfs/235:

 BUG: KASAN: use-after-free in xlog_recover_process_intents+0xa77/0xae0 [xfs]
 Read of size 8 at addr ffff88804391b360 by task mount/5680

 CPU: 2 PID: 5680 Comm: mount Not tainted 6.0.0-xfsx #6.0.0 77e7b52a4943a975441e5ac90a5ad7748b7867f6
 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.15.0-1 04/01/2014
 Call Trace:
  <TASK>
  dump_stack_lvl+0x34/0x44
  print_report.cold+0x2cc/0x682
  kasan_report+0xa3/0x120
  xlog_recover_process_intents+0xa77/0xae0 [xfs fb841c7180aad3f8359438576e27867f5795667e]
  xlog_recover_finish+0x7d/0x970 [xfs fb841c7180aad3f8359438576e27867f5795667e]
  xfs_log_mount_finish+0x2d7/0x5d0 [xfs fb841c7180aad3f8359438576e27867f5795667e]
  xfs_mountfs+0x11d4/0x1d10 [xfs fb841c7180aad3f8359438576e27867f5795667e]
  xfs_fs_fill_super+0x13d5/0x1a80 [xfs fb841c7180aad3f8359438576e27867f5795667e]
  get_tree_bdev+0x3da/0x6e0
  vfs_get_tree+0x7d/0x240
  path_mount+0xdd3/0x17d0
  __x64_sys_mount+0x1fa/0x270
  do_syscall_64+0x2b/0x80
  entry_SYSCALL_64_after_hwframe+0x46/0xb0
 RIP: 0033:0x7ff5bc069eae
 Code: 48 8b 0d 85 1f 0f 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 49 89 ca b8 a5 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 52 1f 0f 00 f7 d8 64 89 01 48
 RSP: 002b:00007ffe433fd448 EFLAGS: 00000246 ORIG_RAX: 00000000000000a5
 RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007ff5bc069eae
 RDX: 00005575d7213290 RSI: 00005575d72132d0 RDI: 00005575d72132b0
 RBP: 00005575d7212fd0 R08: 00005575d7213230 R09: 00005575d7213fe0
 R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
 R13: 00005575d7213290 R14: 00005575d72132b0 R15: 00005575d7212fd0
  </TASK>

 Allocated by task 5680:
  kasan_save_stack+0x1e/0x40
  __kasan_slab_alloc+0x66/0x80
  kmem_cache_alloc+0x152/0x320
  xfs_rui_init+0x17a/0x1b0 [xfs]
  xlog_recover_rui_commit_pass2+0xb9/0x2e0 [xfs]
  xlog_recover_items_pass2+0xe9/0x220 [xfs]
  xlog_recover_commit_trans+0x673/0x900 [xfs]
  xlog_recovery_process_trans+0xbe/0x130 [xfs]
  xlog_recover_process_data+0x103/0x2a0 [xfs]
  xlog_do_recovery_pass+0x548/0xc60 [xfs]
  xlog_do_log_recovery+0x62/0xc0 [xfs]
  xlog_do_recover+0x73/0x480 [xfs]
  xlog_recover+0x229/0x460 [xfs]
  xfs_log_mount+0x284/0x640 [xfs]
  xfs_mountfs+0xf8b/0x1d10 [xfs]
  xfs_fs_fill_super+0x13d5/0x1a80 [xfs]
  get_tree_bdev+0x3da/0x6e0
  vfs_get_tree+0x7d/0x240
  path_mount+0xdd3/0x17d0
  __x64_sys_mount+0x1fa/0x270
  do_syscall_64+0x2b/0x80
  entry_SYSCALL_64_after_hwframe+0x46/0xb0

 Freed by task 5680:
  kasan_save_stack+0x1e/0x40
  kasan_set_track+0x21/0x30
  kasan_set_free_info+0x20/0x30
  ____kasan_slab_free+0x144/0x1b0
  slab_free_freelist_hook+0xab/0x180
  kmem_cache_free+0x1f1/0x410
  xfs_rud_item_release+0x33/0x80 [xfs]
  xfs_trans_free_items+0xc3/0x220 [xfs]
  xfs_trans_cancel+0x1fa/0x590 [xfs]
  xfs_rui_item_recover+0x913/0xd60 [xfs]
  xlog_recover_process_intents+0x24e/0xae0 [xfs]
  xlog_recover_finish+0x7d/0x970 [xfs]
  xfs_log_mount_finish+0x2d7/0x5d0 [xfs]
  xfs_mountfs+0x11d4/0x1d10 [xfs]
  xfs_fs_fill_super+0x13d5/0x1a80 [xfs]
  get_tree_bdev+0x3da/0x6e0
  vfs_get_tree+0x7d/0x240
  path_mount+0xdd3/0x17d0
  __x64_sys_mount+0x1fa/0x270
  do_syscall_64+0x2b/0x80
  entry_SYSCALL_64_after_hwframe+0x46/0xb0

 The buggy address belongs to the object at ffff88804391b300
  which belongs to the cache xfs_rui_item of size 688
 The buggy address is located 96 bytes inside of
  688-byte region [ffff88804391b300, ffff88804391b5b0)

 The buggy address belongs to the physical page:
 page:ffffea00010e4600 refcount:1 mapcount:0 mapping:0000000000000000 index:0xffff888043919320 pfn:0x43918
 head:ffffea00010e4600 order:2 compound_mapcount:0 compound_pincount:0
 flags: 0x4fff80000010200(slab|head|node=1|zone=1|lastcpupid=0xfff)
 raw: 04fff80000010200 0000000000000000 dead000000000122 ffff88807f0eadc0
 raw: ffff888043919320 0000000080140010 00000001ffffffff 0000000000000000
 page dumped because: kasan: bad access detected

 Memory state around the buggy address:
  ffff88804391b200: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
  ffff88804391b280: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
 >ffff88804391b300: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
                                                        ^
  ffff88804391b380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
  ffff88804391b400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
 ==================================================================

The test fuzzes an rmap btree block and starts writer threads to induce
a filesystem shutdown on the corrupt block.  When the filesystem is
remounted, recovery will try to replay the committed rmap intent item,
but the corruption problem causes the recovery transaction to fail.
Cancelling the transaction frees the RUD, which frees the RUI that we
recovered.

When we return to xlog_recover_process_intents, @lip is now a dangling
pointer, and we cannot use it to find the iop_recover method for the
tracepoint.  Hence we must store the item ops before calling
->iop_recover if we want to give it to the tracepoint so that the trace
data will tell us exactly which intent item failed.

Signed-off-by: Darrick J. Wong <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Signed-off-by: Leah Rumancik <[email protected]>
Acked-by: Chandan Babu R <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
(cherry picked from commit 5212d586e76f1331caa700b26956bb44a5814557)
Signed-off-by: Jack Vogel <[email protected]>
oraclelinuxkernel pushed a commit that referenced this pull request Jan 19, 2024
[ Upstream commit e3e82fc ]

When creating ceq_0 during probing irdma, cqp.sc_cqp will be sent as a
cqp_request to cqp->sc_cqp.sq_ring. If the request is pending when
removing the irdma driver or unplugging its aux device, cqp.sc_cqp will be
dereferenced as wrong struct in irdma_free_pending_cqp_request().

  PID: 3669   TASK: ffff88aef892c000  CPU: 28  COMMAND: "kworker/28:0"
   #0 [fffffe0000549e38] crash_nmi_callback at ffffffff810e3a34
   #1 [fffffe0000549e40] nmi_handle at ffffffff810788b2
   #2 [fffffe0000549ea0] default_do_nmi at ffffffff8107938f
   #3 [fffffe0000549eb8] do_nmi at ffffffff81079582
   #4 [fffffe0000549ef0] end_repeat_nmi at ffffffff82e016b4
      [exception RIP: native_queued_spin_lock_slowpath+1291]
      RIP: ffffffff8127e72b  RSP: ffff88aa841ef778  RFLAGS: 00000046
      RAX: 0000000000000000  RBX: ffff88b01f849700  RCX: ffffffff8127e47e
      RDX: 0000000000000000  RSI: 0000000000000004  RDI: ffffffff83857ec0
      RBP: ffff88afe3e4efc8   R8: ffffed15fc7c9dfa   R9: ffffed15fc7c9dfa
      R10: 0000000000000001  R11: ffffed15fc7c9df9  R12: 0000000000740000
      R13: ffff88b01f849708  R14: 0000000000000003  R15: ffffed1603f092e1
      ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0000
  -- <NMI exception stack> --
   #5 [ffff88aa841ef778] native_queued_spin_lock_slowpath at ffffffff8127e72b
   #6 [ffff88aa841ef7b0] _raw_spin_lock_irqsave at ffffffff82c22aa4
   #7 [ffff88aa841ef7c8] __wake_up_common_lock at ffffffff81257363
   #8 [ffff88aa841ef888] irdma_free_pending_cqp_request at ffffffffa0ba12cc [irdma]
   #9 [ffff88aa841ef958] irdma_cleanup_pending_cqp_op at ffffffffa0ba1469 [irdma]
   #10 [ffff88aa841ef9c0] irdma_ctrl_deinit_hw at ffffffffa0b2989f [irdma]
   #11 [ffff88aa841efa28] irdma_remove at ffffffffa0b252df [irdma]
   #12 [ffff88aa841efae8] auxiliary_bus_remove at ffffffff8219afdb
   #13 [ffff88aa841efb00] device_release_driver_internal at ffffffff821882e6
   #14 [ffff88aa841efb38] bus_remove_device at ffffffff82184278
   #15 [ffff88aa841efb88] device_del at ffffffff82179d23
   #16 [ffff88aa841efc48] ice_unplug_aux_dev at ffffffffa0eb1c14 [ice]
   #17 [ffff88aa841efc68] ice_service_task at ffffffffa0d88201 [ice]
   #18 [ffff88aa841efde8] process_one_work at ffffffff811c589a
   #19 [ffff88aa841efe60] worker_thread at ffffffff811c71ff
   #20 [ffff88aa841eff10] kthread at ffffffff811d87a0
   #21 [ffff88aa841eff50] ret_from_fork at ffffffff82e0022f

Fixes: 44d9e52 ("RDMA/irdma: Implement device initialization definitions")
Link: https://lore.kernel.org/r/[email protected]
Suggested-by: "Ismail, Mustafa" <[email protected]>
Signed-off-by: Shifeng Li <[email protected]>
Reviewed-by: Shiraz Saleem <[email protected]>
Signed-off-by: Jason Gunthorpe <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
(cherry picked from commit 0511a9c56e5854958021de15967d879ceac5d821)
Signed-off-by: Jack Vogel <[email protected]>
oraclelinuxkernel pushed a commit that referenced this pull request Feb 2, 2024
…te_call_indirect

commit f5d03da upstream.

kprobe_emulate_call_indirect currently uses int3_emulate_call to emulate
indirect calls. However, int3_emulate_call always assumes the size of
the call to be 5 bytes when calculating the return address. This is
incorrect for register-based indirect calls in x86, which can be either
2 or 3 bytes depending on whether REX prefix is used. At kprobe runtime,
the incorrect return address causes control flow to land onto the wrong
place after return -- possibly not a valid instruction boundary. This
can lead to a panic like the following:

[    7.308204][    C1] BUG: unable to handle page fault for address: 000000000002b4d8
[    7.308883][    C1] #PF: supervisor read access in kernel mode
[    7.309168][    C1] #PF: error_code(0x0000) - not-present page
[    7.309461][    C1] PGD 0 P4D 0
[    7.309652][    C1] Oops: 0000 [#1] SMP
[    7.309929][    C1] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 6.7.0-rc5-trace-for-next #6
[    7.310397][    C1] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.0-20220807_005459-localhost 04/01/2014
[    7.311068][    C1] RIP: 0010:__common_interrupt+0x52/0xc0
[    7.311349][    C1] Code: 01 00 4d 85 f6 74 39 49 81 fe 00 f0 ff ff 77 30 4c 89 f7 4d 8b 5e 68 41 ba 91 76 d8 42 45 03 53 fc 74 02 0f 0b cc ff d3 65 48 <8b> 05 30 c7 ff 7e 65 4c 89 3d 28 c7 ff 7e 5b 41 5c 41 5e 41 5f c3
[    7.312512][    C1] RSP: 0018:ffffc900000e0fd0 EFLAGS: 00010046
[    7.312899][    C1] RAX: 0000000000000001 RBX: 0000000000000023 RCX: 0000000000000001
[    7.313334][    C1] RDX: 00000000000003cd RSI: 0000000000000001 RDI: ffff888100d302a4
[    7.313702][    C1] RBP: 0000000000000001 R08: 0ef439818636191f R09: b1621ff338a3b482
[    7.314146][    C1] R10: ffffffff81e5127b R11: ffffffff81059810 R12: 0000000000000023
[    7.314509][    C1] R13: 0000000000000000 R14: ffff888100d30200 R15: 0000000000000000
[    7.314951][    C1] FS:  0000000000000000(0000) GS:ffff88813bc80000(0000) knlGS:0000000000000000
[    7.315396][    C1] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    7.315691][    C1] CR2: 000000000002b4d8 CR3: 0000000003028003 CR4: 0000000000370ef0
[    7.316153][    C1] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[    7.316508][    C1] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[    7.316948][    C1] Call Trace:
[    7.317123][    C1]  <IRQ>
[    7.317279][    C1]  ? __die_body+0x64/0xb0
[    7.317482][    C1]  ? page_fault_oops+0x248/0x370
[    7.317712][    C1]  ? __wake_up+0x96/0xb0
[    7.317964][    C1]  ? exc_page_fault+0x62/0x130
[    7.318211][    C1]  ? asm_exc_page_fault+0x22/0x30
[    7.318444][    C1]  ? __cfi_native_send_call_func_single_ipi+0x10/0x10
[    7.318860][    C1]  ? default_idle+0xb/0x10
[    7.319063][    C1]  ? __common_interrupt+0x52/0xc0
[    7.319330][    C1]  common_interrupt+0x78/0x90
[    7.319546][    C1]  </IRQ>
[    7.319679][    C1]  <TASK>
[    7.319854][    C1]  asm_common_interrupt+0x22/0x40
[    7.320082][    C1] RIP: 0010:default_idle+0xb/0x10
[    7.320309][    C1] Code: 4c 01 c7 4c 29 c2 e9 72 ff ff ff cc cc cc cc 90 90 90 90 90 90 90 90 90 90 90 b8 0c 67 40 a5 66 90 0f 00 2d 09 b9 3b 00 fb f4 <fa> c3 0f 1f 00 90 90 90 90 90 90 90 90 90 90 90 b8 0c 67 40 a5 e9
[    7.321449][    C1] RSP: 0018:ffffc9000009bee8 EFLAGS: 00000256
[    7.321808][    C1] RAX: ffff88813bca8b68 RBX: 0000000000000001 RCX: 000000000001ef0c
[    7.322227][    C1] RDX: 0000000000000000 RSI: 0000000000000001 RDI: 000000000001ef0c
[    7.322656][    C1] RBP: ffffc9000009bef8 R08: 8000000000000000 R09: 00000000000008c2
[    7.323083][    C1] R10: 0000000000000000 R11: ffffffff81058e70 R12: 0000000000000000
[    7.323530][    C1] R13: ffff8881002b30c0 R14: 0000000000000000 R15: 0000000000000000
[    7.323948][    C1]  ? __cfi_lapic_next_deadline+0x10/0x10
[    7.324239][    C1]  default_idle_call+0x31/0x50
[    7.324464][    C1]  do_idle+0xd3/0x240
[    7.324690][    C1]  cpu_startup_entry+0x25/0x30
[    7.324983][    C1]  start_secondary+0xb4/0xc0
[    7.325217][    C1]  secondary_startup_64_no_verify+0x179/0x17b
[    7.325498][    C1]  </TASK>
[    7.325641][    C1] Modules linked in:
[    7.325906][    C1] CR2: 000000000002b4d8
[    7.326104][    C1] ---[ end trace 0000000000000000 ]---
[    7.326354][    C1] RIP: 0010:__common_interrupt+0x52/0xc0
[    7.326614][    C1] Code: 01 00 4d 85 f6 74 39 49 81 fe 00 f0 ff ff 77 30 4c 89 f7 4d 8b 5e 68 41 ba 91 76 d8 42 45 03 53 fc 74 02 0f 0b cc ff d3 65 48 <8b> 05 30 c7 ff 7e 65 4c 89 3d 28 c7 ff 7e 5b 41 5c 41 5e 41 5f c3
[    7.327570][    C1] RSP: 0018:ffffc900000e0fd0 EFLAGS: 00010046
[    7.327910][    C1] RAX: 0000000000000001 RBX: 0000000000000023 RCX: 0000000000000001
[    7.328273][    C1] RDX: 00000000000003cd RSI: 0000000000000001 RDI: ffff888100d302a4
[    7.328632][    C1] RBP: 0000000000000001 R08: 0ef439818636191f R09: b1621ff338a3b482
[    7.329223][    C1] R10: ffffffff81e5127b R11: ffffffff81059810 R12: 0000000000000023
[    7.329780][    C1] R13: 0000000000000000 R14: ffff888100d30200 R15: 0000000000000000
[    7.330193][    C1] FS:  0000000000000000(0000) GS:ffff88813bc80000(0000) knlGS:0000000000000000
[    7.330632][    C1] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    7.331050][    C1] CR2: 000000000002b4d8 CR3: 0000000003028003 CR4: 0000000000370ef0
[    7.331454][    C1] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[    7.331854][    C1] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[    7.332236][    C1] Kernel panic - not syncing: Fatal exception in interrupt
[    7.332730][    C1] Kernel Offset: disabled
[    7.333044][    C1] ---[ end Kernel panic - not syncing: Fatal exception in interrupt ]---

The relevant assembly code is (from objdump, faulting address
highlighted):

ffffffff8102ed9d:       41 ff d3                  call   *%r11
ffffffff8102eda0:       65 48 <8b> 05 30 c7 ff    mov    %gs:0x7effc730(%rip),%rax

The emulation incorrectly sets the return address to be ffffffff8102ed9d
+ 0x5 = ffffffff8102eda2, which is the 8b byte in the middle of the next
mov. This in turn causes incorrect subsequent instruction decoding and
eventually triggers the page fault above.

Instead of invoking int3_emulate_call, perform push and jmp emulation
directly in kprobe_emulate_call_indirect. At this point we can obtain
the instruction size from p->ainsn.size so that we can calculate the
correct return address.

Link: https://lore.kernel.org/all/[email protected]/

Fixes: 6256e66 ("x86/kprobes: Use int3 instead of debug trap for single-step")
Cc: [email protected]
Signed-off-by: Jinghao Jia <[email protected]>
Signed-off-by: Masami Hiramatsu (Google) <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
(cherry picked from commit 1d5c1617e1e186c3b4fbbed2b8a93787b32df0c0)
Signed-off-by: Vijayendra Suman <[email protected]>
oraclelinuxkernel pushed a commit that referenced this pull request Mar 8, 2024
[ Upstream commit fc3a553 ]

An issue occurred while reading an ELF file in libbpf.c during fuzzing:

	Program received signal SIGSEGV, Segmentation fault.
	0x0000000000958e97 in bpf_object.collect_prog_relos () at libbpf.c:4206
	4206 in libbpf.c
	(gdb) bt
	#0 0x0000000000958e97 in bpf_object.collect_prog_relos () at libbpf.c:4206
	#1 0x000000000094f9d6 in bpf_object.collect_relos () at libbpf.c:6706
	#2 0x000000000092bef3 in bpf_object_open () at libbpf.c:7437
	#3 0x000000000092c046 in bpf_object.open_mem () at libbpf.c:7497
	#4 0x0000000000924afa in LLVMFuzzerTestOneInput () at fuzz/bpf-object-fuzzer.c:16
	#5 0x000000000060be11 in testblitz_engine::fuzzer::Fuzzer::run_one ()
	#6 0x000000000087ad92 in tracing::span::Span::in_scope ()
	#7 0x00000000006078aa in testblitz_engine::fuzzer::util::walkdir ()
	#8 0x00000000005f3217 in testblitz_engine::entrypoint::main::{{closure}} ()
	#9 0x00000000005f2601 in main ()
	(gdb)

scn_data was null at this code(tools/lib/bpf/src/libbpf.c):

	if (rel->r_offset % BPF_INSN_SZ || rel->r_offset >= scn_data->d_size) {

The scn_data is derived from the code above:

	scn = elf_sec_by_idx(obj, sec_idx);
	scn_data = elf_sec_data(obj, scn);

	relo_sec_name = elf_sec_str(obj, shdr->sh_name);
	sec_name = elf_sec_name(obj, scn);
	if (!relo_sec_name || !sec_name)// don't check whether scn_data is NULL
		return -EINVAL;

In certain special scenarios, such as reading a malformed ELF file,
it is possible that scn_data may be a null pointer

Signed-off-by: Mingyi Zhang <[email protected]>
Signed-off-by: Xin Liu <[email protected]>
Signed-off-by: Changye Wu <[email protected]>
Signed-off-by: Andrii Nakryiko <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>
Acked-by: Daniel Borkmann <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]
Signed-off-by: Sasha Levin <[email protected]>
(cherry picked from commit 90dbf4535668042fac0d7201ce9e2c8c770c578a)
Signed-off-by: Vijayendra Suman <[email protected]>
oraclelinuxkernel pushed a commit that referenced this pull request May 10, 2024
systems, using igb driver, crash while executing poweroff command
as per following call stack:

crash> bt -a
PID: 62583    TASK: ffff97ebbf28dc40  CPU: 0    COMMAND: "poweroff"
 #0 [ffffa7adcd64f8a0] machine_kexec at ffffffffa606c7c1
 #1 [ffffa7adcd64f900] __crash_kexec at ffffffffa613bb52
 #2 [ffffa7adcd64f9d0] panic at ffffffffa6099c45
 #3 [ffffa7adcd64fa50] oops_end at ffffffffa603359a
 #4 [ffffa7adcd64fa78] die at ffffffffa6033c32
 #5 [ffffa7adcd64faa8] do_trap at ffffffffa60309a0
 #6 [ffffa7adcd64faf8] do_error_trap at ffffffffa60311e7
 #7 [ffffa7adcd64fbc0] do_invalid_op at ffffffffa6031320
 #8 [ffffa7adcd64fbd0] invalid_op at ffffffffa6a01f2a
    [exception RIP: free_msi_irqs+408]
    RIP: ffffffffa645d248  RSP: ffffa7adcd64fc88  RFLAGS: 00010286
    RAX: ffff97eb1396fe00  RBX: 0000000000000000  RCX: ffff97eb1396fe00
    RDX: ffff97eb1396fe00  RSI: 0000000000000000  RDI: 0000000000000000
    RBP: ffffa7adcd64fcb0   R8: 0000000000000002   R9: 000000000000fbff
    R10: 0000000000000000  R11: 0000000000000000  R12: ffff98c047af4720
    R13: ffff97eb87cd32a0  R14: ffff97eb87cd3000  R15: ffffa7adcd64fd57
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
 #9 [ffffa7adcd64fc80] free_msi_irqs at ffffffffa645d0fc
 #10 [ffffa7adcd64fcb8] pci_disable_msix at ffffffffa645d896
 #11 [ffffa7adcd64fce0] igb_reset_interrupt_capability at ffffffffc024f335 [igb]
 #12 [ffffa7adcd64fd08] __igb_shutdown at ffffffffc0258ed7 [igb]
 #13 [ffffa7adcd64fd48] igb_shutdown at ffffffffc025908b [igb]
 #14 [ffffa7adcd64fd70] pci_device_shutdown at ffffffffa6441e3a
 #15 [ffffa7adcd64fd98] device_shutdown at ffffffffa6570260
 #16 [ffffa7adcd64fdc8] kernel_power_off at ffffffffa60c0725
 #17 [ffffa7adcd64fdd8] SYSC_reboot at ffffffffa60c08f1
 #18 [ffffa7adcd64ff18] sys_reboot at ffffffffa60c09ee
 #19 [ffffa7adcd64ff28] do_syscall_64 at ffffffffa6003ca9
 #20 [ffffa7adcd64ff50] entry_SYSCALL_64_after_hwframe at ffffffffa6a001b1

This happens because igb_shutdown has not yet freed up allocated irqs and
free_msi_irqs finds irq_has_action true for involved msi irqs here and this
condition triggers BUG_ON.

Freeing irqs before proceeding further in igb_clear_interrupt_scheme,
fixes this problem.

This issue does not happen in v5.17 or later kernel versions because
'commit 9fb9eb4 ("PCI/MSI: Let core code free MSI descriptors")',
explicitly frees up MSI based irqs and hence indirectly fixes this issue
as well. But this change is dependent on framework for runtime extension
of MSI-X irqs and including these changes will break the kABI because it
changes some core data structures like pci_device, device and others.

So in kernels prior to v5.17 we need to have this change to fix this issue,
without breaking the kABI.

Orabug: 36547249

Signed-off-by: Imran Khan <[email protected]>
Reviewed-by: Junxiao Bi <[email protected]>
Signed-off-by: Brian Maly <[email protected]>
oraclelinuxkernel pushed a commit that referenced this pull request May 17, 2024
systems, using igb driver, crash while executing poweroff command
as per following call stack:

crash> bt -a
PID: 62583    TASK: ffff97ebbf28dc40  CPU: 0    COMMAND: "poweroff"
 #0 [ffffa7adcd64f8a0] machine_kexec at ffffffffa606c7c1
 #1 [ffffa7adcd64f900] __crash_kexec at ffffffffa613bb52
 #2 [ffffa7adcd64f9d0] panic at ffffffffa6099c45
 #3 [ffffa7adcd64fa50] oops_end at ffffffffa603359a
 #4 [ffffa7adcd64fa78] die at ffffffffa6033c32
 #5 [ffffa7adcd64faa8] do_trap at ffffffffa60309a0
 #6 [ffffa7adcd64faf8] do_error_trap at ffffffffa60311e7
 #7 [ffffa7adcd64fbc0] do_invalid_op at ffffffffa6031320
 #8 [ffffa7adcd64fbd0] invalid_op at ffffffffa6a01f2a
    [exception RIP: free_msi_irqs+408]
    RIP: ffffffffa645d248  RSP: ffffa7adcd64fc88  RFLAGS: 00010286
    RAX: ffff97eb1396fe00  RBX: 0000000000000000  RCX: ffff97eb1396fe00
    RDX: ffff97eb1396fe00  RSI: 0000000000000000  RDI: 0000000000000000
    RBP: ffffa7adcd64fcb0   R8: 0000000000000002   R9: 000000000000fbff
    R10: 0000000000000000  R11: 0000000000000000  R12: ffff98c047af4720
    R13: ffff97eb87cd32a0  R14: ffff97eb87cd3000  R15: ffffa7adcd64fd57
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
 #9 [ffffa7adcd64fc80] free_msi_irqs at ffffffffa645d0fc
 #10 [ffffa7adcd64fcb8] pci_disable_msix at ffffffffa645d896
 #11 [ffffa7adcd64fce0] igb_reset_interrupt_capability at ffffffffc024f335 [igb]
 #12 [ffffa7adcd64fd08] __igb_shutdown at ffffffffc0258ed7 [igb]
 #13 [ffffa7adcd64fd48] igb_shutdown at ffffffffc025908b [igb]
 #14 [ffffa7adcd64fd70] pci_device_shutdown at ffffffffa6441e3a
 #15 [ffffa7adcd64fd98] device_shutdown at ffffffffa6570260
 #16 [ffffa7adcd64fdc8] kernel_power_off at ffffffffa60c0725
 #17 [ffffa7adcd64fdd8] SYSC_reboot at ffffffffa60c08f1
 #18 [ffffa7adcd64ff18] sys_reboot at ffffffffa60c09ee
 #19 [ffffa7adcd64ff28] do_syscall_64 at ffffffffa6003ca9
 #20 [ffffa7adcd64ff50] entry_SYSCALL_64_after_hwframe at ffffffffa6a001b1

This happens because igb_shutdown has not yet freed up allocated irqs and
free_msi_irqs finds irq_has_action true for involved msi irqs here and this
condition triggers BUG_ON.

Freeing irqs before proceeding further in igb_clear_interrupt_scheme,
fixes this problem.

This issue does not happen in v5.17 or later kernel versions because
'commit 9fb9eb4 ("PCI/MSI: Let core code free MSI descriptors")',
explicitly frees up MSI based irqs and hence indirectly fixes this issue
as well. But this change is dependent on framework for runtime extension
of MSI-X irqs and including these changes will break the kABI because it
changes some core data structures like pci_device, device and others.

So in kernels prior to v5.17 we need to have this change to fix this issue,
without breaking the kABI.

Orabug: 36547250

Signed-off-by: Imran Khan <[email protected]>
Reviewed-by: Junxiao Bi <[email protected]>
Signed-off-by: Sherry Yang <[email protected]>
oraclelinuxkernel pushed a commit that referenced this pull request May 17, 2024
systems, using igb driver, crash while executing poweroff command
as per following call stack:

crash> bt -a
PID: 62583    TASK: ffff97ebbf28dc40  CPU: 0    COMMAND: "poweroff"
 #0 [ffffa7adcd64f8a0] machine_kexec at ffffffffa606c7c1
 #1 [ffffa7adcd64f900] __crash_kexec at ffffffffa613bb52
 #2 [ffffa7adcd64f9d0] panic at ffffffffa6099c45
 #3 [ffffa7adcd64fa50] oops_end at ffffffffa603359a
 #4 [ffffa7adcd64fa78] die at ffffffffa6033c32
 #5 [ffffa7adcd64faa8] do_trap at ffffffffa60309a0
 #6 [ffffa7adcd64faf8] do_error_trap at ffffffffa60311e7
 #7 [ffffa7adcd64fbc0] do_invalid_op at ffffffffa6031320
 #8 [ffffa7adcd64fbd0] invalid_op at ffffffffa6a01f2a
    [exception RIP: free_msi_irqs+408]
    RIP: ffffffffa645d248  RSP: ffffa7adcd64fc88  RFLAGS: 00010286
    RAX: ffff97eb1396fe00  RBX: 0000000000000000  RCX: ffff97eb1396fe00
    RDX: ffff97eb1396fe00  RSI: 0000000000000000  RDI: 0000000000000000
    RBP: ffffa7adcd64fcb0   R8: 0000000000000002   R9: 000000000000fbff
    R10: 0000000000000000  R11: 0000000000000000  R12: ffff98c047af4720
    R13: ffff97eb87cd32a0  R14: ffff97eb87cd3000  R15: ffffa7adcd64fd57
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
 #9 [ffffa7adcd64fc80] free_msi_irqs at ffffffffa645d0fc
 #10 [ffffa7adcd64fcb8] pci_disable_msix at ffffffffa645d896
 #11 [ffffa7adcd64fce0] igb_reset_interrupt_capability at ffffffffc024f335 [igb]
 #12 [ffffa7adcd64fd08] __igb_shutdown at ffffffffc0258ed7 [igb]
 #13 [ffffa7adcd64fd48] igb_shutdown at ffffffffc025908b [igb]
 #14 [ffffa7adcd64fd70] pci_device_shutdown at ffffffffa6441e3a
 #15 [ffffa7adcd64fd98] device_shutdown at ffffffffa6570260
 #16 [ffffa7adcd64fdc8] kernel_power_off at ffffffffa60c0725
 #17 [ffffa7adcd64fdd8] SYSC_reboot at ffffffffa60c08f1
 #18 [ffffa7adcd64ff18] sys_reboot at ffffffffa60c09ee
 #19 [ffffa7adcd64ff28] do_syscall_64 at ffffffffa6003ca9
 #20 [ffffa7adcd64ff50] entry_SYSCALL_64_after_hwframe at ffffffffa6a001b1

This happens because igb_shutdown has not yet freed up allocated irqs and
free_msi_irqs finds irq_has_action true for involved msi irqs here and this
condition triggers BUG_ON.

Freeing irqs before proceeding further in igb_clear_interrupt_scheme,
fixes this problem.

This issue does not happen in v5.17 or later kernel versions because
'commit 9fb9eb4 ("PCI/MSI: Let core code free MSI descriptors")',
explicitly frees up MSI based irqs and hence indirectly fixes this issue
as well. But this change is dependent on framework for runtime extension
of MSI-X irqs and including these changes will break the kABI because it
changes some core data structures like pci_device, device and others.

So in kernels prior to v5.17 we need to have this change to fix this issue,
without breaking the kABI.

Orabug: 36547251

Signed-off-by: Imran Khan <[email protected]>
Reviewed-by: Junxiao Bi <[email protected]>
Signed-off-by: Saeed Mirzamohammadi <[email protected]>
oraclelinuxkernel pushed a commit that referenced this pull request May 24, 2024
[ Upstream commit 1947b92 ]

Parallel testing appears to show a race between allocating and setting
evsel ids. As there is a bounds check on the xyarray it yields a segv
like:

```
AddressSanitizer:DEADLYSIGNAL

=================================================================

==484408==ERROR: AddressSanitizer: SEGV on unknown address 0x000000000010

==484408==The signal is caused by a WRITE memory access.

==484408==Hint: address points to the zero page.

    #0 0x55cef5d4eff4 in perf_evlist__id_hash tools/lib/perf/evlist.c:256
    #1 0x55cef5d4f132 in perf_evlist__id_add tools/lib/perf/evlist.c:274
    #2 0x55cef5d4f545 in perf_evlist__id_add_fd tools/lib/perf/evlist.c:315
    #3 0x55cef5a1923f in store_evsel_ids util/evsel.c:3130
    #4 0x55cef5a19400 in evsel__store_ids util/evsel.c:3147
    #5 0x55cef5888204 in __run_perf_stat tools/perf/builtin-stat.c:832
    #6 0x55cef5888c06 in run_perf_stat tools/perf/builtin-stat.c:960
    #7 0x55cef58932db in cmd_stat tools/perf/builtin-stat.c:2878
...
```

Avoid this crash by early exiting the perf_evlist__id_add_fd and
perf_evlist__id_add is the access is out-of-bounds.

Signed-off-by: Ian Rogers <[email protected]>
Cc: Yang Jihong <[email protected]>
Signed-off-by: Namhyung Kim <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Sasha Levin <[email protected]>
(cherry picked from commit be113e082b65433135b9e85e442fb8c5f0c87534)
Signed-off-by: Vijayendra Suman <[email protected]>
oraclelinuxkernel pushed a commit that referenced this pull request Jun 12, 2024
The cited commit adds a compeletion to remove dependency on rtnl
lock. But it causes a deadlock for multiple encapsulations:

 crash> bt ffff8aece8a64000
 PID: 1514557  TASK: ffff8aece8a64000  CPU: 3    COMMAND: "tc"
  #0 [ffffa6d14183f368] __schedule at ffffffffb8ba7f45
  #1 [ffffa6d14183f3f8] schedule at ffffffffb8ba8418
  #2 [ffffa6d14183f418] schedule_preempt_disabled at ffffffffb8ba8898
  #3 [ffffa6d14183f428] __mutex_lock at ffffffffb8baa7f8
  #4 [ffffa6d14183f4d0] mutex_lock_nested at ffffffffb8baabeb
  #5 [ffffa6d14183f4e0] mlx5e_attach_encap at ffffffffc0f48c17 [mlx5_core]
  #6 [ffffa6d14183f628] mlx5e_tc_add_fdb_flow at ffffffffc0f39680 [mlx5_core]
  #7 [ffffa6d14183f688] __mlx5e_add_fdb_flow at ffffffffc0f3b636 [mlx5_core]
  #8 [ffffa6d14183f6f0] mlx5e_tc_add_flow at ffffffffc0f3bcdf [mlx5_core]
  #9 [ffffa6d14183f728] mlx5e_configure_flower at ffffffffc0f3c1d1 [mlx5_core]
 #10 [ffffa6d14183f790] mlx5e_rep_setup_tc_cls_flower at ffffffffc0f3d529 [mlx5_core]
 #11 [ffffa6d14183f7a0] mlx5e_rep_setup_tc_cb at ffffffffc0f3d714 [mlx5_core]
 #12 [ffffa6d14183f7b0] tc_setup_cb_add at ffffffffb8931bb8
 #13 [ffffa6d14183f810] fl_hw_replace_filter at ffffffffc0dae901 [cls_flower]
 #14 [ffffa6d14183f8d8] fl_change at ffffffffc0db5c57 [cls_flower]
 #15 [ffffa6d14183f970] tc_new_tfilter at ffffffffb8936047
 #16 [ffffa6d14183fac8] rtnetlink_rcv_msg at ffffffffb88c7c31
 #17 [ffffa6d14183fb50] netlink_rcv_skb at ffffffffb8942853
 #18 [ffffa6d14183fbc0] rtnetlink_rcv at ffffffffb88c1835
 #19 [ffffa6d14183fbd0] netlink_unicast at ffffffffb8941f27
 #20 [ffffa6d14183fc18] netlink_sendmsg at ffffffffb8942245
 #21 [ffffa6d14183fc98] sock_sendmsg at ffffffffb887d482
 #22 [ffffa6d14183fcb8] ____sys_sendmsg at ffffffffb887d81a
 #23 [ffffa6d14183fd38] ___sys_sendmsg at ffffffffb88806e2
 #24 [ffffa6d14183fe90] __sys_sendmsg at ffffffffb88807a2
 #25 [ffffa6d14183ff28] __x64_sys_sendmsg at ffffffffb888080f
 #26 [ffffa6d14183ff38] do_syscall_64 at ffffffffb8b9b6a8
 #27 [ffffa6d14183ff50] entry_SYSCALL_64_after_hwframe at ffffffffb8c0007c
 crash> bt 0xffff8aeb07544000
 PID: 1110766  TASK: ffff8aeb07544000  CPU: 0    COMMAND: "kworker/u20:9"
  #0 [ffffa6d14e6b7bd8] __schedule at ffffffffb8ba7f45
  #1 [ffffa6d14e6b7c68] schedule at ffffffffb8ba8418
  #2 [ffffa6d14e6b7c88] schedule_timeout at ffffffffb8baef88
  #3 [ffffa6d14e6b7d10] wait_for_completion at ffffffffb8ba968b
  #4 [ffffa6d14e6b7d60] mlx5e_take_all_encap_flows at ffffffffc0f47ec4 [mlx5_core]
  #5 [ffffa6d14e6b7da0] mlx5e_rep_update_flows at ffffffffc0f3e734 [mlx5_core]
  #6 [ffffa6d14e6b7df8] mlx5e_rep_neigh_update at ffffffffc0f400bb [mlx5_core]
  #7 [ffffa6d14e6b7e50] process_one_work at ffffffffb80acc9c
  #8 [ffffa6d14e6b7ed0] worker_thread at ffffffffb80ad012
  #9 [ffffa6d14e6b7f10] kthread at ffffffffb80b615d
 #10 [ffffa6d14e6b7f50] ret_from_fork at ffffffffb8001b2f

After the first encap is attached, flow will be added to encap
entry's flows list. If neigh update is running at this time, the
following encaps of the flow can't hold the encap_tbl_lock and
sleep. If neigh update thread is waiting for that flow's init_done,
deadlock happens.

Fix it by holding lock outside of the for loop. If neigh update is
running, prevent encap flows from offloading. Since the lock is held
outside of the for loop, concurrent creation of encap entries is not
allowed. So remove unnecessary wait_for_completion call for res_ready.

Fixes: 95435ad ("net/mlx5e: Only access fully initialized flows in neigh update")
Signed-off-by: Chris Mi <[email protected]>
Reviewed-by: Roi Dayan <[email protected]>
Reviewed-by: Vlad Buslov <[email protected]>
Signed-off-by: Saeed Mahameed <[email protected]>

Orabug: 35383105

(cherry picked from commit 37c3b9f)
cherry-pick-repo=kernel/git/torvalds/linux.git
unmodified-from-upstream: 37c3b9f

Signed-off-by: Mikhael Goikhman <[email protected]>
Signed-off-by: Qing Huang <[email protected]>
Reviewed-by: Devesh Sharma <[email protected]>
Signed-off-by: Brian Maly <[email protected]>
oraclelinuxkernel pushed a commit that referenced this pull request Jun 12, 2024
The cited commit holds encap tbl lock unconditionally when setting
up dests. But it may cause the following deadlock:

 PID: 1063722  TASK: ffffa062ca5d0000  CPU: 13   COMMAND: "handler8"
  #0 [ffffb14de05b7368] __schedule at ffffffffa1d5aa91
  #1 [ffffb14de05b7410] schedule at ffffffffa1d5afdb
  #2 [ffffb14de05b7430] schedule_preempt_disabled at ffffffffa1d5b528
  #3 [ffffb14de05b7440] __mutex_lock at ffffffffa1d5d6cb
  #4 [ffffb14de05b74e8] mutex_lock_nested at ffffffffa1d5ddeb
  #5 [ffffb14de05b74f8] mlx5e_tc_tun_encap_dests_set at ffffffffc12f2096 [mlx5_core]
  #6 [ffffb14de05b7568] post_process_attr at ffffffffc12d9fc5 [mlx5_core]
  #7 [ffffb14de05b75a0] mlx5e_tc_add_fdb_flow at ffffffffc12de877 [mlx5_core]
  #8 [ffffb14de05b75f0] __mlx5e_add_fdb_flow at ffffffffc12e0eef [mlx5_core]
  #9 [ffffb14de05b7660] mlx5e_tc_add_flow at ffffffffc12e12f7 [mlx5_core]
 #10 [ffffb14de05b76b8] mlx5e_configure_flower at ffffffffc12e1686 [mlx5_core]
 #11 [ffffb14de05b7720] mlx5e_rep_indr_offload at ffffffffc12e3817 [mlx5_core]
 #12 [ffffb14de05b7730] mlx5e_rep_indr_setup_tc_cb at ffffffffc12e388a [mlx5_core]
 #13 [ffffb14de05b7740] tc_setup_cb_add at ffffffffa1ab2ba8
 #14 [ffffb14de05b77a0] fl_hw_replace_filter at ffffffffc0bdec2f [cls_flower]
 #15 [ffffb14de05b7868] fl_change at ffffffffc0be6caa [cls_flower]
 #16 [ffffb14de05b7908] tc_new_tfilter at ffffffffa1ab71f0

[1031218.028143]  wait_for_completion+0x24/0x30
[1031218.028589]  mlx5e_update_route_decap_flows+0x9a/0x1e0 [mlx5_core]
[1031218.029256]  mlx5e_tc_fib_event_work+0x1ad/0x300 [mlx5_core]
[1031218.029885]  process_one_work+0x24e/0x510

Actually no need to hold encap tbl lock if there is no encap action.
Fix it by checking if encap action exists or not before holding
encap tbl lock.

Fixes: 37c3b9f ("net/mlx5e: Prevent encap offload when neigh update is running")
Signed-off-by: Chris Mi <[email protected]>
Reviewed-by: Vlad Buslov <[email protected]>
Signed-off-by: Saeed Mahameed <[email protected]>

Orabug: 35622106

(cherry picked from commit 93a3319)
cherry-pick-repo=kernel/git/torvalds/linux.git
unmodified-from-upstream: 93a3319

Signed-off-by: Mikhael Goikhman <[email protected]>
Signed-off-by: Qing Huang <[email protected]>
Reviewed-by: Devesh Sharma <[email protected]>
Signed-off-by: Brian Maly <[email protected]>
oraclelinuxkernel pushed a commit that referenced this pull request Jun 14, 2024
[ Upstream commit f8bbc07 ]

vhost_worker will call tun call backs to receive packets. If too many
illegal packets arrives, tun_do_read will keep dumping packet contents.
When console is enabled, it will costs much more cpu time to dump
packet and soft lockup will be detected.

net_ratelimit mechanism can be used to limit the dumping rate.

PID: 33036    TASK: ffff949da6f20000  CPU: 23   COMMAND: "vhost-32980"
 #0 [fffffe00003fce50] crash_nmi_callback at ffffffff89249253
 #1 [fffffe00003fce58] nmi_handle at ffffffff89225fa3
 #2 [fffffe00003fceb0] default_do_nmi at ffffffff8922642e
 #3 [fffffe00003fced0] do_nmi at ffffffff8922660d
 #4 [fffffe00003fcef0] end_repeat_nmi at ffffffff89c01663
    [exception RIP: io_serial_in+20]
    RIP: ffffffff89792594  RSP: ffffa655314979e8  RFLAGS: 00000002
    RAX: ffffffff89792500  RBX: ffffffff8af428a0  RCX: 0000000000000000
    RDX: 00000000000003fd  RSI: 0000000000000005  RDI: ffffffff8af428a0
    RBP: 0000000000002710   R8: 0000000000000004   R9: 000000000000000f
    R10: 0000000000000000  R11: ffffffff8acbf64f  R12: 0000000000000020
    R13: ffffffff8acbf698  R14: 0000000000000058  R15: 0000000000000000
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
 #5 [ffffa655314979e8] io_serial_in at ffffffff89792594
 #6 [ffffa655314979e8] wait_for_xmitr at ffffffff89793470
 #7 [ffffa65531497a08] serial8250_console_putchar at ffffffff897934f6
 #8 [ffffa65531497a20] uart_console_write at ffffffff8978b605
 #9 [ffffa65531497a48] serial8250_console_write at ffffffff89796558
 #10 [ffffa65531497ac8] console_unlock at ffffffff89316124
 #11 [ffffa65531497b10] vprintk_emit at ffffffff89317c07
 #12 [ffffa65531497b68] printk at ffffffff89318306
 #13 [ffffa65531497bc8] print_hex_dump at ffffffff89650765
 #14 [ffffa65531497ca8] tun_do_read at ffffffffc0b06c27 [tun]
 #15 [ffffa65531497d38] tun_recvmsg at ffffffffc0b06e34 [tun]
 #16 [ffffa65531497d68] handle_rx at ffffffffc0c5d682 [vhost_net]
 #17 [ffffa65531497ed0] vhost_worker at ffffffffc0c644dc [vhost]
 #18 [ffffa65531497f10] kthread at ffffffff892d2e72
 #19 [ffffa65531497f50] ret_from_fork at ffffffff89c0022f

Fixes: ef3db4a ("tun: avoid BUG, dump packet on GSO errors")
Signed-off-by: Lei Chen <[email protected]>
Reviewed-by: Willem de Bruijn <[email protected]>
Acked-by: Jason Wang <[email protected]>
Reviewed-by: Eric Dumazet <[email protected]>
Acked-by: Michael S. Tsirkin <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
(cherry picked from commit a50dbeca28acf7051dfa92786b85f704c75db6eb)
Signed-off-by: Vijayendra Suman <[email protected]>
oraclelinuxkernel pushed a commit that referenced this pull request Jun 14, 2024
[ Upstream commit f8bbc07 ]

vhost_worker will call tun call backs to receive packets. If too many
illegal packets arrives, tun_do_read will keep dumping packet contents.
When console is enabled, it will costs much more cpu time to dump
packet and soft lockup will be detected.

net_ratelimit mechanism can be used to limit the dumping rate.

PID: 33036    TASK: ffff949da6f20000  CPU: 23   COMMAND: "vhost-32980"
 #0 [fffffe00003fce50] crash_nmi_callback at ffffffff89249253
 #1 [fffffe00003fce58] nmi_handle at ffffffff89225fa3
 #2 [fffffe00003fceb0] default_do_nmi at ffffffff8922642e
 #3 [fffffe00003fced0] do_nmi at ffffffff8922660d
 #4 [fffffe00003fcef0] end_repeat_nmi at ffffffff89c01663
    [exception RIP: io_serial_in+20]
    RIP: ffffffff89792594  RSP: ffffa655314979e8  RFLAGS: 00000002
    RAX: ffffffff89792500  RBX: ffffffff8af428a0  RCX: 0000000000000000
    RDX: 00000000000003fd  RSI: 0000000000000005  RDI: ffffffff8af428a0
    RBP: 0000000000002710   R8: 0000000000000004   R9: 000000000000000f
    R10: 0000000000000000  R11: ffffffff8acbf64f  R12: 0000000000000020
    R13: ffffffff8acbf698  R14: 0000000000000058  R15: 0000000000000000
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
 #5 [ffffa655314979e8] io_serial_in at ffffffff89792594
 #6 [ffffa655314979e8] wait_for_xmitr at ffffffff89793470
 #7 [ffffa65531497a08] serial8250_console_putchar at ffffffff897934f6
 #8 [ffffa65531497a20] uart_console_write at ffffffff8978b605
 #9 [ffffa65531497a48] serial8250_console_write at ffffffff89796558
 #10 [ffffa65531497ac8] console_unlock at ffffffff89316124
 #11 [ffffa65531497b10] vprintk_emit at ffffffff89317c07
 #12 [ffffa65531497b68] printk at ffffffff89318306
 #13 [ffffa65531497bc8] print_hex_dump at ffffffff89650765
 #14 [ffffa65531497ca8] tun_do_read at ffffffffc0b06c27 [tun]
 #15 [ffffa65531497d38] tun_recvmsg at ffffffffc0b06e34 [tun]
 #16 [ffffa65531497d68] handle_rx at ffffffffc0c5d682 [vhost_net]
 #17 [ffffa65531497ed0] vhost_worker at ffffffffc0c644dc [vhost]
 #18 [ffffa65531497f10] kthread at ffffffff892d2e72
 #19 [ffffa65531497f50] ret_from_fork at ffffffff89c0022f

Fixes: ef3db4a ("tun: avoid BUG, dump packet on GSO errors")
Signed-off-by: Lei Chen <[email protected]>
Reviewed-by: Willem de Bruijn <[email protected]>
Acked-by: Jason Wang <[email protected]>
Reviewed-by: Eric Dumazet <[email protected]>
Acked-by: Michael S. Tsirkin <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
(cherry picked from commit 4b0dcae5c4797bf31c63011ed62917210d3fdac3)
Signed-off-by: Sherry Yang <[email protected]>
oraclelinuxkernel pushed a commit that referenced this pull request Jun 21, 2024
commit d3b17c6 upstream.

Using completion_done to determine whether the caller has gone
away only works after a complete call.  Furthermore it's still
possible that the caller has not yet called wait_for_completion,
resulting in another potential UAF.

Fix this by making the caller use cancel_work_sync and then freeing
the memory safely.

Fixes: 7d42e09 ("crypto: qat - resolve race condition during AER recovery")
Cc: <[email protected]> #6.8+
Signed-off-by: Herbert Xu <[email protected]>
Reviewed-by: Giovanni Cabiddu <[email protected]>
Signed-off-by: Herbert Xu <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
FOF: 0724
Signed-off-by: Alok Tiwari <[email protected]>
oraclelinuxkernel pushed a commit that referenced this pull request Jun 21, 2024
[ Upstream commit f8bbc07 ]

vhost_worker will call tun call backs to receive packets. If too many
illegal packets arrives, tun_do_read will keep dumping packet contents.
When console is enabled, it will costs much more cpu time to dump
packet and soft lockup will be detected.

net_ratelimit mechanism can be used to limit the dumping rate.

PID: 33036    TASK: ffff949da6f20000  CPU: 23   COMMAND: "vhost-32980"
 #0 [fffffe00003fce50] crash_nmi_callback at ffffffff89249253
 #1 [fffffe00003fce58] nmi_handle at ffffffff89225fa3
 #2 [fffffe00003fceb0] default_do_nmi at ffffffff8922642e
 #3 [fffffe00003fced0] do_nmi at ffffffff8922660d
 #4 [fffffe00003fcef0] end_repeat_nmi at ffffffff89c01663
    [exception RIP: io_serial_in+20]
    RIP: ffffffff89792594  RSP: ffffa655314979e8  RFLAGS: 00000002
    RAX: ffffffff89792500  RBX: ffffffff8af428a0  RCX: 0000000000000000
    RDX: 00000000000003fd  RSI: 0000000000000005  RDI: ffffffff8af428a0
    RBP: 0000000000002710   R8: 0000000000000004   R9: 000000000000000f
    R10: 0000000000000000  R11: ffffffff8acbf64f  R12: 0000000000000020
    R13: ffffffff8acbf698  R14: 0000000000000058  R15: 0000000000000000
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
 #5 [ffffa655314979e8] io_serial_in at ffffffff89792594
 #6 [ffffa655314979e8] wait_for_xmitr at ffffffff89793470
 #7 [ffffa65531497a08] serial8250_console_putchar at ffffffff897934f6
 #8 [ffffa65531497a20] uart_console_write at ffffffff8978b605
 #9 [ffffa65531497a48] serial8250_console_write at ffffffff89796558
 #10 [ffffa65531497ac8] console_unlock at ffffffff89316124
 #11 [ffffa65531497b10] vprintk_emit at ffffffff89317c07
 #12 [ffffa65531497b68] printk at ffffffff89318306
 #13 [ffffa65531497bc8] print_hex_dump at ffffffff89650765
 #14 [ffffa65531497ca8] tun_do_read at ffffffffc0b06c27 [tun]
 #15 [ffffa65531497d38] tun_recvmsg at ffffffffc0b06e34 [tun]
 #16 [ffffa65531497d68] handle_rx at ffffffffc0c5d682 [vhost_net]
 #17 [ffffa65531497ed0] vhost_worker at ffffffffc0c644dc [vhost]
 #18 [ffffa65531497f10] kthread at ffffffff892d2e72
 #19 [ffffa65531497f50] ret_from_fork at ffffffff89c0022f

Fixes: ef3db4a ("tun: avoid BUG, dump packet on GSO errors")
Signed-off-by: Lei Chen <[email protected]>
Reviewed-by: Willem de Bruijn <[email protected]>
Acked-by: Jason Wang <[email protected]>
Reviewed-by: Eric Dumazet <[email protected]>
Acked-by: Michael S. Tsirkin <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
(cherry picked from commit 68459b8e3ee554ce71878af9eb69659b9462c588)
Signed-off-by: Vegard Nossum <[email protected]>
(cherry picked from commit eaa8c23a83b5a719ac9bc795481595bbfc02fc18)
Signed-off-by: Yifei Liu <[email protected]>
oraclelinuxkernel pushed a commit that referenced this pull request Jun 21, 2024
commit d3b17c6 upstream.

Using completion_done to determine whether the caller has gone
away only works after a complete call.  Furthermore it's still
possible that the caller has not yet called wait_for_completion,
resulting in another potential UAF.

Fix this by making the caller use cancel_work_sync and then freeing
the memory safely.

Fixes: 7d42e09 ("crypto: qat - resolve race condition during AER recovery")
Cc: <[email protected]> #6.8+
Signed-off-by: Herbert Xu <[email protected]>
Reviewed-by: Giovanni Cabiddu <[email protected]>
Signed-off-by: Herbert Xu <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
(cherry picked from commit 0ce5964b82f212f4df6a9813f09a0b5de15bd9c8)
FOF: 0724
Signed-off-by: Saeed Mirzamohammadi <[email protected]>
mark-nicholson pushed a commit that referenced this pull request Jul 2, 2024
The cited commit adds a compeletion to remove dependency on rtnl
lock. But it causes a deadlock for multiple encapsulations:

 crash> bt ffff8aece8a64000
 PID: 1514557  TASK: ffff8aece8a64000  CPU: 3    COMMAND: "tc"
  #0 [ffffa6d14183f368] __schedule at ffffffffb8ba7f45
  #1 [ffffa6d14183f3f8] schedule at ffffffffb8ba8418
  #2 [ffffa6d14183f418] schedule_preempt_disabled at ffffffffb8ba8898
  #3 [ffffa6d14183f428] __mutex_lock at ffffffffb8baa7f8
  #4 [ffffa6d14183f4d0] mutex_lock_nested at ffffffffb8baabeb
  #5 [ffffa6d14183f4e0] mlx5e_attach_encap at ffffffffc0f48c17 [mlx5_core]
  #6 [ffffa6d14183f628] mlx5e_tc_add_fdb_flow at ffffffffc0f39680 [mlx5_core]
  #7 [ffffa6d14183f688] __mlx5e_add_fdb_flow at ffffffffc0f3b636 [mlx5_core]
  #8 [ffffa6d14183f6f0] mlx5e_tc_add_flow at ffffffffc0f3bcdf [mlx5_core]
  #9 [ffffa6d14183f728] mlx5e_configure_flower at ffffffffc0f3c1d1 [mlx5_core]
 #10 [ffffa6d14183f790] mlx5e_rep_setup_tc_cls_flower at ffffffffc0f3d529 [mlx5_core]
 #11 [ffffa6d14183f7a0] mlx5e_rep_setup_tc_cb at ffffffffc0f3d714 [mlx5_core]
 #12 [ffffa6d14183f7b0] tc_setup_cb_add at ffffffffb8931bb8
 #13 [ffffa6d14183f810] fl_hw_replace_filter at ffffffffc0dae901 [cls_flower]
 #14 [ffffa6d14183f8d8] fl_change at ffffffffc0db5c57 [cls_flower]
 #15 [ffffa6d14183f970] tc_new_tfilter at ffffffffb8936047
 #16 [ffffa6d14183fac8] rtnetlink_rcv_msg at ffffffffb88c7c31
 #17 [ffffa6d14183fb50] netlink_rcv_skb at ffffffffb8942853
 #18 [ffffa6d14183fbc0] rtnetlink_rcv at ffffffffb88c1835
 #19 [ffffa6d14183fbd0] netlink_unicast at ffffffffb8941f27
 #20 [ffffa6d14183fc18] netlink_sendmsg at ffffffffb8942245
 #21 [ffffa6d14183fc98] sock_sendmsg at ffffffffb887d482
 #22 [ffffa6d14183fcb8] ____sys_sendmsg at ffffffffb887d81a
 #23 [ffffa6d14183fd38] ___sys_sendmsg at ffffffffb88806e2
 #24 [ffffa6d14183fe90] __sys_sendmsg at ffffffffb88807a2
 #25 [ffffa6d14183ff28] __x64_sys_sendmsg at ffffffffb888080f
 #26 [ffffa6d14183ff38] do_syscall_64 at ffffffffb8b9b6a8
 #27 [ffffa6d14183ff50] entry_SYSCALL_64_after_hwframe at ffffffffb8c0007c
 crash> bt 0xffff8aeb07544000
 PID: 1110766  TASK: ffff8aeb07544000  CPU: 0    COMMAND: "kworker/u20:9"
  #0 [ffffa6d14e6b7bd8] __schedule at ffffffffb8ba7f45
  #1 [ffffa6d14e6b7c68] schedule at ffffffffb8ba8418
  #2 [ffffa6d14e6b7c88] schedule_timeout at ffffffffb8baef88
  #3 [ffffa6d14e6b7d10] wait_for_completion at ffffffffb8ba968b
  #4 [ffffa6d14e6b7d60] mlx5e_take_all_encap_flows at ffffffffc0f47ec4 [mlx5_core]
  #5 [ffffa6d14e6b7da0] mlx5e_rep_update_flows at ffffffffc0f3e734 [mlx5_core]
  #6 [ffffa6d14e6b7df8] mlx5e_rep_neigh_update at ffffffffc0f400bb [mlx5_core]
  #7 [ffffa6d14e6b7e50] process_one_work at ffffffffb80acc9c
  #8 [ffffa6d14e6b7ed0] worker_thread at ffffffffb80ad012
  #9 [ffffa6d14e6b7f10] kthread at ffffffffb80b615d
 #10 [ffffa6d14e6b7f50] ret_from_fork at ffffffffb8001b2f

After the first encap is attached, flow will be added to encap
entry's flows list. If neigh update is running at this time, the
following encaps of the flow can't hold the encap_tbl_lock and
sleep. If neigh update thread is waiting for that flow's init_done,
deadlock happens.

Fix it by holding lock outside of the for loop. If neigh update is
running, prevent encap flows from offloading. Since the lock is held
outside of the for loop, concurrent creation of encap entries is not
allowed. So remove unnecessary wait_for_completion call for res_ready.

Fixes: 95435ad ("net/mlx5e: Only access fully initialized flows in neigh update")
Signed-off-by: Chris Mi <[email protected]>
Reviewed-by: Roi Dayan <[email protected]>
Reviewed-by: Vlad Buslov <[email protected]>
Signed-off-by: Saeed Mahameed <[email protected]>

Orabug: 35383105

(cherry picked from commit 37c3b9f)
cherry-pick-repo=kernel/git/torvalds/linux.git
unmodified-from-upstream: 37c3b9f

Signed-off-by: Mikhael Goikhman <[email protected]>
Signed-off-by: Qing Huang <[email protected]>
Reviewed-by: Devesh Sharma <[email protected]>
Signed-off-by: Brian Maly <[email protected]>
mark-nicholson pushed a commit that referenced this pull request Jul 2, 2024
The cited commit holds encap tbl lock unconditionally when setting
up dests. But it may cause the following deadlock:

 PID: 1063722  TASK: ffffa062ca5d0000  CPU: 13   COMMAND: "handler8"
  #0 [ffffb14de05b7368] __schedule at ffffffffa1d5aa91
  #1 [ffffb14de05b7410] schedule at ffffffffa1d5afdb
  #2 [ffffb14de05b7430] schedule_preempt_disabled at ffffffffa1d5b528
  #3 [ffffb14de05b7440] __mutex_lock at ffffffffa1d5d6cb
  #4 [ffffb14de05b74e8] mutex_lock_nested at ffffffffa1d5ddeb
  #5 [ffffb14de05b74f8] mlx5e_tc_tun_encap_dests_set at ffffffffc12f2096 [mlx5_core]
  #6 [ffffb14de05b7568] post_process_attr at ffffffffc12d9fc5 [mlx5_core]
  #7 [ffffb14de05b75a0] mlx5e_tc_add_fdb_flow at ffffffffc12de877 [mlx5_core]
  #8 [ffffb14de05b75f0] __mlx5e_add_fdb_flow at ffffffffc12e0eef [mlx5_core]
  #9 [ffffb14de05b7660] mlx5e_tc_add_flow at ffffffffc12e12f7 [mlx5_core]
 #10 [ffffb14de05b76b8] mlx5e_configure_flower at ffffffffc12e1686 [mlx5_core]
 #11 [ffffb14de05b7720] mlx5e_rep_indr_offload at ffffffffc12e3817 [mlx5_core]
 #12 [ffffb14de05b7730] mlx5e_rep_indr_setup_tc_cb at ffffffffc12e388a [mlx5_core]
 #13 [ffffb14de05b7740] tc_setup_cb_add at ffffffffa1ab2ba8
 #14 [ffffb14de05b77a0] fl_hw_replace_filter at ffffffffc0bdec2f [cls_flower]
 #15 [ffffb14de05b7868] fl_change at ffffffffc0be6caa [cls_flower]
 #16 [ffffb14de05b7908] tc_new_tfilter at ffffffffa1ab71f0

[1031218.028143]  wait_for_completion+0x24/0x30
[1031218.028589]  mlx5e_update_route_decap_flows+0x9a/0x1e0 [mlx5_core]
[1031218.029256]  mlx5e_tc_fib_event_work+0x1ad/0x300 [mlx5_core]
[1031218.029885]  process_one_work+0x24e/0x510

Actually no need to hold encap tbl lock if there is no encap action.
Fix it by checking if encap action exists or not before holding
encap tbl lock.

Fixes: 37c3b9f ("net/mlx5e: Prevent encap offload when neigh update is running")
Signed-off-by: Chris Mi <[email protected]>
Reviewed-by: Vlad Buslov <[email protected]>
Signed-off-by: Saeed Mahameed <[email protected]>

Orabug: 35622106

(cherry picked from commit 93a3319)
cherry-pick-repo=kernel/git/torvalds/linux.git
unmodified-from-upstream: 93a3319

Signed-off-by: Mikhael Goikhman <[email protected]>
Signed-off-by: Qing Huang <[email protected]>
Reviewed-by: Devesh Sharma <[email protected]>
Signed-off-by: Brian Maly <[email protected]>
oraclelinuxkernel pushed a commit that referenced this pull request Jul 5, 2024
commit d3b17c6 upstream.

Using completion_done to determine whether the caller has gone
away only works after a complete call.  Furthermore it's still
possible that the caller has not yet called wait_for_completion,
resulting in another potential UAF.

Fix this by making the caller use cancel_work_sync and then freeing
the memory safely.

Fixes: 7d42e09 ("crypto: qat - resolve race condition during AER recovery")
Cc: <[email protected]> #6.8+
Signed-off-by: Herbert Xu <[email protected]>
Reviewed-by: Giovanni Cabiddu <[email protected]>
Signed-off-by: Herbert Xu <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
(cherry picked from commit 3fb4601e0db10d4fe25e46f3fa308d40d37366bd)
Signed-off-by: Vijayendra Suman <[email protected]>
oraclelinuxkernel pushed a commit that referenced this pull request Jul 16, 2024
Scenario:
1. Port down and do fail over
2. Ap do rds_bind syscall

PID: 47039  TASK: ffff89887e2fe640  CPU: 47  COMMAND: "kworker/u:6"
 #0 [ffff898e35f159f0] machine_kexec at ffffffff8103abf9
 #1 [ffff898e35f15a60] crash_kexec at ffffffff810b96e3
 #2 [ffff898e35f15b30] oops_end at ffffffff8150f518
 #3 [ffff898e35f15b60] no_context at ffffffff8104854c
 #4 [ffff898e35f15ba0] __bad_area_nosemaphore at ffffffff81048675
 #5 [ffff898e35f15bf0] bad_area_nosemaphore at ffffffff810487d3
 #6 [ffff898e35f15c00] do_page_fault at ffffffff815120b8
 #7 [ffff898e35f15d10] page_fault at ffffffff8150ea95
    [exception RIP: unknown or invalid address]
    RIP: 0000000000000000  RSP: ffff898e35f15dc8  RFLAGS: 00010282
    RAX: 00000000fffffffe  RBX: ffff889b77f6fc00  RCX:ffffffff81c99d88
    RDX: 0000000000000000  RSI: ffff896019ee08e8  RDI:ffff889b77f6fc00
    RBP: ffff898e35f15df0   R8: ffff896019ee08c8  R9:0000000000000000
    R10: 0000000000000400  R11: 0000000000000000  R12:ffff896019ee08c0
    R13: ffff889b77f6fe68  R14: ffffffff81c99d80  R15: ffffffffa022a1e0
    ORIG_RAX: ffffffffffffffff  CS: 0010 SS: 0018
 #8 [ffff898e35f15dc8] cma_ndev_work_handler at ffffffffa022a228 [rdma_cm]
 #9 [ffff898e35f15df8] process_one_work at ffffffff8108a7c6
 #10 [ffff898e35f15e58] worker_thread at ffffffff8108bda0
 #11 [ffff898e35f15ee8] kthread at ffffffff81090fe6

PID: 45659  TASK: ffff880d313d2500  CPU: 31  COMMAND: "oracle_45659_ap"
 #0 [ffff881024ccfc98] __schedule at ffffffff8150bac4
 #1 [ffff881024ccfd40] schedule at ffffffff8150c2cf
 #2 [ffff881024ccfd50] __mutex_lock_slowpath at ffffffff8150cee7
 #3 [ffff881024ccfdc0] mutex_lock at ffffffff8150cdeb
 #4 [ffff881024ccfde0] rdma_destroy_id at ffffffffa022a027 [rdma_cm]
 #5 [ffff881024ccfe10] rds_ib_laddr_check at ffffffffa0357857 [rds_rdma]
 #6 [ffff881024ccfe50] rds_trans_get_preferred at ffffffffa0324c2a [rds]
 #7 [ffff881024ccfe80] rds_bind at ffffffffa031d690 [rds]
 #8 [ffff881024ccfeb0] sys_bind at ffffffff8142a670

PID: 45659                          PID: 47039
rds_ib_laddr_check
  /* create id_priv with a null event_handler */
  rdma_create_id
  rdma_bind_addr
    cma_acquire_dev
      /* add id_priv to cma_dev->id_list */
      cma_attach_to_dev
                                    cma_ndev_work_handler
                                      /* event_hanlder is null */
                                      id_priv->id.event_handler

Orabug: 27530931

Signed-off-by: Guanglei Li <[email protected]>
Signed-off-by: Honglei Wang <[email protected]>
Reviewed-by: Junxiao Bi <[email protected]>
Reviewed-by: Yanjun Zhu <[email protected]>
Reviewed-by: Leon Romanovsky <[email protected]>
Acked-by: Santosh Shilimkar <[email protected]>
Acked-by: Doug Ledford <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
(cherry picked from commit 2c0aa08)
Reviewed-by: Håkon Bugge <[email protected]>
Signed-off-by: Somasundaram Krishnasamy <[email protected]>

Orabug: 33590097

UEK6 => UEK7

(cherry picked from commit 39e0939)
cherry-pick-repo=UEK/production/linux-uek.git

Signed-off-by: Gerd Rausch <[email protected]>
Reviewed-by: William Kucharski <[email protected]>

Orabug: 33590087

UEK7 => LUCI

(cherry picked from commit 7d342f8)
cherry-pick-repo=UEK/production/linux-uek.git

Signed-off-by: Gerd Rausch <[email protected]>
Reviewed-by: William Kucharski <[email protected]>
oraclelinuxkernel pushed a commit that referenced this pull request Jul 16, 2024
The customer hit this crash few times.

PID: 31556  TASK: ffff880f823caa00  CPU: 1   COMMAND: "cellsrv"
 #0 [ffff880f823db850] machine_kexec at ffffffff8105d93c
 #1 [ffff880f823db8b0] crash_kexec at ffffffff811103b3
 #2 [ffff880f823db980] oops_end at ffffffff8101a788
 #3 [ffff880f823db9b0] no_context at ffffffff8106b9cf
 #4 [ffff880f823dba20] __bad_area_nosemaphore at ffffffff8106bc9d
 #5 [ffff880f823dba70] bad_area at ffffffff8106be97
 #6 [ffff880f823dbaa0] __do_page_fault at ffffffff8106c71e
 #7 [ffff880f823dbb00] do_page_fault at ffffffff8106c81f
 #8 [ffff880f823dbb40] page_fault at ffffffff816b5a9f
    [exception RIP: rds_ib_inc_copy_to_user+104]
    RIP: ffffffffa04607b8  RSP: ffff880f823dbbf8  RFLAGS: 00010287
    RAX: 0000000000000340  RBX: 0000000000001000  RCX: 0000000000004000
    RDX: 0000000000001000  RSI: ffff88176cea2000  RDI: ffff8817d291f520
    RBP: ffff880f823dbc48   R8: 0000000000001340   R9: 0000000000001000
    R10: 0000000000001200  R11: ffff880f823dc000  R12: ffff880f823dbed0
    R13: 0000000000000000  R14: 0000000000000000  R15: 0000000000001000
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
 #9 [ffff880f823dbc50] rds_recvmsg at ffffffffa041d837 [rds]

int rds_ib_inc_copy_to_user(struct rds_incoming *inc, struct iov_iter *to)
...
...
        ibinc = container_of(inc, struct rds_ib_incoming, ii_inc);
        frag = list_entry(ibinc->ii_frags.next, struct rds_page_frag, f_item);
        len = be32_to_cpu(inc->i_hdr.h_len);
        sg = frag->f_sg;

        while (iov_iter_count(to) && copied < len) {
                to_copy = min_t(unsigned long, iov_iter_count(to),
                                sg->length - frag_off);
                ...

sg is NULL and it crashes accessing sg->length above.

The cause looks like is due to ic->i_frag_sz returning incorrect value.
16KB when 4KB was expected.

                if (copied % ic->i_frag_sz == 0) {
                        frag = list_entry(frag->f_item.next,
                                          struct rds_page_frag, f_item);
                        frag_off = 0;
                        sg = frag->f_sg;
                }

The other end is using 4KB RDS fragsize (Solaris Super Cluster).
This end is UEK4 (4.1.12-94.8.4.el6uek.x86_64).

The message being copied arrived over 4KB RDS frag size connection.
But during the above check ic->i_frag_sz is 16KB.
This can happen during a reconnect at the connection setup phase.
We start off with ic->i_frag_sz as 16KB. Then settle down at 4KB.

Failing this check
  if (copied % ic->i_frag_sz == 0) {
can result in sg not getting set correctly.

Say, "copied" = 4KB but ic->i_frag_sz is 16KB when it should be 4KB.

During race condition with a reconnect, ic->i_frag_sz can be 16KB
even though once the connection is set up it settled down to 4KB.
It can change from 4KB to 16KB and back to 4KB during connection setup
due to reconnect.

We started seeing this crash after bug 26848749.
But prior to that the same scenario could result in data copied to user
from incorrect "sg" resulting in data corruption.

Orabug: 28748008

Reviewed-by: Rama Nichanamatlu <[email protected]>
Signed-off-by: Venkat Venkatsubra <[email protected]>

Orabug: 33590097

UEK6 => UEK7

(cherry picked from commit 14858a3)
cherry-pick-repo=UEK/production/linux-uek.git

Signed-off-by: Gerd Rausch <[email protected]>
Reviewed-by: William Kucharski <[email protected]>

Orabug: 33590087

UEK7 => LUCI

(cherry picked from commit e86878f)
cherry-pick-repo=UEK/production/linux-uek.git

Signed-off-by: Gerd Rausch <[email protected]>
Reviewed-by: William Kucharski <[email protected]>
oraclelinuxkernel pushed a commit that referenced this pull request Jul 16, 2024
…error

The sequence that leads to this state is as follows.

1) First we see CQ error logged.

Sep 29 22:32:33 dm54cel14 kernel: [471472.784371] mlx4_core
0000:46:00.0: CQ access violation on CQN 000419 syndrome=0x2
vendor_error_syndrome=0x0

2) That is followed by the drop of the associated RDS connection.

Sep 29 22:32:33 dm54cel14 kernel: [471472.784403] RDS/IB: connection
<192.168.54.43,192.168.54.1,0> dropped due to 'qp event'

3) We don't get the WR_FLUSH_ERRs for the posted receive buffers after that.

4) RDS is stuck in rds_ib_conn_shutdown while shutting down that connection.

crash64> bt 62577
PID: 62577  TASK: ffff88143f045400  CPU: 4   COMMAND: "kworker/u224:1"
 #0 [ffff8813663bbb58] __schedule at ffffffff816ab68b
 #1 [ffff8813663bbbb0] schedule at ffffffff816abca7
 #2 [ffff8813663bbbd0] schedule_timeout at ffffffff816aee71
 #3 [ffff8813663bbc80] rds_ib_conn_shutdown at ffffffffa041f7d1 [rds_rdma]
 #4 [ffff8813663bbd10] rds_conn_shutdown at ffffffffa03dc6e2 [rds]
 #5 [ffff8813663bbdb0] rds_shutdown_worker at ffffffffa03e2699 [rds]
 #6 [ffff8813663bbe00] process_one_work at ffffffff8109cda1
 #7 [ffff8813663bbe50] worker_thread at ffffffff8109d92b
 #8 [ffff8813663bbec0] kthread at ffffffff810a304b
 #9 [ffff8813663bbf50] ret_from_fork at ffffffff816b0752
crash64>

It was stuck here in rds_ib_conn_shutdown for ever:

                /* quiesce tx and rx completion before tearing down */
                while (!wait_event_timeout(rds_ib_ring_empty_wait,
                                rds_ib_ring_empty(&ic->i_recv_ring) &&
                                (atomic_read(&ic->i_signaled_sends) == 0),
                                msecs_to_jiffies(5000))) {

                        /* Try to reap pending RX completions every 5 secs */
                        if (!rds_ib_ring_empty(&ic->i_recv_ring)) {
                                spin_lock_bh(&ic->i_rx_lock);
                                rds_ib_rx(ic);
                                spin_unlock_bh(&ic->i_rx_lock);
                        }
                }

The recv ring was not empty.
w_alloc_ptr = 560
w_free_ptr  = 256

This is what Mellanox had to say:
When CQ moves to error (e.g. due to CQ Overrun, CQ Access violation) FW will
generate Async event to notify this error, also the QPs that tries to access
this CQ will be put to error state but will not be flushed since we must not
post CQEs to a broken CQ. The QP that tries to access will also issue an
Async catas event.

In summary we cannot wait for any more WR_FLUSH_ERRs in that state.

Orabug: 29180452

Reviewed-by: Rama Nichanamatlu <[email protected]>
Signed-off-by: Venkat Venkatsubra <[email protected]>

Orabug: 33590097

UEK6 => UEK7

(cherry picked from commit 964cad6)
cherry-pick-repo=UEK/production/linux-uek.git

Signed-off-by: Gerd Rausch <[email protected]>
Reviewed-by: William Kucharski <[email protected]>

Orabug: 33590087

UEK7 => LUCI

(cherry picked from commit e40c8e4)
cherry-pick-repo=UEK/production/linux-uek.git

Signed-off-by: Gerd Rausch <[email protected]>
Reviewed-by: William Kucharski <[email protected]>
oraclelinuxkernel pushed a commit that referenced this pull request Jul 16, 2024
One of our customers reported the following stack.

crash-7.3.0> bt
PID: 250515  TASK: ffff888189482f80  CPU: 1   COMMAND: "vmbackup"
 #0 [ffffc90025017878] die at ffffffff81033c22
 #1 [ffffc900250178a8] do_trap at ffffffff81030990
 #2 [ffffc900250178f8] do_error_trap at ffffffff810311d7
 #3 [ffffc900250179c0] do_invalid_op at ffffffff81031310
 #4 [ffffc900250179d0] invalid_op at ffffffff81a01f2a
    [exception RIP: ocfs2_truncate_rec+1914]
    RIP: ffffffffc1e73b4a  RSP: ffffc90025017a80  RFLAGS: 00010246
    RAX: 0000000000000000  RBX: 0000000000053a75  RCX: 0000000000000000
    RDX: 0000000000000000  RSI: ffff8882d385be08  RDI: ffff8882d385be08
    RBP: ffffc90025017b10   R8: 0000000000000000   R9: 0000000000005900
    R10: 0000000000000001  R11: 0000000000aaaaaa  R12: 0000000000000001
    R13: ffff88829e5a9900  R14: ffffc90025017cf0  R15: 0000000000000000
    ORIG_RAX: ffffffffffffffff  CS: e030  SS: e02b
 #5 [ffffc90025017b18] ocfs2_remove_extent at ffffffffc1e73e6c [ocfs2]
 #6 [ffffc90025017bc8] ocfs2_remove_btree_range at ffffffffc1e745f2 [ocfs2]
 #7 [ffffc90025017c60] ocfs2_commit_truncate at ffffffffc1e75b1f [ocfs2]
 #8 [ffffc90025017d68] __dta_ocfs2_wipe_inode_606 at ffffffffc1e9a3e0 [ocfs2]
 #9 [ffffc90025017dd8] ocfs2_evict_inode at ffffffffc1e9ac10 [ocfs2]
    RIP: 00007f9b26ec8307  RSP: 00007ffc5a193f68  RFLAGS: 00000246
    RAX: ffffffffffffffda  RBX: 0000000000ddd0a0  RCX: 00007f9b26ec8307
    RDX: 0000000000000001  RSI: 00007f9b2719e770  RDI: 0000000001010400
    RBP: 0000000001263d80   R8: 0000000000000000   R9: 00000000012146a0
    R10: 000000000000000d  R11: 0000000000000246  R12: 0000000000ddd0a0
    R13: 00007f9b27ba9595  R14: 00007f9b27ca4a50  R15: 00000000ffffffff
    ORIG_RAX: 0000000000000057  CS: 0033  SS: 002b crash-7.3.0>

This crash resulted due to invalid extent record selected for truncate.

At the top of the function ocfs2_truncate_rec(), the code checks if the
first extent record at the leaf extent list corresponding to the input
path is still empty. In that case the tree is rotated left to get rid of
the empty extent record but this rotation did not happen.

But the function ocfs2_truncate_rec() assumes that the top level call
to ocfs2_rotate_tree_left() to get rid of the empty extent always
succeeds and hence it decrements the input "index" value. This results
in selection of a wrong record for truncate that causes to hit a call to
BUG() with the message "Owner %llu: Invalid record truncate: (%u, %u) ".
The stack above is the panic stack caused due to hitting BUG().

Though the function ocfs2_rotate_tree_left() was intended to get rid of
the first empty record in the extent block, it did not call the function
ocfs2_rotate_rightmost_leaf_left() as it did not find h_next_leaf_blk
in the extentleaf block to be zero, instead, it proceeded to call
__ocfs2_rotate_tree_left(). However the input "index" value was indeed
pointing to the last extent record in the leaf block. The macro
path_leaf_bh() was returning rightmost extent block as per the tree-depth.
and the function ocfs2_find_cpos_for_right_leaf() also found out that
the extent block in question is indeed the rightmost and hence there is
nothing to rotate at the last extent record pointed by the input "index"
value. Hence the extent tree in the leaf block was not totated at all.

Hence, the real reason for the above panic is that the value of the field
h_next_leaf_blk in the right most leaf block was non-zero that caused
the tree not to rotate left resulting in selection of invalid record for
truncate.

The reason why h_next_leaf_blk was not cleared for the last extent block
is still not known and the code changes here is a workaround to avoid
the panic by verifying that the extent block in question is indeed the
rightmost leaf block in the tree and then correcting the invalid
h_next_leaf_blk value. These changes have been verified by the customer
by running the provided rpm in their env.

Orabug: 34393593

Signed-off-by: Gautham Ananthakrishna <[email protected]>
Reviewed-by: Junxiao Bi <[email protected]>
oraclelinuxkernel pushed a commit that referenced this pull request Jul 26, 2024
Add a check to mlx5e_xmit() for shorter frames. A corrupted/malformed
packet, with shorter length can eventually cause system panic further
down in the code path. Avoid it by validating the length and dropping it
at the earliest.

Following is seen in our env with shorter skb->len

crash> bt
PID: 76981    TASK: ff19828cfe508000  CPU: 106  COMMAND: "vhost-76942"
 #0 [ff2d20159b39f2c8] machine_kexec at ffffffffad884801
 #1 [ff2d20159b39f328] __crash_kexec at ffffffffad976142
 #2 [ff2d20159b39f3f8] panic at ffffffffad8b3640
 #3 [ff2d20159b39f4a0] no_context at ffffffffad8954e1
 #4 [ff2d20159b39f518] __bad_area_nosemaphore at ffffffffad8958de
 #5 [ff2d20159b39f578] bad_area_nosemaphore at ffffffffad895a96
 #6 [ff2d20159b39f588] do_kern_addr_fault at ffffffffad89688e
 #7 [ff2d20159b39f5b0] __do_page_fault at ffffffffad896b30
 #8 [ff2d20159b39f618] do_page_fault at ffffffffad896db6
 #9 [ff2d20159b39f650] page_fault at ffffffffae402acd
    [exception RIP: memcpy_erms+6]
    RIP: ffffffffae261ab6  RSP: ff2d20159b39f700  RFLAGS: 00010293
    RAX: ff198291741ecf2e  RBX: ff19828e70d6a100  RCX: fffffffffea1af2b
    RDX: fffffffffffffffd  RSI: ff19828eba6d7e5e  RDI: ff198291757d2000
    RBP: ff2d20159b39f760   R8: ff198291741ecf00   R9: 000000000000037c
    R10: 000000000000003c  R11: ff19828ffe953940  R12: ff198291741ecf20
    R13: ff198267dcb1b600  R14: ff19828eeebb09c0  R15: ff198291741ecf00
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
 #10 [ff2d20159b39f700] mlx5e_sq_xmit_wqe at ffffffffc05c162e [mlx5_core]
 #11 [ff2d20159b39f768] mlx5e_xmit at ffffffffc05c1ca3 [mlx5_core]
 #12 [ff2d20159b39f800] dev_hard_start_xmit at ffffffffae083766
 #13 [ff2d20159b39f860] sch_direct_xmit at ffffffffae0e2564
 #14 [ff2d20159b39f8b0] __qdisc_run at ffffffffae0e294e
 #15 [ff2d20159b39f928] __dev_queue_xmit at ffffffffae083eee
 #16 [ff2d20159b39f9a8] dev_queue_xmit at ffffffffae084370
 #17 [ff2d20159b39f9b8] vlan_dev_hard_start_xmit at ffffffffc2fb6fec [8021q]
 #18 [ff2d20159b39f9d8] dev_hard_start_xmit at ffffffffae083766
 #19 [ff2d20159b39fa38] __dev_queue_xmit at ffffffffae08416a
 #20 [ff2d20159b39fab8] dev_queue_xmit_accel at ffffffffae08438e
 #21 [ff2d20159b39fac8] macvlan_start_xmit at ffffffffc2fc18d9 [macvlan]
 #22 [ff2d20159b39faf0] dev_hard_start_xmit at ffffffffae083766
 #23 [ff2d20159b39fb50] sch_direct_xmit at ffffffffae0e2564
 #24 [ff2d20159b39fba0] __qdisc_run at ffffffffae0e294e
 #25 [ff2d20159b39fc18] __dev_queue_xmit at ffffffffae083c81
 #26 [ff2d20159b39fc90] dev_queue_xmit at ffffffffae084370
 #27 [ff2d20159b39fca0] tap_sendmsg at ffffffffc07206ed [tap]
 #28 [ff2d20159b39fd20] vhost_tx_batch at ffffffffc2fd6590 [vhost_net]
 #29 [ff2d20159b39fd68] handle_tx_copy at ffffffffc2fd70f3 [vhost_net]
 #30 [ff2d20159b39fe80] handle_tx at ffffffffc2fd7651 [vhost_net]
 #31 [ff2d20159b39feb0] handle_tx_kick at ffffffffc2fd76b5 [vhost_net]
 #32 [ff2d20159b39fec0] vhost_worker at ffffffffc12a5be8 [vhost]
 #33 [ff2d20159b39ff08] kthread at ffffffffad8dbfe5
 #34 [ff2d20159b39ff50] ret_from_fork at ffffffffae400364

This change was discussed with Nvidia and they are in agreement.

Orabug: 36879156
CVE: CVE-2024-41090
CVE: CVE-2024-41091

Fixes: e4cf27b ("net/mlx5e: Re-eanble client vlan TX acceleration")
Reported-and-tested-by: Dongli Zhang <[email protected]>
Signed-off-by: Manjunath Patil <[email protected]>
Reviewed-by: Si-Wei Liu <[email protected]>
Reviewed-by: Jack Vogel <[email protected]>
Signed-off-by: Brian Maly <[email protected]>
oraclelinuxkernel pushed a commit that referenced this pull request Jul 26, 2024
Add a check to mlx5e_xmit() for shorter frames. A corrupted/malformed
packet, with shorter length can eventually cause system panic further
down in the code path. Avoid it by validating the length and dropping it
at the earliest.

Following is seen in our env with shorter skb->len

crash> bt
PID: 76981    TASK: ff19828cfe508000  CPU: 106  COMMAND: "vhost-76942"
 #0 [ff2d20159b39f2c8] machine_kexec at ffffffffad884801
 #1 [ff2d20159b39f328] __crash_kexec at ffffffffad976142
 #2 [ff2d20159b39f3f8] panic at ffffffffad8b3640
 #3 [ff2d20159b39f4a0] no_context at ffffffffad8954e1
 #4 [ff2d20159b39f518] __bad_area_nosemaphore at ffffffffad8958de
 #5 [ff2d20159b39f578] bad_area_nosemaphore at ffffffffad895a96
 #6 [ff2d20159b39f588] do_kern_addr_fault at ffffffffad89688e
 #7 [ff2d20159b39f5b0] __do_page_fault at ffffffffad896b30
 #8 [ff2d20159b39f618] do_page_fault at ffffffffad896db6
 #9 [ff2d20159b39f650] page_fault at ffffffffae402acd
    [exception RIP: memcpy_erms+6]
    RIP: ffffffffae261ab6  RSP: ff2d20159b39f700  RFLAGS: 00010293
    RAX: ff198291741ecf2e  RBX: ff19828e70d6a100  RCX: fffffffffea1af2b
    RDX: fffffffffffffffd  RSI: ff19828eba6d7e5e  RDI: ff198291757d2000
    RBP: ff2d20159b39f760   R8: ff198291741ecf00   R9: 000000000000037c
    R10: 000000000000003c  R11: ff19828ffe953940  R12: ff198291741ecf20
    R13: ff198267dcb1b600  R14: ff19828eeebb09c0  R15: ff198291741ecf00
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
 #10 [ff2d20159b39f700] mlx5e_sq_xmit_wqe at ffffffffc05c162e [mlx5_core]
 #11 [ff2d20159b39f768] mlx5e_xmit at ffffffffc05c1ca3 [mlx5_core]
 #12 [ff2d20159b39f800] dev_hard_start_xmit at ffffffffae083766
 #13 [ff2d20159b39f860] sch_direct_xmit at ffffffffae0e2564
 #14 [ff2d20159b39f8b0] __qdisc_run at ffffffffae0e294e
 #15 [ff2d20159b39f928] __dev_queue_xmit at ffffffffae083eee
 #16 [ff2d20159b39f9a8] dev_queue_xmit at ffffffffae084370
 #17 [ff2d20159b39f9b8] vlan_dev_hard_start_xmit at ffffffffc2fb6fec [8021q]
 #18 [ff2d20159b39f9d8] dev_hard_start_xmit at ffffffffae083766
 #19 [ff2d20159b39fa38] __dev_queue_xmit at ffffffffae08416a
 #20 [ff2d20159b39fab8] dev_queue_xmit_accel at ffffffffae08438e
 #21 [ff2d20159b39fac8] macvlan_start_xmit at ffffffffc2fc18d9 [macvlan]
 #22 [ff2d20159b39faf0] dev_hard_start_xmit at ffffffffae083766
 #23 [ff2d20159b39fb50] sch_direct_xmit at ffffffffae0e2564
 #24 [ff2d20159b39fba0] __qdisc_run at ffffffffae0e294e
 #25 [ff2d20159b39fc18] __dev_queue_xmit at ffffffffae083c81
 #26 [ff2d20159b39fc90] dev_queue_xmit at ffffffffae084370
 #27 [ff2d20159b39fca0] tap_sendmsg at ffffffffc07206ed [tap]
 #28 [ff2d20159b39fd20] vhost_tx_batch at ffffffffc2fd6590 [vhost_net]
 #29 [ff2d20159b39fd68] handle_tx_copy at ffffffffc2fd70f3 [vhost_net]
 #30 [ff2d20159b39fe80] handle_tx at ffffffffc2fd7651 [vhost_net]
 #31 [ff2d20159b39feb0] handle_tx_kick at ffffffffc2fd76b5 [vhost_net]
 #32 [ff2d20159b39fec0] vhost_worker at ffffffffc12a5be8 [vhost]
 #33 [ff2d20159b39ff08] kthread at ffffffffad8dbfe5
 #34 [ff2d20159b39ff50] ret_from_fork at ffffffffae400364

This change was discussed with Nvidia and they are in agreement.

Orabug: 36879157
CVE: CVE-2024-41090
CVE: CVE-2024-41091

Fixes: e4cf27b ("net/mlx5e: Re-eanble client vlan TX acceleration")
Reported-and-tested-by: Dongli Zhang <[email protected]>
Signed-off-by: Manjunath Patil <[email protected]>
Reviewed-by: Si-Wei Liu <[email protected]>
Reviewed-by: Jack Vogel <[email protected]>
(cherry picked from commit e7fd2c25dfed19d69e8158ff50d36f90400a7335)
Signed-off-by: Sherry Yang <[email protected]>
oraclelinuxkernel pushed a commit that referenced this pull request Jul 26, 2024
Add a check to mlx5e_xmit() for shorter frames. A corrupted/malformed
packet, with shorter length can eventually cause system panic further
down in the code path. Avoid it by validating the length and dropping it
at the earliest.

Following is seen in our env with shorter skb->len

crash> bt
PID: 76981    TASK: ff19828cfe508000  CPU: 106  COMMAND: "vhost-76942"
 #0 [ff2d20159b39f2c8] machine_kexec at ffffffffad884801
 #1 [ff2d20159b39f328] __crash_kexec at ffffffffad976142
 #2 [ff2d20159b39f3f8] panic at ffffffffad8b3640
 #3 [ff2d20159b39f4a0] no_context at ffffffffad8954e1
 #4 [ff2d20159b39f518] __bad_area_nosemaphore at ffffffffad8958de
 #5 [ff2d20159b39f578] bad_area_nosemaphore at ffffffffad895a96
 #6 [ff2d20159b39f588] do_kern_addr_fault at ffffffffad89688e
 #7 [ff2d20159b39f5b0] __do_page_fault at ffffffffad896b30
 #8 [ff2d20159b39f618] do_page_fault at ffffffffad896db6
 #9 [ff2d20159b39f650] page_fault at ffffffffae402acd
    [exception RIP: memcpy_erms+6]
    RIP: ffffffffae261ab6  RSP: ff2d20159b39f700  RFLAGS: 00010293
    RAX: ff198291741ecf2e  RBX: ff19828e70d6a100  RCX: fffffffffea1af2b
    RDX: fffffffffffffffd  RSI: ff19828eba6d7e5e  RDI: ff198291757d2000
    RBP: ff2d20159b39f760   R8: ff198291741ecf00   R9: 000000000000037c
    R10: 000000000000003c  R11: ff19828ffe953940  R12: ff198291741ecf20
    R13: ff198267dcb1b600  R14: ff19828eeebb09c0  R15: ff198291741ecf00
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
 #10 [ff2d20159b39f700] mlx5e_sq_xmit_wqe at ffffffffc05c162e [mlx5_core]
 #11 [ff2d20159b39f768] mlx5e_xmit at ffffffffc05c1ca3 [mlx5_core]
 #12 [ff2d20159b39f800] dev_hard_start_xmit at ffffffffae083766
 #13 [ff2d20159b39f860] sch_direct_xmit at ffffffffae0e2564
 #14 [ff2d20159b39f8b0] __qdisc_run at ffffffffae0e294e
 #15 [ff2d20159b39f928] __dev_queue_xmit at ffffffffae083eee
 #16 [ff2d20159b39f9a8] dev_queue_xmit at ffffffffae084370
 #17 [ff2d20159b39f9b8] vlan_dev_hard_start_xmit at ffffffffc2fb6fec [8021q]
 #18 [ff2d20159b39f9d8] dev_hard_start_xmit at ffffffffae083766
 #19 [ff2d20159b39fa38] __dev_queue_xmit at ffffffffae08416a
 #20 [ff2d20159b39fab8] dev_queue_xmit_accel at ffffffffae08438e
 #21 [ff2d20159b39fac8] macvlan_start_xmit at ffffffffc2fc18d9 [macvlan]
 #22 [ff2d20159b39faf0] dev_hard_start_xmit at ffffffffae083766
 #23 [ff2d20159b39fb50] sch_direct_xmit at ffffffffae0e2564
 #24 [ff2d20159b39fba0] __qdisc_run at ffffffffae0e294e
 #25 [ff2d20159b39fc18] __dev_queue_xmit at ffffffffae083c81
 #26 [ff2d20159b39fc90] dev_queue_xmit at ffffffffae084370
 #27 [ff2d20159b39fca0] tap_sendmsg at ffffffffc07206ed [tap]
 #28 [ff2d20159b39fd20] vhost_tx_batch at ffffffffc2fd6590 [vhost_net]
 #29 [ff2d20159b39fd68] handle_tx_copy at ffffffffc2fd70f3 [vhost_net]
 #30 [ff2d20159b39fe80] handle_tx at ffffffffc2fd7651 [vhost_net]
 #31 [ff2d20159b39feb0] handle_tx_kick at ffffffffc2fd76b5 [vhost_net]
 #32 [ff2d20159b39fec0] vhost_worker at ffffffffc12a5be8 [vhost]
 #33 [ff2d20159b39ff08] kthread at ffffffffad8dbfe5
 #34 [ff2d20159b39ff50] ret_from_fork at ffffffffae400364

This change was discussed with Nvidia and they are in agreement.

Orabug: 36879158
CVE: CVE-2024-41090
CVE: CVE-2024-41091

Fixes: e4cf27b ("net/mlx5e: Re-eanble client vlan TX acceleration")
Reported-and-tested-by: Dongli Zhang <[email protected]>
Signed-off-by: Manjunath Patil <[email protected]>
Reviewed-by: Si-Wei Liu <[email protected]>
Reviewed-by: Jack Vogel <[email protected]>

Signed-off-by: Saeed Mirzamohammadi <[email protected]>
oraclelinuxkernel pushed a commit that referenced this pull request Jul 26, 2024
Add a check to mlx5e_xmit() for shorter frames. A corrupted/malformed
packet, with shorter length can eventually cause system panic further
down in the code path. Avoid it by validating the length and dropping it
at the earliest.

Following is seen in our env with shorter skb->len

crash> bt
PID: 76981    TASK: ff19828cfe508000  CPU: 106  COMMAND: "vhost-76942"
 #0 [ff2d20159b39f2c8] machine_kexec at ffffffffad884801
 #1 [ff2d20159b39f328] __crash_kexec at ffffffffad976142
 #2 [ff2d20159b39f3f8] panic at ffffffffad8b3640
 #3 [ff2d20159b39f4a0] no_context at ffffffffad8954e1
 #4 [ff2d20159b39f518] __bad_area_nosemaphore at ffffffffad8958de
 #5 [ff2d20159b39f578] bad_area_nosemaphore at ffffffffad895a96
 #6 [ff2d20159b39f588] do_kern_addr_fault at ffffffffad89688e
 #7 [ff2d20159b39f5b0] __do_page_fault at ffffffffad896b30
 #8 [ff2d20159b39f618] do_page_fault at ffffffffad896db6
 #9 [ff2d20159b39f650] page_fault at ffffffffae402acd
    [exception RIP: memcpy_erms+6]
    RIP: ffffffffae261ab6  RSP: ff2d20159b39f700  RFLAGS: 00010293
    RAX: ff198291741ecf2e  RBX: ff19828e70d6a100  RCX: fffffffffea1af2b
    RDX: fffffffffffffffd  RSI: ff19828eba6d7e5e  RDI: ff198291757d2000
    RBP: ff2d20159b39f760   R8: ff198291741ecf00   R9: 000000000000037c
    R10: 000000000000003c  R11: ff19828ffe953940  R12: ff198291741ecf20
    R13: ff198267dcb1b600  R14: ff19828eeebb09c0  R15: ff198291741ecf00
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
 #10 [ff2d20159b39f700] mlx5e_sq_xmit_wqe at ffffffffc05c162e [mlx5_core]
 #11 [ff2d20159b39f768] mlx5e_xmit at ffffffffc05c1ca3 [mlx5_core]
 #12 [ff2d20159b39f800] dev_hard_start_xmit at ffffffffae083766
 #13 [ff2d20159b39f860] sch_direct_xmit at ffffffffae0e2564
 #14 [ff2d20159b39f8b0] __qdisc_run at ffffffffae0e294e
 #15 [ff2d20159b39f928] __dev_queue_xmit at ffffffffae083eee
 #16 [ff2d20159b39f9a8] dev_queue_xmit at ffffffffae084370
 #17 [ff2d20159b39f9b8] vlan_dev_hard_start_xmit at ffffffffc2fb6fec [8021q]
 #18 [ff2d20159b39f9d8] dev_hard_start_xmit at ffffffffae083766
 #19 [ff2d20159b39fa38] __dev_queue_xmit at ffffffffae08416a
 #20 [ff2d20159b39fab8] dev_queue_xmit_accel at ffffffffae08438e
 #21 [ff2d20159b39fac8] macvlan_start_xmit at ffffffffc2fc18d9 [macvlan]
 #22 [ff2d20159b39faf0] dev_hard_start_xmit at ffffffffae083766
 #23 [ff2d20159b39fb50] sch_direct_xmit at ffffffffae0e2564
 #24 [ff2d20159b39fba0] __qdisc_run at ffffffffae0e294e
 #25 [ff2d20159b39fc18] __dev_queue_xmit at ffffffffae083c81
 #26 [ff2d20159b39fc90] dev_queue_xmit at ffffffffae084370
 #27 [ff2d20159b39fca0] tap_sendmsg at ffffffffc07206ed [tap]
 #28 [ff2d20159b39fd20] vhost_tx_batch at ffffffffc2fd6590 [vhost_net]
 #29 [ff2d20159b39fd68] handle_tx_copy at ffffffffc2fd70f3 [vhost_net]
 #30 [ff2d20159b39fe80] handle_tx at ffffffffc2fd7651 [vhost_net]
 #31 [ff2d20159b39feb0] handle_tx_kick at ffffffffc2fd76b5 [vhost_net]
 #32 [ff2d20159b39fec0] vhost_worker at ffffffffc12a5be8 [vhost]
 #33 [ff2d20159b39ff08] kthread at ffffffffad8dbfe5
 #34 [ff2d20159b39ff50] ret_from_fork at ffffffffae400364

This change was discussed with Nvidia and they are in agreement.

Orabug: 36879159
CVE: CVE-2024-41090
CVE: CVE-2024-41091

Fixes: e4cf27b ("net/mlx5e: Re-eanble client vlan TX acceleration")
Reported-and-tested-by: Dongli Zhang <[email protected]>
Signed-off-by: Manjunath Patil <[email protected]>
Reviewed-by: Si-Wei Liu <[email protected]>

In UEK4 stats is not a pointer, change the dropped code.

Signed-off-by: Jack Vogel <[email protected]>
Signed-off-by: Alok Tiwari <[email protected]>
oraclelinuxkernel pushed a commit that referenced this pull request Aug 9, 2024
commit be346c1 upstream.

The code in ocfs2_dio_end_io_write() estimates number of necessary
transaction credits using ocfs2_calc_extend_credits().  This however does
not take into account that the IO could be arbitrarily large and can
contain arbitrary number of extents.

Extent tree manipulations do often extend the current transaction but not
in all of the cases.  For example if we have only single block extents in
the tree, ocfs2_mark_extent_written() will end up calling
ocfs2_replace_extent_rec() all the time and we will never extend the
current transaction and eventually exhaust all the transaction credits if
the IO contains many single block extents.  Once that happens a
WARN_ON(jbd2_handle_buffer_credits(handle) <= 0) is triggered in
jbd2_journal_dirty_metadata() and subsequently OCFS2 aborts in response to
this error.  This was actually triggered by one of our customers on a
heavily fragmented OCFS2 filesystem.

To fix the issue make sure the transaction always has enough credits for
one extent insert before each call of ocfs2_mark_extent_written().

Heming Zhao said:

------
PANIC: "Kernel panic - not syncing: OCFS2: (device dm-1): panic forced after error"

PID: xxx  TASK: xxxx  CPU: 5  COMMAND: "SubmitThread-CA"
  #0 machine_kexec at ffffffff8c069932
  #1 __crash_kexec at ffffffff8c1338fa
  #2 panic at ffffffff8c1d69b9
  #3 ocfs2_handle_error at ffffffffc0c86c0c [ocfs2]
  #4 __ocfs2_abort at ffffffffc0c88387 [ocfs2]
  #5 ocfs2_journal_dirty at ffffffffc0c51e98 [ocfs2]
  #6 ocfs2_split_extent at ffffffffc0c27ea3 [ocfs2]
  #7 ocfs2_change_extent_flag at ffffffffc0c28053 [ocfs2]
  #8 ocfs2_mark_extent_written at ffffffffc0c28347 [ocfs2]
  #9 ocfs2_dio_end_io_write at ffffffffc0c2bef9 [ocfs2]
#10 ocfs2_dio_end_io at ffffffffc0c2c0f5 [ocfs2]
#11 dio_complete at ffffffff8c2b9fa7
#12 do_blockdev_direct_IO at ffffffff8c2bc09f
#13 ocfs2_direct_IO at ffffffffc0c2b653 [ocfs2]
#14 generic_file_direct_write at ffffffff8c1dcf14
#15 __generic_file_write_iter at ffffffff8c1dd07b
#16 ocfs2_file_write_iter at ffffffffc0c49f1f [ocfs2]
#17 aio_write at ffffffff8c2cc72e
#18 kmem_cache_alloc at ffffffff8c248dde
#19 do_io_submit at ffffffff8c2ccada
#20 do_syscall_64 at ffffffff8c004984
#21 entry_SYSCALL_64_after_hwframe at ffffffff8c8000ba

Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Fixes: c15471f ("ocfs2: fix sparse file & data ordering issue in direct io")
Signed-off-by: Jan Kara <[email protected]>
Reviewed-by: Joseph Qi <[email protected]>
Reviewed-by: Heming Zhao <[email protected]>
Cc: Mark Fasheh <[email protected]>
Cc: Joel Becker <[email protected]>
Cc: Junxiao Bi <[email protected]>
Cc: Changwei Ge <[email protected]>
Cc: Gang He <[email protected]>
Cc: Jun Piao <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
(cherry picked from commit 320273b5649bbcee87f9e65343077189699d2a7a)
Signed-off-by: Vijayendra Suman <[email protected]>
oraclelinuxkernel pushed a commit that referenced this pull request Sep 13, 2024
commit f0d17d696dfce77c9abc830e4ac2d677890a2dad upstream.

The pen ID, 0x80842, was not the correct ID for wacom driver to
treat. The ID was corrected to 0x8842.
Also, 0x4200 was not the expected ID used on any Wacom device.
Therefore, 0x4200 was removed.

Signed-off-by: Tatsunosuke Tobita <[email protected]>
Signed-off-by: Tatsunosuke Tobita <[email protected]>
Fixes: bfdc750 ("HID: wacom: add three styli to wacom_intuos_get_tool_type")
Cc: [email protected] #6.2
Reviewed-by: Ping Cheng <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Benjamin Tissoires <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
(cherry picked from commit 04dcab03e6fe4338e7ec8e6756f7bbd5ba4792a9)
Signed-off-by: Sherry Yang <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant