Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sync up with Linus #75

Merged
merged 78 commits into from
Jun 17, 2015
Merged

Sync up with Linus #75

merged 78 commits into from
Jun 17, 2015

Conversation

dabrace
Copy link
Owner

@dabrace dabrace commented Jun 17, 2015

No description provided.

Peter Zijlstra and others added 30 commits June 7, 2015 15:46
The lock_class iteration of /proc/lock_stat is not serialized against
the lockdep_free_key_range() call from module unload.

Therefore it can happen that we find a class of which ->name/->key are
no longer valid.

There is a further bug in zap_class() that left ->name dangling. Cure
this. Use RCU_INIT_POINTER() because NULL.

Since lockdep_free_key_range() is rcu_sched serialized, we can read
both ->name and ->key under rcu_read_lock_sched() (preempt-disable)
and be assured that if we observe a !NULL value it stays safe to use
for as long as we hold that lock.

If we observe both NULL, skip the entry.

Reported-by: Jerome Marchand <[email protected]>
Tested-by: Jerome Marchand <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
CBOX counters are increased to 48b on HSX.

Correct the MSR address for HSWEP_U_MSR_PMON_CTR0 and
HSWEP_U_MSR_PMON_CTL0.

See specification in:
http://www.intel.com/content/www/us/en/processors/xeon/
xeon-e5-v3-uncore-performance-monitoring.html

Signed-off-by: Kan Liang <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
Fixes: 6058bb3 'ARM: sun7i/sun6i: irqchip: Add irqchip driver for NMI controller'
Signed-off-by: Axel Lin <[email protected]>
Cc: Maxime Ripard <[email protected]>
Cc: Carlo Caione <[email protected]>
Cc: Jason Cooper <[email protected]>
Cc: [email protected]
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Thomas Gleixner <[email protected]>
This patch adds native DSD support for the XMOS based JLsounds I2SoverUSB board

Signed-off-by: Jurgen Kramer <[email protected]>
Cc: <[email protected]>
Signed-off-by: Takashi Iwai <[email protected]>
In the commit below, I missed the connector allocation in the function
intel_sdvo_analog_init(), leading to those connectors to have a NULL
state pointer.

commit 08d9bc9
Author: Ander Conselvan de Oliveira <[email protected]>
Date:   Fri Apr 10 10:59:10 2015 +0300

    drm/i915: Allocate connector state together with the connectors

Reported-by: Stefan Lippers-Hollmann <[email protected]>
Tested-by: Stefan Lippers-Hollmann <[email protected]>
Signed-off-by: Ander Conselvan de Oliveira <[email protected]>
Signed-off-by: Jani Nikula <[email protected]>
Using _bh variant for spin locks causes this kind of warning:
Starting logging: ------------[ cut here ]------------
WARNING: CPU: 0 PID: 3 at /ssd_drive/linux/kernel/softirq.c:151
__local_bh_enable_ip+0xe8/0xf4()
Modules linked in:
CPU: 0 PID: 3 Comm: ksoftirqd/0 Not tainted 4.1.0-rc2+ #94
Hardware name: Atmel SAMA5
[<c0013c04>] (unwind_backtrace) from [<c00118a4>] (show_stack+0x10/0x14)
[<c00118a4>] (show_stack) from [<c001bbcc>]
(warn_slowpath_common+0x80/0xac)
[<c001bbcc>] (warn_slowpath_common) from [<c001bc14>]
(warn_slowpath_null+0x1c/0x24)
[<c001bc14>] (warn_slowpath_null) from [<c001e28c>]
(__local_bh_enable_ip+0xe8/0xf4)
[<c001e28c>] (__local_bh_enable_ip) from [<c01fdbd0>]
(at_xdmac_device_terminate_all+0xf4/0x100)
[<c01fdbd0>] (at_xdmac_device_terminate_all) from [<c02221a4>]
(atmel_complete_tx_dma+0x34/0xf4)
[<c02221a4>] (atmel_complete_tx_dma) from [<c01fe4ac>]
(at_xdmac_tasklet+0x14c/0x1ac)
[<c01fe4ac>] (at_xdmac_tasklet) from [<c001de58>]
(tasklet_action+0x68/0xb4)
[<c001de58>] (tasklet_action) from [<c001dfdc>]
(__do_softirq+0xfc/0x238)
[<c001dfdc>] (__do_softirq) from [<c001e140>] (run_ksoftirqd+0x28/0x34)
[<c001e140>] (run_ksoftirqd) from [<c0033a3c>]
(smpboot_thread_fn+0x138/0x18c)
[<c0033a3c>] (smpboot_thread_fn) from [<c0030e7c>] (kthread+0xdc/0xf0)
[<c0030e7c>] (kthread) from [<c000f480>] (ret_from_fork+0x14/0x34)
---[ end trace b57b14a99c1d8812 ]---

It comes from the fact that devices can called some code from the DMA
controller with irq disabled. _bh variant is not intended to be used in
this case since it can enable irqs. Switch to irqsave/irqrestore variant to
avoid this situation.

Signed-off-by: Ludovic Desroches <[email protected]>
Cc: [email protected] # 4.0 and later
Signed-off-by: Vinod Koul <[email protected]>
Rework slave configuration part in order to more report wrong errors
about the configuration.
Only maxburst and addr width values are checked when doing the slave
configuration. The validity of the channel configuration is done at
prepare time.

Signed-off-by: Ludovic Desroches <[email protected]>
Cc: [email protected] # 4.0 and later
Signed-off-by: Vinod Koul <[email protected]>
The MW regbase and vbase(s) were not being freed if an error occurred
in the vbase allocation loop.  This is corrected by updating the error
path for the allocation loop to err4.

Reported-by: Julia Lawall <[email protected]>
Signed-off-by: Jon Mason <[email protected]>
The new regmap code seems to cache this, which isn't helpful
for the hotplug dock situation where this gets updated.

Use the uncached query for this.

Signed-off-by: Dave Airlie <[email protected]>
Signed-off-by: Takashi Iwai <[email protected]>
Passive DP->DVI/HDMI dongles on DP++ ports show up to the system as HDMI
devices, as they do not have a sink device in them to respond to any AUX
traffic. When probing these dongles over the DDC, sometimes they will
NAK the first attempt even though the transaction is valid and they
support the DDC protocol. The retry loop inside of
drm_do_probe_ddc_edid() would normally catch this case and try the
transaction again, resulting in success.

That, however, was thwarted by the fix for [1]:

commit 9292f37
Author: Eugeni Dodonov <[email protected]>
Date:   Thu Jan 5 09:34:28 2012 -0200

    drm: give up on edid retries when i2c bus is not responding

This added code to exit immediately if the return code from the
i2c_transfer function was -ENXIO in order to reduce the amount of time
spent in waiting for unresponsive or disconnected devices. That was
possible because the underlying i2c bit banging algorithm had retries of
its own (which, of course, were part of the reason for the bug the
commit fixes).

Since its introduction in

commit f899fc6
Author: Chris Wilson <[email protected]>
Date:   Tue Jul 20 15:44:45 2010 -0700

    drm/i915: use GMBUS to manage i2c links

we've been flipping back and forth enabling the GMBUS transfers, but
we've settled since then. The GMBUS implementation does not do any
retries, however, bailing out of the drm_do_probe_ddc_edid() retry loop
on first encounter of -ENXIO. This, combined with Eugeni's commit, broke
the retry on -ENXIO.

Retry GMBUS once on -ENXIO on first message to mitigate the issues with
passive adapters.

This patch is based on the work, and commit message, by Todd Previte
<[email protected]>.

[1] https://bugs.freedesktop.org/show_bug.cgi?id=41059

v2: Don't retry if using bit banging.

v3: Move retry within gmbux_xfer, retry only on first message.

v4: Initialize GMBUS0 on retry (Ville).

v5: Take index reads into account (Ville).

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=85924
Cc: Todd Previte <[email protected]>
Cc: [email protected]
Tested-by: Oliver Grafe <[email protected]> (v2)
Tested-by: Jim Bride <[email protected]>
Reviewed-by: Ville Syrjälä <[email protected]>
Signed-off-by: Jani Nikula <[email protected]>
Commit be0c37c ("MIPS: Rearrange PTE bits into fixed positions.")
rearranged the PTE bits into fixed positions in preparation for the XPA
support. However, this patch broke R6 since it only took R2 cores
into consideration for the RI/XI bits leading to boot failures. We fix
this by adding the missing CONFIG_CPU_MIPSR6 definitions

Fixes: be0c37c ("MIPS: Rearrange PTE bits into fixed positions.")
Cc: Steven J. Hill <[email protected]>
Cc: [email protected]
Patchwork: https://patchwork.linux-mips.org/patch/10208/
Signed-off-by: Markos Chandras <[email protected]>
Signed-off-by: Ralf Baechle <[email protected]>
…nitialization"

This reverts commit c05199e.

Vince Weaver reported the following crash while perf fuzzing:

[   79.473121] kernel BUG at mm/vmalloc.c:1335!
[   79.694391] Call Trace:
[   79.696997]  <IRQ>
[   79.699090]  [<ffffffff811b2130>] get_vm_area_caller+0x40/0x50
[   79.705505]  [<ffffffff81039f4d>] ? snb_uncore_imc_init_box+0x6d/0x90
[   79.712414]  [<ffffffff810635e5>] __ioremap_caller+0x195/0x350
[   79.718610]  [<ffffffff81039f4d>] ? snb_uncore_imc_init_box+0x6d/0x90
[   79.725462]  [<ffffffff81427f6b>] ? debug_object_activate+0x14b/0x1e0
[   79.732346]  [<ffffffff810637b7>] ioremap_nocache+0x17/0x20
[   79.738283]  [<ffffffff81039f4d>] snb_uncore_imc_init_box+0x6d/0x90
[   79.744945]  [<ffffffff81039cf7>] snb_uncore_imc_event_start+0xb7/0x110
[   79.752020]  [<ffffffff81039d97>] snb_uncore_imc_event_add+0x47/0x60
[   79.758832]  [<ffffffff81162cbb>] event_sched_in.isra.85+0xfb/0x330
[   79.765519]  [<ffffffff81162f5f>] group_sched_in+0x6f/0x1e0
[   79.771481]  [<ffffffff8101df1a>] ? native_sched_clock+0x2a/0x90
[   79.777858]  [<ffffffff811637bc>] __perf_event_enable+0x25c/0x2a0
[   79.784418]  [<ffffffff810f3e69>] ? tick_nohz_irq_exit+0x29/0x30
[   79.790820]  [<ffffffff8115ef30>] ? cpu_clock_event_start+0x40/0x40
[   79.797546]  [<ffffffff8115ef80>] remote_function+0x50/0x60
[   79.803535]  [<ffffffff810f8cd1>] flush_smp_call_function_queue+0x81/0x180
[   79.810840]  [<ffffffff810f9763>] generic_smp_call_function_single_interrupt+0x13/0x60
[   79.819328]  [<ffffffff8104b5e8>] smp_trace_call_function_single_interrupt+0x38/0xc0
[   79.827614]  [<ffffffff816de9be>] trace_call_function_single_interrupt+0x6e/0x80
[   79.835465]  <EOI>
[   79.837543]  [<ffffffff8156e8b5>] ? cpuidle_enter_state+0x65/0x160
[   79.844377]  [<ffffffff8156e8a1>] ? cpuidle_enter_state+0x51/0x160
[   79.851015]  [<ffffffff8156e9e7>] cpuidle_enter+0x17/0x20
[   79.856791]  [<ffffffff810b6e39>] cpu_startup_entry+0x399/0x440
[   79.863165]  [<ffffffff816c9ddb>] rest_init+0xbb/0xd0

The offending commit is clearly confused as it moves heavy initialization
work into IPI context.

Revert it.

Reported-by: Vince Weaver <[email protected]>
Acked-by: Peter Zijlstra (Intel) <[email protected]>
Cc: Kan Liang <[email protected]>
Cc: Arnaldo Carvalho de Melo <[email protected]>
Cc: Stephane Eranian <[email protected]>
Cc: Yan, Zheng <[email protected]>
Cc: [email protected]
Signed-off-by: Ingo Molnar <[email protected]>
…ister

The existing hardware implementations with PASID support advertised in
bit 28? Forget them. They do not exist. Bit 28 means nothing. When we
have something that works, it'll use bit 40. Do not attempt to infer
anything meaningful from bit 28.

This will be reflected in an updated VT-d spec in the extremely near
future.

Signed-off-by: David Woodhouse <[email protected]>
Dan Carpenter reported missing brackets which resulted in reading a
wrong crystalfreq value. I also noticed that the result of this
function is ignored.

Reported-By: Dan Carpenter <[email protected]>
Signed-off-by: Hauke Mehrtens <[email protected]>
Signed-off-by: Michael Buesch <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Patchwork: https://patchwork.linux-mips.org/patch/10536/
Signed-off-by: Ralf Baechle <[email protected]>
Until recently, mac80211 overwrote all the statistics it could
provide when getting called, but it now relies on the struct
having been zeroed by the caller. This was always the case in
nl80211, but wext used a static struct which could even cause
values from one device leak to another.

Using a static struct is OK (as even documented in a comment)
since the whole usage of this function and its return value is
always locked under RTNL. Not clearing the struct for calling
the driver has always been wrong though, since drivers were
free to only fill values they could report, so calling this
for one device and then for another would always have leaked
values from one to the other.

Fix this by initializing the structure in question before the
driver method call.

This fixes https://bugzilla.kernel.org/show_bug.cgi?id=99691

Cc: [email protected]
Reported-by: Gerrit Renker <[email protected]>
Reported-by: Alexander Kaltsas <[email protected]>
Signed-off-by: Johannes Berg <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Now blk_cleanup_queue() can be called before calling
del_gendisk()[1], inside which hctx->ctxs is touched
from blk_mq_unregister_hctx(), but the variable has
been freed by blk_cleanup_queue() at that time.

So this patch moves freeing of hctx->ctxs into queue's
release handler for fixing the oops reported by Stefan.

[1], 6cd18e7 (block: destroy bdi before blockdev is
unregistered)

Reported-by: Stefan Seyfried <[email protected]>
Cc: NeilBrown <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Cc: [email protected] (v4.0)
Signed-off-by: Ming Lei <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
Along with the transition to regmap for managing the cached parameter
reads, the caps overwrite was also moved to regmap cache.  The cache
change itself works, but it still tries to write the non-existing verb
(the HDA parameter is read-only) wrongly.  It's harmless in most
cases, but some chips are picky and may result in the codec
communication stall.

This patch avoids it just by adding the missing flag check in
reg_write ops.

Signed-off-by: Takashi Iwai <[email protected]>
…odule.

If CONFIG_MTD_PHYSMAP is set to m, the Cobalt mtd.ko module might get
unloaded while the drivers/mtd modules are still loaded resulting in
stale references to the destroyed platform_device instance.

Anyway, platform devices should always be registered indicated what
devices are present, _not_ what drivers have been configured.

Signed-off-by: Ralf Baechle <[email protected]>
If CONFIG_SERIAL_8250 is set to m, the Loongson seria.ko module might get
unloaded while the serial driver modules are still loaded resulting in
stale references to the destroyed platform_device instance.

Anyway, platform devices should always be registered indicated what
devices are present, _not_ what drivers have been configured.

Signed-off-by: Ralf Baechle <[email protected]>
Reported-by: Paul Gortmaker <[email protected]>
Patchwork: https://patchwork.linux-mips.org/patch/10538/
Due to the slightly odd way that new threads and processes start execution
when scheduled for the very first time they were bypassing the required
disable_msa call.

Signed-off-by: Ralf Baechle <[email protected]>
Fix

include/asm-generic/io.h: In function 'readb':
include/asm-generic/io.h:113:2: error:
	implicit declaration of function 'bfin_read8'
include/asm-generic/io.h: In function 'readw':
include/asm-generic/io.h:121:2: error:
	implicit declaration of function 'bfin_read16'
include/asm-generic/io.h: In function 'readl':
include/asm-generic/io.h:129:2: error:
	implicit declaration of function 'bfin_read32'
include/asm-generic/io.h: In function 'writeb':
include/asm-generic/io.h:147:2: error:
	implicit declaration of function 'bfin_write8'
include/asm-generic/io.h: In function 'writew':
include/asm-generic/io.h:155:2: error:
	implicit declaration of function 'bfin_write16'
include/asm-generic/io.h: In function 'writel':
include/asm-generic/io.h:163:2: error:
	implicit declaration of function 'bfin_write32'

Reported-by: Geert Uytterhoeven <[email protected]>
Fixes: 1a3372b ("blackfin: io: define __raw_readx/writex with
	bfin_readx/writex")
Cc: Steven Miao <[email protected]>
Signed-off-by: Guenter Roeck <[email protected]>
The latest version of modinfo fails to compile score architecture
targets with the following error.

FATAL: The relocation at __ex_table+0x634 references
section "__ex_table" which is not executable, IOW
the kernel will fault if it ever tries to
jump to it.  Something is seriously wrong
and should be fixed.

The probem is caused by a bad label in an __ex_table entry.

Acked-by: Lennox Wu <[email protected]>
Cc: Quentin Casasnovas <[email protected]>
Signed-off-by: Guenter Roeck <[email protected]>
This reverts commit 0243508.

It introduces new regressions.

Signed-off-by: David S. Miller <[email protected]>
Izumi found the following oops when hot re-adding a node:

    BUG: unable to handle kernel paging request at ffffc90008963690
    IP: __wake_up_bit+0x20/0x70
    Oops: 0000 [#1] SMP
    CPU: 68 PID: 1237 Comm: rs:main Q:Reg Not tainted 4.1.0-rc5 #80
    Hardware name: FUJITSU PRIMEQUEST2800E/SB, BIOS PRIMEQUEST 2000 Series BIOS Version 1.87 04/28/2015
    task: ffff880838df8000 ti: ffff880017b94000 task.ti: ffff880017b94000
    RIP: 0010:[<ffffffff810dff80>]  [<ffffffff810dff80>] __wake_up_bit+0x20/0x70
    RSP: 0018:ffff880017b97be8  EFLAGS: 00010246
    RAX: ffffc90008963690 RBX: 00000000003c0000 RCX: 000000000000a4c9
    RDX: 0000000000000000 RSI: ffffea101bffd500 RDI: ffffc90008963648
    RBP: ffff880017b97c08 R08: 0000000002000020 R09: 0000000000000000
    R10: 0000000000000000 R11: 0000000000000000 R12: ffff8a0797c73800
    R13: ffffea101bffd500 R14: 0000000000000001 R15: 00000000003c0000
    FS:  00007fcc7ffff700(0000) GS:ffff880874800000(0000) knlGS:0000000000000000
    CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    CR2: ffffc90008963690 CR3: 0000000836761000 CR4: 00000000001407e0
    Call Trace:
      unlock_page+0x6d/0x70
      generic_write_end+0x53/0xb0
      xfs_vm_write_end+0x29/0x80 [xfs]
      generic_perform_write+0x10a/0x1e0
      xfs_file_buffered_aio_write+0x14d/0x3e0 [xfs]
      xfs_file_write_iter+0x79/0x120 [xfs]
      __vfs_write+0xd4/0x110
      vfs_write+0xac/0x1c0
      SyS_write+0x58/0xd0
      system_call_fastpath+0x12/0x76
    Code: 5d c3 66 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55 48 89 e5 48 83 ec 20 65 48 8b 04 25 28 00 00 00 48 89 45 f8 31 c0 48 8d 47 48 <48> 39 47 48 48 c7 45 e8 00 00 00 00 48 c7 45 f0 00 00 00 00 48
    RIP  [<ffffffff810dff80>] __wake_up_bit+0x20/0x70
     RSP <ffff880017b97be8>
    CR2: ffffc90008963690

Reproduce method (re-add a node)::
  Hot-add nodeA --> remove nodeA --> hot-add nodeA (panic)

This seems an use-after-free problem, and the root cause is
zone->wait_table was not set to *NULL* after free it in
try_offline_node.

When hot re-add a node, we will reuse the pgdat of it, so does the zone
struct, and when add pages to the target zone, it will init the zone
first (including the wait_table) if the zone is not initialized.  The
judgement of zone initialized is based on zone->wait_table:

	static inline bool zone_is_initialized(struct zone *zone)
	{
		return !!zone->wait_table;
	}

so if we do not set the zone->wait_table to *NULL* after free it, the
memory hotplug routine will skip the init of new zone when hot re-add
the node, and the wait_table still points to the freed memory, then we
will access the invalid address when trying to wake up the waiting
people after the i/o operation with the page is done, such as mentioned
above.

Signed-off-by: Gu Zheng <[email protected]>
Reported-by: Taku Izumi <[email protected]>
Reviewed by: Yasuaki Ishimatsu <[email protected]>
Cc: KAMEZAWA Hiroyuki <[email protected]>
Cc: Tang Chen <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
When trimming memcg consumption excess (see memory.high), we call
try_to_free_mem_cgroup_pages without checking if we are allowed to sleep
in the current context, which can result in a deadlock.  Fix this.

Fixes: 241994e ("mm: memcontrol: default hierarchy interface for memory")
Signed-off-by: Vladimir Davydov <[email protected]>
Cc: Johannes Weiner <[email protected]>
Acked-by: Michal Hocko <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
Clear zram disk io accounting when resetting the zram device.  Otherwise
the residual io accounting stat will affect the diskstat in the next
zram active cycle.

Signed-off-by: Weijie Yang <[email protected]>
Acked-by: Sergey Senozhatsky <[email protected]>
Acked-by: Minchan Kim <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
Commit d5e616f ("checkpatch: add a few more --fix corrections")
broke the GLOBAL_INITIALISERS test with bad parentheses and optional
leading spaces.

Fix it.

Signed-off-by: Joe Perches <[email protected]>
Reported-by: Bandan Das <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
On -rt, the VM_BUG_ON(!irqs_disabled()) triggers inside the memcg
swapout path because the spin_lock_irq(&mapping->tree_lock) in the
caller doesn't actually disable the hardware interrupts - which is fine,
because on -rt the tophalves run in process context and so we are still
safe from preemption while updating the statistics.

Remove the VM_BUG_ON() but keep the comment of what we rely on.

Signed-off-by: Johannes Weiner <[email protected]>
Reported-by: Clark Williams <[email protected]>
Cc: Fernando Lopez-Lezcano <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
If zs_create_pool()->create_handle_cache()->kmem_cache_create() or
pool->name allocation fails, zs_create_pool()->destroy_handle_cache()
will dereference the NULL pool->handle_cachep.

Modify destroy_handle_cache() to avoid this.

Signed-off-by: Sergey Senozhatsky <[email protected]>
Cc: Minchan Kim <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
Jovi Zhangwei reported the following problem

  Below kernel vm bug can be triggered by tcpdump which mmaped a lot of pages
  with GFP_COMP flag.

  [Mon May 25 05:29:33 2015] page:ffffea0015414000 count:66 mapcount:1 mapping:          (null) index:0x0
  [Mon May 25 05:29:33 2015] flags: 0x20047580004000(head)
  [Mon May 25 05:29:33 2015] page dumped because: VM_BUG_ON_PAGE(compound_order(page) && !PageTransHuge(page))
  [Mon May 25 05:29:33 2015] ------------[ cut here ]------------
  [Mon May 25 05:29:33 2015] kernel BUG at mm/migrate.c:1661!
  [Mon May 25 05:29:33 2015] invalid opcode: 0000 [#1] SMP

In this case it was triggered by running tcpdump but it's not necessary
reproducible on all systems.

  sudo tcpdump -i bond0.100 'tcp port 4242' -c 100000000000 -w 4242.pcap

Compound pages cannot be migrated and it was not expected that such pages
be marked for NUMA balancing.  This did not take into account that drivers
such as net/packet/af_packet.c may insert compound pages into userspace
with vm_insert_page.  This patch tells the NUMA balancing protection
scanner to skip all VM_MIXEDMAP mappings which avoids the possibility that
compound pages are marked for migration.

Signed-off-by: Mel Gorman <[email protected]>
Reported-by: Jovi Zhangwei <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
shligit and others added 24 commits June 11, 2015 17:33
We saw excessive direct memory compaction triggered by skb_page_frag_refill.
This causes performance issues and add latency. Commit 5640f76
introduces the order-3 allocation. According to the changelog, the order-3
allocation isn't a must-have but to improve performance. But direct memory
compaction has high overhead. The benefit of order-3 allocation can't
compensate the overhead of direct memory compaction.

This patch makes the order-3 page allocation atomic. If there is no memory
pressure and memory isn't fragmented, the alloction will still success, so we
don't sacrifice the order-3 benefit here. If the atomic allocation fails,
direct memory compaction will not be triggered, skb_page_frag_refill will
fallback to order-0 immediately, hence the direct memory compaction overhead is
avoided. In the allocation failure case, kswapd is waken up and doing
compaction, so chances are allocation could success next time.

alloc_skb_with_frags is the same.

The mellanox driver does similar thing, if this is accepted, we must fix
the driver too.

V3: fix the same issue in alloc_skb_with_frags as pointed out by Eric
V2: make the changelog clearer

Cc: Eric Dumazet <[email protected]>
Cc: Chris Mason <[email protected]>
Cc: Debabrata Banerjee <[email protected]>
Signed-off-by: Shaohua Li <[email protected]>
Acked-by: Eric Dumazet <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Pull drm fixes from Dave Airlie:
 "i915 and radeon fixes:

  i915:
      fix for connector oops regression
      DDC probing fix

  radeon:
      two radeon reverts, along with a freeze workaround and a fix"

* 'drm-fixes' of git://people.freedesktop.org/~airlied/linux:
  drm/radeon: Make sure radeon_vm_bo_set_addr always unreserves the BO
  Revert "drm/radeon: adjust pll when audio is not enabled"
  Revert "drm/radeon: don't share plls if monitors differ in audio support"
  drm/radeon: fix freeze for laptop with Turks/Thames GPU.
  drm/i915: Fix DDC probe for passive adapters
  drm/i915: Properly initialize SDVO analog connectors
The previous patch tried to continue the probe if i915 binding fails.
For for simplicity reason, we haven't implemented abort even for
controller chips that are dedicated for HDMI/DP on HSW and BDW.
However, Mengdong suggested that this can be dangerous; BIOS may
disable gfx power well although the PCI entry for HD-audio is left,
and this may result in the unexpected behavior, kernel errors, etc.

For avoiding this situation, abort the probe at i915 binding failure
only for HSW/BDW chips selectively.  For other chips, it still
continues.

Fixes: bf06848 ('ALSA: hda - Continue probing even if i915 binding fails')
Reported-by: Mengdong Lin <[email protected]>
Signed-off-by: Takashi Iwai <[email protected]>
Some drivers implement only pause operation (no resuming). Example is
pl330 where pause is needed for getting residuum. pl330 does not support
resume operation, transfer must be stopped after pause.

However for slaves this is exposed always as "pause and resume" which
introduces subtle errors on Odroid U3 board (Exynos4412 with pl330).
After adding pause function to pl330 driver the audio playback
(utilizing DMA) gets choppy after some time (approximately 24 hours).

Fix this by exposing "cmd_pause" if and only if pause and resume are
implemented.

Signed-off-by: Krzysztof Kozlowski <[email protected]>
Reported-by: [email protected]
Reported-by: Marek Szyprowski <[email protected]>
Cc: <[email protected]>
Fixes: 88987d2 ("dmaengine: pl330: add DMA_PAUSE feature")
Acked-by: Maxime Ripard <[email protected]>
Signed-off-by: Vinod Koul <[email protected]>
Returning zero from a 'store' function is bad.
The return value should be either len length of the string
or an error.

So use 'len' if 'err' is zero.

Fixes: 6791875 ("md: make reconfig_mutex optional for writes to md sysfs files.")
Signed-off-by: NeilBrown <[email protected]>
Cc: [email protected] (v4.0+)
Checking ->sync_thread without holding the mddev_lock()
isn't really safe, even after flushing the workqueue which
ensures md_start_sync() has been run.

While this code is waiting for the lock, md_check_recovery could reap
the thread itself, and then start another thread (e.g. recovery might
finish, then reshape starts).  When this thread gets the lock
md_start_sync() hasn't run so it doesn't get reaped, but
MD_RECOVERY_RUNNING gets cleared.  This allows two threads to start
which leads to confusion.

So don't both if MD_RECOVERY_RUNNING isn't set, but if it is do
the flush and the test and the reap all under the mddev_lock to
avoid any race with md_check_recovery.

Signed-off-by: NeilBrown <[email protected]>
Fixes: 6791875 ("md: make reconfig_mutex optional for writes to md sysfs files.")
Cc: [email protected] (v4.0+)
MD_RECOVERY_DONE is normally cleared by md_check_recovery after a
resync etc finished.  However it is possible for raid5_start_reshape
to race and start a reshape before MD_RECOVERY_DONE is cleared.  This
can lean to multiple reshapes running at the same time, which isn't
good.

To make sure it is cleared before starting a reshape, and also clear
it when reaping a thread, just to be safe.

Signed-off-by: NeilBrown  <[email protected]>
Although the extended tables are theoretically a completely orthogonal
feature to PASID and anything else that *uses* the newly-available bits,
some of the early hardware has problems even when all we do is enable
them and use only the same bits that were in the old context tables.

For now, there's no motivation to support extended tables unless we're
going to use PASID support to do SVM. So just don't use them unless
PASID support is advertised too. Also add a command-line bailout just in
case later chips also have issues.

The equivalent problem for PASID support has already been fixed with the
upcoming VT-d spec update and commit bd00c60 ("iommu/vt-d: Change
PASID support to bit 40 of Extended Capability Register"), because the
problematic platforms use the old definition of the PASID-capable bit,
which is now marked as reserved and meaningless.

So with this change, we'll magically start using ECS again only when we
see the new hardware advertising "hey, we have PASID support and we
actually tested it this time" on bit 40.

The VT-d hardware architect has promised that we are not going to have
any reason to support ECS *without* PASID any time soon, and he'll make
sure he checks with us before changing that.

In the future, if hypothetical new features also use new bits in the
context tables and can be seen on implementations *without* PASID support,
we might need to add their feature bits to the ecs_enabled() macro.

Signed-off-by: David Woodhouse <[email protected]>
Pull VT-d hardware workarounds from David Woodhouse:
 "This contains a workaround for hardware issues which I *thought* were
  never going to be seen on production hardware.  I'm glad I checked
  that before the 4.1 release...

  Firstly, PASID support is so broken on existing chips that we're just
  going to declare the old capability bit 28 as 'reserved' and change
  the VT-d spec to move PASID support to another bit.  So any existing
  hardware doesn't support SVM; it only sets that (now) meaningless bit
  28.

  That patch *wasn't* imperative for 4.1 because we don't have PASID
  support yet.  But *even* the extended context tables are broken — if
  you just enable the wider tables and use none of the new bits in them,
  which is precisely what 4.1 does, you find that translations don't
  work.  It's this problem which I thought was caught in time to be
  fixed before production, but wasn't.

  To avoid triggering this issue, we now *only* enable the extended
  context tables on hardware which also advertises "we have PASID
  support and we actually tested it this time" with the new PASID
  feature bit.

  In addition, I've added an 'intel_iommu=ecs_off' command line
  parameter to allow us to disable it manually if we need to"

* git://git.infradead.org/intel-iommu:
  iommu/vt-d: Only enable extended context tables if PASID is supported
  iommu/vt-d: Change PASID support to bit 40 of Extended Capability Register
Pull three more md fixes from Neil Brown:
 "Hasn't been a good cycle for md has it :-(

  The main issue fixed here is a rare race which can result in two
  reshape threads running at once, which doesn't end well.

  Also a minor issue with a write to a sysfs file returning the wrong
  value.  Backports to 4.0-stable are indicated"

* tag 'md/4.1-rc7-fixes' of git://neil.brown.name/md:
  md: make sure MD_RECOVERY_DONE is clear before starting recovery/resync
  md: Close race when setting 'action' to 'idle'.
  md: don't return 0 from array_state_store
Pull block layer fixes from Jens Axboe:
 "Remember about a week ago when I sent the last pull request for 4.1?
  Well, I lied.  Now, I don't want to shift the blame, but Dan, Ming,
  and Richard made a liar out of me.

  Here are three small patches that should go into 4.1.  More
  specifically, this pull request contains:

   - A Kconfig dependency for the pmem block driver, so it can't be
     selected if HAS_IOMEM isn't availble.  From Richard Weinberger.

   - A fix for genhd, making the ext_devt_lock softirq safe.  This makes
     lockdep happier, since we also end up grabbing this lock on release
     off the softirq path.  From Dan Williams.

   - A blk-mq software queue release fix from Ming Lei.

  Last two are headed to stable, first fixes an issue introduced in this
  cycle"

* 'for-linus' of git://git.kernel.dk/linux-block:
  block: pmem: Add dependency on HAS_IOMEM
  block: fix ext_dev_lock lockdep report
  blk-mq: free hctx->ctxs in queue's release handler
Currently, we can ask to authenticate DATA chunks and we can send DATA
chunks on the same packet as COOKIE_ECHO, but if you try to combine
both, the DATA chunk will be sent unauthenticated and peer won't accept
it, leading to a communication failure.

This happens because even though the data was queued after it was
requested to authenticate DATA chunks, it was also queued before we
could know that remote peer can handle authenticating, so
sctp_auth_send_cid() returns false.

The fix is whenever we set up an active key, re-check send queue for
chunks that now should be authenticated. As a result, such packet will
now contain COOKIE_ECHO + AUTH + DATA chunks, in that order.

Reported-by: Liu Wei <[email protected]>
Signed-off-by: Marcelo Ricardo Leitner <[email protected]>
Acked-by: Neil Horman <[email protected]>
Acked-by: Vlad Yasevich <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
This patch fix URL (http to https) for wiki.wireshark.org.

Signed-off-by: Masanari Iida <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Pull networking fixes from David Miller:

 1) Fix uninitialized struct station_info in cfg80211_wireless_stats(),
    from Johannes Berg.

 2) Revert commit attempt to fix ipv6 protocol resubmission, it adds
    regressions.

 3) Endless loops can be created in bridge port lists, fix from Nikolay
    Aleksandrov.

 4) Don't WARN_ON() if sk->sk_forward_alloc is non-zero in
    sk_clear_memalloc, it is a legal situation during swap deactivation.
    Fix from Mel Gorman.

 5) Fix order of disabling interrupts and unlocking NAPI in enic driver
    to avoid a race.  From Govindarajulu Varadarajan.

 6) High and low register writes are swapped when programming the start
    of periodic output in igb driver.  From Richard Cochran.

 7) Fix device rename handling in mpls stack, from Robert Shearman.

 8) Do not trigger compaction synchronously when optimistically trying
    to allocate an order 3 page in alloc_skb_with_frags() and
    skb_page_frag_refill().  From Shaohua Li.

 9) Authentication with COOKIE_ECHO is not handled properly in SCTP, fix
    from Marcelo Ricardo Leitner.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net:
  Doc: networking: Fix URL for wiki.wireshark.org in udplite.txt
  sctp: allow authenticating DATA chunks that are bundled with COOKIE_ECHO
  net: don't wait for order-3 page allocation
  mpls: handle device renames for per-device sysctls
  net: igb: fix the start time for periodic output signals
  enic: fix memory leak in rq_clean
  enic: check return value for stat dump
  enic: unlock napi busy poll before unmasking intr
  net, swap: Remove a warning and clarify why sk_mem_reclaim is required when deactivating swap
  bridge: fix multicast router rlist endless loop
  tipc: disconnect socket directly after probe failure
  Revert "ipv6: Fix protocol resubmission"
  cfg80211: wext: clear sinfo struct before calling driver
The GIC chained handlers use do_IRQ() to call the subhandlers.  This
means that irq_enter() calls get nested, which leads to preempt count
looking like we're in nested interrupts, which in turn leads to all
system time being accounted as IRQ time in account_system_time().

Fix it by using generic_handle_irq().  Since these same functions are
used in some systems (if cpu_has_veic) from a low-level vectored
interrupt handler which does not go throught do_IRQ(), we need to do it
conditionally.

Signed-off-by: Rabin Vincent <[email protected]>
Reviewed-by: Andrew Bresticker <[email protected]>
Acked-by: Thomas Gleixner <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Patchwork: https://patchwork.linux-mips.org/patch/10545/
Signed-off-by: Ralf Baechle <[email protected]>
This patch fixes mips compilation error:

lib/mpi/generic_mpih-mul1.c: In function 'mpihelp_mul_1':
lib/mpi/longlong.h:651:2: error: impossible constraint in 'asm'

Signed-off-by: Jaedon Shin <[email protected]>
Cc: Linux-MIPS <[email protected]>
Patchwork: https://patchwork.linux-mips.org/patch/10546/
Signed-off-by: Ralf Baechle <[email protected]>
…l/git/tiwai/sound

Pull sound fixes from Takashi Iwai:
 "Most of commits are regression fixes for HD-audio: a few corner case
  fixes for regmap transition, and i915 binding issues.

  In addition, a quirk for another USB-audio device supporting DSD"

* tag 'sound-4.1-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
  ALSA: hda - Abort the probe without i915 binding for HSW/BDW
  ALSA: hda - Re-add the lost fake mute support
  ALSA: hda - Continue probing even if i915 binding fails
  ALSA: hda - Don't actually write registers for caps overwrites
  ALSA: hda - fix number of devices query on hotplug
  ALSA: usb-audio: add native DSD support for JLsounds I2SoverUSB
…linux/kernel/git/tip/tip

Pull perf fixes from Ingo Molnar:
 "A regression fix for a crash, and a Intel HSW uncore PMU driver fix"

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  Revert "perf/x86/intel/uncore: Move uncore_box_init() out of driver initialization"
  perf/x86/intel/uncore: Fix CBOX bit wide and UBOX reg on Haswell-EP
…cm/linux/kernel/git/tip/tip

Pull lockdep fix from Ingo Molnar:
 "A lockdep/modules unload race fix that can oops"

* 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  lockdep: Fix a race between /proc/lock_stat and module unload
…inux/kernel/git/tip/tip

Pull irqchip fix from Thomas Gleixner:
 "A single fix for an off by one bug in the sunxi irqchip driver"

* 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  irqchip: sunxi-nmi: Fix off-by-one error in irq iterator
…ream-linus

Pull more MIPS fixes from Ralf Baechle:
 "Another round of 4.1 MIPS fixes, one fix to a MIPS-specific #if
  condition in lib/mpi, one fix to the MIPS GIC irqchip driver and one
  SSB fix.

  Details:
   - fix handling of clock in chipco SSB driver.
   - fix two MIPS-specific #if conditions to correctly work for GCC 5.1.
   - fix damage to R6 pgtable bits done by XPA support.
   - fix possible crash due to unloading modules that contain statically
     defined platform devices.
   - fix disabling of the MSA ASE on context switch to also work
     correctly when a new thread/process has the CPU for the very first
     time.

  This is part of linux-next and has been beaten to death on
  Imagination's test farm.

  While things are not looking too grim this pull request also means the
  rate of fixes for 4.1 remains nearly constant so I'd not be unhappy if
  you'd delay the release"

* 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus:
  MPI: MIPS: Fix compilation error with GCC 5.1
  IRQCHIP: mips-gic: Don't nest calls to do_IRQ()
  MIPS: MSA: bugfix - disable MSA correctly for new threads/processes.
  MIPS: Loongson: Do not register 8250 platform device from module.
  MIPS: Cobalt: Do not build MTD platform device registration code as module.
  SSB: Fix handling of ssb_pmu_get_alp_clock()
  MIPS: pgtable-bits: Fix XPA damage to R6 definitions.
Pull NTB fixes from Jon Mason:
 "I apologize for the tardiness of this request.  Here are a couple of
  last minute NTB bug fixes for v4.1:

  NTB bug fixes to address issues in unmapping the MW reg base and
  vbase, and an uninitialized variable on Atom platforms"

* tag 'ntb-4.1' of git://github.com/jonmason/ntb:
  ntb: initialize max_mw for Atom before using it
  ntb: iounmap MW reg and vbase in error path
Pull dmaengine fixes from Vinod Koul:
 "Here are hopefully last set of fixes for 4.1. This time we have:

   - fixing pause capability reporting on both dmaengine pause & resume
     support by Krzysztof

   - locking fix fir at_xdmac by Ludovic

   - slave configuration fix for at_xdmac by Ludovic"

* 'fixes' of git://git.infradead.org/users/vkoul/slave-dma:
  dmaengine: Fix choppy sound because of unimplemented resume
  dmaengine: at_xdmac: rework slave configuration part
  dmaengine: at_xdmac: lock fixes
dabrace added a commit that referenced this pull request Jun 17, 2015
@dabrace dabrace merged commit 4a3aabf into dabrace:master Jun 17, 2015
dabrace pushed a commit that referenced this pull request Mar 23, 2017
Since the nfct and nfctinfo have been combined, the nf_conn structure
must be at least 8 bytes aligned, as the 3 LSB bits are used for the
nfctinfo. But there's a fake nf_conn structure to denote untracked
connections, which is created by a PER_CPU construct. This does not
guarantee that it will be 8 bytes aligned and can break the logic in
determining the correct nfctinfo.

I triggered this on a 32bit machine with the following error:

BUG: unable to handle kernel NULL pointer dereference at 00000af4
IP: nf_ct_deliver_cached_events+0x1b/0xfb
*pdpt = 0000000031962001 *pde = 0000000000000000

Oops: 0000 [#1] SMP
[Modules linked in: ip6t_REJECT nf_reject_ipv6 nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_filter ip6_tables ipv6 crc_ccitt ppdev r8169 parport_pc parport
  OK  ]
CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.10.0-test+ #75
Hardware name: MSI MS-7823/CSM-H87M-G43 (MS-7823), BIOS V1.6 02/22/2014
task: c126ec00 task.stack: c1258000
EIP: nf_ct_deliver_cached_events+0x1b/0xfb
EFLAGS: 00010202 CPU: 0
EAX: 0021cd01 EBX: 00000000 ECX: 27b0c767 EDX: 32bcb17a
ESI: f34135c0 EDI: f34135c0 EBP: f2debd60 ESP: f2debd3c
 DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
CR0: 80050033 CR2: 00000af4 CR3: 309a0440 CR4: 001406f0
Call Trace:
 <SOFTIRQ>
 ? ipv6_skip_exthdr+0xac/0xcb
 ipv6_confirm+0x10c/0x119 [nf_conntrack_ipv6]
 nf_hook_slow+0x22/0xc7
 nf_hook+0x9a/0xad [ipv6]
 ? ip6t_do_table+0x356/0x379 [ip6_tables]
 ? ip6_fragment+0x9e9/0x9e9 [ipv6]
 ip6_output+0xee/0x107 [ipv6]
 ? ip6_fragment+0x9e9/0x9e9 [ipv6]
 dst_output+0x36/0x4d [ipv6]
 NF_HOOK.constprop.37+0xb2/0xba [ipv6]
 ? icmp6_dst_alloc+0x2c/0xfd [ipv6]
 ? local_bh_enable+0x14/0x14 [ipv6]
 mld_sendpack+0x1c5/0x281 [ipv6]
 ? mark_held_locks+0x40/0x5c
 mld_ifc_timer_expire+0x1f6/0x21e [ipv6]
 call_timer_fn+0x135/0x283
 ? detach_if_pending+0x55/0x55
 ? mld_dad_timer_expire+0x3e/0x3e [ipv6]
 __run_timers+0x111/0x14b
 ? mld_dad_timer_expire+0x3e/0x3e [ipv6]
 run_timer_softirq+0x1c/0x36
 __do_softirq+0x185/0x37c
 ? test_ti_thread_flag.constprop.19+0xd/0xd
 do_softirq_own_stack+0x22/0x28
 </SOFTIRQ>
 irq_exit+0x5a/0xa4
 smp_apic_timer_interrupt+0x2a/0x34
 apic_timer_interrupt+0x37/0x3c

By using DEFINE/DECLARE_PER_CPU_ALIGNED we can enforce at least 8 byte
alignment as all cache line sizes are at least 8 bytes or more.

Fixes: a9e419d ("netfilter: merge ctinfo into nfct pointer storage area")
Signed-off-by: Steven Rostedt (VMware) <[email protected]>
Acked-by: Florian Westphal <[email protected]>
Signed-off-by: Pablo Neira Ayuso <[email protected]>
dabrace pushed a commit that referenced this pull request Sep 16, 2019
Driver copies FW commands to the HW queue as  units of 16 bytes. Some
of the command structures are not exact multiple of 16. So while copying
the data from those structures, the stack out of bounds messages are
reported by KASAN. The following error is reported.

[ 1337.530155] ==================================================================
[ 1337.530277] BUG: KASAN: stack-out-of-bounds in bnxt_qplib_rcfw_send_message+0x40a/0x850 [bnxt_re]
[ 1337.530413] Read of size 16 at addr ffff888725477a48 by task rmmod/2785

[ 1337.530540] CPU: 5 PID: 2785 Comm: rmmod Tainted: G           OE     5.2.0-rc6+ #75
[ 1337.530541] Hardware name: Dell Inc. PowerEdge R730/0599V5, BIOS 1.0.4 08/28/2014
[ 1337.530542] Call Trace:
[ 1337.530548]  dump_stack+0x5b/0x90
[ 1337.530556]  ? bnxt_qplib_rcfw_send_message+0x40a/0x850 [bnxt_re]
[ 1337.530560]  print_address_description+0x65/0x22e
[ 1337.530568]  ? bnxt_qplib_rcfw_send_message+0x40a/0x850 [bnxt_re]
[ 1337.530575]  ? bnxt_qplib_rcfw_send_message+0x40a/0x850 [bnxt_re]
[ 1337.530577]  __kasan_report.cold.3+0x37/0x77
[ 1337.530581]  ? _raw_write_trylock+0x10/0xe0
[ 1337.530588]  ? bnxt_qplib_rcfw_send_message+0x40a/0x850 [bnxt_re]
[ 1337.530590]  kasan_report+0xe/0x20
[ 1337.530592]  memcpy+0x1f/0x50
[ 1337.530600]  bnxt_qplib_rcfw_send_message+0x40a/0x850 [bnxt_re]
[ 1337.530608]  ? bnxt_qplib_creq_irq+0xa0/0xa0 [bnxt_re]
[ 1337.530611]  ? xas_create+0x3aa/0x5f0
[ 1337.530613]  ? xas_start+0x77/0x110
[ 1337.530615]  ? xas_clear_mark+0x34/0xd0
[ 1337.530623]  bnxt_qplib_free_mrw+0x104/0x1a0 [bnxt_re]
[ 1337.530631]  ? bnxt_qplib_destroy_ah+0x110/0x110 [bnxt_re]
[ 1337.530633]  ? bit_wait_io_timeout+0xc0/0xc0
[ 1337.530641]  bnxt_re_dealloc_mw+0x2c/0x60 [bnxt_re]
[ 1337.530648]  bnxt_re_destroy_fence_mr+0x77/0x1d0 [bnxt_re]
[ 1337.530655]  bnxt_re_dealloc_pd+0x25/0x60 [bnxt_re]
[ 1337.530677]  ib_dealloc_pd_user+0xbe/0xe0 [ib_core]
[ 1337.530683]  srpt_remove_one+0x5de/0x690 [ib_srpt]
[ 1337.530689]  ? __srpt_close_all_ch+0xc0/0xc0 [ib_srpt]
[ 1337.530692]  ? xa_load+0x87/0xe0
...
[ 1337.530840]  do_syscall_64+0x6d/0x1f0
[ 1337.530843]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 1337.530845] RIP: 0033:0x7ff5b389035b
[ 1337.530848] Code: 73 01 c3 48 8b 0d 2d 0b 2c 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa b8 b0 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d fd 0a 2c 00 f7 d8 64 89 01 48
[ 1337.530849] RSP: 002b:00007fff83425c28 EFLAGS: 00000206 ORIG_RAX: 00000000000000b0
[ 1337.530852] RAX: ffffffffffffffda RBX: 00005596443e6750 RCX: 00007ff5b389035b
[ 1337.530853] RDX: 000000000000000a RSI: 0000000000000800 RDI: 00005596443e67b8
[ 1337.530854] RBP: 0000000000000000 R08: 00007fff83424ba1 R09: 0000000000000000
[ 1337.530856] R10: 00007ff5b3902960 R11: 0000000000000206 R12: 00007fff83425e50
[ 1337.530857] R13: 00007fff8342673c R14: 00005596443e6260 R15: 00005596443e6750

[ 1337.530885] The buggy address belongs to the page:
[ 1337.530962] page:ffffea001c951dc0 refcount:0 mapcount:0 mapping:0000000000000000 index:0x0
[ 1337.530964] flags: 0x57ffffc0000000()
[ 1337.530967] raw: 0057ffffc0000000 0000000000000000 ffffffff1c950101 0000000000000000
[ 1337.530970] raw: 0000000000000000 0000000000000000 00000000ffffffff 0000000000000000
[ 1337.530970] page dumped because: kasan: bad access detected

[ 1337.530996] Memory state around the buggy address:
[ 1337.531072]  ffff888725477900: 00 00 00 00 f1 f1 f1 f1 00 00 00 00 00 f2 f2 f2
[ 1337.531180]  ffff888725477980: 00 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1 00
[ 1337.531288] >ffff888725477a00: 00 f2 f2 f2 f2 f2 f2 00 00 00 f2 00 00 00 00 00
[ 1337.531393]                                                  ^
[ 1337.531478]  ffff888725477a80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 1337.531585]  ffff888725477b00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 1337.531691] ==================================================================

Fix this by passing the exact size of each FW command to
bnxt_qplib_rcfw_send_message as req->cmd_size. Before sending
the command to HW, modify the req->cmd_size to number of 16 byte units.

Fixes: 1ac5a40 ("RDMA/bnxt_re: Add bnxt_re RoCE driver")
Signed-off-by: Selvin Xavier <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Doug Ledford <[email protected]>
dabrace pushed a commit that referenced this pull request Jan 2, 2020
…zones

In case we have to migrate a ballon page to a newpage of another zone, the
managed page count of both zones is wrong. Paired with memory offlining
(which will adjust the managed page count), we can trigger kernel crashes
and all kinds of different symptoms.

One way to reproduce:
1. Start a QEMU guest with 4GB, no NUMA
2. Hotplug a 1GB DIMM and online the memory to ZONE_NORMAL
3. Inflate the balloon to 1GB
4. Unplug the DIMM (be quick, otherwise unmovable data ends up on it)
5. Observe /proc/zoneinfo
  Node 0, zone   Normal
    pages free     16810
          min      24848885473806
          low      18471592959183339
          high     36918337032892872
          spanned  262144
          present  262144
          managed  18446744073709533486
6. Do anything that requires some memory (e.g., inflate the balloon some
more). The OOM goes crazy and the system crashes
  [  238.324946] Out of memory: Killed process 537 (login) total-vm:27584kB, anon-rss:860kB, file-rss:0kB, shmem-rss:00
  [  238.338585] systemd invoked oom-killer: gfp_mask=0x100cca(GFP_HIGHUSER_MOVABLE), order=0, oom_score_adj=0
  [  238.339420] CPU: 0 PID: 1 Comm: systemd Tainted: G      D W         5.4.0-next-20191204+ #75
  [  238.340139] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-59-gc9ba5276e321-prebuilt.qemu4
  [  238.341121] Call Trace:
  [  238.341337]  dump_stack+0x8f/0xd0
  [  238.341630]  dump_header+0x61/0x5ea
  [  238.341942]  oom_kill_process.cold+0xb/0x10
  [  238.342299]  out_of_memory+0x24d/0x5a0
  [  238.342625]  __alloc_pages_slowpath+0xd12/0x1020
  [  238.343024]  __alloc_pages_nodemask+0x391/0x410
  [  238.343407]  pagecache_get_page+0xc3/0x3a0
  [  238.343757]  filemap_fault+0x804/0xc30
  [  238.344083]  ? ext4_filemap_fault+0x28/0x42
  [  238.344444]  ext4_filemap_fault+0x30/0x42
  [  238.344789]  __do_fault+0x37/0x1a0
  [  238.345087]  __handle_mm_fault+0x104d/0x1ab0
  [  238.345450]  handle_mm_fault+0x169/0x360
  [  238.345790]  do_user_addr_fault+0x20d/0x490
  [  238.346154]  do_page_fault+0x31/0x210
  [  238.346468]  async_page_fault+0x43/0x50
  [  238.346797] RIP: 0033:0x7f47eba4197e
  [  238.347110] Code: Bad RIP value.
  [  238.347387] RSP: 002b:00007ffd7c0c1890 EFLAGS: 00010293
  [  238.347834] RAX: 0000000000000002 RBX: 000055d196a20a20 RCX: 00007f47eba4197e
  [  238.348437] RDX: 0000000000000033 RSI: 00007ffd7c0c18c0 RDI: 0000000000000004
  [  238.349047] RBP: 00007ffd7c0c1c20 R08: 0000000000000000 R09: 0000000000000033
  [  238.349660] R10: 00000000ffffffff R11: 0000000000000293 R12: 0000000000000001
  [  238.350261] R13: ffffffffffffffff R14: 0000000000000000 R15: 00007ffd7c0c18c0
  [  238.350878] Mem-Info:
  [  238.351085] active_anon:3121 inactive_anon:51 isolated_anon:0
  [  238.351085]  active_file:12 inactive_file:7 isolated_file:0
  [  238.351085]  unevictable:0 dirty:0 writeback:0 unstable:0
  [  238.351085]  slab_reclaimable:5565 slab_unreclaimable:10170
  [  238.351085]  mapped:3 shmem:111 pagetables:155 bounce:0
  [  238.351085]  free:720717 free_pcp:2 free_cma:0
  [  238.353757] Node 0 active_anon:12484kB inactive_anon:204kB active_file:48kB inactive_file:28kB unevictable:0kB iss
  [  238.355979] Node 0 DMA free:11556kB min:36kB low:48kB high:60kB reserved_highatomic:0KB active_anon:152kB inactivB
  [  238.358345] lowmem_reserve[]: 0 2955 2884 2884 2884
  [  238.358761] Node 0 DMA32 free:2677864kB min:7004kB low:10028kB high:13052kB reserved_highatomic:0KB active_anon:0B
  [  238.361202] lowmem_reserve[]: 0 0 72057594037927865 72057594037927865 72057594037927865
  [  238.361888] Node 0 Normal free:193448kB min:99395541895224kB low:73886371836733356kB high:147673348131571488kB reB
  [  238.364765] lowmem_reserve[]: 0 0 0 0 0
  [  238.365101] Node 0 DMA: 7*4kB (U) 5*8kB (UE) 6*16kB (UME) 2*32kB (UM) 1*64kB (U) 2*128kB (UE) 3*256kB (UME) 2*512B
  [  238.366379] Node 0 DMA32: 0*4kB 1*8kB (U) 2*16kB (UM) 2*32kB (UM) 2*64kB (UM) 1*128kB (U) 1*256kB (U) 1*512kB (U)B
  [  238.367654] Node 0 Normal: 1985*4kB (UME) 1321*8kB (UME) 844*16kB (UME) 524*32kB (UME) 300*64kB (UME) 138*128kB (B
  [  238.369184] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
  [  238.369915] 130 total pagecache pages
  [  238.370241] 0 pages in swap cache
  [  238.370533] Swap cache stats: add 0, delete 0, find 0/0
  [  238.370981] Free swap  = 0kB
  [  238.371239] Total swap = 0kB
  [  238.371488] 1048445 pages RAM
  [  238.371756] 0 pages HighMem/MovableOnly
  [  238.372090] 306992 pages reserved
  [  238.372376] 0 pages cma reserved
  [  238.372661] 0 pages hwpoisoned

In another instance (older kernel), I was able to observe this
(negative page count :/):
  [  180.896971] Offlined Pages 32768
  [  182.667462] Offlined Pages 32768
  [  184.408117] Offlined Pages 32768
  [  186.026321] Offlined Pages 32768
  [  187.684861] Offlined Pages 32768
  [  189.227013] Offlined Pages 32768
  [  190.830303] Offlined Pages 32768
  [  190.833071] Built 1 zonelists, mobility grouping on.  Total pages: -36920272750453009

In another instance (older kernel), I was no longer able to start any
process:
  [root@vm ~]# [  214.348068] Offlined Pages 32768
  [  215.973009] Offlined Pages 32768
  cat /proc/meminfo
  -bash: fork: Cannot allocate memory
  [root@vm ~]# cat /proc/meminfo
  -bash: fork: Cannot allocate memory

Fix it by properly adjusting the managed page count when migrating if
the zone changed. The managed page count of the zones now looks after
unplug of the DIMM (and after deflating the balloon) just like before
inflating the balloon (and plugging+onlining the DIMM).

We'll temporarily modify the totalram page count. If this ever becomes a
problem, we can fine tune by providing helpers that don't touch
the totalram pages (e.g., adjust_zone_managed_page_count()).

Please note that fixing up the managed page count is only necessary when
we adjusted the managed page count when inflating - only if we
don't have VIRTIO_BALLOON_F_DEFLATE_ON_OOM. With that feature, the
managed page count is not touched when inflating/deflating.

Reported-by: Yumei Huang <[email protected]>
Fixes: 3dcc057 ("mm: correctly update zone->managed_pages")
Cc: <[email protected]> # v3.11+
Cc: "Michael S. Tsirkin" <[email protected]>
Cc: Jason Wang <[email protected]>
Cc: Jiang Liu <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Igor Mammedov <[email protected]>
Cc: [email protected]
Signed-off-by: David Hildenbrand <[email protected]>
Signed-off-by: Michael S. Tsirkin <[email protected]>
dabrace pushed a commit that referenced this pull request Feb 1, 2021
…abled

When booting a kernel which has been built with CONFIG_AMD_MEM_ENCRYPT
enabled as a Xen pv guest a warning is issued for each processor:

[    5.964347] ------------[ cut here ]------------
[    5.968314] WARNING: CPU: 0 PID: 1 at /home/gross/linux/head/arch/x86/xen/enlighten_pv.c:660 get_trap_addr+0x59/0x90
[    5.972321] Modules linked in:
[    5.976313] CPU: 0 PID: 1 Comm: swapper/0 Tainted: G        W         5.11.0-rc5-default #75
[    5.980313] Hardware name: Dell Inc. OptiPlex 9020/0PC5F7, BIOS A05 12/05/2013
[    5.984313] RIP: e030:get_trap_addr+0x59/0x90
[    5.988313] Code: 42 10 83 f0 01 85 f6 74 04 84 c0 75 1d b8 01 00 00 00 c3 48 3d 00 80 83 82 72 08 48 3d 20 81 83 82 72 0c b8 01 00 00 00 eb db <0f> 0b 31 c0 c3 48 2d 00 80 83 82 48 ba 72 1c c7 71 1c c7 71 1c 48
[    5.992313] RSP: e02b:ffffc90040033d38 EFLAGS: 00010202
[    5.996313] RAX: 0000000000000001 RBX: ffffffff82a141d0 RCX: ffffffff8222ec38
[    6.000312] RDX: ffffffff8222ec38 RSI: 0000000000000005 RDI: ffffc90040033d40
[    6.004313] RBP: ffff8881003984a0 R08: 0000000000000007 R09: ffff888100398000
[    6.008312] R10: 0000000000000007 R11: ffffc90040246000 R12: ffff8884082182a8
[    6.012313] R13: 0000000000000100 R14: 000000000000001d R15: ffff8881003982d0
[    6.016316] FS:  0000000000000000(0000) GS:ffff888408200000(0000) knlGS:0000000000000000
[    6.020313] CS:  e030 DS: 0000 ES: 0000 CR0: 0000000080050033
[    6.024313] CR2: ffffc900020ef000 CR3: 000000000220a000 CR4: 0000000000050660
[    6.028314] Call Trace:
[    6.032313]  cvt_gate_to_trap.part.7+0x3f/0x90
[    6.036313]  ? asm_exc_double_fault+0x30/0x30
[    6.040313]  xen_convert_trap_info+0x87/0xd0
[    6.044313]  xen_pv_cpu_up+0x17a/0x450
[    6.048313]  bringup_cpu+0x2b/0xc0
[    6.052313]  ? cpus_read_trylock+0x50/0x50
[    6.056313]  cpuhp_invoke_callback+0x80/0x4c0
[    6.060313]  _cpu_up+0xa7/0x140
[    6.064313]  cpu_up+0x98/0xd0
[    6.068313]  bringup_nonboot_cpus+0x4f/0x60
[    6.072313]  smp_init+0x26/0x79
[    6.076313]  kernel_init_freeable+0x103/0x258
[    6.080313]  ? rest_init+0xd0/0xd0
[    6.084313]  kernel_init+0xa/0x110
[    6.088313]  ret_from_fork+0x1f/0x30
[    6.092313] ---[ end trace be9ecf17dceeb4f3 ]---

Reason is that there is no Xen pv trap entry for X86_TRAP_VC.

Fix that by adding a generic trap handler for unknown traps and wire all
unknown bare metal handlers to this generic handler, which will just
crash the system in case such a trap will ever happen.

Fixes: 0786138 ("x86/sev-es: Add a Runtime #VC Exception Handler")
Cc: <[email protected]> # v5.10
Signed-off-by: Juergen Gross <[email protected]>
Reviewed-by: Andrew Cooper <[email protected]>
Signed-off-by: Juergen Gross <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.