I2S interrupts not triggered #161

mhelin · 2012-11-21T12:24:41Z

When I2S/PCM block is programmed to trigger an interrupt #55 when for an example transmit FIFO is below it's threshold level no IRQ 55 is triggered and interrupt handler is not called. However, armctrl supposedly implements handler for IRQ 81 which shows some activity in /proc/interrupts. It's not possible (Raspberry hangs) to register an interrupt handler for this (high) IRQ number. This might also be a firmware issue.

philpoole · 2012-12-22T07:52:08Z

Hi,
This issue has been documented in this forum thread http://www.raspberrypi.org/phpBB3/viewtopic.php?f=44&t=8496&start=125 .
It can be reproduced very easily with the gist below.
(If that fails, it should be viewable in https://gist.github.com/4357922, or the original thread - sorry, I'm new to github).
Basically, it's a very simple module that sets up the I2S PCM device with a few register pokes, and then starts filling the FIFO in order to trigger an interrupt. I'm happy the setup is reasonable, as I've had it working with my I2S DAC without using interrupts (with an immense CPU load, but it works).
Initially, it looks like nothing is happening, but /proc/interrupts suggests that interrupts are firing on IRQ 81, and PCM_INTSTC_A suggests that the interrupt line is high from the PCM block.

By placing debug in the kernel. I can see that IRQ 81 is being triggered in handle_level_irq() in the kernel, but isn't getting any further (not calling our ISR) because the IRQ is disabled, and has no action defined. This is with my test driver only setting up IRQ 55, so obviously some setup for 81 is required. However, doing that seems to make the pi crash (I assume) when the IRQ is triggered.
I tried hacking the kernel to also trigger the registered IRQ function for 55 (a very simple stub function - just a printk) as well as for 81 when 81 was fired, but again that just crashed. So something seems incorrect.
It's very possible we're setting interrupts up incorrectly. I took initial inspiration from my Linux Device Drivers book (3rd edition - which is rapidly becoming out of date now), and then from bcm2708_gpio.c, but I'm not 100% certain it's setup correctly.
I can see that there is some IRQ remapping in armctrl.c that is linking 55 to 81 (and, undoing that remapping crashes the box too {I've tried alsorts}).

Is there anything I can do to progress this further? Perhaps some debug I can perform? Is there a kernel configuration I need to set?

Where do you think this could be going wrong?

ghollingworth · 2012-12-22T15:19:41Z

Are you sure you are handling the interrupt correctly? If you've enabled
it but are not handling the interrupt properly it could be locking up...

Check the dwc_otg code for an example of how to use an interrupt properly
(search the code for MPHI and you should find the relevant bits...

drivers/usb/host/dwc_otg/*

Gordon

On 22 December 2012 07:52, philpoole [email protected] wrote:

Hi,
This issue has been documented in this forum thread
http://www.raspberrypi.org/phpBB3/viewtopic.php?f=44&t=8496&start=125 .
It can be reproduced very easily with the gist below.
(If that fails, it should be viewable in https://gist.github.com/4357922,
or the original thread - sorry, I'm new to github).
Basically, it's a very simple module that sets up the I2S PCM device with
a few register pokes, and then starts filling the FIFO in order to trigger
an interrupt. I'm happy the setup is reasonable, as I've had it working
with my I2S DAC without using interrupts (with an immense CPU load, but it
works).
Initially, it looks like nothing is happening, but /proc/interrupts
suggests that interrupts are firing on IRQ 81, and PCM_INTSTC_A suggests
that the interrupt line is high from the PCM block.

By placing debug in the kernel. I can see that IRQ 81 is being triggered
in handle_level_irq() in the kernel, but isn't getting any further (not
calling our ISR) because the IRQ is disabled, and has no action defined.
This is with my test driver only setting up IRQ 55, so obviously some setup
for 81 is required. However, doing that seems to make the pi crash (I
assume) when the IRQ is triggered.
I tried hacking the kernel to also trigger the registered IRQ function for
55 (a very simple stub function - just a printk) as well as for 81 when 81
was fired, but again that just crashed. So something seems incorrect.
It's very possible we're setting interrupts up incorrectly. I took initial
inspiration from my Linux Device Drivers book (3rd edition - which is
rapidly becoming out of date now), and then from bcm2708_gpio.c, but I'm
not 100% certain it's setup correctly.
I can see that there is some IRQ remapping in armctrl.c that is linking 55
to 81 (and, undoing that remapping crashes the box too {I've tried
alsorts}).

Is there anything I can do to progress this further? Perhaps some debug I
can perform? Is there a kernel configuration I need to set?

Where do you think this could be going wrong?

—
Reply to this email directly or view it on GitHubhttps://github.com//issues/161#issuecomment-11634895.

philpoole · 2012-12-22T18:40:35Z

Hi Gordon,
It's entirely possible that I'm doing things wrong. Knowing what works and what doesn't is a good start, so thanks for that. I shall have a look...
Regards,
Phil

ghollingworth · 2012-12-22T19:30:04Z

Phil,

Yes even I had quite a bit of trouble getting the FIQ working and I had
access to the verilog to check the wiring of the interrupts!!

Gordon

On 22 December 2012 18:40, philpoole [email protected] wrote:

Hi Gordon,
It's entirely possible that I'm doing things wrong. Knowing what works and
what doesn't is a good start, so thanks for that. I shall have a look...
Regards,
Phil

—
Reply to this email directly or view it on GitHubhttps://github.com//issues/161#issuecomment-11639954.

philpoole · 2012-12-22T21:03:50Z

Hi Gordon,
Glad I'm not the only one!
Looking at dwc_otg_pcd_linux.c for instance, I notice that it uses request_irq() and free_irq() (as opposed to setup_irq() and remove_irq() that I used). I had originally tried that, but to no avail, and attempted the latter method from assessing bcm2708_gpi.c.
I'll have another look, but I'm afraid I'm sceptical at the moment (although perhaps I can debug the kernel to discover different behaviour - which might help).
Thanks for the help!
Phil

mhelin · 2012-12-22T22:13:20Z

Hi,

I've also played with both IRQ setup ways (request_irq and setupt_irq) but with no results, the IRQ #55 was never called.

In dwc_otg_driver.c request_irq is used as well:

retval = request_irq(devirq, dwc_otg_common_irq,
IRQF_SHARED,
"dwc_otg", dwc_otg_device);

"dwc_otg_common_irq" is the interrupt handler, struct dwc_otg_device contains some data.

ghollingworth · 2012-12-23T07:38:58Z

So I assume you are also using the same platform initialisation to allocate
the IRQ? i.e. you need to make sure your driver is correctly given the IRQ
resource in bcm2708.c

Have you tried allocating an IRQ that you know definitely works (such as
the MPHI or USB interrupt?) That way you know that you are correctly
allocating the resources etc.

Gordon

On 22 December 2012 22:13, mhelin [email protected] wrote:

Hi,

I've also played with both IRQ setup ways (request_irq and setupt_irq) but
with no results, the IRQ #55 https://github.com/raspberrypi/linux/issues/55was never called.

In dwc_otg_driver.c request_irq is used as well:

retval = request_irq(devirq, dwc_otg_common_irq,
IRQF_SHARED,
"dwc_otg", dwc_otg_device);

"dwc_otg_common_irq" is the interrupt handler, struct dwc_otg_device
contains some data.

—
Reply to this email directly or view it on GitHubhttps://github.com//issues/161#issuecomment-11641851.

philpoole · 2012-12-23T10:18:38Z

Ok, I shall have a look (although christmas is looming, so it might take a while). I didn't make any changes to the kernel in order to implement this (just to debug). I was hoping that was unnecessary.
Trying a working, shared interrupt with a benign handler seems a sensible idea...

Regards,
Phil

philpoole · 2012-12-28T08:43:20Z

Been debugging further, and can actually see interrupts!
Basically, if I setup IRQ 81, as soon as INTEN_A is set (interrupts enabled on the PCM block), then infinite interrupts are fired. I wasn't seeing this initially, just a locked up pi.
It's strange, because nothing seems to stall these interrupt, except disabling them (clearing the status bit in INTSTC by writing 0xf does nothing, ensuring the interrupts should occur when full (00 to bits 6 and 5 of CS_A), and then maintaining an empty buffer doesn't help either).

ghollingworth · 2012-12-28T08:48:39Z

Likely to be the tx fifo empty interrupt being triggered have you tried
masking those interrupts?

Do you know which interrupts are actually being triggered

Gordon
On 28 Dec 2012 08:43, "philpoole" [email protected] wrote:

Been debugging further, and can actually see interrupts!
Basically, if I setup IRQ 81, as soon as INTEN_A is set (interrupts
enabled on the PCM block), then infinite interrupts are fired. I wasn't
seeing this initially, just a locked up pi.
It's strange, because nothing seems to stall these interrupt, except
disabling them (clearing the status bit in INTSTC by writing 0xf does
nothing, ensuring the interrupts should occur when full (00 to bits 6 and 5
of CS_A), and then maintaining an empty buffer doesn't help either).

—
Reply to this email directly or view it on GitHubhttps://github.com//issues/161#issuecomment-11727766.

philpoole · 2012-12-28T10:00:58Z

Hi Gordon,
Yes, well, it's definitely to do with the registers not behaving as I'd expect. Need to have a play...

Phil

philpoole · 2013-01-02T08:22:52Z

I appear to be getting regular interrupts now. I didn't realise, if you don't clear the TXERR flag when it occurs, it seems to just interrupt like crazy (even if you deal with the underflow, as I was).
So I think this issue isn't a problem now. The other issues are related to misuse of registers :)
Thanks for the help.

mhelin · 2013-01-02T09:01:52Z

So the problem when using IRQ #81 was the interrupt storm actually? What are the other register issues, please post your code to the I2S thread on forums when you get it working better. Can we now close this issue? I'm still wondering the IRQ number issue because the documentation specifies only the IRQ #55.

mhelin · 2013-01-04T09:30:57Z

After implementing the changes described I also got the I2S FIFO interrupt (undocumented IRQ 81) working so I'm closing this issue now. Thanks for everyone helping.

This code has been working since 2012, as limiting the time between 1ms and 3ms is unlikely to do any harm. Soothes checkpatch warning: WARNING: msleep < 20ms can sleep for up to 20ms; see Documentation/timers/timers-howto.txt #161: FILE: drivers/mfd/88pm805.c:161: + msleep(1); Signed-off-by: Lee Jones <[email protected]>

commit a41537e upstream. O_DIRECT flags can be toggeled via fcntl(F_SETFL). But this value checked twice inside ext4_file_write_iter() and __generic_file_write() which result in BUG_ON inside ext4_direct_IO. Let's initialize iocb->private unconditionally. TESTCASE: xfstest:generic/036 https://patchwork.ozlabs.org/patch/402445/ #TYPICAL STACK TRACE: kernel BUG at fs/ext4/inode.c:2960! invalid opcode: 0000 [#1] SMP Modules linked in: brd iTCO_wdt lpc_ich mfd_core igb ptp dm_mirror dm_region_hash dm_log dm_mod CPU: 6 PID: 5505 Comm: aio-dio-fcntl-r Not tainted 3.17.0-rc2-00176-gff5c017 #161 Hardware name: Intel Corporation W2600CR/W2600CR, BIOS SE5C600.86B.99.99.x028.061320111235 06/13/2011 task: ffff88080e95a7c0 ti: ffff88080f908000 task.ti: ffff88080f908000 RIP: 0010:[<ffffffff811fabf2>] [<ffffffff811fabf2>] ext4_direct_IO+0x162/0x3d0 RSP: 0018:ffff88080f90bb58 EFLAGS: 00010246 RAX: 0000000000000400 RBX: ffff88080fdb2a28 RCX: 00000000a802c818 RDX: 0000040000080000 RSI: ffff88080d8aeb80 RDI: 0000000000000001 RBP: ffff88080f90bbc8 R08: 0000000000000000 R09: 0000000000001581 R10: 0000000000000000 R11: 0000000000000000 R12: ffff88080d8aeb80 R13: ffff88080f90bbf8 R14: ffff88080fdb28c8 R15: ffff88080fdb2a28 FS: 00007f23b2055700(0000) GS:ffff880818400000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f23b2045000 CR3: 000000080cedf000 CR4: 00000000000407e0 Stack: ffff88080f90bb98 0000000000000000 7ffffffffffffffe ffff88080fdb2c30 0000000000000200 0000000000000200 0000000000000001 0000000000000200 ffff88080f90bbc8 ffff88080fdb2c30 ffff88080f90be08 0000000000000200 Call Trace: [<ffffffff8112ca9d>] generic_file_direct_write+0xed/0x180 [<ffffffff8112f2b2>] __generic_file_write_iter+0x222/0x370 [<ffffffff811f495b>] ext4_file_write_iter+0x34b/0x400 [<ffffffff811bd709>] ? aio_run_iocb+0x239/0x410 [<ffffffff811bd709>] ? aio_run_iocb+0x239/0x410 [<ffffffff810990e5>] ? local_clock+0x25/0x30 [<ffffffff810abd94>] ? __lock_acquire+0x274/0x700 [<ffffffff811f4610>] ? ext4_unwritten_wait+0xb0/0xb0 [<ffffffff811bd756>] aio_run_iocb+0x286/0x410 [<ffffffff810990e5>] ? local_clock+0x25/0x30 [<ffffffff810ac359>] ? lock_release_holdtime+0x29/0x190 [<ffffffff811bc05b>] ? lookup_ioctx+0x4b/0xf0 [<ffffffff811bde3b>] do_io_submit+0x55b/0x740 [<ffffffff811bdcaa>] ? do_io_submit+0x3ca/0x740 [<ffffffff811be030>] SyS_io_submit+0x10/0x20 [<ffffffff815ce192>] system_call_fastpath+0x16/0x1b Code: 01 48 8b 80 f0 01 00 00 48 8b 18 49 8b 45 10 0f 85 f1 01 00 00 48 03 45 c8 48 3b 43 48 0f 8f e3 01 00 00 49 83 7c 24 18 00 75 04 <0f> 0b eb fe f0 ff 83 ec 01 00 00 49 8b 44 24 18 8b 00 85 c0 89 RIP [<ffffffff811fabf2>] ext4_direct_IO+0x162/0x3d0 RSP <ffff88080f90bb58> Reported-by: Sasha Levin <[email protected]> Signed-off-by: Theodore Ts'o <[email protected]> Signed-off-by: Dmitry Monakhov <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>

commit a41537e upstream. O_DIRECT flags can be toggeled via fcntl(F_SETFL). But this value checked twice inside ext4_file_write_iter() and __generic_file_write() which result in BUG_ON inside ext4_direct_IO. Let's initialize iocb->private unconditionally. TESTCASE: xfstest:generic/036 https://patchwork.ozlabs.org/patch/402445/ #TYPICAL STACK TRACE: kernel BUG at fs/ext4/inode.c:2960! invalid opcode: 0000 [raspberrypi#1] SMP Modules linked in: brd iTCO_wdt lpc_ich mfd_core igb ptp dm_mirror dm_region_hash dm_log dm_mod CPU: 6 PID: 5505 Comm: aio-dio-fcntl-r Not tainted 3.17.0-rc2-00176-gff5c017 raspberrypi#161 Hardware name: Intel Corporation W2600CR/W2600CR, BIOS SE5C600.86B.99.99.x028.061320111235 06/13/2011 task: ffff88080e95a7c0 ti: ffff88080f908000 task.ti: ffff88080f908000 RIP: 0010:[<ffffffff811fabf2>] [<ffffffff811fabf2>] ext4_direct_IO+0x162/0x3d0 RSP: 0018:ffff88080f90bb58 EFLAGS: 00010246 RAX: 0000000000000400 RBX: ffff88080fdb2a28 RCX: 00000000a802c818 RDX: 0000040000080000 RSI: ffff88080d8aeb80 RDI: 0000000000000001 RBP: ffff88080f90bbc8 R08: 0000000000000000 R09: 0000000000001581 R10: 0000000000000000 R11: 0000000000000000 R12: ffff88080d8aeb80 R13: ffff88080f90bbf8 R14: ffff88080fdb28c8 R15: ffff88080fdb2a28 FS: 00007f23b2055700(0000) GS:ffff880818400000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f23b2045000 CR3: 000000080cedf000 CR4: 00000000000407e0 Stack: ffff88080f90bb98 0000000000000000 7ffffffffffffffe ffff88080fdb2c30 0000000000000200 0000000000000200 0000000000000001 0000000000000200 ffff88080f90bbc8 ffff88080fdb2c30 ffff88080f90be08 0000000000000200 Call Trace: [<ffffffff8112ca9d>] generic_file_direct_write+0xed/0x180 [<ffffffff8112f2b2>] __generic_file_write_iter+0x222/0x370 [<ffffffff811f495b>] ext4_file_write_iter+0x34b/0x400 [<ffffffff811bd709>] ? aio_run_iocb+0x239/0x410 [<ffffffff811bd709>] ? aio_run_iocb+0x239/0x410 [<ffffffff810990e5>] ? local_clock+0x25/0x30 [<ffffffff810abd94>] ? __lock_acquire+0x274/0x700 [<ffffffff811f4610>] ? ext4_unwritten_wait+0xb0/0xb0 [<ffffffff811bd756>] aio_run_iocb+0x286/0x410 [<ffffffff810990e5>] ? local_clock+0x25/0x30 [<ffffffff810ac359>] ? lock_release_holdtime+0x29/0x190 [<ffffffff811bc05b>] ? lookup_ioctx+0x4b/0xf0 [<ffffffff811bde3b>] do_io_submit+0x55b/0x740 [<ffffffff811bdcaa>] ? do_io_submit+0x3ca/0x740 [<ffffffff811be030>] SyS_io_submit+0x10/0x20 [<ffffffff815ce192>] system_call_fastpath+0x16/0x1b Code: 01 48 8b 80 f0 01 00 00 48 8b 18 49 8b 45 10 0f 85 f1 01 00 00 48 03 45 c8 48 3b 43 48 0f 8f e3 01 00 00 49 83 7c 24 18 00 75 04 <0f> 0b eb fe f0 ff 83 ec 01 00 00 49 8b 44 24 18 8b 00 85 c0 89 RIP [<ffffffff811fabf2>] ext4_direct_IO+0x162/0x3d0 RSP <ffff88080f90bb58> Reported-by: Sasha Levin <[email protected]> Signed-off-by: Theodore Ts'o <[email protected]> Signed-off-by: Dmitry Monakhov <[email protected]> [hujianyang: Backported to 3.10 - Move initialization of iocb->private to ext4_file_write() as we don't have ext4_file_write_iter(), which is introduced by commit 9b88416. - Adjust context to make 'overwrite' changes apply to ext4_file_dio_write() as ext4_file_dio_write() is not move into ext4_file_write()] Signed-off-by: hujianyang <[email protected]> Signed-off-by: Jiri Slaby <[email protected]>

I got this: kasan: GPF could be caused by NULL-ptr deref or user memory access general protection fault: 0000 [#1] PREEMPT SMP KASAN Dumping ftrace buffer: (ftrace buffer empty) CPU: 0 PID: 5505 Comm: syz-executor Not tainted 4.8.0-rc2+ #161 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.9.3-0-ge2fc41e-prebuilt.qemu-project.org 04/01/2014 task: ffff880113415940 task.stack: ffff880118350000 RIP: 0010:[<ffffffff8172cb32>] [<ffffffff8172cb32>] bd_mount+0x52/0xa0 RSP: 0018:ffff880118357ca0 EFLAGS: 00010207 RAX: dffffc0000000000 RBX: ffffffffffffffff RCX: ffffc90000bb6000 RDX: 0000000000000018 RSI: ffffffff846d6b20 RDI: 00000000000000c7 RBP: ffff880118357cb0 R08: ffff880115967c68 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: ffff8801188211e8 R13: ffffffff847baa20 R14: ffff8801139cb000 R15: 0000000000000080 FS: 00007fa3ff6c0700(0000) GS:ffff88011aa00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007fc1d8cc7e78 CR3: 0000000109f20000 CR4: 00000000000006f0 DR0: 000000000000001e DR1: 000000000000001e DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000600 Stack: ffff880112cfd6c0 ffff8801188211e8 ffff880118357cf0 ffffffff8167f207 ffffffff816d7a1e ffff880112a413c0 ffffffff847baa20 ffff8801188211e8 0000000000000080 ffff880112cfd6c0 ffff880118357d38 ffffffff816dce0a Call Trace: [<ffffffff8167f207>] mount_fs+0x97/0x2e0 [<ffffffff816d7a1e>] ? alloc_vfsmnt+0x55e/0x760 [<ffffffff816dce0a>] vfs_kern_mount+0x7a/0x300 [<ffffffff83c3247c>] ? _raw_read_unlock+0x2c/0x50 [<ffffffff816dfc87>] do_mount+0x3d7/0x2730 [<ffffffff81235fd4>] ? trace_do_page_fault+0x1f4/0x3a0 [<ffffffff816df8b0>] ? copy_mount_string+0x40/0x40 [<ffffffff8161ea81>] ? memset+0x31/0x40 [<ffffffff816df73e>] ? copy_mount_options+0x1ee/0x320 [<ffffffff816e2a02>] SyS_mount+0xb2/0x120 [<ffffffff816e2950>] ? copy_mnt_ns+0x970/0x970 [<ffffffff81005524>] do_syscall_64+0x1c4/0x4e0 [<ffffffff83c3282a>] entry_SYSCALL64_slow_path+0x25/0x25 Code: 83 e8 63 1b fc ff 48 85 c0 48 89 c3 74 4c e8 56 35 d1 ff 48 8d bb c8 00 00 00 48 b8 00 00 00 00 00 fc ff df 48 89 fa 48 c1 ea 03 <80> 3c 02 00 75 36 4c 8b a3 c8 00 00 00 48 b8 00 00 00 00 00 fc RIP [<ffffffff8172cb32>] bd_mount+0x52/0xa0 RSP <ffff880118357ca0> ---[ end trace 13690ad962168b98 ]--- mount_pseudo() returns ERR_PTR(), not NULL, on error. Fixes: 3684aa7 ("block-dev: enable writeback cgroup support") Cc: Shaohua Li <[email protected]> Cc: Tejun Heo <[email protected]> Cc: Jens Axboe <[email protected]> Cc: [email protected] Signed-off-by: Vegard Nossum <[email protected]> Signed-off-by: Jens Axboe <[email protected]>

commit e9e5e3f upstream. I got this: kasan: GPF could be caused by NULL-ptr deref or user memory access general protection fault: 0000 [#1] PREEMPT SMP KASAN Dumping ftrace buffer: (ftrace buffer empty) CPU: 0 PID: 5505 Comm: syz-executor Not tainted 4.8.0-rc2+ #161 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.9.3-0-ge2fc41e-prebuilt.qemu-project.org 04/01/2014 task: ffff880113415940 task.stack: ffff880118350000 RIP: 0010:[<ffffffff8172cb32>] [<ffffffff8172cb32>] bd_mount+0x52/0xa0 RSP: 0018:ffff880118357ca0 EFLAGS: 00010207 RAX: dffffc0000000000 RBX: ffffffffffffffff RCX: ffffc90000bb6000 RDX: 0000000000000018 RSI: ffffffff846d6b20 RDI: 00000000000000c7 RBP: ffff880118357cb0 R08: ffff880115967c68 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: ffff8801188211e8 R13: ffffffff847baa20 R14: ffff8801139cb000 R15: 0000000000000080 FS: 00007fa3ff6c0700(0000) GS:ffff88011aa00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007fc1d8cc7e78 CR3: 0000000109f20000 CR4: 00000000000006f0 DR0: 000000000000001e DR1: 000000000000001e DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000600 Stack: ffff880112cfd6c0 ffff8801188211e8 ffff880118357cf0 ffffffff8167f207 ffffffff816d7a1e ffff880112a413c0 ffffffff847baa20 ffff8801188211e8 0000000000000080 ffff880112cfd6c0 ffff880118357d38 ffffffff816dce0a Call Trace: [<ffffffff8167f207>] mount_fs+0x97/0x2e0 [<ffffffff816d7a1e>] ? alloc_vfsmnt+0x55e/0x760 [<ffffffff816dce0a>] vfs_kern_mount+0x7a/0x300 [<ffffffff83c3247c>] ? _raw_read_unlock+0x2c/0x50 [<ffffffff816dfc87>] do_mount+0x3d7/0x2730 [<ffffffff81235fd4>] ? trace_do_page_fault+0x1f4/0x3a0 [<ffffffff816df8b0>] ? copy_mount_string+0x40/0x40 [<ffffffff8161ea81>] ? memset+0x31/0x40 [<ffffffff816df73e>] ? copy_mount_options+0x1ee/0x320 [<ffffffff816e2a02>] SyS_mount+0xb2/0x120 [<ffffffff816e2950>] ? copy_mnt_ns+0x970/0x970 [<ffffffff81005524>] do_syscall_64+0x1c4/0x4e0 [<ffffffff83c3282a>] entry_SYSCALL64_slow_path+0x25/0x25 Code: 83 e8 63 1b fc ff 48 85 c0 48 89 c3 74 4c e8 56 35 d1 ff 48 8d bb c8 00 00 00 48 b8 00 00 00 00 00 fc ff df 48 89 fa 48 c1 ea 03 <80> 3c 02 00 75 36 4c 8b a3 c8 00 00 00 48 b8 00 00 00 00 00 fc RIP [<ffffffff8172cb32>] bd_mount+0x52/0xa0 RSP <ffff880118357ca0> ---[ end trace 13690ad962168b98 ]--- mount_pseudo() returns ERR_PTR(), not NULL, on error. Fixes: 3684aa7 ("block-dev: enable writeback cgroup support") Cc: Shaohua Li <[email protected]> Cc: Tejun Heo <[email protected]> Cc: Jens Axboe <[email protected]> Signed-off-by: Vegard Nossum <[email protected]> Signed-off-by: Jens Axboe <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>

RDS currently doesn't check if the length of the control message is large enough to hold the required data, before dereferencing the control message data. This results in following crash: BUG: KASAN: stack-out-of-bounds in rds_rdma_bytes net/rds/send.c:1013 [inline] BUG: KASAN: stack-out-of-bounds in rds_sendmsg+0x1f02/0x1f90 net/rds/send.c:1066 Read of size 8 at addr ffff8801c928fb70 by task syzkaller455006/3157 CPU: 0 PID: 3157 Comm: syzkaller455006 Not tainted 4.15.0-rc3+ #161 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:17 [inline] dump_stack+0x194/0x257 lib/dump_stack.c:53 print_address_description+0x73/0x250 mm/kasan/report.c:252 kasan_report_error mm/kasan/report.c:351 [inline] kasan_report+0x25b/0x340 mm/kasan/report.c:409 __asan_report_load8_noabort+0x14/0x20 mm/kasan/report.c:430 rds_rdma_bytes net/rds/send.c:1013 [inline] rds_sendmsg+0x1f02/0x1f90 net/rds/send.c:1066 sock_sendmsg_nosec net/socket.c:628 [inline] sock_sendmsg+0xca/0x110 net/socket.c:638 ___sys_sendmsg+0x320/0x8b0 net/socket.c:2018 __sys_sendmmsg+0x1ee/0x620 net/socket.c:2108 SYSC_sendmmsg net/socket.c:2139 [inline] SyS_sendmmsg+0x35/0x60 net/socket.c:2134 entry_SYSCALL_64_fastpath+0x1f/0x96 RIP: 0033:0x43fe49 RSP: 002b:00007fffbe244ad8 EFLAGS: 00000217 ORIG_RAX: 0000000000000133 RAX: ffffffffffffffda RBX: 00000000004002c8 RCX: 000000000043fe49 RDX: 0000000000000001 RSI: 000000002020c000 RDI: 0000000000000003 RBP: 00000000006ca018 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000217 R12: 00000000004017b0 R13: 0000000000401840 R14: 0000000000000000 R15: 0000000000000000 To fix this, we verify that the cmsg_len is large enough to hold the data to be read, before proceeding further. Reported-by: syzbot <[email protected]> Signed-off-by: Avinash Repaka <[email protected]> Acked-by: Santosh Shilimkar <[email protected]> Reviewed-by: Yuval Shaia <[email protected]> Signed-off-by: David S. Miller <[email protected]>

[ Upstream commit 14e138a ] RDS currently doesn't check if the length of the control message is large enough to hold the required data, before dereferencing the control message data. This results in following crash: BUG: KASAN: stack-out-of-bounds in rds_rdma_bytes net/rds/send.c:1013 [inline] BUG: KASAN: stack-out-of-bounds in rds_sendmsg+0x1f02/0x1f90 net/rds/send.c:1066 Read of size 8 at addr ffff8801c928fb70 by task syzkaller455006/3157 CPU: 0 PID: 3157 Comm: syzkaller455006 Not tainted 4.15.0-rc3+ #161 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:17 [inline] dump_stack+0x194/0x257 lib/dump_stack.c:53 print_address_description+0x73/0x250 mm/kasan/report.c:252 kasan_report_error mm/kasan/report.c:351 [inline] kasan_report+0x25b/0x340 mm/kasan/report.c:409 __asan_report_load8_noabort+0x14/0x20 mm/kasan/report.c:430 rds_rdma_bytes net/rds/send.c:1013 [inline] rds_sendmsg+0x1f02/0x1f90 net/rds/send.c:1066 sock_sendmsg_nosec net/socket.c:628 [inline] sock_sendmsg+0xca/0x110 net/socket.c:638 ___sys_sendmsg+0x320/0x8b0 net/socket.c:2018 __sys_sendmmsg+0x1ee/0x620 net/socket.c:2108 SYSC_sendmmsg net/socket.c:2139 [inline] SyS_sendmmsg+0x35/0x60 net/socket.c:2134 entry_SYSCALL_64_fastpath+0x1f/0x96 RIP: 0033:0x43fe49 RSP: 002b:00007fffbe244ad8 EFLAGS: 00000217 ORIG_RAX: 0000000000000133 RAX: ffffffffffffffda RBX: 00000000004002c8 RCX: 000000000043fe49 RDX: 0000000000000001 RSI: 000000002020c000 RDI: 0000000000000003 RBP: 00000000006ca018 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000217 R12: 00000000004017b0 R13: 0000000000401840 R14: 0000000000000000 R15: 0000000000000000 To fix this, we verify that the cmsg_len is large enough to hold the data to be read, before proceeding further. Reported-by: syzbot <[email protected]> Signed-off-by: Avinash Repaka <[email protected]> Acked-by: Santosh Shilimkar <[email protected]> Reviewed-by: Yuval Shaia <[email protected]> Signed-off-by: David S. Miller <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>

[ Upstream commit afd0738 ] One of the vmalloc stress test case triggers the kernel BUG(): <snip> [60.562151] ------------[ cut here ]------------ [60.562154] kernel BUG at mm/vmalloc.c:512! [60.562206] invalid opcode: 0000 [#1] PREEMPT SMP PTI [60.562247] CPU: 0 PID: 430 Comm: vmalloc_test/0 Not tainted 4.20.0+ #161 [60.562293] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014 [60.562351] RIP: 0010:alloc_vmap_area+0x36f/0x390 <snip> it can happen due to big align request resulting in overflowing of calculated address, i.e. it becomes 0 after ALIGN()'s fixup. Fix it by checking if calculated address is within vstart/vend range. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Uladzislau Rezki (Sony) <[email protected]> Reviewed-by: Andrew Morton <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Joel Fernandes <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Oleksiy Avramchenko <[email protected]> Cc: Steven Rostedt <[email protected]> Cc: Tejun Heo <[email protected]> Cc: Thomas Garnier <[email protected]> Cc: Thomas Gleixner <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]> Signed-off-by: Sasha Levin <[email protected]>

Nadav reports running into the below splat on re-enabling softirqs: WARNING: CPU: 2 PID: 1777 at kernel/softirq.c:364 __local_bh_enable_ip+0xaa/0xe0 Modules linked in: CPU: 2 PID: 1777 Comm: umem Not tainted 5.13.1+ #161 Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/22/2020 RIP: 0010:__local_bh_enable_ip+0xaa/0xe0 Code: a9 00 ff ff 00 74 38 65 ff 0d a2 21 8c 7a e8 ed 1a 20 00 fb 66 0f 1f 44 00 00 5b 41 5c 5d c3 65 8b 05 e6 2d 8c 7a 85 c0 75 9a <0f> 0b eb 96 e8 2d 1f 20 00 eb a5 4c 89 e7 e8 73 4f 0c 00 eb ae 65 RSP: 0018:ffff88812e58fcc8 EFLAGS: 00010046 RAX: 0000000000000000 RBX: 0000000000000201 RCX: dffffc0000000000 RDX: 0000000000000007 RSI: 0000000000000201 RDI: ffffffff8898c5ac RBP: ffff88812e58fcd8 R08: ffffffff8575dbbf R09: ffffed1028ef14f9 R10: ffff88814778a7c3 R11: ffffed1028ef14f8 R12: ffffffff85c9e9ae R13: ffff88814778a000 R14: ffff88814778a7b0 R15: ffff8881086db890 FS: 00007fbcfee17700(0000) GS:ffff8881e0300000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 000000c0402a5008 CR3: 000000011c1ac003 CR4: 00000000003706e0 Call Trace: _raw_spin_unlock_bh+0x31/0x40 io_rsrc_node_ref_zero+0x13e/0x190 io_dismantle_req+0x215/0x220 io_req_complete_post+0x1b8/0x720 __io_complete_rw.isra.0+0x16b/0x1f0 io_complete_rw+0x10/0x20 where it's clear we end up calling the percpu count release directly from the completion path, as it's in atomic mode and we drop the last ref. For file/block IO, this can be from IRQ context already, and the softirq locking for rsrc isn't enough. Just make the lock fully IRQ safe, and ensure we correctly safe state from the release path as we don't know the full context there. Reported-by: Nadav Amit <[email protected]> Tested-by: Nadav Amit <[email protected]> Link: https://lore.kernel.org/io-uring/[email protected]/ Signed-off-by: Jens Axboe <[email protected]>

[ Upstream commit 4956b9e ] Nadav reports running into the below splat on re-enabling softirqs: WARNING: CPU: 2 PID: 1777 at kernel/softirq.c:364 __local_bh_enable_ip+0xaa/0xe0 Modules linked in: CPU: 2 PID: 1777 Comm: umem Not tainted 5.13.1+ #161 Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/22/2020 RIP: 0010:__local_bh_enable_ip+0xaa/0xe0 Code: a9 00 ff ff 00 74 38 65 ff 0d a2 21 8c 7a e8 ed 1a 20 00 fb 66 0f 1f 44 00 00 5b 41 5c 5d c3 65 8b 05 e6 2d 8c 7a 85 c0 75 9a <0f> 0b eb 96 e8 2d 1f 20 00 eb a5 4c 89 e7 e8 73 4f 0c 00 eb ae 65 RSP: 0018:ffff88812e58fcc8 EFLAGS: 00010046 RAX: 0000000000000000 RBX: 0000000000000201 RCX: dffffc0000000000 RDX: 0000000000000007 RSI: 0000000000000201 RDI: ffffffff8898c5ac RBP: ffff88812e58fcd8 R08: ffffffff8575dbbf R09: ffffed1028ef14f9 R10: ffff88814778a7c3 R11: ffffed1028ef14f8 R12: ffffffff85c9e9ae R13: ffff88814778a000 R14: ffff88814778a7b0 R15: ffff8881086db890 FS: 00007fbcfee17700(0000) GS:ffff8881e0300000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 000000c0402a5008 CR3: 000000011c1ac003 CR4: 00000000003706e0 Call Trace: _raw_spin_unlock_bh+0x31/0x40 io_rsrc_node_ref_zero+0x13e/0x190 io_dismantle_req+0x215/0x220 io_req_complete_post+0x1b8/0x720 __io_complete_rw.isra.0+0x16b/0x1f0 io_complete_rw+0x10/0x20 where it's clear we end up calling the percpu count release directly from the completion path, as it's in atomic mode and we drop the last ref. For file/block IO, this can be from IRQ context already, and the softirq locking for rsrc isn't enough. Just make the lock fully IRQ safe, and ensure we correctly safe state from the release path as we don't know the full context there. Reported-by: Nadav Amit <[email protected]> Tested-by: Nadav Amit <[email protected]> Link: https://lore.kernel.org/io-uring/[email protected]/ Signed-off-by: Jens Axboe <[email protected]> Signed-off-by: Sasha Levin <[email protected]>

mhelin closed this as completed Jan 4, 2013

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

I2S interrupts not triggered #161

I2S interrupts not triggered #161

mhelin commented Nov 21, 2012

philpoole commented Dec 22, 2012

ghollingworth commented Dec 22, 2012

philpoole commented Dec 22, 2012

ghollingworth commented Dec 22, 2012

philpoole commented Dec 22, 2012

mhelin commented Dec 22, 2012

ghollingworth commented Dec 23, 2012

philpoole commented Dec 23, 2012

philpoole commented Dec 28, 2012

ghollingworth commented Dec 28, 2012

philpoole commented Dec 28, 2012

philpoole commented Jan 2, 2013

mhelin commented Jan 2, 2013

mhelin commented Jan 4, 2013

I2S interrupts not triggered #161

I2S interrupts not triggered #161

Comments

mhelin commented Nov 21, 2012

philpoole commented Dec 22, 2012

ghollingworth commented Dec 22, 2012

philpoole commented Dec 22, 2012

ghollingworth commented Dec 22, 2012

philpoole commented Dec 22, 2012

mhelin commented Dec 22, 2012

ghollingworth commented Dec 23, 2012

philpoole commented Dec 23, 2012

philpoole commented Dec 28, 2012

ghollingworth commented Dec 28, 2012

philpoole commented Dec 28, 2012

philpoole commented Jan 2, 2013

mhelin commented Jan 2, 2013

mhelin commented Jan 4, 2013