dwc_otg: switching to device mode at run time hangs the kernel #884

ali1234 · 2015-03-11T20:04:38Z

Leave USB_OTGID floating if you have a compute module, or on the A/A+, write a 1 in to gusbcfg bit 30 to force device only mode.

Expected result: the pi acts as a USB device.

Actual result: the kernel goes in to an endless interrupt loop:

[   23.151467] WARN::dwc_otg_handle_mode_mismatch_intr:68: Mode Mismatch Interrupt: currently in Device mode
[   23.151467] 
[   23.169536] WARN::dwc_otg_handle_mode_mismatch_intr:68: Mode Mismatch Interrupt: currently in Device mode
[   23.169536] 
[   23.187584] WARN::dwc_otg_handle_mode_mismatch_intr:68: Mode Mismatch Interrupt: currently in Device mode
[   23.187584] 
[   23.205596] INFO: rcu_preempt self-detected stall on CPU { 0}  (t=2101 jiffies g=-287 c=-288 q=1)
[   23.219239] Task dump for CPU 0:
[   23.224791] swapper         R running      0     1      0 0x00000000
[   23.233605] [<c00149b0>] (unwind_backtrace) from [<c0012140>] (show_stack+0x20/0x24)
[   23.246049] [<c0012140>] (show_stack) from [<c0047490>] (sched_show_task+0xac/0x108)
[   23.258562] [<c0047490>] (sched_show_task) from [<c0048450>] (dump_cpu_task+0x2c/0x38)
[   23.271373] [<c0048450>] (dump_cpu_task) from [<c0062854>] (rcu_dump_cpu_stacks+0xb0/0x130)
[   23.284699] [<c0062854>] (rcu_dump_cpu_stacks) from [<c0065dcc>] (rcu_check_callbacks+0x494/0x8b0)
[   23.298730] [<c0065dcc>] (rcu_check_callbacks) from [<c006aebc>] (update_process_times+0x44/0x64)
[   23.312794] [<c006aebc>] (update_process_times) from [<c007a8c4>] (tick_sched_handle+0x58/0x64)
[   23.326761] [<c007a8c4>] (tick_sched_handle) from [<c007aac4>] (tick_sched_timer+0x50/0x94)
[   23.340460] [<c007aac4>] (tick_sched_timer) from [<c006b9d4>] (__run_hrtimer+0x94/0x29c)
[   23.354011] [<c006b9d4>] (__run_hrtimer) from [<c006c3f4>] (hrtimer_interrupt+0x10c/0x2ec)
[   23.367807] [<c006c3f4>] (hrtimer_interrupt) from [<c001cc6c>] (bcm2708_timer_interrupt+0x38/0x48)
[   23.382384] [<c001cc6c>] (bcm2708_timer_interrupt) from [<c005bf68>] (handle_irq_event_percpu+0x5c/0x278)
[   23.397701] [<c005bf68>] (handle_irq_event_percpu) from [<c005c1f4>] (handle_irq_event+0x70/0x8c)
[   23.412358] [<c005c1f4>] (handle_irq_event) from [<c005edf0>] (handle_level_irq+0xb0/0x150)
[   23.426625] [<c005edf0>] (handle_level_irq) from [<c005b83c>] (__handle_domain_irq+0x7c/0xd0)
[   23.441083] [<c005b83c>] (__handle_domain_irq) from [<c000f474>] (handle_IRQ+0x2c/0x30)
[   23.454997] [<c000f474>] (handle_IRQ) from [<c0008510>] (asm_do_IRQ+0x18/0x1c)
[   23.468115] [<c0008510>] (asm_do_IRQ) from [<c053b678>] (__irq_svc+0x38/0xd0)
[   23.478265] Exception stack(0xc703bc38 to 0xc703bc80)
[   23.486278] bc20:                                                       00000000 0000000a
[   23.500196] bc40: c082fbc0 c07e0f48 c0025d94 c703a000 00000000 00000202 00000000 c081a09c
[   23.514042] bc60: c07e7138 c703bccc c703bc80 c703bc80 c00257d0 c00257d4 60000113 ffffffff
[   23.527934] [<c053b678>] (__irq_svc) from [<c00257d4>] (__do_softirq+0xb8/0x338)
[   23.541055] [<c00257d4>] (__do_softirq) from [<c0025d94>] (irq_exit+0xb4/0x108)
[   23.554091] [<c0025d94>] (irq_exit) from [<c005b844>] (__handle_domain_irq+0x84/0xd0)
[   23.567657] [<c005b844>] (__handle_domain_irq) from [<c000f474>] (handle_IRQ+0x2c/0x30)
[   23.581399] [<c000f474>] (handle_IRQ) from [<c0008510>] (asm_do_IRQ+0x18/0x1c)
[   23.594326] [<c0008510>] (asm_do_IRQ) from [<c053b678>] (__irq_svc+0x38/0xd0)
[   23.604386] Exception stack(0xc703bd30 to 0xc703bd78)
[   23.612316] bd20:                                     f2980008 00000031 00000001 00000000
[   23.626112] bd40: 00000000 00000000 c71a7000 c0819fb8 c724a8a0 c081a09c c07e7138 c703bdac
[   23.639879] bd60: 00000030 c703bd78 c03c6848 c03c45cc 60000113 ffffffff
[   23.649361] [<c053b678>] (__irq_svc) from [<c03c45cc>] (dwc_otg_driver_probe+0x114/0x7b0)
[   23.663140] [<c03c45cc>] (dwc_otg_driver_probe) from [<c0358b18>] (platform_drv_probe+0x3c/0x6c)
[   23.677559] [<c0358b18>] (platform_drv_probe) from [<c0357500>] (driver_probe_device+0x100/0x224)
[   23.692187] [<c0357500>] (driver_probe_device) from [<c03576c0>] (__driver_attach+0x9c/0xa0)
[   23.706439] [<c03576c0>] (__driver_attach) from [<c0355a0c>] (bus_for_each_dev+0x64/0x98)
[   23.720405] [<c0355a0c>] (bus_for_each_dev) from [<c0356fc4>] (driver_attach+0x28/0x30)
[   23.734175] [<c0356fc4>] (driver_attach) from [<c0356c3c>] (bus_add_driver+0xe8/0x1e8)
[   23.747873] [<c0356c3c>] (bus_add_driver) from [<c0357bd0>] (driver_register+0x88/0x104)
[   23.761748] [<c0357bd0>] (driver_register) from [<c0358a04>] (__platform_driver_register+0x58/0x6c)
[   23.776604] [<c0358a04>] (__platform_driver_register) from [<c07afce8>] (dwc_otg_driver_init+0x5c/0x118)
[   23.791887] [<c07afce8>] (dwc_otg_driver_init) from [<c0008730>] (do_one_initcall+0x90/0x1dc)
[   23.806216] [<c0008730>] (do_one_initcall) from [<c0783e04>] (kernel_init_freeable+0xf0/0x1bc)
[   23.820696] [<c0783e04>] (kernel_init_freeable) from [<c0531cfc>] (kernel_init+0x18/0xfc)
[   23.834769] [<c0531cfc>] (kernel_init) from [<c000eb48>] (ret_from_fork+0x14/0x20)
[   23.848324] WARN::dwc_otg_handle_mode_mismatch_intr:68: Mode Mismatch Interrupt: currently in Device mode
[   23.848324] 
[   23.868236] WARN::dwc_otg_handle_mode_mismatch_intr:68: Mode Mismatch Interrupt: currently in Device mode

The text was updated successfully, but these errors were encountered:

P33M · 2015-03-11T20:48:34Z

"If it is not tested, it does not work".

Device mode using the dwc_otg driver stack hasn't been tried since day 0 of release - I am not surprised that there are serious bugs with the implementation. In particular the FIQ implementation is certain to mess up the handling of device mode interrupts, as certain critical functions/protections have not been implemented on the PCD driver.

There are a number of ways that this can be resolved, all of which require significant engineering time.

Bugfix the PCD driver. Made an order of magnitude more complicated due to the inclusion of the FIQ code.
Use the upstream dwc2 driver in gadget mode - this in theory "should work" but hasn't really been tested in device mode. The hardware is similar to the type underneath the s3c_hsotg driver in upstream and there is an ongoing effort to consolidate both codebases. This would be the ideal solution if you just require device mode.
Port the FIQ code to the dwc2 upstream driver and nobble it in gadget mode - again, requiring engineering hours. This would result in functional device and host modes.

nanosonde · 2015-04-19T11:42:18Z

I have seen a lot of changes to the DWC2 gadget driver in branch rpi-4.0.y.
Is the FIQ implmentation still an issue?

Ruffio · 2016-08-02T18:55:32Z

The error/warning still occurs on latest Jessie Lite 4.4 when working with the onboard wifi module on RPi3 , take a look here: raspberrypi/firmware#630

Ruffio · 2016-08-14T10:08:25Z

@ali1234 has your issue been resolved? If yes, then please close this issue.

P33M · 2017-05-04T15:23:19Z

Gadget (aka device mode) is primarily supported by the upstream dwc2 driver. dwc_otg should only be used for host mode.

Our frontbuffer tracking improved over the years + the WA raspberrypi#884 helped us keep PSR2 enabled while triggering screen updates when necessary so this FIXME is not valid anymore. Acked-by: Dhinakaran Pandiyan <[email protected]> Reviewed-by: Rodrigo Vivi <[email protected]> Signed-off-by: José Roberto de Souza <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]

[ Upstream commit 09dadb5 ] In our tests, "qemu-nbd" triggers a io hung: INFO: task qemu-nbd:11445 blocked for more than 368 seconds. Not tainted 5.18.0-rc3-next-20220422-00003-g2176915513ca #884 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. task:qemu-nbd state:D stack: 0 pid:11445 ppid: 1 flags:0x00000000 Call Trace: <TASK> __schedule+0x480/0x1050 ? _raw_spin_lock_irqsave+0x3e/0xb0 schedule+0x9c/0x1b0 blk_mq_freeze_queue_wait+0x9d/0xf0 ? ipi_rseq+0x70/0x70 blk_mq_freeze_queue+0x2b/0x40 nbd_add_socket+0x6b/0x270 [nbd] nbd_ioctl+0x383/0x510 [nbd] blkdev_ioctl+0x18e/0x3e0 __x64_sys_ioctl+0xac/0x120 do_syscall_64+0x35/0x80 entry_SYSCALL_64_after_hwframe+0x44/0xae RIP: 0033:0x7fd8ff706577 RSP: 002b:00007fd8fcdfebf8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 RAX: ffffffffffffffda RBX: 0000000040000000 RCX: 00007fd8ff706577 RDX: 000000000000000d RSI: 000000000000ab00 RDI: 000000000000000f RBP: 000000000000000f R08: 000000000000fbe8 R09: 000055fe497c62b0 R10: 00000002aff20000 R11: 0000000000000246 R12: 000000000000006d R13: 0000000000000000 R14: 00007ffe82dc5e70 R15: 00007fd8fcdff9c0 "qemu-ndb -d" will call ioctl 'NBD_DISCONNECT' first, however, following message was found: block nbd0: Send disconnect failed -32 Which indicate that something is wrong with the server. Then, "qemu-nbd -d" will call ioctl 'NBD_CLEAR_SOCK', however ioctl can't clear requests after commit 2516ab1("nbd: only clear the queue on device teardown"). And in the meantime, request can't complete through timeout because nbd_xmit_timeout() will always return 'BLK_EH_RESET_TIMER', which means such request will never be completed in this situation. Now that the flag 'NBD_CMD_INFLIGHT' can make sure requests won't complete multiple times, switch back to call nbd_clear_sock() in nbd_clear_sock_ioctl(), so that inflight requests can be cleared. Signed-off-by: Yu Kuai <[email protected]> Reviewed-by: Josef Bacik <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]> Signed-off-by: Sasha Levin <[email protected]>

[ Upstream commit 09dadb5 ] In our tests, "qemu-nbd" triggers a io hung: INFO: task qemu-nbd:11445 blocked for more than 368 seconds. Not tainted 5.18.0-rc3-next-20220422-00003-g2176915513ca raspberrypi#884 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. task:qemu-nbd state:D stack: 0 pid:11445 ppid: 1 flags:0x00000000 Call Trace: <TASK> __schedule+0x480/0x1050 ? _raw_spin_lock_irqsave+0x3e/0xb0 schedule+0x9c/0x1b0 blk_mq_freeze_queue_wait+0x9d/0xf0 ? ipi_rseq+0x70/0x70 blk_mq_freeze_queue+0x2b/0x40 nbd_add_socket+0x6b/0x270 [nbd] nbd_ioctl+0x383/0x510 [nbd] blkdev_ioctl+0x18e/0x3e0 __x64_sys_ioctl+0xac/0x120 do_syscall_64+0x35/0x80 entry_SYSCALL_64_after_hwframe+0x44/0xae RIP: 0033:0x7fd8ff706577 RSP: 002b:00007fd8fcdfebf8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 RAX: ffffffffffffffda RBX: 0000000040000000 RCX: 00007fd8ff706577 RDX: 000000000000000d RSI: 000000000000ab00 RDI: 000000000000000f RBP: 000000000000000f R08: 000000000000fbe8 R09: 000055fe497c62b0 R10: 00000002aff20000 R11: 0000000000000246 R12: 000000000000006d R13: 0000000000000000 R14: 00007ffe82dc5e70 R15: 00007fd8fcdff9c0 "qemu-ndb -d" will call ioctl 'NBD_DISCONNECT' first, however, following message was found: block nbd0: Send disconnect failed -32 Which indicate that something is wrong with the server. Then, "qemu-nbd -d" will call ioctl 'NBD_CLEAR_SOCK', however ioctl can't clear requests after commit 2516ab1("nbd: only clear the queue on device teardown"). And in the meantime, request can't complete through timeout because nbd_xmit_timeout() will always return 'BLK_EH_RESET_TIMER', which means such request will never be completed in this situation. Now that the flag 'NBD_CMD_INFLIGHT' can make sure requests won't complete multiple times, switch back to call nbd_clear_sock() in nbd_clear_sock_ioctl(), so that inflight requests can be cleared. Signed-off-by: Yu Kuai <[email protected]> Reviewed-by: Josef Bacik <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]> Signed-off-by: Sasha Levin <[email protected]>

hh mentioned this issue Dec 2, 2015

dwc_otg can't coexist with other USB drivers #881

Closed

lufeig mentioned this issue May 11, 2016

Bluetooth and Dual Shock 3 Playstation Remote will crash System. #1360

Closed

FishTest mentioned this issue May 11, 2016

i2c-gpio module has the problem of slow response #1467

Closed

ali1234 mentioned this issue May 16, 2016

RPi 3 WiFi - kernel NULL pointer dereference #1442

Closed

frasersdev mentioned this issue May 23, 2016

Aux SPI (SPI1) + Console on Aux UART (ttyS0) = Lockup #1484

Closed

olllivier mentioned this issue May 24, 2016

Rpi2 is hungs up after hot plug redrat usb cabel ( Kernel Panic Unable to handle kernel paging request at virtual address) #1481

Closed

P33M closed this as completed May 4, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

dwc_otg: switching to device mode at run time hangs the kernel #884

dwc_otg: switching to device mode at run time hangs the kernel #884

ali1234 commented Mar 11, 2015

P33M commented Mar 11, 2015

nanosonde commented Apr 19, 2015

Ruffio commented Aug 2, 2016

Ruffio commented Aug 14, 2016

P33M commented May 4, 2017

dwc_otg: switching to device mode at run time hangs the kernel #884

dwc_otg: switching to device mode at run time hangs the kernel #884

Comments

ali1234 commented Mar 11, 2015

P33M commented Mar 11, 2015

nanosonde commented Apr 19, 2015

Ruffio commented Aug 2, 2016

Ruffio commented Aug 14, 2016

P33M commented May 4, 2017