Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dwc_otg: switching to device mode at run time hangs the kernel #884

Closed
ali1234 opened this issue Mar 11, 2015 · 5 comments
Closed

dwc_otg: switching to device mode at run time hangs the kernel #884

ali1234 opened this issue Mar 11, 2015 · 5 comments

Comments

@ali1234
Copy link
Contributor

ali1234 commented Mar 11, 2015

Leave USB_OTGID floating if you have a compute module, or on the A/A+, write a 1 in to gusbcfg bit 30 to force device only mode.

Expected result: the pi acts as a USB device.

Actual result: the kernel goes in to an endless interrupt loop:

[   23.151467] WARN::dwc_otg_handle_mode_mismatch_intr:68: Mode Mismatch Interrupt: currently in Device mode
[   23.151467] 
[   23.169536] WARN::dwc_otg_handle_mode_mismatch_intr:68: Mode Mismatch Interrupt: currently in Device mode
[   23.169536] 
[   23.187584] WARN::dwc_otg_handle_mode_mismatch_intr:68: Mode Mismatch Interrupt: currently in Device mode
[   23.187584] 
[   23.205596] INFO: rcu_preempt self-detected stall on CPU { 0}  (t=2101 jiffies g=-287 c=-288 q=1)
[   23.219239] Task dump for CPU 0:
[   23.224791] swapper         R running      0     1      0 0x00000000
[   23.233605] [<c00149b0>] (unwind_backtrace) from [<c0012140>] (show_stack+0x20/0x24)
[   23.246049] [<c0012140>] (show_stack) from [<c0047490>] (sched_show_task+0xac/0x108)
[   23.258562] [<c0047490>] (sched_show_task) from [<c0048450>] (dump_cpu_task+0x2c/0x38)
[   23.271373] [<c0048450>] (dump_cpu_task) from [<c0062854>] (rcu_dump_cpu_stacks+0xb0/0x130)
[   23.284699] [<c0062854>] (rcu_dump_cpu_stacks) from [<c0065dcc>] (rcu_check_callbacks+0x494/0x8b0)
[   23.298730] [<c0065dcc>] (rcu_check_callbacks) from [<c006aebc>] (update_process_times+0x44/0x64)
[   23.312794] [<c006aebc>] (update_process_times) from [<c007a8c4>] (tick_sched_handle+0x58/0x64)
[   23.326761] [<c007a8c4>] (tick_sched_handle) from [<c007aac4>] (tick_sched_timer+0x50/0x94)
[   23.340460] [<c007aac4>] (tick_sched_timer) from [<c006b9d4>] (__run_hrtimer+0x94/0x29c)
[   23.354011] [<c006b9d4>] (__run_hrtimer) from [<c006c3f4>] (hrtimer_interrupt+0x10c/0x2ec)
[   23.367807] [<c006c3f4>] (hrtimer_interrupt) from [<c001cc6c>] (bcm2708_timer_interrupt+0x38/0x48)
[   23.382384] [<c001cc6c>] (bcm2708_timer_interrupt) from [<c005bf68>] (handle_irq_event_percpu+0x5c/0x278)
[   23.397701] [<c005bf68>] (handle_irq_event_percpu) from [<c005c1f4>] (handle_irq_event+0x70/0x8c)
[   23.412358] [<c005c1f4>] (handle_irq_event) from [<c005edf0>] (handle_level_irq+0xb0/0x150)
[   23.426625] [<c005edf0>] (handle_level_irq) from [<c005b83c>] (__handle_domain_irq+0x7c/0xd0)
[   23.441083] [<c005b83c>] (__handle_domain_irq) from [<c000f474>] (handle_IRQ+0x2c/0x30)
[   23.454997] [<c000f474>] (handle_IRQ) from [<c0008510>] (asm_do_IRQ+0x18/0x1c)
[   23.468115] [<c0008510>] (asm_do_IRQ) from [<c053b678>] (__irq_svc+0x38/0xd0)
[   23.478265] Exception stack(0xc703bc38 to 0xc703bc80)
[   23.486278] bc20:                                                       00000000 0000000a
[   23.500196] bc40: c082fbc0 c07e0f48 c0025d94 c703a000 00000000 00000202 00000000 c081a09c
[   23.514042] bc60: c07e7138 c703bccc c703bc80 c703bc80 c00257d0 c00257d4 60000113 ffffffff
[   23.527934] [<c053b678>] (__irq_svc) from [<c00257d4>] (__do_softirq+0xb8/0x338)
[   23.541055] [<c00257d4>] (__do_softirq) from [<c0025d94>] (irq_exit+0xb4/0x108)
[   23.554091] [<c0025d94>] (irq_exit) from [<c005b844>] (__handle_domain_irq+0x84/0xd0)
[   23.567657] [<c005b844>] (__handle_domain_irq) from [<c000f474>] (handle_IRQ+0x2c/0x30)
[   23.581399] [<c000f474>] (handle_IRQ) from [<c0008510>] (asm_do_IRQ+0x18/0x1c)
[   23.594326] [<c0008510>] (asm_do_IRQ) from [<c053b678>] (__irq_svc+0x38/0xd0)
[   23.604386] Exception stack(0xc703bd30 to 0xc703bd78)
[   23.612316] bd20:                                     f2980008 00000031 00000001 00000000
[   23.626112] bd40: 00000000 00000000 c71a7000 c0819fb8 c724a8a0 c081a09c c07e7138 c703bdac
[   23.639879] bd60: 00000030 c703bd78 c03c6848 c03c45cc 60000113 ffffffff
[   23.649361] [<c053b678>] (__irq_svc) from [<c03c45cc>] (dwc_otg_driver_probe+0x114/0x7b0)
[   23.663140] [<c03c45cc>] (dwc_otg_driver_probe) from [<c0358b18>] (platform_drv_probe+0x3c/0x6c)
[   23.677559] [<c0358b18>] (platform_drv_probe) from [<c0357500>] (driver_probe_device+0x100/0x224)
[   23.692187] [<c0357500>] (driver_probe_device) from [<c03576c0>] (__driver_attach+0x9c/0xa0)
[   23.706439] [<c03576c0>] (__driver_attach) from [<c0355a0c>] (bus_for_each_dev+0x64/0x98)
[   23.720405] [<c0355a0c>] (bus_for_each_dev) from [<c0356fc4>] (driver_attach+0x28/0x30)
[   23.734175] [<c0356fc4>] (driver_attach) from [<c0356c3c>] (bus_add_driver+0xe8/0x1e8)
[   23.747873] [<c0356c3c>] (bus_add_driver) from [<c0357bd0>] (driver_register+0x88/0x104)
[   23.761748] [<c0357bd0>] (driver_register) from [<c0358a04>] (__platform_driver_register+0x58/0x6c)
[   23.776604] [<c0358a04>] (__platform_driver_register) from [<c07afce8>] (dwc_otg_driver_init+0x5c/0x118)
[   23.791887] [<c07afce8>] (dwc_otg_driver_init) from [<c0008730>] (do_one_initcall+0x90/0x1dc)
[   23.806216] [<c0008730>] (do_one_initcall) from [<c0783e04>] (kernel_init_freeable+0xf0/0x1bc)
[   23.820696] [<c0783e04>] (kernel_init_freeable) from [<c0531cfc>] (kernel_init+0x18/0xfc)
[   23.834769] [<c0531cfc>] (kernel_init) from [<c000eb48>] (ret_from_fork+0x14/0x20)
[   23.848324] WARN::dwc_otg_handle_mode_mismatch_intr:68: Mode Mismatch Interrupt: currently in Device mode
[   23.848324] 
[   23.868236] WARN::dwc_otg_handle_mode_mismatch_intr:68: Mode Mismatch Interrupt: currently in Device mode

@P33M
Copy link
Contributor

P33M commented Mar 11, 2015

"If it is not tested, it does not work".

Device mode using the dwc_otg driver stack hasn't been tried since day 0 of release - I am not surprised that there are serious bugs with the implementation. In particular the FIQ implementation is certain to mess up the handling of device mode interrupts, as certain critical functions/protections have not been implemented on the PCD driver.

There are a number of ways that this can be resolved, all of which require significant engineering time.

  1. Bugfix the PCD driver. Made an order of magnitude more complicated due to the inclusion of the FIQ code.
  2. Use the upstream dwc2 driver in gadget mode - this in theory "should work" but hasn't really been tested in device mode. The hardware is similar to the type underneath the s3c_hsotg driver in upstream and there is an ongoing effort to consolidate both codebases. This would be the ideal solution if you just require device mode.
  3. Port the FIQ code to the dwc2 upstream driver and nobble it in gadget mode - again, requiring engineering hours. This would result in functional device and host modes.

@nanosonde
Copy link

I have seen a lot of changes to the DWC2 gadget driver in branch rpi-4.0.y.
Is the FIQ implmentation still an issue?

@Ruffio
Copy link

Ruffio commented Aug 2, 2016

The error/warning still occurs on latest Jessie Lite 4.4 when working with the onboard wifi module on RPi3 , take a look here: raspberrypi/firmware#630

@Ruffio
Copy link

Ruffio commented Aug 14, 2016

@ali1234 has your issue been resolved? If yes, then please close this issue.

@P33M
Copy link
Contributor

P33M commented May 4, 2017

Gadget (aka device mode) is primarily supported by the upstream dwc2 driver. dwc_otg should only be used for host mode.

@P33M P33M closed this as completed May 4, 2017
anholt pushed a commit to anholt/linux that referenced this issue Feb 20, 2019
Our frontbuffer tracking improved over the years + the WA raspberrypi#884
helped us keep PSR2 enabled while triggering screen updates when
necessary so this FIXME is not valid anymore.

Acked-by: Dhinakaran Pandiyan <[email protected]>
Reviewed-by: Rodrigo Vivi <[email protected]>
Signed-off-by: José Roberto de Souza <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
popcornmix pushed a commit that referenced this issue Jun 17, 2022
[ Upstream commit 09dadb5 ]

In our tests, "qemu-nbd" triggers a io hung:

INFO: task qemu-nbd:11445 blocked for more than 368 seconds.
      Not tainted 5.18.0-rc3-next-20220422-00003-g2176915513ca #884
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:qemu-nbd        state:D stack:    0 pid:11445 ppid:     1 flags:0x00000000
Call Trace:
 <TASK>
 __schedule+0x480/0x1050
 ? _raw_spin_lock_irqsave+0x3e/0xb0
 schedule+0x9c/0x1b0
 blk_mq_freeze_queue_wait+0x9d/0xf0
 ? ipi_rseq+0x70/0x70
 blk_mq_freeze_queue+0x2b/0x40
 nbd_add_socket+0x6b/0x270 [nbd]
 nbd_ioctl+0x383/0x510 [nbd]
 blkdev_ioctl+0x18e/0x3e0
 __x64_sys_ioctl+0xac/0x120
 do_syscall_64+0x35/0x80
 entry_SYSCALL_64_after_hwframe+0x44/0xae
RIP: 0033:0x7fd8ff706577
RSP: 002b:00007fd8fcdfebf8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
RAX: ffffffffffffffda RBX: 0000000040000000 RCX: 00007fd8ff706577
RDX: 000000000000000d RSI: 000000000000ab00 RDI: 000000000000000f
RBP: 000000000000000f R08: 000000000000fbe8 R09: 000055fe497c62b0
R10: 00000002aff20000 R11: 0000000000000246 R12: 000000000000006d
R13: 0000000000000000 R14: 00007ffe82dc5e70 R15: 00007fd8fcdff9c0

"qemu-ndb -d" will call ioctl 'NBD_DISCONNECT' first, however, following
message was found:

block nbd0: Send disconnect failed -32

Which indicate that something is wrong with the server. Then,
"qemu-nbd -d" will call ioctl 'NBD_CLEAR_SOCK', however ioctl can't clear
requests after commit 2516ab1("nbd: only clear the queue on device
teardown"). And in the meantime, request can't complete through timeout
because nbd_xmit_timeout() will always return 'BLK_EH_RESET_TIMER', which
means such request will never be completed in this situation.

Now that the flag 'NBD_CMD_INFLIGHT' can make sure requests won't
complete multiple times, switch back to call nbd_clear_sock() in
nbd_clear_sock_ioctl(), so that inflight requests can be cleared.

Signed-off-by: Yu Kuai <[email protected]>
Reviewed-by: Josef Bacik <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jens Axboe <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
ajs124 pushed a commit to helsinki-systems/linux that referenced this issue Jun 21, 2022
[ Upstream commit 09dadb5 ]

In our tests, "qemu-nbd" triggers a io hung:

INFO: task qemu-nbd:11445 blocked for more than 368 seconds.
      Not tainted 5.18.0-rc3-next-20220422-00003-g2176915513ca raspberrypi#884
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:qemu-nbd        state:D stack:    0 pid:11445 ppid:     1 flags:0x00000000
Call Trace:
 <TASK>
 __schedule+0x480/0x1050
 ? _raw_spin_lock_irqsave+0x3e/0xb0
 schedule+0x9c/0x1b0
 blk_mq_freeze_queue_wait+0x9d/0xf0
 ? ipi_rseq+0x70/0x70
 blk_mq_freeze_queue+0x2b/0x40
 nbd_add_socket+0x6b/0x270 [nbd]
 nbd_ioctl+0x383/0x510 [nbd]
 blkdev_ioctl+0x18e/0x3e0
 __x64_sys_ioctl+0xac/0x120
 do_syscall_64+0x35/0x80
 entry_SYSCALL_64_after_hwframe+0x44/0xae
RIP: 0033:0x7fd8ff706577
RSP: 002b:00007fd8fcdfebf8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
RAX: ffffffffffffffda RBX: 0000000040000000 RCX: 00007fd8ff706577
RDX: 000000000000000d RSI: 000000000000ab00 RDI: 000000000000000f
RBP: 000000000000000f R08: 000000000000fbe8 R09: 000055fe497c62b0
R10: 00000002aff20000 R11: 0000000000000246 R12: 000000000000006d
R13: 0000000000000000 R14: 00007ffe82dc5e70 R15: 00007fd8fcdff9c0

"qemu-ndb -d" will call ioctl 'NBD_DISCONNECT' first, however, following
message was found:

block nbd0: Send disconnect failed -32

Which indicate that something is wrong with the server. Then,
"qemu-nbd -d" will call ioctl 'NBD_CLEAR_SOCK', however ioctl can't clear
requests after commit 2516ab1("nbd: only clear the queue on device
teardown"). And in the meantime, request can't complete through timeout
because nbd_xmit_timeout() will always return 'BLK_EH_RESET_TIMER', which
means such request will never be completed in this situation.

Now that the flag 'NBD_CMD_INFLIGHT' can make sure requests won't
complete multiple times, switch back to call nbd_clear_sock() in
nbd_clear_sock_ioctl(), so that inflight requests can be cleared.

Signed-off-by: Yu Kuai <[email protected]>
Reviewed-by: Josef Bacik <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jens Axboe <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
herrnst pushed a commit to herrnst/linux-raspberrypi that referenced this issue Jun 21, 2022
[ Upstream commit 09dadb5 ]

In our tests, "qemu-nbd" triggers a io hung:

INFO: task qemu-nbd:11445 blocked for more than 368 seconds.
      Not tainted 5.18.0-rc3-next-20220422-00003-g2176915513ca raspberrypi#884
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:qemu-nbd        state:D stack:    0 pid:11445 ppid:     1 flags:0x00000000
Call Trace:
 <TASK>
 __schedule+0x480/0x1050
 ? _raw_spin_lock_irqsave+0x3e/0xb0
 schedule+0x9c/0x1b0
 blk_mq_freeze_queue_wait+0x9d/0xf0
 ? ipi_rseq+0x70/0x70
 blk_mq_freeze_queue+0x2b/0x40
 nbd_add_socket+0x6b/0x270 [nbd]
 nbd_ioctl+0x383/0x510 [nbd]
 blkdev_ioctl+0x18e/0x3e0
 __x64_sys_ioctl+0xac/0x120
 do_syscall_64+0x35/0x80
 entry_SYSCALL_64_after_hwframe+0x44/0xae
RIP: 0033:0x7fd8ff706577
RSP: 002b:00007fd8fcdfebf8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
RAX: ffffffffffffffda RBX: 0000000040000000 RCX: 00007fd8ff706577
RDX: 000000000000000d RSI: 000000000000ab00 RDI: 000000000000000f
RBP: 000000000000000f R08: 000000000000fbe8 R09: 000055fe497c62b0
R10: 00000002aff20000 R11: 0000000000000246 R12: 000000000000006d
R13: 0000000000000000 R14: 00007ffe82dc5e70 R15: 00007fd8fcdff9c0

"qemu-ndb -d" will call ioctl 'NBD_DISCONNECT' first, however, following
message was found:

block nbd0: Send disconnect failed -32

Which indicate that something is wrong with the server. Then,
"qemu-nbd -d" will call ioctl 'NBD_CLEAR_SOCK', however ioctl can't clear
requests after commit 2516ab1("nbd: only clear the queue on device
teardown"). And in the meantime, request can't complete through timeout
because nbd_xmit_timeout() will always return 'BLK_EH_RESET_TIMER', which
means such request will never be completed in this situation.

Now that the flag 'NBD_CMD_INFLIGHT' can make sure requests won't
complete multiple times, switch back to call nbd_clear_sock() in
nbd_clear_sock_ioctl(), so that inflight requests can be cleared.

Signed-off-by: Yu Kuai <[email protected]>
Reviewed-by: Josef Bacik <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jens Axboe <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
matthiakl pushed a commit to matthiakl/linux that referenced this issue Jun 24, 2022
[ Upstream commit 09dadb5 ]

In our tests, "qemu-nbd" triggers a io hung:

INFO: task qemu-nbd:11445 blocked for more than 368 seconds.
      Not tainted 5.18.0-rc3-next-20220422-00003-g2176915513ca raspberrypi#884
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:qemu-nbd        state:D stack:    0 pid:11445 ppid:     1 flags:0x00000000
Call Trace:
 <TASK>
 __schedule+0x480/0x1050
 ? _raw_spin_lock_irqsave+0x3e/0xb0
 schedule+0x9c/0x1b0
 blk_mq_freeze_queue_wait+0x9d/0xf0
 ? ipi_rseq+0x70/0x70
 blk_mq_freeze_queue+0x2b/0x40
 nbd_add_socket+0x6b/0x270 [nbd]
 nbd_ioctl+0x383/0x510 [nbd]
 blkdev_ioctl+0x18e/0x3e0
 __x64_sys_ioctl+0xac/0x120
 do_syscall_64+0x35/0x80
 entry_SYSCALL_64_after_hwframe+0x44/0xae
RIP: 0033:0x7fd8ff706577
RSP: 002b:00007fd8fcdfebf8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
RAX: ffffffffffffffda RBX: 0000000040000000 RCX: 00007fd8ff706577
RDX: 000000000000000d RSI: 000000000000ab00 RDI: 000000000000000f
RBP: 000000000000000f R08: 000000000000fbe8 R09: 000055fe497c62b0
R10: 00000002aff20000 R11: 0000000000000246 R12: 000000000000006d
R13: 0000000000000000 R14: 00007ffe82dc5e70 R15: 00007fd8fcdff9c0

"qemu-ndb -d" will call ioctl 'NBD_DISCONNECT' first, however, following
message was found:

block nbd0: Send disconnect failed -32

Which indicate that something is wrong with the server. Then,
"qemu-nbd -d" will call ioctl 'NBD_CLEAR_SOCK', however ioctl can't clear
requests after commit 2516ab1("nbd: only clear the queue on device
teardown"). And in the meantime, request can't complete through timeout
because nbd_xmit_timeout() will always return 'BLK_EH_RESET_TIMER', which
means such request will never be completed in this situation.

Now that the flag 'NBD_CMD_INFLIGHT' can make sure requests won't
complete multiple times, switch back to call nbd_clear_sock() in
nbd_clear_sock_ioctl(), so that inflight requests can be cleared.

Signed-off-by: Yu Kuai <[email protected]>
Reviewed-by: Josef Bacik <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jens Axboe <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
sigmaris pushed a commit to sigmaris/linux that referenced this issue Jun 24, 2022
[ Upstream commit 09dadb5 ]

In our tests, "qemu-nbd" triggers a io hung:

INFO: task qemu-nbd:11445 blocked for more than 368 seconds.
      Not tainted 5.18.0-rc3-next-20220422-00003-g2176915513ca raspberrypi#884
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:qemu-nbd        state:D stack:    0 pid:11445 ppid:     1 flags:0x00000000
Call Trace:
 <TASK>
 __schedule+0x480/0x1050
 ? _raw_spin_lock_irqsave+0x3e/0xb0
 schedule+0x9c/0x1b0
 blk_mq_freeze_queue_wait+0x9d/0xf0
 ? ipi_rseq+0x70/0x70
 blk_mq_freeze_queue+0x2b/0x40
 nbd_add_socket+0x6b/0x270 [nbd]
 nbd_ioctl+0x383/0x510 [nbd]
 blkdev_ioctl+0x18e/0x3e0
 __x64_sys_ioctl+0xac/0x120
 do_syscall_64+0x35/0x80
 entry_SYSCALL_64_after_hwframe+0x44/0xae
RIP: 0033:0x7fd8ff706577
RSP: 002b:00007fd8fcdfebf8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
RAX: ffffffffffffffda RBX: 0000000040000000 RCX: 00007fd8ff706577
RDX: 000000000000000d RSI: 000000000000ab00 RDI: 000000000000000f
RBP: 000000000000000f R08: 000000000000fbe8 R09: 000055fe497c62b0
R10: 00000002aff20000 R11: 0000000000000246 R12: 000000000000006d
R13: 0000000000000000 R14: 00007ffe82dc5e70 R15: 00007fd8fcdff9c0

"qemu-ndb -d" will call ioctl 'NBD_DISCONNECT' first, however, following
message was found:

block nbd0: Send disconnect failed -32

Which indicate that something is wrong with the server. Then,
"qemu-nbd -d" will call ioctl 'NBD_CLEAR_SOCK', however ioctl can't clear
requests after commit 2516ab1("nbd: only clear the queue on device
teardown"). And in the meantime, request can't complete through timeout
because nbd_xmit_timeout() will always return 'BLK_EH_RESET_TIMER', which
means such request will never be completed in this situation.

Now that the flag 'NBD_CMD_INFLIGHT' can make sure requests won't
complete multiple times, switch back to call nbd_clear_sock() in
nbd_clear_sock_ioctl(), so that inflight requests can be cleared.

Signed-off-by: Yu Kuai <[email protected]>
Reviewed-by: Josef Bacik <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jens Axboe <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants