Realtek RTL8195AM - CMSIS-RTOS error: ISR Queue overflow (status: 0x2, task ID: 0x0, object ID: 0x30051484) #5640

JanneKiiskila · 2017-12-02T09:19:21Z

Description

Type: Bug
Priority: Blocker for releasing support for Mbed Cloud Client / mbed-os-example-client

Bug

Target
REALTEK_RTL8195AM

Toolchain:
GCC_ARM

Toolchain version:
mbed cli Windows installed toolchain 0.43
gcc_arm - same with Linux as well.

mbed-cli version:
(mbed --version)
1.2.2

mbed-os sha:
(git log -n1 --oneline)

2e1c2a1 (HEAD -> master, origin/master, origin/feature-lorawan, origin/HEAD) Merge pull request #5538 from geky/littlefs-staging
41591eb Merge pull request #5602 from artokin/nanostack_release_v704

DAPLink version:

241 - this one is also a bit old
It would be nice if Realtek could update the official DAPLINK you can download via their website.

=========================================================

ROM Version: 0.3

Build ToolChain Version: gcc version 4.8.3 (Realtek ASDK-4.8.3p1 Build 2003)

=========================================================
Check boot type form eFuse
SPI Initial
Image1 length: 0x3308, Image Addr: 0x10000bc8
Image1 Validate OK, Going jump to Image1

Expected behavior

mbed-os-example-client can run for a very long time.
For example reference testing was done with K64F, it works fine.

Actual behavior

...
simulate button_click, new value of counter is 280
simulate button_click, new value of counter is 281
simulate button_click, new value of counter is 282
simulate button_click, new value of counter is 283
CMSIS-RTOS error: ISR Queue overflow (status: 0x2, task ID: 0x0, object ID: 0x30051484)

[mbed_die]  0x0 die here

Steps to reproduce

git clone mbed-os-example-client
modify mbed_app.json with a valid SSID/WIFI-passphrase.
set connectivity method to `WIFI_RTW`
mbed compile -m REALTEK_RTL8195AM -t GCC_ARM

(Have not tried other compilers, though).

The text was updated successfully, but these errors were encountered:

JanneKiiskila · 2017-12-02T09:19:53Z

@tung7970 @Archcady

[Mirrored to Jira]

samchuarm · 2017-12-04T07:52:02Z

@ARMmbed/team-realtek
[Mirrored to Jira]

Archcady · 2017-12-04T11:42:46Z

looking into this issue, will update ASAP.
[Mirrored to Jira]

samchuarm · 2017-12-13T07:29:47Z

@Archcady
[Mirrored to Jira]

Archcady · 2017-12-20T03:57:28Z

I am trying to test and debug this issue, but it takes a long time to reproduce. Is there a way to make the handle_timer_click trigger faster so that I can test and debug the point where it fails?
[Mirrored to Jira]

tung7970 · 2017-12-20T04:12:31Z

@Archcady You can try to reduce the wait period in main.cpp. Currently it's 25 secs per loop.

updates.wait(25000);

[Mirrored to Jira]

Archcady · 2017-12-20T09:10:20Z

I updated this to 1000, still the time taken for the trigger is a lot

[Mirrored to Jira]

JanneKiiskila · 2017-12-20T10:08:35Z

We will be talking of hours, not days even with the default value.

[Mirrored to Jira]

Archcady · 2018-01-03T08:07:38Z

Have debugged this issue, the ISR queue overflow is happening as the result of the function call "release()" in Semaphore.cpp which in turn calls "osSemaphoreRelease". I checkked our driver from the realtek side, we are not calling the "release()" function at all in our code. It would be great if anyone from ARM side could help me understand this issue. These are my findings so far.

P.S. We are unable to debug with pyOCD as it is giving some error hence debugging is taking an extended amount of time.
[Mirrored to Jira]

JanneKiiskila · 2018-01-24T12:46:06Z

Hei,

you are 100% sure you are not using semaphores anywhere else? I can see at least these calls using git grep under TARGET_Realtek.

TARGET_AMEBA/sdk/os/rtx2/rtx2_service.c:                osStatus_t status = osSemaphoreRelease(p_sem->id);
TARGET_AMEBA/sdk/os/rtx2/rtx2_service.c:                osStatus_t status = osSemaphoreRelease(p_sem->id);
T

Any unbalance in acquiring those semaphores vs. releasing could potentially cause an issue, right?

[Mirrored to Jira]

MarceloSalazar · 2018-02-01T23:15:10Z

@Archcady is there any update on this?
[Mirrored to Jira]

samchuarm · 2018-02-02T06:49:10Z

Hi Marcelo, Realtek team is still trying to narrow down where the semaphore mismatch might come from.
[Mirrored to Jira]

bkht · 2018-02-07T19:40:27Z

Hi, I have reproduced the same problem.
Using the on-line compiler, I have successfully run on a NUCLEO-F746ZG: Getting started with mbed Client on mbed OS https://os.mbed.com/teams/mbed-os-examples/code/mbed-os-example-client/
More info:
https://os.mbed.com/questions/80121/mbed-Client-on-mbed-OS-CMSIS-RTOS-error-/

I found that using some library (temperature sensor), to get some real-world data, causes this problem as soon as that library gets called, say mcp9808.readTemp().
That library works fine in a simple program.

[Mirrored to Jira]

samchuarm · 2018-02-08T18:24:39Z

So does this mean the ISR queue overflow is not platform dependent? @JanneKiiskila
@Archcady any progress on this issue?
[Mirrored to Jira]

prashantrar · 2018-02-09T08:54:18Z

From realtek side, we are still debugging the issue, but ill share some of my findings here in case there could be some pointers.

The ISR queue overflow happens always at a fixed amount of time, it takes approx 62-63mins for it to occur every single time.
the issue originates in the function "osRtxPostProcess" in Rtx_system.c where "osRtxErrorNotify" is called and the program terminates.
void osRtxPostProcess (os_object_t *object) { if (isr_queue_put(object) != 0U) { if (osRtxInfo.kernel.blocked == 0U) { SetPendSV(); } else { osRtxInfo.kernel.pendSV = 1U; } } else { osRtxErrorNotify(osRtxErrorISRQueueOverflow, object); } }
The reason why "osRtxErrorNotify" is called is because just before the crash inside the function "" the "if" condition gets executed 16times.
if (isr_queue_put(object) != 0U) { if (osRtxInfo.kernel.blocked == 0U) { SetPendSV(); }
the "kernel.blocked" check fails and hence the same condition gets called 16 tines, and the size of the ISR queue defined is 16 and hence the queue overflows.
Surprisingly in the function "osRtxPostProcess " if i comment the call to "osRtxErrorNotify" then the program runs forever without issues.
Also in case I modify the example code and make it such that the semaphore release is done with a software timer rather than using the ticker, this issue dosent happen. Only when the ticker is used to release the semaphore, this issue is reproducible.

P.S. I am still debugging the issue, these are just my findings, if anyone from arm team could get some pointers from these findings and highlight anything that I am missing, please kindly help out.
[Mirrored to Jira]

JanneKiiskila · 2018-02-09T14:18:19Z

There is something that's board specific (or driver specific) - K64F does not have this issue.

But, clearly it's now something that's impacting more than one board, if this happens also with NUCLEO-F746ZG.

[Mirrored to Jira]

samchuarm · 2018-02-13T16:04:19Z

Hi @JanneKiiskila do you think can commenting out osRtxErrorNotify in osRtxPostProcess or switching to use software timer in semaphore release be the fix?
[Mirrored to Jira]

JanneKiiskila · 2018-02-13T19:33:49Z

I will admit my own limited knowledge at this stage and say I don't know. @kjbracey-arm, @geky, @sg- , or other Mbed OS team members would know better.

[Mirrored to Jira]

kjbracey · 2018-02-14T07:41:26Z

This is something it's quite easy to hit in RTX when using any RTOS operations from interrupt context. The RTOS work is always deferred onto this queue, so if you do 16 consecutive RTOS operations from interrupt before returning to thread context, it overflows.

I've raised one issue here for RTX suggesting how this could be improved, at least for flags. Not sure if the same logic could apply to semaphores. Maybe?

ARM-software/CMSIS_5#283

Pending any RTX improvement, it's usually best to work around the issue by including logic to make sure you don't signal multiple consecutive times from interrupt. Some sort of "pending" flag which is cleared by the person who is monitoring the semaphore.

Do we really have no information about where the interrupt-context semaphore release triggering this is is coming from? No backtrace?

[Mirrored to Jira]

samchuarm · 2018-02-26T10:13:20Z

@prashantrar @ARMmbed/team-realtek
[Mirrored to Jira]

prashantrar · 2018-02-26T11:42:43Z

We are having difficulty taking backtraces because the second the crash happens the stack is corrupt, but it originates from semaphore release all the time, beyond this the backtrace is unable to point out to specific functions usually just shows " ?? ()" in the backtrace. I will try to get proper backtraces once again tomorrow and update this ticket.
[Mirrored to Jira]

prashantrar · 2018-03-02T03:14:45Z

@kjbracey-arm I am updating the latest backtrace with all the latest mbed-os components.

#0  osRtxErrorNotify () at .\mbed-os\rtos\TARGET_CORTEX\mbed_rtx_handlers.c  
#1  0x3001bb74 in isrRtxSemaphoreRelease ()  
at .\mbed-os\rtos\TARGET_CORTEX\rtx5\RTX\Source\rtx_semaphore.c:414  
#2  osSemaphoreRelease ()  
at .\mbed-os\rtos\TARGET_CORTEX\rtx5\RTX\Source\rtx_semaphore.c:461  
#3  0x300193f2 in ticker_irq_handler () at .\mbed-os\hal\mbed_ticker_api.c:  
#4  0x30022ff4 in HalTimerIrq2To7Handle_Patch (Data=<optimized out>)  
at ../../TARGET_Realtek/TARGET_AMEBA/TARGET_RTL8195A/device/rtl8195a_ti  
:45  
#5  0x000035de in ?? ()  
Backtrace stopped: previous frame identical to this frame (corrupt stack?)

[Mirrored to Jira]

0xc0170 · 2018-07-25T12:51:15Z

@ARMmbed/team-realtek @JanneKiiskila Is this still a blocker and issue has not yet been fixed?
[Mirrored to Jira]

samchuarm · 2018-09-20T08:30:35Z

@M-ichae-l , can you confirm if this issue has been addressed?
[Mirrored to Jira]

adbridge · 2018-10-04T12:22:43Z

Internal Jira reference: https://jira.arm.com/browse/IOTPART-5928

MarceloSalazar · 2020-04-14T10:30:12Z

Closing as target won't be supported in Mbed 6 - #12775

JanneKiiskila changed the title ~~Realtek RTL8195AM -~~ Realtek RTL8195AM - CMSIS-RTOS error: ISR Queue overflow (status: 0x2, task ID: 0x0, object ID: 0x30051484) Dec 2, 2017

JanneKiiskila mentioned this issue Dec 3, 2017

General instability on REALTEK_RTL8195AM (GCC_ARM) ARMmbed/mbed-os-example-client#306

Closed

0xc0170 added the devices: realtek label Dec 4, 2017

MarceloSalazar mentioned this issue Dec 8, 2017

Realtek RTL8195AM support ARMmbed/mbed-os-example-client#343

Merged

ciarmcom added the mirrored label Jun 1, 2018

ARMmbed deleted a comment from ciarmcom Oct 2, 2018

adbridge added the Jira status: OPEN label Oct 2, 2018

0xc0170 added the type: bug label Oct 24, 2019

MarceloSalazar closed this as completed Apr 14, 2020

ciarmcom added Jira status: CLOSED and removed Jira status: OPEN labels Apr 14, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Realtek RTL8195AM - CMSIS-RTOS error: ISR Queue overflow (status: 0x2, task ID: 0x0, object ID: 0x30051484) #5640

Realtek RTL8195AM - CMSIS-RTOS error: ISR Queue overflow (status: 0x2, task ID: 0x0, object ID: 0x30051484) #5640

JanneKiiskila commented Dec 2, 2017 •

edited by 0xc0170

Loading

JanneKiiskila commented Dec 2, 2017 •

edited by adbridge

Loading

samchuarm commented Dec 4, 2017 •

edited by adbridge

Loading

Archcady commented Dec 4, 2017 •

edited by adbridge

Loading

samchuarm commented Dec 13, 2017 •

edited by adbridge

Loading

Archcady commented Dec 20, 2017 •

edited by adbridge

Loading

tung7970 commented Dec 20, 2017 •

edited by adbridge

Loading

Archcady commented Dec 20, 2017 •

edited by adbridge

Loading

JanneKiiskila commented Dec 20, 2017 •

edited by adbridge

Loading

Archcady commented Jan 3, 2018 •

edited by adbridge

Loading

JanneKiiskila commented Jan 24, 2018 •

edited by adbridge

Loading

MarceloSalazar commented Feb 1, 2018 •

edited by adbridge

Loading

samchuarm commented Feb 2, 2018 •

edited by adbridge

Loading

bkht commented Feb 7, 2018 •

edited by adbridge

Loading

samchuarm commented Feb 8, 2018 •

edited by adbridge

Loading

prashantrar commented Feb 9, 2018 •

edited by adbridge

Loading

JanneKiiskila commented Feb 9, 2018 •

edited by adbridge

Loading

samchuarm commented Feb 13, 2018 •

edited by adbridge

Loading

JanneKiiskila commented Feb 13, 2018 •

edited by adbridge

Loading

kjbracey commented Feb 14, 2018 •

edited by adbridge

Loading

samchuarm commented Feb 26, 2018 •

edited by adbridge

Loading

prashantrar commented Feb 26, 2018 •

edited by adbridge

Loading

prashantrar commented Mar 2, 2018 •

edited by adbridge

Loading

0xc0170 commented Jul 25, 2018 •

edited by adbridge

Loading

samchuarm commented Sep 20, 2018 •

edited by adbridge

Loading

adbridge commented Oct 4, 2018

MarceloSalazar commented Apr 14, 2020

Realtek RTL8195AM - CMSIS-RTOS error: ISR Queue overflow (status: 0x2, task ID: 0x0, object ID: 0x30051484) #5640

Realtek RTL8195AM - CMSIS-RTOS error: ISR Queue overflow (status: 0x2, task ID: 0x0, object ID: 0x30051484) #5640

Comments

JanneKiiskila commented Dec 2, 2017 • edited by 0xc0170 Loading

Description

Bug

JanneKiiskila commented Dec 2, 2017 • edited by adbridge Loading

samchuarm commented Dec 4, 2017 • edited by adbridge Loading

Archcady commented Dec 4, 2017 • edited by adbridge Loading

samchuarm commented Dec 13, 2017 • edited by adbridge Loading

Archcady commented Dec 20, 2017 • edited by adbridge Loading

tung7970 commented Dec 20, 2017 • edited by adbridge Loading

Archcady commented Dec 20, 2017 • edited by adbridge Loading

JanneKiiskila commented Dec 20, 2017 • edited by adbridge Loading

Archcady commented Jan 3, 2018 • edited by adbridge Loading

JanneKiiskila commented Jan 24, 2018 • edited by adbridge Loading

MarceloSalazar commented Feb 1, 2018 • edited by adbridge Loading

samchuarm commented Feb 2, 2018 • edited by adbridge Loading

bkht commented Feb 7, 2018 • edited by adbridge Loading

samchuarm commented Feb 8, 2018 • edited by adbridge Loading

prashantrar commented Feb 9, 2018 • edited by adbridge Loading

JanneKiiskila commented Feb 9, 2018 • edited by adbridge Loading

samchuarm commented Feb 13, 2018 • edited by adbridge Loading

JanneKiiskila commented Feb 13, 2018 • edited by adbridge Loading

kjbracey commented Feb 14, 2018 • edited by adbridge Loading

samchuarm commented Feb 26, 2018 • edited by adbridge Loading

prashantrar commented Feb 26, 2018 • edited by adbridge Loading

prashantrar commented Mar 2, 2018 • edited by adbridge Loading

0xc0170 commented Jul 25, 2018 • edited by adbridge Loading

samchuarm commented Sep 20, 2018 • edited by adbridge Loading

adbridge commented Oct 4, 2018

MarceloSalazar commented Apr 14, 2020

JanneKiiskila commented Dec 2, 2017 •

edited by 0xc0170

Loading

JanneKiiskila commented Dec 2, 2017 •

edited by adbridge

Loading

samchuarm commented Dec 4, 2017 •

edited by adbridge

Loading

Archcady commented Dec 4, 2017 •

edited by adbridge

Loading

samchuarm commented Dec 13, 2017 •

edited by adbridge

Loading

Archcady commented Dec 20, 2017 •

edited by adbridge

Loading

tung7970 commented Dec 20, 2017 •

edited by adbridge

Loading

Archcady commented Dec 20, 2017 •

edited by adbridge

Loading

JanneKiiskila commented Dec 20, 2017 •

edited by adbridge

Loading

Archcady commented Jan 3, 2018 •

edited by adbridge

Loading

JanneKiiskila commented Jan 24, 2018 •

edited by adbridge

Loading

MarceloSalazar commented Feb 1, 2018 •

edited by adbridge

Loading

samchuarm commented Feb 2, 2018 •

edited by adbridge

Loading

bkht commented Feb 7, 2018 •

edited by adbridge

Loading

samchuarm commented Feb 8, 2018 •

edited by adbridge

Loading

prashantrar commented Feb 9, 2018 •

edited by adbridge

Loading

JanneKiiskila commented Feb 9, 2018 •

edited by adbridge

Loading

samchuarm commented Feb 13, 2018 •

edited by adbridge

Loading

JanneKiiskila commented Feb 13, 2018 •

edited by adbridge

Loading

kjbracey commented Feb 14, 2018 •

edited by adbridge

Loading

samchuarm commented Feb 26, 2018 •

edited by adbridge

Loading

prashantrar commented Feb 26, 2018 •

edited by adbridge

Loading

prashantrar commented Mar 2, 2018 •

edited by adbridge

Loading

0xc0170 commented Jul 25, 2018 •

edited by adbridge

Loading

samchuarm commented Sep 20, 2018 •

edited by adbridge

Loading