SMP trace status #412

yf13 · 2024-09-30T09:46:04Z

I am debugging a SMP port and noticed the sample works fine w/o trace. However when trace is enabled they are some strange issues.

@billlamiework and @goldscott, can you teach status of trace APIs in SMP ThreadX? do they work with any other SMP ports?

I noticed that core id is embedded in most sigificant byte of event id, but not sure if there are any more SMP events yet to be added? Also from TraceX, there is no special reports on SMP trace file as well.

billlamiework · 2024-10-14T17:21:21Z

It would be good to understand better the strange issues that occur when trace is enabled and used, e.g., are the issues related to the operation of ThreadX or problems with the trace buffer? From: yf13 ***@***.***> Sent: Sunday, October 13, 2024 5:10 AM To: eclipse-threadx/threadx ***@***.***> Cc: Bill Lamie ***@***.***>; Mention ***@***.***> Subject: [eclipse-threadx/threadx] SMP trace status (Issue #412) I am debugging a SMP port and noticed the sample works fine w/o trace. However when trace is enabled they are some strange issues. @billlamiework<https://github.com/billlamiework> and @goldscott<https://github.com/goldscott>, I am wondering status of trace APIs in SMP ThreadX? does it work with other SMP ports? I noticed that core id is embedded in most sigificant byte of event id, but not sure if there are any more SMP events yet to be added? Also from TraceX, there is no special reports on SMP trace file as well. — Reply to this email directly, view it on GitHub<#412>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/BDUEV3EJLWA27E7R4X6RUVDZ3JPKNAVCNFSM6AAAAABPC4KQDKVHI2DSMVQWIX3LMV43ASLTON2WKOZSGU2TMMJSGYZDMMA>. You are receiving this because you were mentioned.Message ID: ***@***.******@***.***>>

yf13 · 2024-10-14T23:05:01Z

It would be good to understand better the strange issues that occur when trace is enabled and used, e.g., are the issues related to the operation of ThreadX or problems with the trace buffer?

@billamiework, thanks for asking. I think it more relates to ThreadX operations of my port.

My sample rotates two trace buffers upon buffer full and a thread_8 is added to record full buffer to files. The sample quits when thread_0_counter reaches 1000 (with 1 tick sleep), terminates other app threads, print out statistics and quit QEMU.

So console log of a successful run looks like below, with 97 trace buffer rotates:

Copyright (c) 2024 Eclipse Foundation. * ThreadX SMP/RISC-V/GNU Version 6.4.x *
**** ThreadX RiscV Demonstration **** (c) 1996-2024 Eclipse Foundation

           thread 0 events sent:          1000
           thread 1 messages sent:        500
           thread 2 messages received:    501
           thread 3 obtained semaphore:   251
           thread 4 obtained semaphore:   251
           thread 5 events received:      1001
           thread 6 mutex obtained:       251
           thread 7 mutex obtained:       251
           thread 8 messages received:    97

Stack depths: 824, 776, 764, 692, 776, 884, 816, 820, 484, 644
SMP inter-core interrupts: 641, 1006

Below is that for typical failure case with a weird exception:

Thread_2.0 error: got 0.525322, expect -0.432178(46)

Instruction access fault(1) #0 epc: 0 tva: 0 mstatus: 1880 lvl:1 thr:thread 2, cnt:94 sp: 80081c20(800816ac 80081ca7) ra:0

I tried skipping the data recording, but the issues still occur, so it seems not like trace buffer contention from the recording thread.

If trace have been used with other SMP ports before then it is likely an issue specific to my port.

yf13 · 2024-10-15T11:36:18Z

@billamiework, I noticed that trace feature has been used with other upstream SMP ports but they don't rotate trace buffers.

So I commented out buffer full notifier registration in my sample and it works as good as trace disabled situation.

So my buffer full handler is more suspicious:

/* Trace buffer full handler, called within critical section */
void trace_buffer_rotate(void *buffer)
{
void *new;

    tx_trace_disable();

    /* Switch buffer */
    if (buffer == &trace_buffer0) {
        new = &trace_buffer1;
    } else {
        new = &trace_buffer0;
    }
    trace_buffer_setup(new);

    /* Send messag to notify the logging thread  */
    tx_queue_send(&queue_1, &buffer, TX_NO_WAIT);
}

What is the right way for buffer rotation?

billlamiework · 2024-10-15T17:49:38Z

It is good that you have narrowed it down to this callback handler… It would be good to check the thread or ISR stack availability where this callback occurs. Memory corruption from stack overflow is an easy way to cause a system failure. It might be interesting to know if there is any consistency on where the callback is being called from when the crash occurs. Does it work sometimes, or does it crash every time this is called? Also, what is going on inside of “trace_buffer_setup”? From: yf13 ***@***.***> Sent: Tuesday, October 15, 2024 4:37 AM To: eclipse-threadx/threadx ***@***.***> Cc: Bill Lamie ***@***.***>; Mention ***@***.***> Subject: Re: [eclipse-threadx/threadx] SMP trace status (Issue #412) @billamiework, I noticed that trace feature has been used with other upstream SMP ports but they don't rotate trace buffers. So I commented out buffer full notifier registration in my sample and it works as good as trace disabled situation. So it seems that buffer rotation might be more suspicious: /* Trace buffer full handler, called within critical section */ void trace_buffer_rotate(void *buffer) { void *new; tx_trace_disable(); /* Switch buffer */ if (buffer == &trace_buffer0) { new = &trace_buffer1; } else { new = &trace_buffer0; } trace_buffer_setup(new); /* Send messag to notify the logging thread */ tx_queue_send(&queue_1, &buffer, TX_NO_WAIT); } Is this the right use of buffer rotation? — Reply to this email directly, view it on GitHub<#412 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/BDUEV3B5BNHIM7QWXINS3VLZ3T445AVCNFSM6AAAAABPC4KQDKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDIMJTGY2DKOJXHA>. You are receiving this because you were mentioned.Message ID: ***@***.******@***.***>>

yf13 · 2024-10-15T21:09:31Z

@billlamiework, thanks again for continuing this.

It would be good to check the thread or ISR stack availability where this
callback occurs.

For rv32, stacks for app and timers threads are of 1536 bytes, as per the logs from program finish, they look fine so far. The system stacks are of 2048 bytes, I need see how to track the system stack usage, I might need add some code to track its depth for sure.

For rv64, stacks for app and timer threads are 2048 bytes, the system stacks are of 4096 bytes. here is one log of rv64 success case:

Copyright (c) 2024 Eclipse Foundation. * ThreadX SMP/RISC-V/GNU Version 6.4.x *
**** ThreadX RiscV Demonstration **** (c) 1996-2024 Eclipse Foundation

           thread 0 events sent:          1000
           thread 1 messages sent:        503
           thread 2 messages received:    504
           thread 3 obtained semaphore:   253
           thread 4 obtained semaphore:   252
           thread 5 events received:      1001
           thread 6 mutex obtained:       251
           thread 7 mutex obtained:       251
           thread 8 messages received:    97

Stack depths: 1048, 1184, 1184, 1184, 1184, 1172, 1192, 1204, 768, 1096
SMP inter-core interrupts: 1424, 554

It might be interesting to know if there is any consistency on where the callback is being called from when the crash occurs.

The log shows that exception happened to different app threads. Is there a simple way to find where the callback is called?

Does it work sometimes, or does it crash every time this is called?

The demo works sometimes, but often fails with two typical failure cases:

one is above mentioned exception;
the other is stalled QEMU which I have to quit forcedly.

what is going on inside of “trace_buffer_setup"?

Here it is:

void trace_buffer_setup(void *buffer)
{
UINT ret;
    ret = tx_trace_enable(buffer, DEMO_TRACE_SIZE, DEMO_TRACE_OBJECTS);
    if (ret != TX_SUCCESS){
        printf("tx_trace_enable failed: %d", ret);
        exit(1);
    }
    ret = tx_trace_buffer_full_notify(trace_buffer_rotate);
    if (ret != TX_SUCCESS) {
        printf("failed to install buffer full handler: %d", ret);
        exit(2);
    }
}

I haven't found the API to set event buffer only and reuse existing header and object registry, so this is what I have now.

billlamiework · 2024-10-16T17:32:51Z

The trace_buffer_setup processing is non-trivial and, as implemented, could be called from ISR processing. Instead, I suggest using the buffer full callback notification to wake up a thread, which will then perform the trace buffer swap. I don’t know if this is the actual problem… but it’s hard to tell without digging in with the debugger. From: yf13 ***@***.***> Sent: Tuesday, October 15, 2024 2:10 PM To: eclipse-threadx/threadx ***@***.***> Cc: Bill Lamie ***@***.***>; Mention ***@***.***> Subject: Re: [eclipse-threadx/threadx] SMP trace status (Issue #412) @billlamiework<https://github.com/billlamiework>, thanks again for continuing this. It would be good to check the thread or ISR stack availability where this callback occurs. For rv32, stacks for app and timers threads are of 1536 bytes, as per the logs from program finish, they look fine so far. The system stacks are of 2048 bytes, I need see how to track the system stack usage, I might need add some code to track its depth for sure. For rv64, stacks for app and timer threads are 2048 bytes, the system stacks are of 4096 bytes. here is one log of rv64 success case: Copyright (c) 2024 Eclipse Foundation. * ThreadX SMP/RISC-V/GNU Version 6.4.x * **** ThreadX RiscV Demonstration **** (c) 1996-2024 Eclipse Foundation thread 0 events sent: 1000 thread 1 messages sent: 503 thread 2 messages received: 504 thread 3 obtained semaphore: 253 thread 4 obtained semaphore: 252 thread 5 events received: 1001 thread 6 mutex obtained: 251 thread 7 mutex obtained: 251 thread 8 messages received: 97 Stack depths: 1048, 1184, 1184, 1184, 1184, 1172, 1192, 1204, 768, 1096 SMP inter-core interrupts: 1424, 554 It might be interesting to know if there is any consistency on where the callback is being called from when the crash occurs. The log shows that exception happened to different app threads. Is there a simple way to find where the callback is called? Does it work sometimes, or does it crash every time this is called? The demo works sometimes, but often fails. Typical failure behaviours are two: one is above mentioned exception, the other is stalled QEMU which I have to quit forcedly. what is going on inside of “trace_buffer_setup"? Here it is: void trace_buffer_setup(void *buffer) { UINT ret; ret = tx_trace_enable(buffer, DEMO_TRACE_SIZE, DEMO_TRACE_OBJECTS); if (ret != TX_SUCCESS){ printf("tx_trace_enable failed: %d", ret); exit(1); } ret = tx_trace_buffer_full_notify(trace_buffer_rotate); if (ret != TX_SUCCESS) { printf("failed to install buffer full handler: %d", ret); exit(2); } } I haven't find an API to just set the event buffer and keep reusing existing header and object registry, so this is what I have now. — Reply to this email directly, view it on GitHub<#412 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/BDUEV3EW5F52VHOIKRB7Q5TZ3WACDAVCNFSM6AAAAABPC4KQDKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDIMJVGE3TCMZWGE>. You are receiving this because you were mentioned.Message ID: ***@***.******@***.***>>

yf13 · 2024-10-16T22:29:18Z

Instead, I suggest using the buffer full callback notification to wake up a thread, which will then perform the trace buffer swap.

Thank you @billlamiework, I guess you mean to keep tx_queue_send() only in the callback? Will give this a try later as I am using a bigger buffer currently.

As a side topic, I am wondering if we shall enrich trace design later so that more SMP related events are supported?

billlamiework · 2024-10-17T18:00:20Z

Sounds good. Please let me know how it goes. I’m happy to discuss other issues/improvements at ***@***.******@***.***> From: yf13 ***@***.***> Sent: Wednesday, October 16, 2024 3:30 PM To: eclipse-threadx/threadx ***@***.***> Cc: Bill Lamie ***@***.***>; Mention ***@***.***> Subject: Re: [eclipse-threadx/threadx] SMP trace status (Issue #412) Instead, I suggest using the buffer full callback notification to wake up a thread, which will then perform the trace buffer swap. Thank you @billlamiework<https://github.com/billlamiework>, I guess you mean to keep tx_queue_send() only in the callback? Will give this a try later as I am using a bigger buffer currently. As a side topic, I am wondering if we shall enrich trace design later so that more SMP related events are supported? — Reply to this email directly, view it on GitHub<#412 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/BDUEV3B6KDBOFOTU3KQ4CLDZ33SFJAVCNFSM6AAAAABPC4KQDKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDIMJYGA3TMOBUGY>. You are receiving this because you were mentioned.Message ID: ***@***.******@***.***>>

yf13 · 2024-10-17T22:05:25Z

@billamiework, I tried following buffer full callback and off-load rotation to a thread:

/* The buffer full callback */
void trace_buffer_full(void *buffer)
{
    tx_trace_disable();
    tx_queue_send(&queue_1, &buffer, TX_NO_WAIT);
}

/* Trace rotation for thread #8 */
void trace_buffer_rotate(void *buffer)
{
TX_INTERRUPT_SAVE_AREA
void *new;

    TX_DISABLE
    if (buffer == &trace_buffer0) {
        new = &trace_buffer1;
    } else {
        new = &trace_buffer0;
    }
    trace_buffer_setup(new);
    TX_RESTORE
}

/* Enable trace with given buffer */
void trace_buffer_setup(void *buffer)
{
UINT ret;
    ret = tx_trace_enable(buffer, DEMO_TRACE_SIZE, DEMO_TRACE_OBJECTS);
    if (ret != TX_SUCCESS){
        printf("tx_trace_enable failed: %d", ret);
        exit(1);
    }
    ret = tx_trace_buffer_full_notify(trace_buffer_full);
    if (ret != TX_SUCCESS) {
        printf("failed to install trace buffer fully handler: %d", ret);
        exit(2);
    }
}

Unfortunately it doesn't help yet, there are more frequent stalls and errors.
I am wondering if trace buffer rotation works on other SMP ports?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SMP trace status #412

SMP trace status #412

yf13 commented Sep 30, 2024 •

edited

Loading

billlamiework commented Oct 14, 2024 via email

yf13 commented Oct 14, 2024 •

edited

Loading

yf13 commented Oct 15, 2024 •

edited

Loading

billlamiework commented Oct 15, 2024 via email

yf13 commented Oct 15, 2024 •

edited

Loading

billlamiework commented Oct 16, 2024 via email

yf13 commented Oct 16, 2024

billlamiework commented Oct 17, 2024 via email

yf13 commented Oct 17, 2024 •

edited

Loading

SMP trace status #412

SMP trace status #412

Comments

yf13 commented Sep 30, 2024 • edited Loading

billlamiework commented Oct 14, 2024 via email

yf13 commented Oct 14, 2024 • edited Loading

yf13 commented Oct 15, 2024 • edited Loading

billlamiework commented Oct 15, 2024 via email

yf13 commented Oct 15, 2024 • edited Loading

billlamiework commented Oct 16, 2024 via email

yf13 commented Oct 16, 2024

billlamiework commented Oct 17, 2024 via email

yf13 commented Oct 17, 2024 • edited Loading

yf13 commented Sep 30, 2024 •

edited

Loading

yf13 commented Oct 14, 2024 •

edited

Loading

yf13 commented Oct 15, 2024 •

edited

Loading

yf13 commented Oct 15, 2024 •

edited

Loading

yf13 commented Oct 17, 2024 •

edited

Loading