Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SMP trace status #412

Open
yf13 opened this issue Sep 30, 2024 · 9 comments
Open

SMP trace status #412

yf13 opened this issue Sep 30, 2024 · 9 comments

Comments

@yf13
Copy link

yf13 commented Sep 30, 2024

I am debugging a SMP port and noticed the sample works fine w/o trace. However when trace is enabled they are some strange issues.

@billlamiework and @goldscott, can you teach status of trace APIs in SMP ThreadX? do they work with any other SMP ports?

I noticed that core id is embedded in most sigificant byte of event id, but not sure if there are any more SMP events yet to be added? Also from TraceX, there is no special reports on SMP trace file as well.

@billlamiework
Copy link

billlamiework commented Oct 14, 2024 via email

@yf13
Copy link
Author

yf13 commented Oct 14, 2024

It would be good to understand better the strange issues that occur when trace is enabled and used, e.g., are the issues related to the operation of ThreadX or problems with the trace buffer?

@billamiework, thanks for asking. I think it more relates to ThreadX operations of my port.

My sample rotates two trace buffers upon buffer full and a thread_8 is added to record full buffer to files. The sample quits when thread_0_counter reaches 1000 (with 1 tick sleep), terminates other app threads, print out statistics and quit QEMU.

So console log of a successful run looks like below, with 97 trace buffer rotates:

Copyright (c) 2024 Eclipse Foundation. * ThreadX SMP/RISC-V/GNU Version 6.4.x *
**** ThreadX RiscV Demonstration **** (c) 1996-2024 Eclipse Foundation

           thread 0 events sent:          1000
           thread 1 messages sent:        500
           thread 2 messages received:    501
           thread 3 obtained semaphore:   251
           thread 4 obtained semaphore:   251
           thread 5 events received:      1001
           thread 6 mutex obtained:       251
           thread 7 mutex obtained:       251
           thread 8 messages received:    97

Stack depths: 824, 776, 764, 692, 776, 884, 816, 820, 484, 644
SMP inter-core interrupts: 641, 1006

Below is that for typical failure case with a weird exception:

Thread_2.0 error: got 0.525322, expect -0.432178(46)

Instruction access fault(1) #0 epc: 0 tva: 0 mstatus: 1880 lvl:1 thr:thread 2, cnt:94 sp: 80081c20(800816ac 80081ca7) ra:0

I tried skipping the data recording, but the issues still occur, so it seems not like trace buffer contention from the recording thread.

If trace have been used with other SMP ports before then it is likely an issue specific to my port.

@yf13
Copy link
Author

yf13 commented Oct 15, 2024

@billamiework, I noticed that trace feature has been used with other upstream SMP ports but they don't rotate trace buffers.

So I commented out buffer full notifier registration in my sample and it works as good as trace disabled situation.

So my buffer full handler is more suspicious:

/* Trace buffer full handler, called within critical section */
void trace_buffer_rotate(void *buffer)
{
void *new;

    tx_trace_disable();

    /* Switch buffer */
    if (buffer == &trace_buffer0) {
        new = &trace_buffer1;
    } else {
        new = &trace_buffer0;
    }
    trace_buffer_setup(new);

    /* Send messag to notify the logging thread  */
    tx_queue_send(&queue_1, &buffer, TX_NO_WAIT);
}

What is the right way for buffer rotation?

@billlamiework
Copy link

billlamiework commented Oct 15, 2024 via email

@yf13
Copy link
Author

yf13 commented Oct 15, 2024

@billlamiework, thanks again for continuing this.

It would be good to check the thread or ISR stack availability where this
callback occurs.

For rv32, stacks for app and timers threads are of 1536 bytes, as per the logs from program finish, they look fine so far. The system stacks are of 2048 bytes, I need see how to track the system stack usage, I might need add some code to track its depth for sure.

For rv64, stacks for app and timer threads are 2048 bytes, the system stacks are of 4096 bytes. here is one log of rv64 success case:

Copyright (c) 2024 Eclipse Foundation. * ThreadX SMP/RISC-V/GNU Version 6.4.x *
**** ThreadX RiscV Demonstration **** (c) 1996-2024 Eclipse Foundation

           thread 0 events sent:          1000
           thread 1 messages sent:        503
           thread 2 messages received:    504
           thread 3 obtained semaphore:   253
           thread 4 obtained semaphore:   252
           thread 5 events received:      1001
           thread 6 mutex obtained:       251
           thread 7 mutex obtained:       251
           thread 8 messages received:    97

Stack depths: 1048, 1184, 1184, 1184, 1184, 1172, 1192, 1204, 768, 1096
SMP inter-core interrupts: 1424, 554

It might be interesting to know if there is any consistency on where the callback is being called from when the crash occurs.

The log shows that exception happened to different app threads. Is there a simple way to find where the callback is called?

Does it work sometimes, or does it crash every time this is called?

The demo works sometimes, but often fails with two typical failure cases:

  • one is above mentioned exception;
  • the other is stalled QEMU which I have to quit forcedly.

what is going on inside of “trace_buffer_setup"?

Here it is:

void trace_buffer_setup(void *buffer)
{
UINT ret;
    ret = tx_trace_enable(buffer, DEMO_TRACE_SIZE, DEMO_TRACE_OBJECTS);
    if (ret != TX_SUCCESS){
        printf("tx_trace_enable failed: %d", ret);
        exit(1);
    }
    ret = tx_trace_buffer_full_notify(trace_buffer_rotate);
    if (ret != TX_SUCCESS) {
        printf("failed to install buffer full handler: %d", ret);
        exit(2);
    }
}

I haven't found the API to set event buffer only and reuse existing header and object registry, so this is what I have now.

@billlamiework
Copy link

billlamiework commented Oct 16, 2024 via email

@yf13
Copy link
Author

yf13 commented Oct 16, 2024

Instead, I suggest using the buffer full callback notification to wake up a thread, which will then perform the trace buffer swap.

Thank you @billlamiework, I guess you mean to keep tx_queue_send() only in the callback? Will give this a try later as I am using a bigger buffer currently.

As a side topic, I am wondering if we shall enrich trace design later so that more SMP related events are supported?

@billlamiework
Copy link

billlamiework commented Oct 17, 2024 via email

@yf13
Copy link
Author

yf13 commented Oct 17, 2024

@billamiework, I tried following buffer full callback and off-load rotation to a thread:

/* The buffer full callback */
void trace_buffer_full(void *buffer)
{
    tx_trace_disable();
    tx_queue_send(&queue_1, &buffer, TX_NO_WAIT);
}

/* Trace rotation for thread #8 */
void trace_buffer_rotate(void *buffer)
{
TX_INTERRUPT_SAVE_AREA
void *new;

    TX_DISABLE
    if (buffer == &trace_buffer0) {
        new = &trace_buffer1;
    } else {
        new = &trace_buffer0;
    }
    trace_buffer_setup(new);
    TX_RESTORE
}

/* Enable trace with given buffer */
void trace_buffer_setup(void *buffer)
{
UINT ret;
    ret = tx_trace_enable(buffer, DEMO_TRACE_SIZE, DEMO_TRACE_OBJECTS);
    if (ret != TX_SUCCESS){
        printf("tx_trace_enable failed: %d", ret);
        exit(1);
    }
    ret = tx_trace_buffer_full_notify(trace_buffer_full);
    if (ret != TX_SUCCESS) {
        printf("failed to install trace buffer fully handler: %d", ret);
        exit(2);
    }
}

Unfortunately it doesn't help yet, there are more frequent stalls and errors.
I am wondering if trace buffer rotation works on other SMP ports?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants