-
Notifications
You must be signed in to change notification settings - Fork 7.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
serial output using ESP32-S3 usb_serial_jtag controller is unreliable (IDFGH-7781) #9318
Comments
i'm not an expert on this but did you try latest master? there was an issue with usb-cdc lines getting eaten, which was fixed about 2 weeks ago. |
The version listed is the latest master (even still as of today - 09 July 2022). |
ok, i have a hunch. can you apply this patch? it might solve the problem. remove the .txt extension, place it in
and run this command (in that folder):
|
@moefear85 I'm afraid the file you are suggesting to modify is related to the software USB CDC driver for the USB_OTG peripheral. The reported issue is related to the USB_SERIAL_JTAG peripheral which has a hardware CDC implementation. (I have modified the issue title to avoid confusion.) |
oh ok, i don't have any experience with esp32-s3 yet. I have some modules lying around, but haven't broken them out yet (haven't found the time). |
To reduce variables, instead of your application, can you try the standard "basic console" example project? (At That has its own "help" output, which gives me:
It doesn't seem corrupt, but I haven't done a line by line check. |
good idea. although that wouldn't catch any hidden variables that manifest themselves under stress, (which may be at play here, otherwise everybody would have faced the issue), it can rule out systematic problems, such as bad connectors/cables or buggy drivers. |
Don't be so sure it would have come up already! (Though somehow the bootloader works...?) I've reported and worked through more broken things in widely-used software. People put up and work around a surprising amount of brokenness.
[edit: as sweetly pointed out, I was confused about who I was talking to] |
it's premature to speak of silicon errata. The chance of you catching one here is small. Something like this would be reported by many users, this isn't something one can easily workaround, especially given all the tiny things people rush to the issues sections to complain about. hate to break/dump it on you, but obviously you're confused, since I'm not the one who reported the bug. read closely. and repeating the obvious doesn't help much. you may think you're a hotshot somewhere else, but that has little relevance to answering the question. stick to the topic instead. |
The small example I mentioned in the main ticket was indeed the basic console example from the IDF. I have not been able to reproduce this in a small test case. Note that in the app I have, I am not only dropping lines from the 'help' command -- many operations will emit several lines of text (usually via ESP_LOGx), and that output is also irregular (subsequent examples will show differing results, sometimes even "correctly"). The 'help' output is just the easiest and most obvious example. @moefear85 : Just to be able to rule out certain systemic issues, I have seen this on all of my ESP32-S3 devices -- DevKitC, UM Feather-S3, Adafruit ESP32-S3 Feather. I've used a variety of cables (as the DevKit C are USB-micro, and the other two are USB-C) using multiple upstream host ports (USB-C & USB-A) so I am fairly confident in ruling out a specific connector or cable failure. It's also the case that any given line that does get output seems to always be correct. The errors entail entire lines being dropped (some of which are just blank lines). There are no garbled or missing characters within a line. As far as drivers go, when I use my Pi4 as the development and USB host, I see the same problems as well. |
Ah, bummer. I know you posted a binary, is it possible to share source that reproduces the problem? Or is that too proprietary? (Mostly I'm trying to rule out that something else in your app is messing with, say, interrupt handling in a way that's causing lost data.) |
@davidzuhn We have managed to reproduce a very similar looking issue with an ESP32-S3 development board and with USB_SERIAL_JTAG. However the issue only happens with Could you please give |
Using minicom and miniterm, I see similar results as when using 'idf.py monitor'. As for sharing the source to the app, I cannot distribute it in any public fashion. |
I suspect that this is a problem in the monitor code. In particular, if the monitor reads "\r\n" from the (virtual) serial port, then it outputs "\r\n". However, if it reads "\r" and then "\n", it only outputs "\r". I think that the root cause of this is https://github.com/espressif/esp-idf/blob/release/v4.4/tools/idf_monitor_base/serial_handler.py#L85 However, getting rid of that line causes extra newlines in the output, but does fix the missing newlines. I'm going to try and figure out what is going on..... |
I think that this is the fix:
|
While that may indeed be an issue, the problem that I see does not lie solely in the monitor.py program. I see the same unusual results when using minicom as when I use monitor.py. I will also note that I have updated my development tree to ESP-IDF v5.0-dev-4547-gfde4afc67a and I see the same issues When I have your patch applied (to that version of the esp-idf), I see different flawed output. Using the same example, which has seen some changes since the initial report, but nothing in the area of stdio & console config, I now see some lines merged. In the initial problem, lines (including blank lines) are sometimes dropped. Now I see output like this:
Compared to the earlier "good text", the lines about the 'battery' command are no longer expected, but the 'wifi support' text is immediately followed by the 'tasks' command header. |
Are you using the driver to write to the serial jtag or not? I.e. do you have
or
It appears that you can overrun the output buffer if you are not using the driver and this can cause dropped characters. |
My code calls |
I'm not sure that I can help then. My debugging approach was to run the Good luck tracking it down! |
I hadn't thought about strace. I ran minicom under strace (
Here are some of the corresponding lines from the strace output:
Notice the lack of a \n reading the line "throttle \r". This is a line generated within the console help routine. This is the missing header before the " throttle support" text. I ran this cycle several times, and here's the output from three log files:
Notice that 'event ' header lines are inconsistent. Sometimes it has \r\n, and sometimes it's just the \r. When the \n is missing, the text doesn't display properly (pretty sure the first line is being output, but then is overwritten because of the \r without the \n). But the key thing is that the data coming in from the ESP32-S3 is inconsistent, even though the device has not been refreshed (or even restarted) between these runs. |
The interesting thing about losing The inconsistency makes me think that it is related to the timing of the linux box polling the cdc device and precisely when characters get added to the output buffer. I imagine that the USB polling causes interrupts which happen at different times on each run. The other thing to try is that Wireshark supports tracing USB transactions via the |
When I disable the I've done a trace under Wireshark, but I don't know how to read USB CDC traffic yet, so I haven't any further information on that front. Capture file attached in case you already know what you're looking at. This is a capture of connecting to the device, running several 'help' commands (during which time some of the lines were lost), and then disconnecting. There are no other USB devices on this computer, so the capture contains only the ESP32-S3 traffic. |
This PR might fix the hang. I'd need to look deeper to be sure. I'd first try to remove the conditional and always flush.
or yes, ideally never force a flush and let the flush happen naturally when the USB Host polls. I dont understand what this flush would even do since the bus is Host controlled. |
I am also hitting the same issue (or similar) when using python's serial console for reading. Characters seem out of order v.s. uart. I need to look deeper tho. I think entire lines are being swallowed. Have we come to any conclusions? |
Id also try changing this: to |
And would also try increasing tx buffer size:
|
I have a smiliar problem, and upon examining the code, I noticed that the usbjtag_tx_char_via_driver function contains a comparison that checks whether an immediate send operation succeeds. However, the return value from usb_serial_jtag_write_bytes is always 1, which prevents the intended logic to try blocking for 50 mS, but also a return that always skyp the bloking try. Proposed Solution: Modify the usb_serial_jtag_write_bytes function to return -1 on failure and 0 on success. int usb_serial_jtag_write_bytes(const void* src, size_t size, TickType_t ticks_to_wait)
{
ESP_RETURN_ON_FALSE(size != 0, ESP_ERR_INVALID_ARG, USB_SERIAL_JTAG_TAG, "size should be larger than 0");
ESP_RETURN_ON_FALSE(src != NULL, ESP_ERR_INVALID_ARG, USB_SERIAL_JTAG_TAG, "Invalid buffer pointer.");
ESP_RETURN_ON_FALSE(p_usb_serial_jtag_obj != NULL, ESP_ERR_INVALID_ARG, USB_SERIAL_JTAG_TAG, "The driver hasn't been initialized");
const uint8_t *buff = (const uint8_t *)src;
// Blocking method, Sending data to ringbuffer, and handle the data in ISR.
BaseType_t result = xRingbufferSend(p_usb_serial_jtag_obj->tx_ring_buf, (void*) (buff), size, ticks_to_wait);
// Now trigger the ISR to read data from the ring buffer.
usb_serial_jtag_ll_ena_intr_mask(USB_SERIAL_JTAG_INTR_SERIAL_IN_EMPTY);
// Return -1 on failure, 0 on success.
return (result == pdFALSE) ? -1 : 0;
} Update the usbjtag_tx_char_via_driver function to accommodate the adjusted return values and logic. static void usbjtag_tx_char_via_driver(int fd, int c)
{
char ch = (char) c;
TickType_t ticks = (TX_FLUSH_TIMEOUT_US / 1000) / portTICK_PERIOD_MS;
// Attempt to send the character immediately without blocking.
if (usb_serial_jtag_write_bytes(&ch, 1, 0) != 0) {
s_ctx.tx_tried_blocking = false;
} else {
return;
}
// If immediate send fails, try blocking with a timeout.
if (s_ctx.tx_tried_blocking == false) {
if (usb_serial_jtag_write_bytes(&ch, 1, ticks) != 0) {
return;
} else {
s_ctx.tx_tried_blocking = true;
}
}
} |
@jjsch-dev thats is a great observation! Can you create a separate issue and open a Pr? |
@chipweinberger thanks for the comment, and I will do a new issue and open the PR. |
I find that this problem is not fixed, and in fact is worse now with the latest version of the IDF (git commit 62ee413). This is what I get when I run my help command (which should look like the content shown earlier in this bug report).
Instead of losing lines, I lose large chunks of the text in arbitrary places. |
Hi @davidzuhn, I think the master (as of 4th Sep) doesn't have this fix. Does your code use printf in more than one thread? |
The tree I'm using has the fix that was claimed to close this issue (commit c926153 of Aug 23). My code uses a log of ESP_LOG across many threads, and a little bit of printf (which is almost all from the thread that handles the command line processing). |
Sorry with the fix date time, I made a mistake with the version. Although ESP_LOG is thread safe, printf is not, perhaps you can create a function that uses a mutex to serialize printf and if possible disable ESP_LOG. |
@jjsch-dev perhaps we could add a warning? something like: "printf is not thread safe. you should only call printf from a single thread" which would be logged if you call printf from multiple threads? I have been using ESP-IDF many years and never knew printf was not thread safe! |
@chipweinberger, You have right because while printf() is reentrant, meaning it can safely be called from multiple threads without corrupting its internal data structures, it doesn't inherently guarantee that output will be serialized correctly. This can result in interleaved or mixed output when multiple threads use printf() simultaneously. I might be mistaken, but in these scenarios, I believe the issue of mixed output occurs when using a console to receive user commands. For instance, when utilizing linenoise(), which relies on write() and read() functions. If the formatted response is generated using printf() from a different thread than the one handling linenoise(), this problem can arise. It becomes even more challenging if you have ESP_LOG macros enabled. To resolve this issue, you would need to make changes to the IDF to establish a mutex that can be shared for write(), printf(), and ESP_LOG. The straightforward solution is to disable the ESP_LOG macros and ensure that printf is called from the same thread that runs linenoise. |
In my scenario, the printf output is not being interleaved between data from multiple threads, nor is there any interleave of output from ESP_LOG. The block of text with the missing/corrupted data is generated and printed by the help command that comes from esp_console_register_help_command(). I completely understand that there is the chance of interleave between portions of the help text and log messages from ESP_LOG. But that's not happening. What's happening is that the blob of text from help is missing characters when being read from the host. Earlier it was missing the carriage return, allowing the terminal to overwrite one line with the next giving the appearance of missing lines of text. With the latest ESP-IDF, much more than that is going missing. |
@davidzuhn thanks for the firmware explanation, is the host program in python? |
I get similar results when I use 'idf.py monitor' or when I use minicom. |
ok, can you share the source code? I have some time and maybe I can help you find the problem. |
I could repro it with simple code, when I tried.
|
@chipweinberger to run this test, you disabled the WDT or put vTaskDelay(1); inside the for()? |
I recall disabling WDT |
@chipweinberger I tested your code on an esp32-s3 with WDT disabled at 160Mhz with channel for console output = USB Serial/JTAG Controller and can't reproduce the issue. I use the ESP-IDF v5.2-dev-2319-gcbbc7ff9df-dirty I create the app with the idf.py create-project I use this main. #include <stdio.h>
#include <stdbool.h>
#include <freertos/FreeRTOS.h>
#include <freertos/task.h>
void app_main(void)
{
int v = 0;
while (true) {
for(int i = 0; i < 1024 * 10; i++) {
printf("%i,%i\n", v, i);
}
v++;
vTaskDelay(1);
}
} To test I used @igrr's procedure with the monitor Ctr+T Ctr+L and the python script. To check if python works I modified some values, you can see the pictures with the report. |
that's great! I typically use 240mhz, btw. I do recall thinking it might have been a bug in macOS. @davidzuhn maybe you could write a reproducing minimal example? |
…e data randomly, Closes espressif#12119, Closes espressif#11344, Closes espressif#9318 Closes espressif#11192
Environment
git describe --tags
to find it): v5.0-dev-4001-g495d35949dProblem Description
My application uses the console component, and provides a number of commands. When I run 'help' at the console when using the UART for the console (via the CP2102N), the output is completely repeatable and formatted as expected. When I use the USB Serial/JTAG controller for the console (as I expect to do when my "real" hardware arrives), the output is janky [that's the technical term :)]. Some newlines seem to be eaten, and some lines get omitted altogether.
Expected Behavior
When I use the UART console (connected via /dev/cu.SLAB_USBtoUART), this is what I get every time I run 'help'
Actual Behavior
When I run using the USB Serial/JTAG interface (via /dev/cu.usbmodem1101), and run the command again (and again), I see results like this:
Note these are not identical results for each time, and none of them match the "correct" text as seen on the UART.
Steps to reproduce
Code to reproduce this issue
Debug Logs
Other items if possible
build
folderbug-report.zip
The text was updated successfully, but these errors were encountered: