Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added TX Interrupt/DMA based serial writing #2840

Merged
merged 21 commits into from
Oct 31, 2023

Conversation

digant73
Copy link
Contributor

@digant73 digant73 commented Sep 22, 2023

IMPROVEMENTS:

  • Added TX Interrupt/DMA based serial writing based on excellent work and discussion shared by @rondln in [FR] Improved serial data processing (smarter reading, faster writing) #2835: TX serial tasks (e.g. transmission of gcodes to mainboard or ACK messages forwarding to ESP3D) is now decoupled from main thread and handled by dedicated HW (interrupt or DMA mode selectable at compile time). It allowed to drastically improve responsiveness of TFT maintaining a perfect load towards mainboard and even supplementary serial ports (e.g. ESP3D).
    Attached pictures show a comparison with current implementation. The current implementation struggles to reach and maintain the target TX slots (only 1 pending gcode towards the mainboard instead of the 31 available) having a very low scan rate (number of times the main loopProcess() function is invoked per second) while the proposed implementation maintains the target TX slots (a constant 31 pending gcodes towards the mainboard) with a huge difference on scan rate
  • The implemented TX module also supports a blocking transmission in case the message to be transmitted is longer than the TX cache size or there is not enough free space on the TX cache. Chunks of the message will be sent once space on the TX buffer becomes available.
    By the way, TX cache size is set to 256 bytes for mainboard and supplementary serial ports. It is bigger more than enough to buffer any gcodes (max 100 bytes) or even a long ACK message from M43 output (even 200 bytes) avoiding any need of blocking transmission
  • Improved RX serial reading: the TFT is now able to manage contiguous (not separated by IDLE on serial line) and very long ACK messages (e.g. M43 output) from mainboard without any loss as discussed in [FR] Improved serial data processing (smarter reading, faster writing) #2835. It allowed a big increase in reading responsiveness
  • Added priorities to back end and front end tasks to speed up more critical tasks (e.g. printing) as suggested by @rondln in [FR] General TFT Speed Improvements #2836
  • Extended baudrates range for serial ports. Values range for param serial_port in config.ini has been extended providing other standard baudrates such as 230400 and 921600 (e.g. supported by ESP3D)
  • Minor cleanup/bug fixes

NOTES:

  • TX mode (DMA or Interrupt mode) is selectable at compile time at the beginning of Serial.h. By default, TX DMA mode is configured to be used. Both DMA and Interrupt modes provide similar performance
  • TX and RX improvements are companion improvements of ADVANCED OK feature recently merged. Putting all together, they allow to improve print quality and to increase serial connection speed (for mainboard and supplementary devices) if not limited by EMI or by other limits of the connected devices. Basically the TFT is no more the main bottleneck in the chain

BUGFIXES:

resolves #2835
fixes #2858

PR STATE: ready for merge

Current solution:

other_3

Proposed solution:

other_3

@digant73 digant73 marked this pull request as draft September 22, 2023 21:47
@rondlh
Copy link

rondlh commented Sep 25, 2023

Well done, thanks!

I got the BTT TFT35V3.0.1(GD32F207 @ 120MHz), it's about 35% faster than the BTT TFTV3.0 (STM32F207 @ 120MHz).
The MKS TFT35V1.0 (STM32F407 @ 168MHz) is only about 20% faster than the BTT TFT35V3.0.
Tested with the scan rate benchmark.

@rondlh
Copy link

rondlh commented Sep 26, 2023

I noticed that you already started to deal with the priorities in loopProcess and it's friends.

I think that a lot of tasks in loopBackEnd are actually frontend tasks, like . Almost everything under the "dividing line" seems to be front end:

  1. loopCheckHeater, loopFan, loopSpeed are just for screen updates and requesting info updates.
  2. loopBuzzer can be removed with the event-triggering approach: buzzer.zip
  3. The encoder needs a bit more attention, be careful with that one. Perhaps it should even be above the priority dividing line.
  4. Mode switching, loopScreenShot, loopCheckBackPress, LCD_CheckDimming, LED_CheckEvent and rrfStatusQuery are also front end.

Usually front ends are given a constant guaranteed amount of attention by guaranteeing a number of updates per second.
Anything above twice the screen refresh rate feels response. I'm not sure what refresh rates the TFTs use, I assume 60Hz or lower. So 120 updates/s is responsive, but we can be generous and make it 3x the refresh rate (200/s, 5ms intervals).

@digant73
Copy link
Contributor Author

I noticed that you already started to deal with the priorities in loopProcess and it's friends.

I think that a lot of tasks in loopBackEnd are actually frontend tasks, like . Almost everything under the "dividing line" seems to be front end:

  1. loopCheckHeater, loopFan, loopSpeed are just for screen updates and requesting info updates.
  2. loopBuzzer can be removed with the event-triggering approach: buzzer.zip
  3. The encoder needs a bit more attention, be careful with that one. Perhaps it should even be above the priority dividing line.
  4. Mode switching, loopScreenShot, loopCheckBackPress, LCD_CheckDimming, LED_CheckEvent and rrfStatusQuery are also front end.

Usually front ends are given a constant guaranteed amount of attention by guaranteeing a number of updates per second. Anything above twice the screen refresh rate feels response. I'm not sure what refresh rates the TFTs use, I assume 60Hz or lower. So 120 updates/s is responsive, but we can be generous and make it 3x the refresh rate (200/s, 5ms intervals).

Some features are split in a back end task and front end task. I agree with you to provide different priorities to some back end task. Currently I would avoid to make (rework) too much other things. I would simply integrate the improvements on buzzer.

@rondlh
Copy link

rondlh commented Sep 28, 2023

I would recommend to make the changes of this PR as narrow as possible to reduce the risks. Possibly the writing mode should be user configurable so there is no risk of running into issues because there is a safe fallback. This would impact FW size a bit. I believe the smallest available flash size is 224KB, currently we are around 193KB, so we have some headroom for a useful feature like this.


Did you try the buzzer implementation? If so, don't forget to disable loopBuzzer in loopFrontEnd.
I checked if there are more processes that can use event triggering. There are quite a few, like notifications and such could be launched by the notification arriving. The problem is how to end the notification, for that loopProcess is still needed. This works well for the buzzer because it's interrupt based, so when the sound is done, you are still in control to make decisions. So it seems the easiest solution is just to give lower priority to these tasks.


My ESP3D 2.1.x is a bit modified (for the M43 response and general faster response times):

  1. Increased RX buffer from 512 to 8KB: Config.h #define SERIAL_RX_BUFFER_SIZE 512 * 16
  2. syncwebserver.cpp reduced response time from 2000ms to 200ms.
//pickup the list
while ((millis() - timeout < 200) && !done) { // IRON, was 2000, delays response

@digant73
Copy link
Contributor Author

digant73 commented Sep 28, 2023

I would recommend to make the changes of this PR as narrow as possible to reduce the risks. Possibly the writing mode should be user configurable so there is no risk of running into issues because there is a safe fallback. This would impact FW size a bit. I believe the smallest available flash size is 224KB, currently we are around 193KB, so we have some headroom for a useful feature like this.

Do you mean to make it configurable at runtime (configured in config.ini and/or in Feature menu? Currently I make it configurable only at compile time. I would avoid to make it configurable at runtime (it means to check the mode at every transmission request).

Did you try the buzzer implementation? If so, don't forget to disable loopBuzzer in loopFrontEnd. I checked if there are more processes that can use event triggering. There are quite a few, like notifications and such could be launched by the notification arriving. The problem is how to end the notification, for that loopProcess is still needed. This works well for the buzzer because it's interrupt based, so when the sound is done, you are still in control to make decisions. So it seems the easiest solution is just to give lower priority to these tasks.

Not yet tried your buzzer implementation. I would possibly provide it in the PR related to other improvements.

My ESP3D 2.1.x is a bit modified (for the M43 response and general faster response times):

  1. Increased RX buffer from 512 to 8KB: Config.h #define SERIAL_RX_BUFFER_SIZE 512 * 16
  2. syncwebserver.cpp reduced response time from 2000ms to 200ms.
//pickup the list
while ((millis() - timeout < 200) && !done) { // IRON, was 2000, delays response

My ESP3D is now perfectly working. It is able to receive the entire M43 output (more than 14K) with no lost at all. This night I will make some testing reducing the RX cache size to 2048 instead of 4096.

@rondlh
Copy link

rondlh commented Sep 28, 2023

Do you mean to make it configurable at runtime (configured in config.ini and/or in Feature menu? Currently I make it configurable only at compile time. I would avoid to make it configurable at runtime (it means to check the mode at every transmission request).

Agree, that not very efficient, but it's a safe way to get started. And don't forget, it's only 1 check per message, not much overhead.

Did you try the buzzer implementation? If so, don't forget to disable loopBuzzer in loopFrontEnd. I checked if there are more processes that can use event triggering. There are quite a few, like notifications and such could be launched by the notification arriving. The problem is how to end the notification, for that loopProcess is still needed. This works well for the buzzer because it's interrupt based, so when the sound is done, you are still in control to make decisions. So it seems the easiest solution is just to give lower priority to these tasks.

Not yet tried your buzzer implementation. I would possibly provide it in the PR related to other improvements.

Please let me do it... This could be my first PR... our friend made a competing PR already based on my code, he's just a troll.

My ESP3D 2.1.x is a bit modified (for the M43 response and general faster response times):

  1. Increased RX buffer from 512 to 8KB: Config.h #define SERIAL_RX_BUFFER_SIZE 512 * 16
  2. syncwebserver.cpp reduced response time from 2000ms to 200ms.
//pickup the list
while ((millis() - timeout < 200) && !done) { // IRON, was 2000, delays response

My ESP3D is now perfectly working. It is able to receive the entire M43 output (more than 14K) with no lost at all. This night I will make some testing reducing the RX cache size to 2048 instead of 4096.

Great, that's good to see. Working on ESP3D integration could be something nice to work on in the future. I get the time from the ESP3D, and give a estimated ETA when you start a print. And of course TFT 921600 baud support would be nice to have.

@digant73
Copy link
Contributor Author

Do you mean to make it configurable at runtime (configured in config.ini and/or in Feature menu? Currently I make it configurable only at compile time. I would avoid to make it configurable at runtime (it means to check the mode at every transmission request).

Agree, that not very efficient, but it's a safe way to get started. And don't forget, it's only 1 check per message, not much overhead.

Did you try the buzzer implementation? If so, don't forget to disable loopBuzzer in loopFrontEnd. I checked if there are more processes that can use event triggering. There are quite a few, like notifications and such could be launched by the notification arriving. The problem is how to end the notification, for that loopProcess is still needed. This works well for the buzzer because it's interrupt based, so when the sound is done, you are still in control to make decisions. So it seems the easiest solution is just to give lower priority to these tasks.

Not yet tried your buzzer implementation. I would possibly provide it in the PR related to other improvements.

Please let me do it... This could be my first PR... our friend made a competing PR already based on my code, he's just a troll.

sure

My ESP3D 2.1.x is a bit modified (for the M43 response and general faster response times):

  1. Increased RX buffer from 512 to 8KB: Config.h #define SERIAL_RX_BUFFER_SIZE 512 * 16
  2. syncwebserver.cpp reduced response time from 2000ms to 200ms.
//pickup the list
while ((millis() - timeout < 200) && !done) { // IRON, was 2000, delays response

My ESP3D is now perfectly working. It is able to receive the entire M43 output (more than 14K) with no lost at all. This night I will make some testing reducing the RX cache size to 2048 instead of 4096.

Great, that's good to see. Working on ESP3D integration could be something nice to work on in the future. I get the time from the ESP3D, and give a estimated ETA when you start a print. And of course TFT 921600 baud support would be nice to have.

Yes, I will probably add also the support to 921600 baud (in the past I also asked the author of ESP3D to provide 1000000 baud)

@rondlh
Copy link

rondlh commented Oct 2, 2023

  1. In loopBackEnd, it's probably safer to move the LCD_ENCODER_SUPPORT section above the priority divider.
    You can try to use the encoder under high stress to confirm.

  2. These are all frontEnd processes in my view:

  // Temperature monitor
  loopCheckHeater();

  // Fan speed monitor
  loopFan();

  // Speed & flow monitor
  loopSpeed();

  // Buzzer handling
  #ifdef BUZZER_PIN
    loopBuzzer();
  #endif
  1. I don't think there is a need for the function: Serial_GetReadingIndexRX, it's only used in 1 location in the code, but defined in 3 locations (for different hardware platforms).

  2. In Serial_Get

rIndex = 0;
while (rIndex <= flag)
{
  *(buf++) = cache[rIndex++];
}

I believe this is the same as:

while (rIndex <= flag)
{
  *(buf++) = *(cache++);
}
  1. You mention the line below in Serial_NewDataAvailable and Serial_Get

// NOTE: used 32 bit variables for performance reasons (in particular for data copy)

But in Serial_NewDataAvailable there is no data copy, and in Serial_Get the data copy is done in 8-bit cycles.

In general I found that using and passing 32-bit variables is faster, that is probably because in the background 32-bit variables are always used, and bools and 8/16-bit variables are emulated, which takes extra steps. The official recommendation is to not blindly use the smallest datatype possible, but use 32-bit variables.

  1. If the default is DMA mode, then perhaps the comment should be worded differently... "Comment out this line to..."
    // uncomment this line to use TX DMA based serial writing. Otherwise, TX interrupt based serial writing will be used
    #define TX_DMA_WRITE

@digant73
Copy link
Contributor Author

digant73 commented Oct 2, 2023

  1. In loopBackEnd, it's probably safer to move the LCD_ENCODER_SUPPORT section above the priority divider.
    You can try to use the encoder under high stress to confirm.

I tested also under stress and it works

  1. These are all frontEnd processes in my view:
  // Temperature monitor
  loopCheckHeater();

  // Fan speed monitor
  loopFan();

  // Speed & flow monitor
  loopSpeed();

  // Buzzer handling
  #ifdef BUZZER_PIN
    loopBuzzer();
  #endif
  1. I don't think there is a need for the function: Serial_GetReadingIndexRX, it's only used in 1 location in the code, but defined in 3 locations (for different hardware platforms).

No, but it is a function related to Serial.c/h as Serial_GetWritingIndexRX so I preferred to move in Serial.h.

  1. In Serial_Get
rIndex = 0;
while (rIndex <= flag)
{
  *(buf++) = cache[rIndex++];
}

I believe this is the same as:

while (rIndex <= flag)
{
  *(buf++) = *(cache++);
}

It doesn't work. I causes an infinite loop (rIndex is never updated)

  1. You mention the line below in Serial_NewDataAvailable and Serial_Get

// NOTE: used 32 bit variables for performance reasons (in particular for data copy)

But in Serial_NewDataAvailable there is no data copy, and in Serial_Get the data copy is done in 8-bit cycles.

yes cut&paste of comment. I will change it

In general I found that using and passing 32-bit variables is faster, that is probably because in the background 32-bit variables are always used, and bools and 8/16-bit variables are emulated, which takes extra steps. The official recommendation is to not blindly use the smallest datatype possible, but use 32-bit variables.

I made some testing using 32bit in functions but there is no difference.

  1. If the default is DMA mode, then perhaps the comment should be worded differently... "Comment out this line to..."
    // uncomment this line to use TX DMA based serial writing. Otherwise, TX interrupt based serial writing will be used
    #define TX_DMA_WRITE

Yes it is better. I will change it

@rondlh
Copy link

rondlh commented Oct 2, 2023

All clear, one more thing...

The new baudrates do not seem to be in Configuration.h yet

@digant73
Copy link
Contributor Author

digant73 commented Oct 2, 2023

All clear, one more thing...

The new baudrates do not seem to be in Configuration.h yet

yes, right thanks. updated also Configuration.h. Please, don't forget to test GD version

@rondlh
Copy link

rondlh commented Oct 2, 2023

yes, right thanks. updated also Configuration.h. Please, don't forget to test GD version

Sure, I will test GD and F4, both write modes.

@digant73
Copy link
Contributor Author

digant73 commented Oct 2, 2023

yes, right thanks. updated also Configuration.h. Please, don't forget to test GD version

Sure, I will test GD and F4, both write modes.

yes thanks. I tested F2 and F1 (both modes). Working fine

@rondlh
Copy link

rondlh commented Oct 3, 2023

Confirmed: GD32F2 and STM32F4 working fine, both modes tested, well done!
Hardware: BTT TFT35V3.0.1 / MKS TFT35 V1.0

@digant73
Copy link
Contributor Author

digant73 commented Oct 3, 2023

Confirmed: GD32F2 and STM32F4 working fine, both modes tested, well done! Hardware: BTT TFT35V3.0.1 / MKS TFT35 V1.0

thanks for the testing. PR ready for merge

@digant73 digant73 marked this pull request as ready for review October 3, 2023 12:03
@ETE-Design
Copy link

My ESP3D is now perfectly working. It is able to receive the entire M43 output (more than 14K) with no lost at all. This night I will make some testing reducing the RX cache size to 2048 instead of 4096.

Great, that's good to see. Working on ESP3D integration could be something nice to work on in the future. I get the time from the ESP3D, and give a estimated ETA when you start a print. And of course TFT 921600 baud support would be nice to have.

@rondlh - I remember that MKS offered that feature for both their TFT and LCD so it was possible to to add Wireless Network direct from TFT / LCD, but only worked on their Hardware, cause some of their board offered ESP8266 as an addon for their Board and and also for their TFT
Would be a really great feature to add to this TFT also :-)

@bigtreetech bigtreetech merged commit a19a095 into bigtreetech:master Oct 31, 2023
1 check passed
@F14V
Copy link

F14V commented Nov 13, 2023

Amazing improvement! I love things like this.

One note, it seems the config file in the "Copy to SD" directory was not updated. Is it supposed to be updated differently?

@digant73
Copy link
Contributor Author

@F14V Only BTT can update that folder. In the meanwhile you (everyone) have to compile the fw

@F14V
Copy link

F14V commented Nov 14, 2023

@F14V Only BTT can update that folder. In the meanwhile you (everyone) have to compile the fw

Oh, good to know, sorry if it was stated somewhere and I missed it.

Indeed, I had to compile since the automatic build didn't run yet. But it's working great! (didn't see any pauses yet, they were occurring in the past)

@digant73
Copy link
Contributor Author

digant73 commented Nov 14, 2023

Yes, the automatic build populates that folder (config files, binary files). That task is executed by BTT only

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
5 participants