Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SpiManager breaks SPI communication on SPI3 of ESP32-S2 #2343

Open
4 tasks done
LennartF22 opened this issue Oct 9, 2024 · 7 comments
Open
4 tasks done

SpiManager breaks SPI communication on SPI3 of ESP32-S2 #2343

LennartF22 opened this issue Oct 9, 2024 · 7 comments
Labels
bug Something isn't working

Comments

@LennartF22
Copy link
Contributor

LennartF22 commented Oct 9, 2024

What happened?

On ESP32-S2, the CMT2300A and W5500 are not working when they run on SPI3.

To Reproduce Bug

Configure a CMT and/or W5500 on an ESP32-S2 running OpenDTU version 24.9.27 or later, but do not configure a nRF (this will cause SPI3 to be used as a shared SPI bus).

Install Method

Self-Compiled

What git-hash/version of OpenDTU?

v24.10.6

What firmware variant (PIO Environment) are you using?

lolin_s2_mini

Anything else?

This only affects the ESP32-S2. What sets it apart from the ESP32, ESP32-S3 and ESP32-C3 is that the DMA channel for SPI3 is shared with the ADC/DAC, while on the ESP32, there are dedicated DMA channels for SPI, and on the ESP32-S3 and ESP32-C3, there is are a handful of general-purpose DMA channels, which can be assign to arbitrary peripherals.

Currently, DMA is automatically used when allocating shared buses in the SpiManager, because DMA is required for the usage of the W5500. Simultaneously, ADC/DAC DMA is probably used somewhere (maybe in the Arduino core?), causing the issue.

As a first hotfix, we should use SPI_DMA_DISABLED here for SPI3 on the ESP32-S2 platform:

ESP_ERROR_CHECK(spi_bus_initialize(host_device, &bus_config, SPI_DMA_CH_AUTO));

This means, that the W5500 would still not work on the ESP32-S2 if it ends up on SPI3 (this depends on initialization order and, for example, whether a nRF, which currently still claims a dedicated bus, is configured), but at least the CMT will work again.

Subsequently, we should add more logic so that for dedicated and shared SPI buses managed via the SpiManager

  1. DMA is only used when it is required for one of the devices on the shared bus (DMA is less performant for small transaction anyways),
  2. SPI3 is prefered over SPI2 for dedicated/shared SPI buses that do not need DMA on the ESP32-S2 and
  3. only SPI2 is used for dedicated/shared SPI buses that need DMA on the ESP32-S2.

Please confirm the following

  • I believe this issue is a bug that affects all users of OpenDTU, not something specific to my installation.
  • I have already searched for relevant existing issues and discussions before opening this report.
  • I have updated the title field above with a concise description.
  • I have double checked that my inverter does not contain a W in the model name (like HMS-xxxW) as they are not supported.
@LennartF22 LennartF22 added the bug Something isn't working label Oct 9, 2024
LennartF22 added a commit to LennartF22/OpenDTU that referenced this issue Oct 10, 2024
@LennartF22
Copy link
Contributor Author

@tbnobody I already created a PR with a hotfix. I will also work on a better long-term solution, as mentioned above.

@tbnobody
Copy link
Owner

The new crash might be somehow related to this

@stefan123t
Copy link

stefan123t commented Oct 17, 2024

@tbnobody thanks for the reference, this is a long and rewarding read. My TLDR; resume is that the temp sensor may run into an endless loop by some neglicence to turn it back on in case it was on in the wifi code. There is a fix in the wifi code since v5.1.2+.

I think we have introduced mcu temp reading some time ago and therefor may trigger the loop / watch dog timeout inadvertently.

If we are on pre v5.1.2 there is a temp workaround which never got merged in the esp-idf.
Are we using that v5.1.2+ or a newer version now ?

@tbnobody
Copy link
Owner

Are we using that v5.1.2+ or a newer version now ?

Arduino core with platformio uses 4.x as default... when using version 5.x it requieres ~10% additinal flash (currently the image requires 86% flash). It will not make sense to upgrade 4MB devices to core 5. Therefor we have to find the reason for the problem and why it occours right now.

@stefan123t
Copy link

Can’t we take the same approach as NoNullPtr in the esp-idf thread, ie espressif/esp-idf#8088 (comment)
either for older ESP32’s with 4MB or until we switch to ESP-IDF 5.x ?

As NoNullptr, softhack007 and MartinPatarinski describe it the register needs to be set back to prevent staying in the loop and ultimately triggering the watch dog.

We do not call this so often to determine the ESP temperature, do we ?

@LennartF22
Copy link
Contributor Author

@stefan123t No, not easily at least, because it requires a change to ESP-IDF, and an already built version of that is used with the Arduino core. Presumably we could patch the binary, but that's probably too much of a hack... Alternatively, one might pause the WiFi task while reading the temperature (which takes 100-200us), but even if that works, it's likely not a good idea either.

The rate at which it is called is the MQTT publish interval, and it was added in v24.9.22. Calling it at a lower rate would not fix the issue either, unfortunately.

I can't speak for @tbnobody of course, but I think the only thing we could do right now is to disable the temperature sensor completely (in device info and over MQTT) for the ESP32-S2. It does not pay off to put a considerable amount of work into fixing a feature that presumably is not needed by many users and is more like a "nice to have", on a platform (ESP-S2) that is not used by many users either.

@stefan123t
Copy link

@LennartF22 thanks for sharing your thoughts.

I read that they were waiting for both ESP-IDF changes and the changes to the WiFi binary which is included as a part of the ESP-IDF core hardware in binary only. Apparently they had to wait for Espressif to update the latter binaries before the change to ESP-IDF would be possible.

Yes, disabling the WiFi for some time is usually a bad idea because it may switch context into the WiFi core which would probably trigger the watch dog.

But disabling the ESP temperature altogether would be another fix besides reducing the interval with which we call the temperature sensor andor fencing the temperature sensor reading with some register (p)reset code.

I understood however that this effects also ESP32S3 maybe to a lesser extent / frequency ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants