-
-
Notifications
You must be signed in to change notification settings - Fork 107
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ESP32 uart handling #23
Comments
Good question. I had a prototype on ESP32 with the UART code written in ESP-IDF and core C last year but stopped to focus solely on ESP8266 as this was more cost effective for the gateway boards. Honestly i'd prefer the ESP32 (or ESP32-S) as it has so much more flash memory and bandwidth. My version 2.0 of EMS-ESP has a core that is fully compatible with the ESP32 apart from the UART library which needs to be re-coded and tested. When I release it for ESP8266 I'll see if I can get the ESP32 up and running too. If you're handy with programming let me know as I could do with some help in testing. |
With current master I already have it compiling for ESP32 (and Win32). As I told this is the platform with that I have to go (other tasks are also running and ESP8266 isn't an option). The only part missing is UART. Unfortunately my EPS32 (ESP32-Gateway from Olimex) doesn't allow for JTAG debugging. So I am not sure if I really want to develop UART stuff with printf style debugging... |
I will take your offer and prepare something on my side and would be happy if you could test it (assuming you have a device with JTAG debug possiblity)? |
It'll be great if you can get the ESP32 uart driver working. The tricky part is the timing and detecting the BRK signals. Also it may be easier to work on my EMS-ESP version 2.0 branch which builds both ESP8266 and ESP32 and the UART code is isolated. It also builds and runs standalone without an ESP microcontroller which I use for most of the coding and testing, saving me a ton of time. It's still in 'alpha' stage at the moment and doesn't have all the devices like 1.9.x does. The web UI is still in development too which I'm adding very soon. Up to you, what ever is easiest. I do have a ESP32-prog board that has the JTAG interface so can debug in Visual Studio Code on my Win10 box if needed. |
There is a nice example for ESP32 using events. If this works as descriped, it should be very easy |
Thanks @MichaelDvP for the ESP32 code. I remember looking at that a while ago and my first version is based loosely off the same design and in version 2.0 I'll take a look at your tx_mode 4. You're right in that the tx_mode 1 logic is quite complicated (with checking for timeouts and breaks) while the code for EMS+ and HT3 is very basic but works equally well. And yes, the master doesn't not echo, its just the x reading back the same data we sent along the Tx. |
Yes all three modes working well, rx and tx was never lost. |
I did some analysis of emsuart.cpp to understand how it works.
By doing that I detected potential problem and I am wondering if my conclusion is true. In principal everything seems to happen inside context of receive task (even sending) --> Singlethreading --> Good. But functions like ems_setWarmWaterOnetime are called from another task context --> Multithreading with interface CircularBuffer which isn't multi-threading safe... |
Sending is ony allowed in reply to a poll or request (within 20ms).
Called from |
Thanks for clarification regarding sending... |
Afair CircularBuffer isn't used in v2. But that can only answer @proddy correctly, for now i'm not so familiar with v2. |
emsuart has it's own simple buffer queue storing the complete telegrams. When one if filled it is sent to the core emsesp. CircularBuffer is not used in Tx. But now in v2 there are two queues (std::deque), one for Rx and one for Tx, both asynchronous using std::atomic to prevent data race conditions. A telegram is sent to Tx after a poll is received on the Rx line, the Rx disabled and the whole data sent as one block. |
also @nomis pointed me to his safe buffer implementation (https://github.com/nomis/EvohomeWirelessFW/blob/master/lib/InterruptSafeBuffer/InterruptSafeBuffer.h) which I'd like to try at some point too. |
Something that was frustrating about the Rx process is that it has to wait until the break finishes before it receives the message (which could delay Tx). Ideally it should use a timer to identify when there is nothing else being received and process the message sooner. It may be possible to use the UART timer to do this but it's character sized not bit sized. I never got Tx working on my boiler so I don't know if any of my changes to Rx would break Tx. |
emsbus.zip |
@ArwedL I've added the code to the v2 repo. Builds fine but causes a reset when the uart port is opened so need to debug further. It's probably easier if you work off my latest code base. |
The latest commit which I can see on v2 branch (https://github.com/proddy/EMS-ESP/tree/v2) is from 16.1.2020 - is this the latest codebase? |
I thought that branch was empty. I deleted it. No my EMS-ESP2 is in a private repo and not quite ready for the real world. I'll grant you access so can familiarize yourself with how the modules work. |
I made a snapshot of v2 in https://github.com/proddy/EMS-ESP/tree/v2 Note it's not backward compatible for v1. So first wipe the flash on the ESP then upload the new firmware, connect to USB/Serial with 115200 speed/baud, type system to go into the system menu and use the set commands to change the wifi settings. Use help if you get lost and remember to read the README file. |
Hi @ArwedL did you get any further debugging the ESP32 UART code? |
To be honest I hoped for your side to make debugging progress... Any info so far which you can share? I had to shift focus to other open points in my ESP32 project. Will see if I can spend some time. |
no worries, I'll have a go and fixing it up after I finished merging in the web code |
I spent some time and found the issue. For me it works now (only tested receiving) - see attached update emsuart.cpp |
nice! trying it out now... |
I could get a wemos-d1-mini32 for testing. It fits in the BBQKees board (but not in the housing). The communication-problem is a bug in the driver. The default interrupt gives on break-intr. only the queue-flag, but does not read the fifo to the buffer (see here). The buffer is only filled in line 808. On break-intr. the telegram is mainly in the fifo and stays there until fifo full intr. Solution is to set fifo-full to 1 with a lot of irq-calls, or use our own irq-routine. |
nice Michael! I had a quick try but couldn't see any Rx come in. I could be a fault on my side (broken wires). I'll have another go later. |
I have changed the rx/tx pins for my module, you have to change back. |
it was a fault cable. I'm getting Rx in but Tx causes a crash. So I think you're on the right track! nice work btw. |
Also thanks from my side for improving the code. Obviously I don't find the time for making the solution bullet-proof... |
For me all modes working fine, test with change mode, wait a minute, show emsbus, change mode, etc., starting with mode 4: With the mixer i meant something like that: Is it possible to set some device-flags to the boilers. The 0xE3 .. 0xE9 are not supportet by my boiler. I think they are used by newer boilers or only heatpumps/condensors/compressors? Also for the thermostat it should be fine if only the active hcs would be requested. |
On both ESP8266 and ESP32 when the Tx is sent, there is no acknowledgment, just more Rx. If it works on your setup it must work on mine too so let me experiment a little more using your latest code. Worst case I'll bring out the scope and see what is being transmitted over the EMS line.
Here I wanted a way to detect if the Tx was not working and thought if the Tx queue is full (max 20) this must be a good sign. With a poll happening every 1-2 seconds and doing the queue check every minute it should be pretty fail-safe. Note the poll-acks are not stored as Tx messages so these are only the real read/write commands. But you're right, there will be times when there are a lot of messages in the queue (like after a 'refresh' command) so I'll need to find a better way. Any ideas?
correct. I had fixed that in an earlier build and modified the whole logic.
thanks, corrected it!
This is a very good point and I had noticed it too. Same with the boiler (your 0xE3-E9 example). Maybe by using device flags like we did with the thermostat is the right approach. And this will also save on flooding the Tx queue with bogus messages that won't be answered anyway.
good idea. I'll work on this too.
Ok. I will test again. I just noticed after each send I would get a CRC error. it's really nice all the help you're providing, much appreciated Michael. |
Yes, i know what you mean. I've added some more log-messages to monitor tx and polls and some answers and acks are missing. missing_answer_and_ack.log (some messages are logged from telegram and emsesp, they are received/send single but appear double in the log).
I'l send you a working uart as soon as i understand what's going wrong. edit: BTW: the test with different modes was false, i doesn't realize that i have to reboot before tx_mode is active. |
ok. I'm working on improving the wifi and mqtt calls (I still get some dropouts). I've add the changes to the mixer and also adjusted the thermostat to only fetch the active heating circuits. And I made the check for 'Tx line' more robust. Just need to add device_types to the Boilers and I think I've covered all your comments. It'll be version a10. |
@MichaelDvP when you query a heating circuit on your thermostat that's not active, what does it come back with? Eg. |
Here is the uart, working on 8266 in mode 4, on esp32 i'm getting a few CRC error at start and very rarely a "TX read failed", but the device answers, so it's in the echo readback. @proddy i can't read 2a8 since i have no ems+. But these are the RC35 circuits, only hc2 is active: I've also found some smaler bugs and typos, should i describe them in a issue or make a pr? (I've put a fork on github with the fixes (and few changes for me like sensor-mqtt)). |
For the esp32 add in the start-routine the line |
@MichaelDvP thanks for the updates. I've merged them into a11 in the v2 branch. I've also made you a contributor so feel free to hack away directly in the project branch or push PRs. A few comments on the changes
|
@proddy Thank you for the invitation.
Strange, i've tested with the esp32 and a noname wemos-8266-clone (to have the original wemos with 1.9.6 as backup reference), both without tx issues now. Btw: I found, that esp32 resets the tx-break-bit by hardware after sending, no need to do it in the code, but in 8266 we have to clear it in the code.
the intention was mainly the sensor formatting, I have 7 sensors and sometimes change them (add one on a new place, or disconnect on that is useless), Then the numbering changes and my iobroker-scripts catch the wrong sensor. With json {id:temp,...} the sensors are fixed.
No mqtt-errors, the intention was a 1.9.5 compatible json, were every mixing device can add a own nest and keep the other nests. But now i think it's better to have a json for every device. But we have to uses the device-id (also represents the setting on the switch on MM100) for numbering, otherwise |
using the latest v2 I still don't get Tx working. After each send I get a timeout error. I think I need to hook up the scope.
that's a valid use case for people with more than 1 sensor so happy with any changes you commit that make it better.
yes, the ArdunioJson library uses only the copy constructor when they are char * or consts so there may be conflicts when building the json object. I'll improve this code.
I've been battling to find why the telnet is always not very responsive and I think its due to the Dallas one-wire library interfering with the wifi. Have you ever experienced this? I may re-write that piece of code too. |
Oh, i thought the issues are with tx_mode 4. Now I tested mode 1 a bit longer and can reproduce this errors, but only one error in a few minutes. There are timeouts and break-interruptions and collisions with the next telegram, resultion in bad CRC. Something seems to interrupt and delay the tx routine very long, is there somthing time consuming (like dallas) in another thread? |
I'll do some checking this afternoon. Also re-writing the sensor code as it seems to also block the wifi/lwip. There are a couple of delay() calls in the onewire library |
Yes, the onewire is burning a lot of time. Another thing: the |
@MichaelDvP i added the RX_LOOP_WAIT because I thought it was slowing down the telnet, but in the end the culprit was the 1-wire library. So happy to remove it. Except, why should we process an Rx telegram as fast as possible? They're queued up and can be processed every 300ms without effecting any Tx |
Ups, you're right, i confused |
that is odd. Tx should happen immediately after a poll. What you could try is to comment out the |
i thought about that, but we have only a few messages per second and if there is no message the function also returns. I see no benefit. To give telnet/wifi more time i think it's better to add a delay in the main loop. Since the tx-reaction is complete in the |
@MichaelDvP I found and fixed the issue that was causing Tx to fail with the old logic (tx_mode 1). The value of the timeout was too short. Should be 1760. I suspect a typo in the macro when it was copied over from 1.9.5. I still can't get the newer Tx code to work (tx_mode 4). Giving me the same errors. Perhaps also a timing error? |
I don't believe it is the timeout value. I've changed it as i was working on uart, the 10 seems to me as a typo. The error messages indicates that there is no response from the destination, right? |
I logged with syslog to see the tx-errors and there is a another strange thing. I get reboots every 2 hours (mark is set to 2h, can it be that?) and always after 4 complete tx-errors and the first error of 5th. But the time between retries is very long, seems the counter is'nt cleared in between. Another thing: [system] and [network] logs with local time (mest), [emseesp] (and also [boiler], [thermostat]) logs with utc. |
You're right, 1760ms is a long time within the loop. I'd rather just forget the "tx_mode 1-3" and work on your new and improved Tx logic and figure out why it doesn't work on my setup. I'll also ask BBQKees is he's willing to try out a few things on his boiler. Is there anything specific with your environment? There is a difference with timings between EMS+ and EMS1.0 and I'm on EMS1.0. |
I'll create a separate issue for this and track it there. |
I've logged the time from rx-intr to send and found the it's always this check: |
I see a responsetime of 0 and sometimes 1 with "tx_mode 1". I just can't get tx_mode 4 working, even by adding some delay after each bit write. I'm so busy with the web interface I don't really have time to get the scope out to see how the timings are off on my EMS 1.0 system. |
Closing this. Covered in emsesp/EMS-ESP#398 |
Question
Do you have an implementation for ESP32 uart handling? Currently I am only interested in receiving. Unfortunately I have to ESP32 because it also servers other purposes.
Additional context
I have seen another question in Q&A (May 2019) where you stated that you aren't happy with your current ESP32 implementation - I hope now for some new status (especially if receiving only is relevant)...
The text was updated successfully, but these errors were encountered: