Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ESP32 uart handling #23

Closed
ArwedL opened this issue May 2, 2020 · 61 comments
Closed

ESP32 uart handling #23

ArwedL opened this issue May 2, 2020 · 61 comments
Labels
enhancement New feature or request

Comments

@ArwedL
Copy link

ArwedL commented May 2, 2020

Question
Do you have an implementation for ESP32 uart handling? Currently I am only interested in receiving. Unfortunately I have to ESP32 because it also servers other purposes.

Additional context
I have seen another question in Q&A (May 2019) where you stated that you aren't happy with your current ESP32 implementation - I hope now for some new status (especially if receiving only is relevant)...

@proddy
Copy link
Contributor

proddy commented May 2, 2020

Good question. I had a prototype on ESP32 with the UART code written in ESP-IDF and core C last year but stopped to focus solely on ESP8266 as this was more cost effective for the gateway boards. Honestly i'd prefer the ESP32 (or ESP32-S) as it has so much more flash memory and bandwidth. My version 2.0 of EMS-ESP has a core that is fully compatible with the ESP32 apart from the UART library which needs to be re-coded and tested. When I release it for ESP8266 I'll see if I can get the ESP32 up and running too. If you're handy with programming let me know as I could do with some help in testing.

@ArwedL
Copy link
Author

ArwedL commented May 4, 2020

With current master I already have it compiling for ESP32 (and Win32). As I told this is the platform with that I have to go (other tasks are also running and ESP8266 isn't an option). The only part missing is UART. Unfortunately my EPS32 (ESP32-Gateway from Olimex) doesn't allow for JTAG debugging. So I am not sure if I really want to develop UART stuff with printf style debugging...
Because of JTAG I decided to do all debugging on WIN32 platform. This approach works quite well (e.g. using FTDI 3V3 TTL UART - USB converter). But of course it fails for low-level components like UART handling (which of course is ESP32 specific and can't be emulated on Win32)...

@ArwedL
Copy link
Author

ArwedL commented May 4, 2020

I will take your offer and prepare something on my side and would be happy if you could test it (assuming you have a device with JTAG debug possiblity)?
In general on ESP32 platform at least sending should be quite comfortable as Espressif SDK offers uart_write_bytes_with_break() function...

@proddy
Copy link
Contributor

proddy commented May 4, 2020

It'll be great if you can get the ESP32 uart driver working. The tricky part is the timing and detecting the BRK signals.

Also it may be easier to work on my EMS-ESP version 2.0 branch which builds both ESP8266 and ESP32 and the UART code is isolated. It also builds and runs standalone without an ESP microcontroller which I use for most of the coding and testing, saving me a ton of time. It's still in 'alpha' stage at the moment and doesn't have all the devices like 1.9.x does. The web UI is still in development too which I'm adding very soon. Up to you, what ever is easiest.

I do have a ESP32-prog board that has the JTAG interface so can debug in Visual Studio Code on my Win10 box if needed.

@MichaelDvP
Copy link
Contributor

There is a nice example for ESP32 using events. If this works as descriped, it should be very easy
sending with uart_write_bytes_with_break() function and receiving by event UART_BREAK and read the complete string to the pEMSRxBuf.
Sadly this does not work on ESP8266 as the event and the tx-function is missing. But as i had some problems sending much (i simulate a RC20 with id0x19 and answer every poll with ack) I've tested a new tx_mode without waiting for the whole telegram sent out.
@proddy take a look at tx_mode 4 in this emsuart. Sending is mainly done by the rx_intr. This works very good and shows, that timing isn't that critical as supposed.
I think in most dokumentation there is a misunderstanding about the bus, because people think the telegram is echoed by the master, but we have a differential 2-wire-bus like RS485, all we send is also on the rx line, so we always receive our own sendings simultaniously, the master does not echo. I was confused by oszillograms showing a delay between tx and echo, especially if rx and tx overlap, which is impossible on a halfduplex-bus.

@proddy
Copy link
Contributor

proddy commented May 5, 2020

Thanks @MichaelDvP for the ESP32 code. I remember looking at that a while ago and my first version is based loosely off the same design and in version 2.0

I'll take a look at your tx_mode 4. You're right in that the tx_mode 1 logic is quite complicated (with checking for timeouts and breaks) while the code for EMS+ and HT3 is very basic but works equally well. And yes, the master doesn't not echo, its just the x reading back the same data we sent along the Tx.

@MichaelDvP
Copy link
Contributor

Yes all three modes working well, rx and tx was never lost.
My problem was only when sending much more:
[APP] Uptime: 0 days 0 hours 1 minute 58 seconds [SYSTEM] Last reset reason: Software Watchdog [SYSTEM] Last reset info: Fatal exception:4 flag:3 (SOFT_WDT) epc1:0x4000dd38 epc2:0x00000000 epc3:0x00000000 excvaddr:0x00000000 depc:0x00000000
and these resets were related to how much i'm sending, so it has something to do with tx. I found nothing in the code which can cause the exception (only EMS_TX_TO_COUNT was initialized factor 10 to high, but it does not matter) and it happens in all 3 modes.
All the modes waiting until tx is finished and that seems sometimes to long. I think there are also other things taken long and sometimes it happens that all comes together and the watchdog resets. Mode 4 sends the first byte out and returns to calling function, that saves time and i don't have the watchdog-resets again (running 2 days now with ack every poll on 0x0B and 0x19).

@ArwedL
Copy link
Author

ArwedL commented May 6, 2020

I did some analysis of emsuart.cpp to understand how it works.
`How does it work currently:

  • emsuart_init installs emsuart_recvTask with
    • system_os_task(emsuart_recvTask, EMSUART_recvTaskPrio, recvTaskQueue, EMSUART_recvTaskQueueLen)
  • Receiption in interrupt handler = emsuart_rx_intr_handler
    1. After detection of break pEMSRxBuf->buffer is filled
    2. system_os_post(EMSUART_recvTaskPrio, 0, 0); // call emsuart_recvTask()
      --> this calls finally parseTelegram in context of receive task (noisy data with length between 3 and 4 bytes (including break byte) is ignored)
  • Sending when?
    • sending by emsuart_tx_buffer function (called by ems.cpp and ems_utils.cpp at several places)
    • can send directly
  • Other functions
    • emsuart_stop & emsuart_start (public but not called)
    • emsuart_tx_brk (internal only - not in header - no external call --> good)
  • All transmit is triggered because telegrams are received
    • send functions like ems_setWarmWaterOnetime only push to EMS_TxQueue.push(EMS_TxTelegram)
      --> Single threaded but functions like ems_setWarmWaterOnetime are called from another thread
      --> Potential problem that CircularBuffer isn't multi-thread protected?`

By doing that I detected potential problem and I am wondering if my conclusion is true. In principal everything seems to happen inside context of receive task (even sending) --> Singlethreading --> Good. But functions like ems_setWarmWaterOnetime are called from another task context --> Multithreading with interface CircularBuffer which isn't multi-threading safe...
Is this a real problem or is my analysis wrong?
Just wondering...

@MichaelDvP
Copy link
Contributor

Sending when?

Sending is ony allowed in reply to a poll or request (within 20ms). emsuart_tx_buffer is (at least) only called from parseTelegram. All functions push the messages to the queue and parseTelegram picks it from the queue and sends it to emsuart_tx_buffer.

emsuart_stop & emsuart_start (public but not called)

Called from ems-esp.cpp only for OTA-Updates.

@ArwedL
Copy link
Author

ArwedL commented May 6, 2020

Thanks for clarification regarding sending...
How do you see my supsicion of having a potential multi-threading problem in CircularBuffer?

@MichaelDvP
Copy link
Contributor

Afair CircularBuffer isn't used in v2. But that can only answer @proddy correctly, for now i'm not so familiar with v2.

@proddy
Copy link
Contributor

proddy commented May 6, 2020

emsuart has it's own simple buffer queue storing the complete telegrams. When one if filled it is sent to the core emsesp. CircularBuffer is not used in Tx.

But now in v2 there are two queues (std::deque), one for Rx and one for Tx, both asynchronous using std::atomic to prevent data race conditions. A telegram is sent to Tx after a poll is received on the Rx line, the Rx disabled and the whole data sent as one block.

@proddy
Copy link
Contributor

proddy commented May 6, 2020

also @nomis pointed me to his safe buffer implementation (https://github.com/nomis/EvohomeWirelessFW/blob/master/lib/InterruptSafeBuffer/InterruptSafeBuffer.h) which I'd like to try at some point too.

@nomis
Copy link

nomis commented May 6, 2020

Something that was frustrating about the Rx process is that it has to wait until the break finishes before it receives the message (which could delay Tx). Ideally it should use a timer to identify when there is nothing else being received and process the message sooner. It may be possible to use the UART timer to do this but it's character sized not bit sized.

I never got Tx working on my boiler so I don't know if any of my changes to Rx would break Tx.

@ArwedL
Copy link
Author

ArwedL commented May 7, 2020

emsbus.zip
I added a first version of emsuart.cpp and .h with ESP32 support. It compiles for me - I didn't yet test it.
It is ESP32 only (dropped all ESP8266 stuff)!

@proddy
Copy link
Contributor

proddy commented May 7, 2020

@ArwedL I've added the code to the v2 repo. Builds fine but causes a reset when the uart port is opened so need to debug further. It's probably easier if you work off my latest code base.

@ArwedL
Copy link
Author

ArwedL commented May 7, 2020

The latest commit which I can see on v2 branch (https://github.com/proddy/EMS-ESP/tree/v2) is from 16.1.2020 - is this the latest codebase?

@proddy
Copy link
Contributor

proddy commented May 7, 2020

I thought that branch was empty. I deleted it. No my EMS-ESP2 is in a private repo and not quite ready for the real world. I'll grant you access so can familiarize yourself with how the modules work.

@proddy
Copy link
Contributor

proddy commented May 9, 2020

I made a snapshot of v2 in https://github.com/proddy/EMS-ESP/tree/v2

Note it's not backward compatible for v1. So first wipe the flash on the ESP then upload the new firmware, connect to USB/Serial with 115200 speed/baud, type system to go into the system menu and use the set commands to change the wifi settings. Use help if you get lost and remember to read the README file.

@proddy
Copy link
Contributor

proddy commented May 18, 2020

Hi @ArwedL did you get any further debugging the ESP32 UART code?

@ArwedL
Copy link
Author

ArwedL commented May 18, 2020

To be honest I hoped for your side to make debugging progress... Any info so far which you can share? I had to shift focus to other open points in my ESP32 project. Will see if I can spend some time.

@proddy
Copy link
Contributor

proddy commented May 18, 2020

no worries, I'll have a go and fixing it up after I finished merging in the web code

@ArwedL
Copy link
Author

ArwedL commented May 18, 2020

I spent some time and found the issue. For me it works now (only tested receiving) - see attached update emsuart.cpp
emsbus.zip
The problem was in line "buf_handle = xRingbufferCreate(512, RINGBUF_TYPE_NOSPLIT);" which has to be called before the recvTask and parseTask are created. I didn't had any other problem. uart_driver_install worked for me.

@proddy
Copy link
Contributor

proddy commented May 18, 2020

nice! trying it out now...

@proddy
Copy link
Contributor

proddy commented May 18, 2020

yes, it kinda works. Data comes in (Rx) but also a lot of miss-fires. I'll take a deeper look later this week.

Annotation 2020-05-18 222237

@MichaelDvP
Copy link
Contributor

I could get a wemos-d1-mini32 for testing. It fits in the BBQKees board (but not in the housing).
Settings for this module with this pinout are:
sensors.h:
SENSOR_GPIO = 18; // Wemos D1-32 for compatibility D5
system.h:
LED_GPIO = 2; // on Wemos D1-32
LED_ON = HIGH;
emsuart_esp32.h:
EMSUART_RXPIN 23 // Wemos D1-32 RX pin for compatibility D7
EMSUART_TXPIN 5 // Wemos D1-32 TX pin for compatibility D8

The communication-problem is a bug in the driver. The default interrupt gives on break-intr. only the queue-flag, but does not read the fifo to the buffer (see here). The buffer is only filled in line 808. On break-intr. the telegram is mainly in the fifo and stays there until fifo full intr. Solution is to set fifo-full to 1 with a lot of irq-calls, or use our own irq-routine.
Here is a small intr routine, rx is working but i can not see tx in the log.
uart.zip
terminal

@proddy
Copy link
Contributor

proddy commented May 24, 2020

nice Michael! I had a quick try but couldn't see any Rx come in. I could be a fault on my side (broken wires). I'll have another go later.

@MichaelDvP
Copy link
Contributor

I have changed the rx/tx pins for my module, you have to change back.
But i think i know what's wrong with tx, i have to set conf0.txd_brk after the tx-buffer is filled and clear it in the irq. It also seems to work with esp8266. I'll test and let you know, stay tuned.

@proddy
Copy link
Contributor

proddy commented May 24, 2020

it was a fault cable. I'm getting Rx in but Tx causes a crash. So I think you're on the right track! nice work btw.

@ArwedL
Copy link
Author

ArwedL commented May 25, 2020

Also thanks from my side for improving the code. Obviously I don't find the time for making the solution bullet-proof...
And of course I assumed correct implementation from Espressif SDK (which seems not to be the case).

@MichaelDvP
Copy link
Contributor

For me all modes working fine, test with change mode, wait a minute, show emsbus, change mode, etc., starting with mode 4:
EMS Bus protocol: Buderus, #telegrams received: 252, #Read requests sent: 87, #Write requests sent: 0, #CRC errors: 0 (0%)
Tx mode = 2
EMS Bus protocol: Buderus, #telegrams received: 348, #Read requests sent: 118, #Write requests sent: 0, #CRC errors: 0 (0%)
Tx mode = 1
EMS Bus protocol: Buderus, #telegrams received: 411, #Read requests sent: 129, #Write requests sent: 0, #CRC errors: 0 (0%)
Tx mode = 3
EMS Bus protocol: Buderus, #telegrams received: 448, #Read requests sent: 140, #Write requests sent: 0, #CRC errors: 0 (0%)
Tx mode = 4
EMS Bus protocol: Buderus, #telegrams received: 536, #Read requests sent: 172, #Write requests sent: 0, #CRC errors: 0 (0%)
I've used this routine:
emsuart_esp8266.cpp.txt

With the mixer i meant something like that:
mixing.cpp.txt

Is it possible to set some device-flags to the boilers. The 0xE3 .. 0xE9 are not supportet by my boiler. I think they are used by newer boilers or only heatpumps/condensors/compressors?

Also for the thermostat it should be fine if only the active hcs would be requested.

@proddy
Copy link
Contributor

proddy commented May 27, 2020

What issues do you have with tx? Only on 8266 or both?

On both ESP8266 and ESP32 when the Tx is sent, there is no acknowledgment, just more Rx. If it works on your setup it must work on mine too so let me experiment a little more using your latest code. Worst case I'll bring out the scope and see what is being transmitted over the EMS line.

I see only "[telegram] Tx buffer full. Looks like Tx is not working?"

Here I wanted a way to detect if the Tx was not working and thought if the Tx queue is full (max 20) this must be a good sign. With a poll happening every 1-2 seconds and doing the queue check every minute it should be pretty fail-safe. Note the poll-acks are not stored as Tx messages so these are only the real read/write commands. But you're right, there will be times when there are a lot of messages in the queue (like after a 'refresh' command) so I'll need to find a better way. Any ideas?

i have modified telegram.ccp, line 272 to:
if (data[2] < 0xF0 || length < 6) { to avoid reading outside the telegram)

correct. I had fixed that in an earlier build and modified the whole logic.

btw, mixing.cpp, line 156 should be hc_ = device_id() - 0x20 + 1;

thanks, corrected it!

I think it is better to request only the telegrams implemented in the device. So for MM10 request only 0xAB, for MM50,100... request 0x01D7 only for device 0x20, 0x01D8 only for device 0x21, and so on.

This is a very good point and I had noticed it too. Same with the boiler (your 0xE3-E9 example). Maybe by using device flags like we did with the thermostat is the right approach. And this will also save on flooding the Tx queue with bogus messages that won't be answered anyway.

Also for the thermostat it should be fine if only the active hcs would be requested.

good idea. I'll work on this too.

Regarding the corrupt telegrams when stopping the uart: The first telegram after reenable is always corrupt (to long or first bytes missing if buffers cleared), we can set a flag to drop this first telegram.

Ok. I will test again. I just noticed after each send I would get a CRC error.

it's really nice all the help you're providing, much appreciated Michael.

@MichaelDvP
Copy link
Contributor

MichaelDvP commented May 28, 2020

What issues do you have with tx? Only on 8266 or both?

On both ESP8266 and ESP32 when the Tx is sent, there is no acknowledgment, just more Rx.

Yes, i know what you mean. I've added some more log-messages to monitor tx and polls and some answers and acks are missing. missing_answer_and_ack.log (some messages are logged from telegram and emsesp, they are received/send single but appear double in the log).
To monitor the ack better i added a log for acks and test another version with the same tx, now working. working.log.
But i have not figured out yet what caused the problem.

Here I wanted a way to detect if the Tx was not working and thought if the Tx queue is full (max 20)
I've changed to 30 and with the mentioned changes in Mixing::Mixing the message does not show up again.

I'l send you a working uart as soon as i understand what's going wrong.

edit: BTW: the test with different modes was false, i doesn't realize that i have to reboot before tx_mode is active.

@proddy
Copy link
Contributor

proddy commented May 28, 2020

ok. I'm working on improving the wifi and mqtt calls (I still get some dropouts). I've add the changes to the mixer and also adjusted the thermostat to only fetch the active heating circuits. And I made the check for 'Tx line' more robust. Just need to add device_types to the Boilers and I think I've covered all your comments. It'll be version a10.

@proddy
Copy link
Contributor

proddy commented May 29, 2020

Also for the thermostat it should be fine if only the active hcs would be requested.

@MichaelDvP when you query a heating circuit on your thermostat that's not active, what does it come back with? Eg. read 2a8 for HC4 ?

@MichaelDvP
Copy link
Contributor

Here is the uart, working on 8266 in mode 4, on esp32 i'm getting a few CRC error at start and very rarely a "TX read failed", but the device answers, so it's in the echo readback.
uart.zip

@proddy i can't read 2a8 since i have no ems+. But these are the RC35 circuits, only hc2 is active:
Thermostat_hc1-4.log

I've also found some smaler bugs and typos, should i describe them in a issue or make a pr? (I've put a fork on github with the fixes (and few changes for me like sensor-mqtt)).

@MichaelDvP
Copy link
Contributor

For the esp32 add in the start-routine the line EMS_UART.idle_conf.rx_idle_thrhd = 12;
sometimes the break is recognized before the line goes up, The receiver takes the end of the break as startbit and reads a 0xFF at the start of the next telegram.

@proddy
Copy link
Contributor

proddy commented May 30, 2020

@MichaelDvP thanks for the updates. I've merged them into a11 in the v2 branch. I've also made you a contributor so feel free to hack away directly in the project branch or push PRs.

A few comments on the changes

  • I still can't get Tx working on the ESP8266. I get Tx errors so will need to look into what is causing that on my system
  • I added an MQTT format called 'CUSTOM' to replace your 'MY'
  • the only change I didn't make is the MQTT changes in mixing.cpp. I don't like the static heap allocation. Was there a reason for this? If you're getting mqtt errors try qos 1

@MichaelDvP
Copy link
Contributor

@proddy Thank you for the invitation.

I get Tx errors

Strange, i've tested with the esp32 and a noname wemos-8266-clone (to have the original wemos with 1.9.6 as backup reference), both without tx issues now. Btw: I found, that esp32 resets the tx-break-bit by hardware after sending, no need to do it in the code, but in 8266 we have to clear it in the code.

MQTT format called 'CUSTOM'

the intention was mainly the sensor formatting, I have 7 sensors and sometimes change them (add one on a new place, or disconnect on that is useless), Then the numbering changes and my iobroker-scripts catch the wrong sensor. With json {id:temp,...} the sensors are fixed.

MQTT changes in mixing.cpp. I don't like the static heap

No mqtt-errors, the intention was a 1.9.5 compatible json, were every mixing device can add a own nest and keep the other nests. But now i think it's better to have a json for every device. But we have to uses the device-id (also represents the setting on the switch on MM100) for numbering, otherwise mixing_data1 with hc1 will be overwitten by wwc1. With device-id wwc1 (switch position 8) will become mixing_data8.

@proddy
Copy link
Contributor

proddy commented Jun 1, 2020

TxErrors

using the latest v2 I still don't get Tx working. After each send I get a timeout error. I think I need to hook up the scope.

Annotation 2020-06-01 141944

sensors & MQTT

that's a valid use case for people with more than 1 sensor so happy with any changes you commit that make it better.

MQTT & mixing

yes, the ArdunioJson library uses only the copy constructor when they are char * or consts so there may be conflicts when building the json object. I'll improve this code.

telnet performance

I've been battling to find why the telnet is always not very responsive and I think its due to the Dallas one-wire library interfering with the wifi. Have you ever experienced this? I may re-write that piece of code too.

@MichaelDvP
Copy link
Contributor

Oh, i thought the issues are with tx_mode 4. Now I tested mode 1 a bit longer and can reproduce this errors, but only one error in a few minutes. There are timeouts and break-interruptions and collisions with the next telegram, resultion in bad CRC. Something seems to interrupt and delay the tx routine very long, is there somthing time consuming (like dallas) in another thread?
Do you have the errors also with mode 4?
this disablees all interrupt while sending, no error in the first 8 minutes now.
emsuart_esp8266 .cpp.txt

@proddy
Copy link
Contributor

proddy commented Jun 2, 2020

I'll do some checking this afternoon. Also re-writing the sensor code as it seems to also block the wifi/lwip. There are a couple of delay() calls in the onewire library

@MichaelDvP
Copy link
Contributor

Yes, the onewire is burning a lot of time. Another thing: the RxService::loop() should process a telegram as fast as possible, but is delayed (RX_LOOP_WAIT), skip this.

@proddy
Copy link
Contributor

proddy commented Jun 2, 2020

@MichaelDvP i added the RX_LOOP_WAIT because I thought it was slowing down the telnet, but in the end the culprit was the 1-wire library. So happy to remove it. Except, why should we process an Rx telegram as fast as possible? They're queued up and can be processed every 300ms without effecting any Tx

@MichaelDvP
Copy link
Contributor

Ups, you're right, i confused process_telegram an incoming_telegram. The short reaction time is on master poll to tx. I added code to check the time from receiving interupt to tx transmit call, and sometimes this takes more than 20 ms. I wonder what happens in this time?

@proddy
Copy link
Contributor

proddy commented Jun 3, 2020

that is odd. Tx should happen immediately after a poll. What you could try is to comment out the sensor_.start() and sensors_.loop() in emsesp.cpp. This made my setup run a zillion times faster.

@MichaelDvP
Copy link
Contributor

MichaelDvP commented Jun 3, 2020

@proddy

i added the RX_LOOP_WAIT because I thought it was slowing down the telnet

i thought about that, but we have only a few messages per second and if there is no message the function also returns. I see no benefit. To give telnet/wifi more time i think it's better to add a delay in the main loop. Since the tx-reaction is complete in the emsuart_recvTask, this tasks should have enough time to complete. I'm trying now a delay(MYESP_DELAY) as in 1.9 in the EMSESP::loop() and it seems to help. Also terminal seems more responsive.

@proddy
Copy link
Contributor

proddy commented Jun 4, 2020

@MichaelDvP I found and fixed the issue that was causing Tx to fail with the old logic (tx_mode 1). The value of the timeout was too short. Should be 1760. I suspect a typo in the macro when it was copied over from 1.9.5.

I still can't get the newer Tx code to work (tx_mode 4). Giving me the same errors. Perhaps also a timing error?

Capture

@MichaelDvP
Copy link
Contributor

MichaelDvP commented Jun 5, 2020

I don't believe it is the timeout value. I've changed it as i was working on uart, the 10 seems to me as a typo.
The timeout is counting loops, each loop is EMSUART_BUSY_WAIT long, which is 1/8 bittime. With EMS_TX_TO_COUNT set to 22 * 8 * 10 you get a timeout of 22 * 8 * 10*bittime/8, e.g. 220 bittimes or 22000 µs. If you wait so long for one byte, you'll get a collision with the next master-poll in the second byte. With 22 * 8 the timeout is 22 bittimes, 2200µs, a bit more than the EMS+ fixed wait.

The error messages indicates that there is no response from the destination, right?
So first question, did we send? Can we receive the echo or is it missing?
If there is a echo, what do we receive next, What message is recieved, that triggers the rx, but does not match the tx_waiting? We should log the raw telegrams including break direct in recvTask to see what's come in. For timing it can be usefull to increase the priority of the recvTask.

@MichaelDvP
Copy link
Contributor

I logged with syslog to see the tx-errors and there is a another strange thing. I get reboots every 2 hours (mark is set to 2h, can it be that?) and always after 4 complete tx-errors and the first error of 5th. But the time between retries is very long, seems the counter is'nt cleared in between.

Another thing: [system] and [network] logs with local time (mest), [emseesp] (and also [boiler], [thermostat]) logs with utc.
syslog.txt

@proddy
Copy link
Contributor

proddy commented Jun 5, 2020

I don't believe it is the timeout value. I've changed it as i was working on uart, the 10 seems to me as a typo.
The timeout is counting loops, each loop is EMSUART_BUSY_WAIT long, which is 1/8 bittime. With EMS_TX_TO_COUNT set to 22 * 8 * 10 you get a timeout of 22 * 8 * 10*bittime/8, e.g. 220 bittimes or 22000 µs. If you wait so long for one byte, you'll get a collision with the next master-poll in the second byte. With 22 * 8 the timeout is 22 bittimes, 2200µs, a bit more than the EMS+ fixed wait.

The error messages indicates that there is no response from the destination, right?
So first question, did we send? Can we receive the echo or is it missing?
If there is a echo, what do we receive next, What message is recieved, that triggers the rx, but does not match the tx_waiting? We should log the raw telegrams including break direct in recvTask to see what's come in. For timing it can be usefull to increase the priority of the recvTask.

You're right, 1760ms is a long time within the loop. I'd rather just forget the "tx_mode 1-3" and work on your new and improved Tx logic and figure out why it doesn't work on my setup. I'll also ask BBQKees is he's willing to try out a few things on his boiler. Is there anything specific with your environment? There is a difference with timings between EMS+ and EMS1.0 and I'm on EMS1.0.

@proddy
Copy link
Contributor

proddy commented Jun 5, 2020

I logged with syslog to see the tx-errors and there is a another strange thing. I get reboots every 2 hours (mark is set to 2h, can it be that?) and always after 4 complete tx-errors and the first error of 5th. But the time between retries is very long, seems the counter is'nt cleared in between.

Another thing: [system] and [network] logs with local time (mest), [emseesp] (and also [boiler], [thermostat]) logs with utc.
syslog.txt

I'll create a separate issue for this and track it there.

@MichaelDvP
Copy link
Contributor

I've logged the time from rx-intr to send and found the it's always this check:
if (millis() > (emsRxTime + EMS_RX_TO_TX_TIMEOUT)) { // send allowed within 20 ms
return EMS_TX_WTD_TIMEOUT;
that cause the error. I replaced this code by
LOG_DEBUG(F("Responsetime: %d"), uuid::get_uptime() - emsRxTime);
and it's mainly 0 or 1 but sometimes 29 ms (no other values), but does not give a collision with next telegram. We should skip this check completly.

@proddy
Copy link
Contributor

proddy commented Jun 7, 2020

I see a responsetime of 0 and sometimes 1 with "tx_mode 1". I just can't get tx_mode 4 working, even by adding some delay after each bit write. I'm so busy with the web interface I don't really have time to get the scope out to see how the timings are off on my EMS 1.0 system.

@proddy
Copy link
Contributor

proddy commented Jul 6, 2020

Closing this. Covered in emsesp/EMS-ESP#398

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants