-
-
Notifications
You must be signed in to change notification settings - Fork 107
This issue was moved to a discussion.
You can continue the conversation there. Go to discussion →
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
EMS-ESP status disconnected #859
Comments
have a look at the BUS status and uptime to see if your EMS-ESP is losing network connectivity (your RSSI is very low at -86, see https://www.metageek.com/training/resources/understanding-rssi/) or your EMS-ESP is restarting dur to out-of-memory (your available free memory at 83 and available block size is also very low). |
The why the RSSI is that low I do not understand, there is a WIFI mesh module nearby. But that might be related to the fact that I can't see the EMS-ESP as device connected to the network... Working on that. Question is why it is restarting that often. It looks like that this restart frequency has increased since I installed EMS-ESP-3_5_0b12-ESP32 and higher. In the last two graphs you see the uptime, strange that I have 2 different uptime fields from EMS-ESP, one before and one after the upgrade. Looks like the UoM has changed... When I look at the Free Memory graph I get the impression there is enough free memory. But maybe the system did not get enough time to interface the low memory. |
I have done a factory reset and configured all again, now with fixed IP. Right now the RSSI is -47 so it connected to the nearby DECO M5. But I doubt that this RSSI was causing the restart of the device each time. |
And again the EMS-ESP restarted. |
Keeps restarting. Is there a way to capture the reason for the restart? E.g. the reset reason is "Software reset CPU", so maybe a more detailed reason can be given for the root cause. Or maybe a dump can be stored somewhere with the needed details. I'm using this version: https://bbqkees-electronics.nl/product/gateway-e32-ethernet-edition/ "System Info": { |
hmm, it could me memory related. Your free mem and max alloc is very low (mine is 133/75, yours 83/39). The first thing I would do is change the Log Buffer max size. I updated the wiki with some other things to try out: https://emsesp.github.io/docs/Troubleshooting/#ems-esp-sometimes-crashes-and-restarts sorry about this. I know how annoying it can be. It's hard to simulate everyone's setup but we're working on it. |
In the settings I disabled all the options (in my case Telnet Console and Analog Sensors) Question: why do we need a Buffer Size? If we keep the screen open we keep all the messages. Would Buffer Size = 0 be an option or will that not save much memory? Shortly after restart I have: A few minutes later when all the entities are found: With NTP and MQTT disabled: With MQTT enabled and NTP and OTA disabled: So not much saving by disabling services. Any suggestion how to free up memory? 000+00:00:00.000 I 0: [emsesp] Last system reset reason Core0: Software reset CPU, Core1: Software reset CPU |
It's so you can go to the Log page and always see the last logs, which is useful when EMS-ESP is booting up I'm going to try and simulate your setup by adding dummy devices so I can trace what is happening. |
that's good but equally strange since Telnet/Analog/OTA don't really use a lot of heap memory. 100/41 is just on the edge. Michael and I are going to do some memory optimizations in 3.6.0 as soon as the current 3.5.x is out. Out of interest, are you compiling yourself or taking the firmware .bin files from the repo? Reason I'm asking is that there was a pio library update a few days ago that is using by default espressif32 5.3.0 which adds a lot more mem |
No, I'm using your dev build from https://github.com/emsesp/EMS-ESP32/releases. I have installed the one of Dec 28th. I see there is one of yesterday, will install that one. Didn't help, 106 KB / 40 KB. |
@MichaelDvP what do you think? It's 100+ KB free heap seem normal for 3 devices (boiler and 2 thermostats) with a total of180 entities. My system has 78 so hard to compare. I'm planning to make some changes to test.cpp so we can simulate adding devices on a standalone ESP32 and watch how EMS-ESP handles the memory. |
Yes, in #857 the system is E32, boiler, heatsource, thermostat, 2x mixer, solar and also ~100k heap and 38k block. The heap is imo not critical, it's the fragmentation. I tried to malloc(40k) in emsesp-start and in dashboard, custoimzation and api i free the buffer, alloc the reponse and after send realloc the 40k buffer. Seems to work good. Worst what could happen is, that realloc 40k failed, then it's normal emsesp again. Now i'm testing to malloc another extra 20k just to reduce free heap. Also unlimited 100 message log buffer. |
I was to optimistic... Restarted twice today, at 04:00 and 10:00. |
another to try is using the customization web page and disabling about 30-50 of the entities, or things you're not interested in seeing (from the 180 you have). And then use http://ems-esp.local/api/system/info to see the "free mem" and "max alloc" |
I reduced to 4, did a restart, and there was no impact on the Heap.
|
ok, I'll make some code changes to simulate your devices so I can debug locally |
also, I realized that it's a heap memory issue and it crashes when you're not doing anything manually (in console, in web etc). When you disable entities in the customization page it still loads all the details into memory, just sets flags etc. So sorry I led you on a bit of a goose chase. I'll try something else and send over a custom build for you to try. |
As a test I temporary installed EMS-ESP-3_4_4-ESP32. EMS-ESP-3_4_4-ESP32 EMS-ESP-3_5_0-dev_14-ESP32 - Jan 2nd |
Just thinking... If people use MQTT to send the data to HA or any other tool, then the WebUI is most likely only used for making configuration changes. Can we think of a new option in the Settings menu to enable / disable some WebUI functionality to reduce the memory usage for those using MQTT? E.g. for me the Dashboard can be disabled or limited when all is running fine as I will look at the values in HA. And when I need to make some changes I can then temporary enable them again. But not sure if that will save a lot of memory. |
Heap (Free / Max Alloc) This sounds as if there is a lot of free memory, so why does the EMS-ESP then restarts? Is there a process that suddenly requires a lot of memory? A big backlog of MQTT messages? And how does the following sum up to 103 KB? |
Do you have mqtt qos set? The mqtt queue is 300 messages, needed for HA autoconfig. If qos fail it could be that the queue is filled with large data messages, consuming a lot of ram. (on mqtt disconnect output is stopped).
It does not. The output of system info takes a 16k buffer from heap, 87 + 16 ->103. |
QoS = 0 So if I understand correctly then if messages fail this should not result in a big queue of messages. |
Thinking about something else... I can connect to the EMS-ESP Checking the RSSI it's something like -77 although there is one DECO M5 (mesh network) a few meters away, meaning that it doesn't connect to that nearby one. I just disabled MESH for the EMS-ESP and fixed it to the nearby DECO M5. The RSSI is now -55. So what if the connection is/was so bad that it takes to long to send the messages and due to that the queue is getting full? Update: |
could be, there is a limit of the mqtt buffer obviously and nothing more is added. But there might also be a nasty memory leak in the code somewhere which is creeping up. The best way to monitor this is in HA and see the free mem and alloc mem is falling. You can also disable mDNS in the Network settings page. This will give you an additional 5kb of heap which may be just enough to keep it alive. |
In HA there is only the "Free memory" for me, not the "Alloc mem". After the next restart I will disable the mDNS. What about the Dallas sensor service? Is that using a lot of memory? I can't disable it but I also do not use it.
|
mDNS disabled Heap (Free / Max Alloc) |
Ok, let us know if it still crashes. We're doing some memory optimizations in the background (#869) |
See #894 Maybe something to add to the It may be memory related in I will check if these extra KB are enough to prevent the restarts. Heap (Free / Max Alloc) disabled OTA disabled ethernet |
Thanks Hans for all the time you put into benchmarking the scenarios. It's been really helpful. #869 will help us figure out how much heap and max-alloc we can safely live with before the ESP32 implodes. I'm working on some optimizations in a separate branch, one that includes this max-alloc in the heartbeat MQTT topic so I can see if it starts getting smaller over time. |
Do you want to keep this one open or shall we close it? We have the #869 in place to improve the heap memory, and on top of that we have the #891 which allows to excluded entities completely if they are not relevant for someone's configuration. After #869 I can try to enable functionality again. |
I'd prefer we keep it open as a reference and reminder. I'm making progress on #869 - using that build I've seen an increase in the max alloc buffer by 24% on my live environment and heap by 8%. It's still not enough though, loading 200+ entities crashes and I'm almost 100% sure it's related to MQTT publishing. I'll have more time in a few weeks to pick this up again as I'm traveling extensively for work this month |
Do I understand that (on high level);
If the above is true then is the problem related to the speed of sending the MQTT messages or is it related to free up the speed in the buffer? On top of that, if the setting is QoS0 then there is a risk that messages will get lost. If someone doesn't want that then QoS1 or QoS2 should be used. Don't get me wrong, finding and solving the root cause is better. But in the meantime we can try to avoid the restarts by using such tricks.. |
This is what I will need to investigate, now that I can reproduce the crash. The fix to prevent the restarts is to prevent over-allocation of buffers, and report loss of messages/data as an log ERR. Then we can optimize the memory allocation. |
I have disconnection issues, the s32 drops to AP mode and disconnects from the network. I don't know whether it's related to this issue, but @proddy asked me to mention it here. I will attach my config and the screenshot of the HA Free memory and EMS Status around the time of the last disconnect. |
I just manually rebootet HA, and it disconnected MQTT and it won't reconnect.
|
if you're hosting a Mosquitto broker on HA and restarting HA every night and EMS-ESP doesn't reconnect - is that correct? If so please create a new issue and include the support info. I don't think it's related to this one, I was wrong, sorry |
I disabled the automation, but yes, that's what happened last time. MQTT Broker is also running on HA. I will open an issue later. |
closing this for now as we know its memory related and workarounds have been published. |
This issue was moved to a discussion.
You can continue the conversation there. Go to discussion →
Question
I sometimes see some "gaps" in some measurement in HA. Checking the EMS-ESP status then I see that around those times the status was disconnected. However, when checking the logs I do not see anything strange on the EMS-ESP.
Not sure yet where is the problem. Can we, based on the attached log file, exclude that the issue is in EMS-ESP?
Screenshots
Device information
emsesp_info.txt
Additional context
log 20221228_153827 to 20221229_072137.zip
The text was updated successfully, but these errors were encountered: