-
-
Notifications
You must be signed in to change notification settings - Fork 107
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
EMS-ESP 3.6.0 crashes after hours/days - no response from ems-esp #1264
Comments
Strange, haven't seen this behaviour or any other reports. Did you build yourself or take the .bin from the GH release page? Also any other info on your setup? If it's memory you can try capturing the data and seeing what happens over time. |
Proddy, See different on sysinfo "free RAM" |
Now I have the ems-esp freez the situation, if you need further information let me know. log.txt |
For information: I'm on 3.6.0, release installed on 15.08.2023 (19:40) .bin from GH, so was running for ~2days until freeze. 3.5.x: Never had such freeze before the last years with previous versions like 3.5.x
3.7.x Haven't installed 3.7.x/dev so far Will observe if it will occur again... |
Thanks for reporting. I'll compare both the old 3.6.0-dev17 with the latest 3.6.0 to try and see what is different |
update: I can't see any noticeable changes in the code between are there any specific actions you are performing when it 'freezes'? Like is the Web UI open and logging everything to the System Log window? Or does it crash when there is no web UI or telnet present? |
Hi, I have the very same issue with 3.6.0. |
damn. Is there anything particular to your setups? Like are using the API a lot (e.g. via iobroker), or exporting data from the web, or having the Web UI open looking at the web logs, or logging with syslog. I need to trace down why it's not working for you guys. In the meantime can you try out these two builds: |
@proddy actually, nothing really special. I am just using it in home assistant via mqtt. I would be really happy to help but I will not have a physical access to the device for few days. If the issue is not found, I will get back to this on Thursday. Thanks a lot! |
Currently after last restart 3.6.0 is running without freeze since 2 days now... Edit 21.08.2023: Edit 23.08.2023 ~11:30: Edit 25.08.2023 ~6:50: Edit 29.08.2023 15:30: Edit 29.08.2023 18:45: |
There should be no difference between the dev-17, dev-18 and main 3.6.0 other than some update web libraries, but that shouldn't cause a crash. Which is strange. |
I update second device cross the street dev-17 is working, dev-18 as well 3.7.0-dev-0 freez System over time. |
Ok! Thanks for testing this, it's super helpful. So something sneeked in between dev-17 and dev-18 so I'll compare the source for those two again and try and figure out what is causing this. |
Same issue on my device, too. ESP32, wifi. Updated from 3.5.x latest official to 3.6.0 official. I have 8 digital thermometers attached via one wire. I only use the Web GUI, for OTA updates or customizations. Apart from that I use Home Assistant via MQTT. |
Sorry @JokerGermany to screw up your system. There is something fishy with the 3.6.0 main build but I'm not sure what it is. Could you try one of the dev17 or dev18 builds from the earlier posts? |
will do when i get it back to live. at the moment the web interface isn't reachable at all. €dit: €dit2: €dit3: |
Proddy, mqtt setting with freez mqtt settings without freez I did it as well at my parents device, both are running since I changed the mqtt, |
I have the same issue using HomeAssistant via MQTT on Wifi and Buderus Logamax plus GB192i |
@proddy Before leaving for vacation, I reverted to dev18. It has been stable since then (5 days), so I believe the issue has been introduced afterwards. |
@proddy , for me dev18 also freezed after ~2days (#1264 (comment)) |
I changed the mqtt username and password, see six post above |
I think the freezes are memory related. If mqtt disconnects the queue is filled consuming memory until a minimum is reached,
Maybe better to change both values |
releasing 3.6.2 - please report back if these freeze type scenarios return |
restarted it and updated to 3.6.2 |
I also have issues, my S32 (standard wifi, v2.0) also crashed a few times with 3.6.0 the last week. I’m using it with Home Assistant, all up-to-date latest software releases (2023.9.3, mosquito mqtt 6.3.1) connected to a Nefit/Bosch heatpump. |
I was running 3.6.2-dev.2 for over 9 days with hardcoded BSSID without a crash or reboot (a recent record) until this morning when our mesh WiFi system performed a maintenance reboot. The ESP then hung until I power cycled it. However (and this must be a separate issue), when it came back up it had reverted to 3.6.1. I just uploaded 3.6.2, verified it was running 3.6.2 and rebooted and now I'm back at 3.6.1 again...? |
Did you use the official 3.6.2? I've seen this before in one of my first tests for power-entities to nvs. I think ota_data was corrupted and it boots to first partition always. But in later versions i can't reproduce (i think is was a nvsRead/Write before nvsBegin in first version). I tested now with the official 3.6.2 and still can not reproduce, all reboots go to right partition. |
Yes, it was the official 3.6.2. I tried something different this morning - flashed 3.6.2 while running 3.6.2. This made the change to 3.6.2 persist through a manual reboot. Previously, I had flashed 3.6.2 (the the devs) after it had rebooted back to 3.6.1... |
It seems so. The ESP have two partitions app0 and app1, one of them is active. An update is written to the other partition and than this partition is set active. If you have same software in both partitions you'll not see a fallback to wrong partition, because both have the same software. Try to flash 3.6.3-dev0, it is the same as 3.6.2 but different version number and see if it is reset persistant. |
Flashed 3.6.3-dev.0 and it persisted after a system restart so I believe the boot partition numbering issue has gone away for me. |
My main esp32 is now:
I don't think it is something in software. @JokerGermany You have a lot of rx fails, pointing to a weak power supply or emc problem. Check with different supply |
I'll create a new issue for this. This issue is about the EMS-ESP crashing which has been fixed (was a mqtt memory issue on network loss) |
@proddy I am not 100% convinced it is fixed; Unfortunately, I had another crash on 3.6.2 after many days of running fine. After restarting, I set BSSID to the nearest AP. If I ever observe another crash, I will let you know about it in the other issue. Many thanks for your help and also excellent project! |
I categorize an EMS-ESP crash as a shutdown with a stack dump followed by an automatic restart, usually related to out of memory. I think in your case it just becomes unresponsive. |
I don't think it's crashing, but rather EMS-ESP is going into an infinite re-connect loop after the MQTT connection was dropped. There's another issue for this. #1321 |
ems-bus has not enough power for a esp32, you have to use service-jack or external supply. Low power could explain the high rx-fails and unstable network.
hide-LED only disables the permanent led, not the blinking. |
I had another WiFi disconnect now as well. During my 2 weeks vacation period I installed the special version from @MichaelDvP 3_6_1-dev_0f-ESP32_S3 with energy entities but without the changes for Wifi. For a period of 16 days this version was stable. After return I installed the latest version 3.6.3-dev.1 for the ESP32-S3. The router log shows 3 disconnects within 10 seconds but no new connect. Afterwards nothing anymore. It looks like that there has been 2 reconnect attempts after the first connection loss but without success. Open questions are: 1. why was the connection lost? and 2. why is the reconnect not working? I can see a high number of disconnecting / reconnecting activities within my routers log for most devices. |
Have you tried to set the bssid? Networkscan and click the AP sets it automatically. |
Yes i am using the service jack. Pls forgot my report, i will change my messages. I can't exactly tell if someone teared it down... Sorry for the false report |
@MichaelDvP Mesh steering is not active for ems-esp. But there are dynamic bandwidth changes from 40 to 20 Mhz, which might initiate a reconnect. Shall we continue in #1321 ? And shall I try one of the builds? |
I have updated to dev.3.7.0 as a test,
after a certain time n 1-3 hours the ems-esp is no longer available, no response, no putty, no ping
the whole thing can be reproduced, if I switch back to dev 3.6.17, this behavior is gone . Is this problem known
BR Alex
The text was updated successfully, but these errors were encountered: