Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

device regularly becomes unresponsive for several minutes #246

Open
marcone opened this issue Mar 18, 2024 · 12 comments
Open

device regularly becomes unresponsive for several minutes #246

marcone opened this issue Mar 18, 2024 · 12 comments

Comments

@marcone
Copy link

marcone commented Mar 18, 2024

I'm using the ratgdo 2.53i standalone, not integrated with any home automation setup. I flashed it with the ESPHome firmware and have a script that polls the REST API every few seconds to get the current state and automatically close the door when it's been accidentally left open.

What I'm noticing is that the ratgdo frequently stops responding, sometimes only briefly, sometimes for minutes at a time.
pinging the device shows the same behavior: it'll regularly stop responding to pings for up to a few minutes, then resumes, and these interruptions coincide with the REST API becoming nonresponsive.
I can ping the GDO itself as well as other devices on the same Wifi access point (I have a dedicated AP in the garage) without issue.

(as I was typing the above, it stopped responding again and stayed unresponsive for 6 minutes)

@rlowens
Copy link
Contributor

rlowens commented Mar 21, 2024

How often does this happen? Can you plug in and capture the serial logs?

Also, have you disabled the Home Assistant "api:" on your firmware, since you aren't using Home Assistant? The device will automatically reboot if it cannot connect to Home Assistant after 15 minutes by default. https://esphome.io/components/api.html

To disable the api, you will need to compile your own firmware with ESPHome. You'll need to install ESPHome itself somewhere and then create a device .yaml for the ratgdo and compile and flash that new firmware.

Here's what the device .yaml could look like:

substitutions:
  name: ratgdo
  friendly_name: Garage
packages:
  ratgdo.esphome: github://ratgdo/esphome-ratgdo/v25iboard.yaml@main
esphome:
  name: ${name}
  name_add_mac_suffix: false
  friendly_name: ${friendly_name}

wifi:
  ssid: !secret wifi_ssid
  password: !secret wifi_password

#custom modifications using https://esphome.io/guides/configuration-types.html#remove
api: !remove

@marcone
Copy link
Author

marcone commented Mar 21, 2024

I did not edit any yaml files or build my own firmware, I just flashed it from https://ratgdo.github.io/esphome-ratgdo/ (choosing "security + 2.0" and "ratdgo v2.5x").

I do see messages about it rebooting in the log on the web UI, however the nonresponsive periods are much more frequent than that, e.g. just in the last half hour it was unresponsive from 8:29-8:31, 8:33-8:40, 8:43-8:44, 8:46-8:48, 8:50-8:51 and 8:54-8:55.

I'll see about getting serial logs. Might have to go buy a really long USB cable first.

@PaulWieland
Copy link
Contributor

This is probably because the ESP Home firmware is rebooting because it's not connected to HA.
Look at reboot_timeout on https://esphome.io/components/api.html

@marcone
Copy link
Author

marcone commented Mar 21, 2024

But doesn't that reboot only happen every 15 minutes? The nonresponsive periods happen way more frequently than that, and their duration varies a lot too.

@marcone
Copy link
Author

marcone commented Mar 24, 2024

I attached the ratgdo to a Raspberry Pi so I could capture serial logs while the ratgdo was attached to the opener.
I've attached two logs:
"ping.log" is the log of a script that pings the ratgdo every second. When it receives a response it logs "alive", and when it doesn't receive a ping response it logs "unreachable".
"serial.log" is the serial log, with each line prefixed by the timestamp of the time it was read, so it can be correlated with the ping log.

Some things that stood out to me:

  • in the web interface I've seen the occasional "No client connected to API. Rebooting..." message, but I haven't seen those in the serial log.
  • in the serial log, there are a lot of messages like this:
Sat 23 Mar 19:58:54 PDT 2024: ^[[1;31m[E][json:041]: Could not allocate memory for JSON document! Requested 128 bytes, largest free heap block: 128 bytes^[[0m^M
Sat 23 Mar 19:58:54 PDT 2024:
Sat 23 Mar 19:58:54 PDT 2024: --------------- CUT HERE FOR EXCEPTION DECODER ---------------
Sat 23 Mar 19:58:54 PDT 2024: ^M
Sat 23 Mar 19:58:54 PDT 2024: Exception (29):^M
Sat 23 Mar 19:58:54 PDT 2024: epc1=0x4000df64 epc2=0x00000000 epc3=0x00000000 excvaddr=0x00000000 depc=0x00000000^M
Sat 23 Mar 19:58:54 PDT 2024: ^M
Sat 23 Mar 19:58:54 PDT 2024: >>>stack>>>^M
Sat 23 Mar 19:58:54 PDT 2024: ^M
Sat 23 Mar 19:58:54 PDT 2024: ctx: sys^M
Sat 23 Mar 19:58:54 PDT 2024: sp: 3fffec10 end: 3fffffb0 offset: 0190^M
Sat 23 Mar 19:58:54 PDT 2024: 3fffeda0:  00000016 000000d4 00000020 40101530  ^M
Sat 23 Mar 19:58:54 PDT 2024: 3fffedb0:  00000005 00000000 00000002 4010179c  ^M
Sat 23 Mar 19:58:54 PDT 2024: 3fffedc0:  4025d6e7 3ffef3f0 00000002 4025d67c  ^M
Sat 23 Mar 19:58:54 PDT 2024: 3fffedd0:  00000002 4025d623 00000002 4025c778  ^M
Sat 23 Mar 19:58:54 PDT 2024: 3fffede0:  4025c7a1 3fffee90 3ffef3f0 00000016  ^M
Sat 23 Mar 19:58:54 PDT 2024: 3fffedf0:  4025a204 3fffee90 3ffef298 3ffeec60  ^M
Sat 23 Mar 19:58:54 PDT 2024: 3fffee00:  3ffeb820 3fffee90 3fffee90 40101506  ^M
Sat 23 Mar 19:58:54 PDT 2024: 3fffee10:  61726147 10006567 40103c1e 00000100  ^M
Sat 23 Mar 19:58:54 PDT 2024: 3fffee20:  3ffeb18c 7fffffff 00002200 00000001  ^M
Sat 23 Mar 19:58:54 PDT 2024: 3fffee30:  4025655d 00000080 3fffaff4 40239aea  ^M
Sat 23 Mar 19:58:54 PDT 2024: 3fffee40:  ffffffe1 3ffeedac 3ffeb830 3ffef3f0  ^M
Sat 23 Mar 19:58:54 PDT 2024: 3fffee50:  3ffee390 00000041 00000000 4025aeff  ^M
Sat 23 Mar 19:58:54 PDT 2024: 3fffee60:  00000000 3fff493c ffffffe1 00000000  ^M
Sat 23 Mar 19:58:54 PDT 2024: 3fffee70:  00000000 3ffef3f0 00000014 0000000f  ^M
Sat 23 Mar 19:58:54 PDT 2024: 3fffee80:  40000f58 00000000 3281b0bb 00000000  ^M
...

which correspond to times that the ratgdo becomes unresponsive to pings.
The duration of the unresponsiveness after a "Could not allocate memory for JSON document" exception varies. They're often pretty short (like the one above, where the unresponsiveness lasted about 10 seconds), but sometimes quite long (like the one starting at "Sat 23 Mar 20:20:40", which lasted over 5 minutes)

serial.txt
ping.txt

@chriscrowe
Copy link

I have this issue too... Curious if you've found any resolution. For now I've dialed my reboot_timeout values way down to try to force the ratgdo to restart itself when it becomes unresponsive.

@marcone
Copy link
Author

marcone commented May 25, 2024

Haven't found a solution/workaround. Since it happens so frequently, starting shortly after boot, rebooting often wouldn't really help me much either, since then I'd just be waiting for it to finish its reboot and reconnect to the network.

@marcone
Copy link
Author

marcone commented May 25, 2024

On a whim I tried updating the firmware again (from 2024.4.2 to 2024.5.0) and it is so much worse now. The web interface isn't even usable anymore: every time I reload the page it takes over a minute to load just a partial mostly-empty page, another minute or more for the actual information to load, and the ratgdo is unresponsive to pings for most of this time.

@jgstroud
Copy link

jgstroud commented Jun 4, 2024

this sounds similar to an issue I'm seeing with a couple devices running the homekit firmware. curious if you disconnect the device from the GDO if it becomes responsive again. in those cases just disconnecting from the gdo made it suddenly become fully responsive again.

@calisro
Copy link

calisro commented Jun 10, 2024

Yeah i've been having this issue too. It constantly disconnects where as it used to be really stable. Its not rebooting. In the logs I see a disconnect and then reconnects in a short time.

@pdbennett
Copy link

I also been troubleshooting this issue - glad to find this thread as I’ve been pulling my hair out. Anyone make any progress resolving?

@WillCodeForCats
Copy link

WillCodeForCats commented Jul 14, 2024

I just set up a ratgdo today and was seeing the json error message followed by a reboot. It was crashing so often it warned it was going into safe mode. I was testing it with combinations of remotes and Home Assistant commands and it would crash after almost every open/close cycle.

Could not allocate memory for JSON document! Requested 504 bytes, largest free heap block: 504 bytes

Removing web_server: from config helped mine to stop crashing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants