Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stalled server causes ProxySQL watchdog to miss heartbeats on all threads #2481

Closed
banpei-dbart opened this issue Jan 13, 2020 · 2 comments
Closed

Comments

@banpei-dbart
Copy link

banpei-dbart commented Jan 13, 2020

We're using ProxySQL version 1.4.14-percona-1.1 (on Debian) and today we encountered a crash on one of our ProxySQL hosts. Upon further investigation it revealed that the crash happened due to the watchdog missing 10 heartbeats on one of our clusters.

2020-01-13 11:12:24 MySQL_Monitor.cpp:1437:monitor_ping(): [ERROR] Server <some.borked.host>:3306 missed 3 heartbeats, shunning it and killing all the connections. Disabling other checks until the node comes back online.
2020-01-13 11:12:24 main.cpp:1091:main(): [ERROR] Watchdog: 4 threads missed a heartbeat
2020-01-13 11:12:27 main.cpp:1091:main(): [ERROR] Watchdog: 4 threads missed a heartbeat
2020-01-13 11:12:30 main.cpp:1091:main(): [ERROR] Watchdog: 4 threads missed a heartbeat
2020-01-13 11:12:33 main.cpp:1091:main(): [ERROR] Watchdog: 4 threads missed a heartbeat
2020-01-13 11:12:36 main.cpp:1091:main(): [ERROR] Watchdog: 4 threads missed a heartbeat
2020-01-13 11:12:39 main.cpp:1091:main(): [ERROR] Watchdog: 4 threads missed a heartbeat
2020-01-13 11:12:42 main.cpp:1091:main(): [ERROR] Watchdog: 4 threads missed a heartbeat
2020-01-13 11:12:45 main.cpp:1091:main(): [ERROR] Watchdog: 4 threads missed a heartbeat
2020-01-13 11:12:48 main.cpp:1091:main(): [ERROR] Watchdog: 4 threads missed a heartbeat
2020-01-13 11:12:51 main.cpp:1091:main(): [ERROR] Watchdog: 4 threads missed a heartbeat
2020-01-13 11:12:51 main.cpp:1095:main(): [ERROR] Watchdog: reached 10 missed heartbeats. Aborting!
proxysql: main.cpp:1096: int main(int, const char**): Assertion '0' failed.
Error: signal 6:
proxysql(_Z13crash_handleri+0x1a)[0x560b95850eda]
/lib/x86_64-linux-gnu/libc.so.6(+0x33060)[0x7f825ae30060]
/lib/x86_64-linux-gnu/libc.so.6(gsignal+0xcf)[0x7f825ae2ffff]
/lib/x86_64-linux-gnu/libc.so.6(abort+0x16a)[0x7f825ae3142a]
/lib/x86_64-linux-gnu/libc.so.6(+0x2be67)[0x7f825ae28e67]
/lib/x86_64-linux-gnu/libc.so.6(+0x2bf12)[0x7f825ae28f12]
proxysql(main+0x749)[0x560b9584ece9]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf1)[0x7f825ae1d2e1]
proxysql(_start+0x2a)[0x560b9584f03a]

I see we're not the only ones struggling with it:
#1685
#1701
#2103
#2217
#2286
#2361

I find the issue by Ben Mildren (1685) the most interesting one as it's exactly what we encounter with one slight difference: it seems that the line of shunningthe host in question prior the watchdog failing is the cause of all four threads locking up.

We have a core dump of proxysql that we can look into with you and we can also share our configuration with you.

@renecannao
Copy link
Contributor

renecannao commented Jan 13, 2020

Hi @banpei-dbart .

Yes, sure, we can have a look at core dump, but currently our backlog on community bugs is quite busy, as paying customers get priority.
Speaking of priority, the issue reported by Ben (#1685) is fixed in a hotfix for 1.4.10 (on commit e5226f9), and it is fixed in 2.0.5 (bd9c88e). It never made into 1.4.x .
Note that the hotfix is dated May 2019, while the latest 1.4.x (1.4.15) was released in Feb 2019 .

In short, my first suggestion is to upgrade, or to apply the hotfix to 1.4.14 (we can do that for you).

@renecannao
Copy link
Contributor

Fixed in 1.4.16 and 2.0.5

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants