Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ProxySQL fatal crash loop due to missing heartbeat #2103

Closed
carsonip opened this issue Jun 20, 2019 · 2 comments
Closed

ProxySQL fatal crash loop due to missing heartbeat #2103

carsonip opened this issue Jun 20, 2019 · 2 comments

Comments

@carsonip
Copy link
Contributor

ProxySQL version: 1.4.13 (with backport fix from #1952 and #1953)
ProxySQL config: started with --idle-threads

Environment description:
AWS m5.xlarge host, quite busy. Runs ~10 application services, usually starts with 100% CPU then stabilize to ~50%. VM has 4 vcpu, ProxySQL is configured to run with 4 threads. (I should probably reduce the number of proxysql threads, but I don't think this is relevant at all). Lots of network traffic.

Problem description:
This is a fatal issue. I am observing reproducible crashing of many hosts in my application cluster, very probably related to ProxySQL. In ProxySQL log, it shows a crash loop due to missing heartbeat. This usually comes with an OOM, but the cause and effect relationship is not sure. My guess is ProxySQL is the cause. The log includes crash loop that restarts proxysql every few minutes, mixed with some ping timeout and connect timeout.

Log:

2019-06-20 03:21:09 [INFO] New mysql_group_replication_hostgroups table
Standard Query Processor rev. 0.2.0902 -- Query_Processor.cpp -- Wed Mar 20 09:22:57 2019
In memory Standard Query Cache (SQC) rev. 1.2.0905 -- Query_Cache.cpp -- Wed Mar 20 09:22:57 2019
Standard MySQL Monitor (StdMyMon) rev. 1.2.0723 -- MySQL_Monitor.cpp -- Wed Mar 20 09:22:57 2019
2019-06-20 03:21:50 main.cpp:1091:main(): [ERROR] Watchdog: 1 threads missed a heartbeat
2019-06-20 03:21:59 main.cpp:1091:main(): [ERROR] Watchdog: 1 threads missed a heartbeat
2019-06-20 03:22:03 main.cpp:1091:main(): [ERROR] Watchdog: 3 threads missed a heartbeat
2019-06-20 03:22:06 main.cpp:1091:main(): [ERROR] Watchdog: 3 threads missed a heartbeat
2019-06-20 03:22:09 main.cpp:1091:main(): [ERROR] Watchdog: 3 threads missed a heartbeat
2019-06-20 03:22:12 main.cpp:1091:main(): [ERROR] Watchdog: 1 threads missed a heartbeat
2019-06-20 03:22:15 main.cpp:1091:main(): [ERROR] Watchdog: 1 threads missed a heartbeat
2019-06-20 03:22:18 main.cpp:1091:main(): [ERROR] Watchdog: 2 threads missed a heartbeat
2019-06-20 03:22:21 main.cpp:1091:main(): [ERROR] Watchdog: 2 threads missed a heartbeat
2019-06-20 03:22:24 main.cpp:1091:main(): [ERROR] Watchdog: 2 threads missed a heartbeat
2019-06-20 03:22:27 main.cpp:1091:main(): [ERROR] Watchdog: 2 threads missed a heartbeat
2019-06-20 03:22:27 main.cpp:1095:main(): [ERROR] Watchdog: reached 10 missed heartbeats. Aborting!
proxysql: main.cpp:1096: int main(int, const char**): Assertion `0' failed.
Error: signal 6:
proxysql(_Z13crash_handleri+0x2d)[0x44a17d]
/lib/x86_64-linux-gnu/libc.so.6(+0x354b0)[0x7f12041e04b0]
/lib/x86_64-linux-gnu/libc.so.6(gsignal+0x38)[0x7f12041e0428]
/lib/x86_64-linux-gnu/libc.so.6(abort+0x16a)[0x7f12041e202a]
/lib/x86_64-linux-gnu/libc.so.6(+0x2dbd7)[0x7f12041d8bd7]
/lib/x86_64-linux-gnu/libc.so.6(+0x2dc82)[0x7f12041d8c82]
proxysql(main+0x853)[0x4478c3]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf0)[0x7f12041cb830]
proxysql(_start+0x29)[0x447c79]
2019-06-20 03:22:41 main.cpp:910:ProxySQL_daemonize_phase3(): [ERROR] ProxySQL crashed. Restarting!
2019-06-20 03:22:41 [INFO] Angel process started ProxySQL process 7066
Standard ProxySQL Cluster rev. 0.4.0906 -- ProxySQL_Cluster.cpp -- Wed Mar 20 09:22:57 2019
Standard ProxySQL Statistics rev. 1.4.1027 -- ProxySQL_Statistics.cpp -- Wed Mar 20 09:22:57 2019
Standard ProxySQL HTTP Server Handler rev. 1.4.1031 -- ProxySQL_HTTP_Server.cpp -- Wed Mar 20 09:22:57 2019
2019-06-20 03:23:53 ProxySQL_Admin.cpp:3945:flush_mysql_variables___database_to_runtime(): [WARNING] Impossible to set not existing variable auto_increment_delay_multiplex with value "5". Deleting. If the variable name is correct, this version doesn't support it
Standard ProxySQL Admin rev. 0.2.0902 -- ProxySQL_Admin.cpp -- Wed Mar 20 09:22:57 2019
Standard MySQL Threads Handler rev. 0.2.0902 -- MySQL_Thread.cpp -- Wed Mar 20 09:22:57 2019
Standard MySQL Authentication rev. 0.2.0902 -- MySQL_Authentication.cpp -- Wed Mar 20 09:22:57 2019
2019-06-20 03:23:53 [INFO] Dumping mysql_servers_incoming

I have the proxysql log and core dump in hand. If you need it, I could send it to you through email.

Questions:

  1. Is this a known bug and fixed after v1.4.13?
  2. Is this related to epoll thread?
  3. How could I debug this issue?

Thanks.

@carsonip
Copy link
Contributor Author

Again, in the production systems, I cannot isolate the problem and identify the root cause easily. I cannot tell whether this is proxysql's issue. I also notice high iowait and exhaustion of aws system disk burst credit, causing IO to be extremely slow.

@carsonip
Copy link
Contributor Author

I would say OOM is the root cause. Close for now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant