Bug with multiple servers in group with mixed compress/uncompressed settings #1493

Reiner030 · 2018-05-04T20:03:17Z

Hi,

I re-setup a new PHP + DB slave server with Debian Stretch.
Before all was running fine with weighted settings with a local ProxySQL:
external master + external slave + local slave - all uncompressed connections.

Since the compress flag for another database as mentioned in another issue was working nicely I added the compress setting for saving traffic also to new proxysql configuration. But only for the external master + external slave. For local connections it should be not helpful and not performant to also compress the data.

First days all works nicely but from time to time when the backend server has to do many PHP tasks there were exeptions thrown like: SQLSTATE[HY000]: General error: 9001 Max connect timeout reached while reaching hostgroup 501 after 60238ms without any reason - all servers are up, with not much connections and especially while these errors occured the connections went to zero on the localhost slave.

we have 60s timeout for backend services
501 is our reader group
The user/server have only used connections < 1/3 of user/server max_connections running
These connections are with multiplexing set off
Tested with 1.4.8
Tested with 1.4.6 because old working servers running this version

and nothing into the logs which gave a good hint - only complaining about the master (with weight=1) if a server was mentioned:

2018-05-04 07:05:16 main.cpp:1086:main(): [ERROR] Watchdog: 1 threads missed a heartbeat
2018-05-04 07:15:10 main.cpp:1086:main(): [ERROR] Watchdog: 1 threads missed a heartbeat
2018-05-04 07:15:13 main.cpp:1086:main(): [ERROR] Watchdog: 40 threads missed a heartbeat
2018-05-04 07:15:16 main.cpp:1086:main(): [ERROR] Watchdog: 40 threads missed a heartbeat
2018-05-04 07:15:19 main.cpp:1086:main(): [ERROR] Watchdog: 40 threads missed a heartbeat
2018-05-04 07:15:21 mysql_connection.cpp:640:handler(): [ERROR] Connect timeout on master:3306 : exceeded by 2
9251us
2018-05-04 07:15:21 mysql_connection.cpp:604:handler(): [ERROR] Failed to mysql_real_connect() on master:3306
, FD (Conn:1578 , MyDS:1578) , 2013: Lost connection to MySQL server at 'reading authorization packet', system error: 11.
2018-05-04 07:15:21 mysql_connection.cpp:604:handler(): [ERROR] Failed to mysql_real_connect() on master:3306
, FD (Conn:1574 , MyDS:1574) , 2013: Lost connection to MySQL server at 'reading authorization packet', system error: 11.
2018-05-04 07:15:21 MySQL_HostGroups_Manager.cpp:301:connect_error(): [ERROR] Shunning server master:3306 with
 5 errors/sec. Shunning for 10 seconds
2018-05-04 07:20:10 main.cpp:1086:main(): [ERROR] Watchdog: 1 threads missed a heartbeat

It would be great to get some more hints in Wiki how the process is running and how debugging can be done (=> #969 ) because I was lost by searching for reasons/solutions till I found these both issues by the error message... Second one was the in changelog also mentioned

Automatically processing of uncompressed packets even if wrongly marked as compressed #1410

which remembered me to check my previous compress settings on the connections (which was off) I had them manually added some month ago in proxysql configuration before the ProxySQL reload function was available.

=> After setting ALL connections to compressed or to uncompressed the errors are gone.
=> After setting mixed compress/uncompress values to connections errors are back again.

==> this must be an "enhanced" problem of above bug #1410 ?
Best would be to fix it for upcoming 2.0.0 release... 😊

The text was updated successfully, but these errors were encountered:

renecannao · 2018-05-04T23:57:16Z

This issue is interesting.
To clarify, an important question: is your client (PHP) using compression, or no?
From what I read, I would assume that the answer is no, but please correct me if I am wrong.

Based on my assumption that client is not using compression, that means that ProxySQL's code for handling compression is not relevant at all (so forget about #1410).
If you are using compression for backends, all that ProxySQL does is to set CLIENT_COMPRESS flag, nothing more.
Therefore, if a problem exists, it is mostly likely either on MySQL server, or in the embedded MariaDB Client Library.
IF the problem is indeed compression, my main suspect right now is MariaDB Client Library, because this is what ProxySQL uses to communicates with backends.

Your error log indeeds give us some leads for further investigation. Is it possible to have the whole error log, or few hundreds lines?
What I find relevant in your error log is:

at 07:15:10 : one thread missed a heartbeat: either locked, or very busy
from 07:15:13 to 07:15:19 : 40 threads missed a heartbeat: it is unlikely that 40 threads are very busy, so it is more likely the are locked (maybe the first blocked thread is holding a mutex ths blocking all the others)
07:15:21 : errors about master connections are logged
he errors about missing heartbeats disappear: is this a coincidence or not?

At this point, I would investigate what causes an apparent lock.
It is possible that ProxySQL's worker threads get locked, and they report connection timeout just because they weren't able to process the connection itself. If that's the case, compression is potentially unrelated to this issue.
From the few lines I have from your error log, it is possible that heartbeats are missing at regular intervals (07:05:16, 07:15:10 and 07:20:10): having the whole error log will help us finding some pattern.
Are you using mysql-threads=40 ? Unless you have an extremely beefy machine where ProxySQL is allowed to use 40 CPU cores, you should lower this value. Especially if you are running ProxySQL locally to your application. How much CPU is normally using ProxySQL?

Action items I would suggest right now:
a) drastically reduce mysql-threads based on its CPU utilization
b) provide the full error log, or a large portion of it

Thanks

Reiner030 · 2018-05-05T15:11:02Z

Hi,
thanks for quick answer 😄 I setup yesterday my connection all to compressed so it's working already fine but that could be a common problem if mixed connections where needed.

I have no more logging from ProxySQL which was the problem to find the cause.
Thats why I asked for a howto in wiki pages also for such cases.
Have I overseen such entries or do I need to install the debug version of ProxySQL?
Found only the profiling wiki pages for memory leaks.
As suggested I use no compression on client side, only on backend server configuration for ProxySQL.
I thought that the backend connection between ProxySQL and MySQL/MariaDB/Percona server is transparent to the client so independent if backend connection is compressed or not ProxySQL client communication has to be negotiated (un-)compressed connection type "only"?
Here my actual package versions for Debian Stretch/9 (all actual) with MariaDB repository packages and sury.org PHP packages:

# apt-cache policy php7.2-mysql libmariadb3 libmariadbclient18 libmysqlclient18 mariadb-client mariadb-client-10.2 mariadb-client-core-10.2 mariadb-common | grep -e ^[a-z] -e Installed:
php7.2-mysql:
  Installed: 7.2.4-1+0~20180405085422.20+stretch~1.gbpbff9f0
libmariadb3:
  Installed: 10.2.14+maria~stretch
libmariadbclient18:
  Installed: 10.2.14+maria~stretch
libmysqlclient18:
  Installed: 10.2.14+maria~stretch
mariadb-client:
  Installed: 10.2.14+maria~stretch
mariadb-client-10.2:
  Installed: 10.2.14+maria~stretch
mariadb-client-core-10.2:
  Installed: 10.2.14+maria~stretch
mariadb-common:
  Installed: 10.2.14+maria~stretch

Threads: 40 seemed not enough in past I thought (we have 220 production/110 staging PHP processes on this server started) so I set threads=100 (we have on this server 6 cores/12 threads Intel(R) Xeon(R) CPU E5-1650 v3 running)...
From documentation there is no hint how many threads should be used but from overview minimal 11 + MySQL working threads seems to be active/needed?:
https://github.com/sysown/proxysql/wiki/ProxySQL-Threads
So what seems best / can wiki pages be enhanced with recommendations by common count of / calulated by cpu cores?
ProxySQL normally uses low CPU time with my setup compared to PHP required one.
Both servers have same setup except
- doubled RAM on the new setup server
- different weights for read grouping
- compression active on new setup server
Here today snapshot of our busy "nightly" jobs -
- with the old uncompressed ProxySQL 1.4.6 setup with constantly 34% of 1200% CPU time:
  
  https://drive.google.com/open?id=1iUqxz6I9husT7eCqvB7Q11ZlL2fxi5Wy
  
  https://drive.google.com/open?id=1JACPbJFRWXqzQ_Sy9-1AsS_NLFZ-k7FF
  
  https://drive.google.com/open?id=1a3NisWzmzpkzGpJ7QqVGrReUk-CdRAqa
- with the new compressed ProxySQL 1.4.8 setup - here I can see that the new version uses fortunately now only 8.5 - 10% CPU time basically but peaks up to 42% because of high load / compression tasks:
  
  https://drive.google.com/open?id=1-jyf21FL5niFXMb2J8pfLKel1yxnlDVG
  
  https://drive.google.com/open?id=1woiDPOSVsUXe908FAnVlPlN7onGaC6R1
  
  https://drive.google.com/open?id=1jRJ7YoCbppY-YKwd9ByoYIaa9f7UDpNa

(EDITED because image links were not working)

renecannao · 2018-05-05T15:29:10Z

Hi.

You are welcome! :)
I still think that having a look at error log is the best starting point to troubleshoot this issue.
Is it possible to have it?

About compression on clients:
that is correct, clients can use compression or no, connections to backends can use compression or no, and they do not have to match.
For example, a client can use no-compression, and proxysql uses compression to backend, and viceversa.
My point in the previous comment is that proxysql's code only handles compression with clients: compression with backends is handled by mariadb client library.
On mariadb client library: proxysql uses an embedded modified version, thus what you have installed is not relevant.

About mysql-threads: ProxySQL worker threads use async calls, they do not block on network operations.
That means that a single thread can handle tens of thousands of connections.
That said, the way to tune mysql-threads is not based on number of connections, but on CPU usage.
In your setup it seems that mysql-threads=1 is actually enough, but you can use the default mysql-threads=4. Surely, please stay away from mysql-threads=40.
(we need to mmake the wiki more clear on this).

Finally, the links you posted does seem to work.

Reiner030 · 2018-05-05T15:37:55Z

Hi,

I already copied the "full log" between the logrotate line ~ 1 hour before and the stop/restart lines hours later.
If you need more debug/logging I need to know howto create it. Thanks.
OK, then its now clear which MariaDB library you referenced.
I would then switch Monday back to default of 4 threads for ProxySQL.
The Github image links creates for me (already weeks ago in other issue writing) only a Non-Image content-type returned response from Github so I added for security the Google Drive links directly below it.

Reiner030 · 2018-05-24T20:08:29Z

First easy parts / found problems - it could be better to split parts into separated issues but I didn't know what should gone separated so feel free to reuse it if it make sense:

I tried to minimize threads as suggested but proxysql (1.4.6 package) is rejecting it;

2018-05-07 07:51:31 [INFO] Received LOAD MYSQL VARIABLES TO RUNTIME command
2018-05-07 07:51:31 ProxySQL_Admin.cpp:3875:flush_mysql_variables___database_to_runtime(): [WARNING] Impossible to set variable threads with value "4". Resetting to current "100".
2018-05-07 07:51:44 [INFO] Received SAVE MYSQL VARIABLES TO DISK command

On another instances with Debian Jessie I got same reject (1.4.6 package) and there is the package default of 20 set:

2018-05-09 08:05:41 [INFO] Received LOAD MYSQL VARIABLES TO RUNTIME command
2018-05-09 08:05:41 ProxySQL_Admin.cpp:3858:flush_mysql_variables___database_to_runtime(): [WARNING] Impossible to set variable threads with value "4". Resetting to current "20".

Also it's not possible to set the value down from 100 to 20 even:

2018-05-24 19:24:35 ProxySQL_Admin.cpp:1384:admin_handler_command_load_or_save(): [INFO] Received LOAD MYSQL VARIABLES TO RUNTIME command
2018-05-24 19:24:35 ProxySQL_Admin.cpp:3875:flush_mysql_variables___database_to_runtime(): [WARNING] Impossible to set variable threads with value "20". Resetting to current "100".
2018-05-24 19:24:42 ProxySQL_Admin.cpp:1368:admin_handler_command_load_or_save(): [INFO] Received SAVE MYSQL VARIABLES TO DISK command

Even after a ProxySQL restart the "saved to disk" value is again (20 or) 100 ...

It would be great if not too difficult to get more informations like to which servers were the connected threads failing?

2018-05-07 23:30:18 main.cpp:1086:main(): [ERROR] Watchdog: 1 threads missed a heartbeat
2018-05-07 23:30:19 mysql_connection.cpp:664:handler(): [ERROR] Connect timeout on staging-master:3306 : exceeded by 25713us

In this case it's clear but if many threads fail in parallel there could be more than 1 backend server affected like here where no server is mentioned explicit in another log line:

2018-05-21 06:25:02 ProxySQL_Admin.cpp:7658:flush_error_log(): [INFO] ProxySQL version 1.4.8-32-g669c149_DEBUG
2018-05-21 06:25:02 ProxySQL_Admin.cpp:7661:flush_error_log(): [INFO] Detected OS: Linux gearman-worker-fs1dc12-01 4.9.0-6-amd64 #1 SMP Debian 4.9.82-1+deb9u3 (2018-03-02) x86_64
2018-05-21 15:47:13 main.cpp:1086:main(): [ERROR] Watchdog: 1 threads missed a heartbeat
2018-05-21 15:47:16 main.cpp:1086:main(): [ERROR] Watchdog: 1 threads missed a heartbeat
2018-05-21 15:52:10 main.cpp:1086:main(): [ERROR] Watchdog: 1 threads missed a heartbeat
2018-05-21 19:17:10 main.cpp:1086:main(): [ERROR] Watchdog: 1 threads missed a heartbeat
2018-05-21 19:27:07 main.cpp:1086:main(): [ERROR] Watchdog: 1 threads missed a heartbeat
2018-05-21 23:30:08 main.cpp:1086:main(): [ERROR] Watchdog: 1 threads missed a heartbeat
2018-05-22 00:00:15 main.cpp:1086:main(): [ERROR] Watchdog: 1 threads missed a heartbeat
2018-05-22 00:00:18 main.cpp:1086:main(): [ERROR] Watchdog: 1 threads missed a heartbeat
2018-05-22 00:28:34 main.cpp:1086:main(): [ERROR] Watchdog: 1 threads missed a heartbeat
2018-05-22 00:35:13 main.cpp:1086:main(): [ERROR] Watchdog: 1 threads missed a heartbeat
2018-05-22 00:35:16 main.cpp:1086:main(): [ERROR] Watchdog: 1 threads missed a heartbeat
2018-05-22 03:05:20 main.cpp:1086:main(): [ERROR] Watchdog: 1 threads missed a heartbeat
2018-05-22 03:15:18 main.cpp:1086:main(): [ERROR] Watchdog: 1 threads missed a heartbeat
2018-05-22 03:15:21 main.cpp:1086:main(): [ERROR] Watchdog: 23 threads missed a heartbeat
2018-05-22 03:15:33 main.cpp:1086:main(): [ERROR] Watchdog: 1 threads missed a heartbeat
2018-05-22 03:15:36 main.cpp:1086:main(): [ERROR] Watchdog: 1 threads missed a heartbeat
2018-05-22 03:30:15 main.cpp:1086:main(): [ERROR] Watchdog: 1 threads missed a heartbeat
2018-05-22 03:30:18 main.cpp:1086:main(): [ERROR] Watchdog: 1 threads missed a heartbeat
2018-05-22 03:30:21 main.cpp:1086:main(): [ERROR] Watchdog: 1 threads missed a heartbeat
2018-05-22 06:10:16 main.cpp:1086:main(): [ERROR] Watchdog: 15 threads missed a heartbeat
2018-05-22 06:10:19 main.cpp:1086:main(): [ERROR] Watchdog: 15 threads missed a heartbeat
2018-05-22 06:10:22 main.cpp:1086:main(): [ERROR] Watchdog: 3 threads missed a heartbeat
2018-05-22 06:25:02 ProxySQL_Admin.cpp:657:admin_handler_command_proxysql(): [INFO] Received PROXYSQL FLUSH LOGS command

This error happend sometime while checking stats page; as found in code error message came from htauth handler which seems only have a buffer/array for 1 session and we were checking with 2 persons the stats page:

Stale nonce received.  If this happens a lot, you should probably increase the size of the nonce array.

It would be great to get better formatted messages so it's clear who is is throwing it
- client when trying connect to ProxySQL
- ProxySQL while trying to get backend
- ...
  like this one:

Unable to execute SELECT statement [SELECT ... ] [wrapped: SQLSTATE[HY000]: General error: 9001 Max connect timeout reached while reaching hostgroup 501 after 60063ms] in ...

Question is: Are these others all client <-> ProxySQL related only?

Unable to open PDO connection [wrapped: SQLSTATE[HY000] [2002] Connection refused]' in ...

PropelException' with message 'Unable to open PDO connection [wrapped: SQLSTATE[HY000] [1040] Too many connections]' in ...

Unable to execute SELECT statement [SELECT ...] [wrapped: PDOStatement::execute(): MySQL server has gone away] in ...

Lost connection to MySQL server during query

=> which server is gone (if the message came from ProxySQL)?

In first crash after my hopefully found bug with mixed content I got this crash logged:

2018-05-09 12:01:34 main.cpp:1086:main(): [ERROR] Watchdog: 5 threads missed a heartbeat
2018-05-09 12:01:34 main.cpp:1090:main(): [ERROR] Watchdog: reached 10 missed heartbeats. Aborting!
proxysql: main.cpp:1091: int main(int, const char**): Assertion `0' failed.
Error: signal 6:
proxysql(_Z13crash_handleri+0x25)[0x4cf485]
/lib/x86_64-linux-gnu/libc.so.6(+0x350e0)[0x7fe9cec4c0e0]
/lib/x86_64-linux-gnu/libc.so.6(gsignal+0x37)[0x7fe9cec4c067]
/lib/x86_64-linux-gnu/libc.so.6(abort+0x148)[0x7fe9cec4d448]
/lib/x86_64-linux-gnu/libc.so.6(+0x2e266)[0x7fe9cec45266]
/lib/x86_64-linux-gnu/libc.so.6(+0x2e312)[0x7fe9cec45312]
proxysql(main+0x70d)[0x4cc2c4]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf5)[0x7fe9cec38b45]
proxysql[0x4c99b4]
2018-05-09 12:01:35 main.cpp:905:ProxySQL_daemonize_phase3(): [ERROR] ProxySQL crashed. Restarting!
2018-05-09 12:01:35 main.cpp:877:ProxySQL_daemonize_phase3(): [INFO] Angel process started ProxySQL process 3852
Standard ProxySQL Cluster rev. 0.1.0702_DEBUG -- ProxySQL_Cluster.cpp -- Thu Feb  1 02:43:15 2018
Standard ProxySQL Statistics rev. 1.4.1027_DEBUG -- ProxySQL_Statistics.cpp -- Thu Feb  1 02:43:15 2018
Standard ProxySQL HTTP Server Handler rev. 1.4.1031_DEBUG -- ProxySQL_HTTP_Server.cpp -- Thu Feb  1 02:43:15 2018
Admin initialized in 0.213958 secs.

In further logs I found also these:

Error: signal 11:
proxysql(_Z13crash_handleri+0x1a)[0x55be8b035ffa]
/lib/x86_64-linux-gnu/libc.so.6(+0x33060)[0x7f4b2416d060]
proxysql(_ZN12MySQL_Thread3runEv+0x1460)[0x55be8b0632c0]
proxysql(_Z24mysql_worker_thread_funcPv+0x6c)[0x55be8b03435c]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x7494)[0x7f4b25aa2494]
/lib/x86_64-linux-gnu/libc.so.6(clone+0x3f)[0x7f4b24222acf]
2018-05-05 01:25:11 main.cpp:905:ProxySQL_daemonize_phase3(): [ERROR] ProxySQL crashed. Restarting!
2018-05-05 01:25:11 [INFO] Angel process started ProxySQL process 30164
Standard ProxySQL Cluster rev. 0.1.0702 -- ProxySQL_Cluster.cpp -- Tue Apr  3 06:28:45 2018
Standard ProxySQL Statistics rev. 1.4.1027 -- ProxySQL_Statistics.cpp -- Wed Apr  4 13:21:29 2018
Standard ProxySQL HTTP Server Handler rev. 1.4.1031 -- ProxySQL_HTTP_Server.cpp -- Tue Apr  3 06:28:45 2018
Standard ProxySQL Admin rev. 0.2.0902 -- ProxySQL_Admin.cpp -- Tue Apr 10 12:18:22 2018
Standard MySQL Threads Handler rev. 0.2.0902 -- MySQL_Thread.cpp -- Tue Apr  3 06:28:45 2018
Standard MySQL Authentication rev. 0.2.0902 -- MySQL_Authentication.cpp -- Tue Apr  3 06:28:45 2018
...
Standard MySQL Monitor (StdMyMon) rev. 1.2.0723 -- MySQL_Monitor.cpp -- Wed Apr  4 13:21:29 2018
Error: signal 11:
proxysql(_Z13crash_handleri+0x1a)[0x55be8b035ffa]
/lib/x86_64-linux-gnu/libc.so.6(+0x33060)[0x7f4b2416d060]
proxysql(free_root+0x5b)[0x55be8b21afbb]
proxysql(mysql_free_result+0x36)[0x55be8b215676]
proxysql(_ZN16MySQL_Connection7handlerEs+0x3fc)[0x55be8b14870c]
proxysql(_ZN16MySQL_Connection11async_queryEsPcmPP13st_mysql_stmtP23stmt_execute_metadata_t+0xa6)[0x55be8b1494d6]
proxysql(_ZN13MySQL_Session7handlerEv+0x1ba0)[0x55be8b0723b0]
proxysql(_ZN12MySQL_Thread20process_all_sessionsEv+0x2df)[0x55be8b05b00f]
proxysql(_ZN12MySQL_Thread3runEv+0x1740)[0x55be8b0635a0]
proxysql(_Z24mysql_worker_thread_funcPv+0x6c)[0x55be8b03435c]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x7494)[0x7f4b25aa2494]
/lib/x86_64-linux-gnu/libc.so.6(clone+0x3f)[0x7f4b24222acf]
2018-05-05 01:30:09 main.cpp:905:ProxySQL_daemonize_phase3(): [ERROR] ProxySQL crashed. Restarting!
2018-05-05 01:30:09 [INFO] Angel process started ProxySQL process 1481
Standard ProxySQL Cluster rev. 0.1.0702 -- ProxySQL_Cluster.cpp -- Tue Apr  3 06:28:45 2018
Standard ProxySQL Statistics rev. 1.4.1027 -- ProxySQL_Statistics.cpp -- Wed Apr  4 13:21:29 2018
Standard ProxySQL HTTP Server Handler rev. 1.4.1031 -- ProxySQL_HTTP_Server.cpp -- Tue Apr  3 06:28:45 2018
Standard ProxySQL Admin rev. 0.2.0902 -- ProxySQL_Admin.cpp -- Tue Apr 10 12:18:22 2018
Standard MySQL Threads Handler rev. 0.2.0902 -- MySQL_Thread.cpp -- Tue Apr  3 06:28:45 2018
Standard MySQL Authentication rev. 0.2.0902 -- MySQL_Authentication.cpp -- Tue Apr  3 06:28:45 2018

Since also with debug version logging is not easy to expand due to missing wiki documentation I have since now 2 more crashes (and running instance only for staging which didn't get enough traffic to reproduce the problem).

From latest issues it could be similar to #1522 but I get no real crashes

Here the log with time around the last "tried" production failure after 8:00 UTC:

2018-05-17 06:50:17 main.cpp:1086:main(): [ERROR] Watchdog: 1 threads missed a heartbeat
2018-05-17 06:50:20 main.cpp:1086:main(): [ERROR] Watchdog: 1 threads missed a heartbeat
2018-05-17 07:15:12 main.cpp:1086:main(): [ERROR] Watchdog: 1 threads missed a heartbeat
2018-05-17 07:15:15 main.cpp:1086:main(): [ERROR] Watchdog: 1 threads missed a heartbeat
2018-05-17 07:20:21 main.cpp:1086:main(): [ERROR] Watchdog: 4 threads missed a heartbeat
2018-05-17 07:20:27 main.cpp:1086:main(): [ERROR] Watchdog: 1 threads missed a heartbeat
2018-05-17 07:20:30 main.cpp:1086:main(): [ERROR] Watchdog: 1 threads missed a heartbeat
2018-05-17 07:20:33 main.cpp:1086:main(): [ERROR] Watchdog: 1 threads missed a heartbeat
2018-05-17 07:20:36 main.cpp:1086:main(): [ERROR] Watchdog: 1 threads missed a heartbeat
2018-05-17 07:20:38 mysql_connection.cpp:664:handler(): [ERROR] Connect timeout on production-slave-02:3306 : exceeded by 24629us
2018-05-17 07:50:19 main.cpp:1086:main(): [ERROR] Watchdog: 2 threads missed a heartbeat
2018-05-17 07:50:34 main.cpp:1086:main(): [ERROR] Watchdog: 1 threads missed a heartbeat
2018-05-17 07:50:37 main.cpp:1086:main(): [ERROR] Watchdog: 1 threads missed a heartbeat
2018-05-17 08:00:22 main.cpp:1086:main(): [ERROR] Watchdog: 1 threads missed a heartbeat
2018-05-17 08:15:10 main.cpp:1086:main(): [ERROR] Watchdog: 86 threads missed a heartbeat
2018-05-17 08:15:13 main.cpp:1086:main(): [ERROR] Watchdog: 86 threads missed a heartbeat
2018-05-17 08:15:16 main.cpp:1086:main(): [ERROR] Watchdog: 86 threads missed a heartbeat
2018-05-17 08:15:19 main.cpp:1086:main(): [ERROR] Watchdog: 86 threads missed a heartbeat
2018-05-17 08:15:21 mysql_connection.cpp:628:handler(): [ERROR] Failed to mysql_real_connect() on production-slave-02:3306 , FD (Conn:670 , MyDS:670) , 2013: Lost connection to MySQL server at 'reading authorization packet', system error: 115.
2018-05-17 08:15:21 mysql_connection.cpp:664:handler(): [ERROR] Connect timeout on production-slave-02:3306 : exceeded by 29554us
2018-05-17 08:15:21 mysql_connection.cpp:628:handler(): [ERROR] Failed to mysql_real_connect() on production-slave-02:3306 , FD (Conn:667 , MyDS:667) , 2013: Lost connection to MySQL server at 'reading authorization packet', system error: 11.
2018-05-17 08:15:21 mysql_connection.cpp:628:handler(): [ERROR] Failed to mysql_real_connect() on production-master:3306 , FD (Conn:669 , MyDS:669) , 2013: Lost connection to MySQL server at 'reading authorization packet', system error: 11.
2018-05-17 08:15:22 main.cpp:1086:main(): [ERROR] Watchdog: 2 threads missed a heartbeat
2018-05-17 08:15:25 main.cpp:1086:main(): [ERROR] Watchdog: 15 threads missed a heartbeat
2018-05-17 08:15:34 main.cpp:1086:main(): [ERROR] Watchdog: 1 threads missed a heartbeat
2018-05-17 08:25:13 main.cpp:1086:main(): [ERROR] Watchdog: 1 threads missed a heartbeat
2018-05-17 08:25:16 main.cpp:1086:main(): [ERROR] Watchdog: 1 threads missed a heartbeat
2018-05-17 08:25:19 main.cpp:1086:main(): [ERROR] Watchdog: 1 threads missed a heartbeat
2018-05-17 08:25:22 main.cpp:1086:main(): [ERROR] Watchdog: 1 threads missed a heartbeat
2018-05-17 08:25:23 mysql_connection.cpp:664:handler(): [ERROR] Connect timeout on production-slave-02:3306 : exceeded by 30230us
2018-05-17 08:45:08 main.cpp:1086:main(): [ERROR] Watchdog: 19 threads missed a heartbeat
2018-05-17 09:30:15 main.cpp:1086:main(): [ERROR] Watchdog: 1 threads missed a heartbeat
2018-05-17 09:45:15 main.cpp:1086:main(): [ERROR] Watchdog: 84 threads missed a heartbeat
2018-05-17 09:45:18 main.cpp:1086:main(): [ERROR] Watchdog: 84 threads missed a heartbeat
2018-05-17 09:45:21 main.cpp:1086:main(): [ERROR] Watchdog: 1 threads missed a heartbeat
2018-05-17 09:57:15 main.cpp:1086:main(): [ERROR] Watchdog: 1 threads missed a heartbeat
2018-05-17 11:35:12 main.cpp:1086:main(): [ERROR] Watchdog: 1 threads missed a heartbeat
2018-05-17 11:50:06 main.cpp:1086:main(): [ERROR] Watchdog: 1 threads missed a heartbeat

As we can see the ProxySQL service "actively" disconnects from any mysql backend server - even the localhost one - which are then clientless and the client got only exceptions like:

[wrapped: SQLSTATE[HY000]: General error: 9001 Max connect timeout reached while reaching hostgroup 501 after 60075ms]'

My idea is that perhaps deactivated multiplexing could cause such problems (boosted by mixed compressed/uncompressed backend connections) ?
If this could be a problem: Could it be possible to setup default multiplexing=on and only the connections with a different INCREMENT_OFFSET could/would use multiplexing=off ?

How is debugging made ? ... the big question which is not explained in wiki / I cannot reproduce out of code:

ProxySQL> UPDATE debug_levels SET verbosity=3 WHERE module in ('debug_mysql_server', 'debug_mysql_connection', 'debug_mysql_connpool');
Query OK, 3 rows affected (0.00 sec)

ProxySQL> SELECT * FROM debug_levels;
+-----------------------------+-----------+
| module                      | verbosity |
+-----------------------------+-----------+
| debug_generic               | 0         |
| debug_net                   | 0         |
| debug_pkt_array             | 0         |
| debug_poll                  | 0         |
| debug_mysql_com             | 0         |
| debug_mysql_server          | 3         |
| debug_mysql_connection      | 3         |
| debug_mysql_connpool        | 3         |
| debug_mysql_rw_split        | 0         |
| debug_mysql_auth            | 0         |
| debug_mysql_protocol        | 0         |
| debug_mysql_query_processor | 0         |
| debug_memory                | 0         |
| debug_admin                 | 0         |
| debug_sqlite                | 0         |
| debug_ipc                   | 0         |
| debug_query_cache           | 0         |
| debug_query_statistics      | 0         |
+-----------------------------+-----------+
18 rows in set (0.00 sec)
ProxySQL> LOAD DEBUG LEVELS TO RUNTIME;
ERROR 1045 (#2800): near "LOAD": syntax error
ProxySQL> SAVE DEBUG LEVELS TO DISK;
ERROR 1045 (#2800): near "SAVE": syntax error
ProxySQL>

is not working as expected and there are no log entries for these tries...

Reiner030 · 2018-05-30T19:41:32Z

New version, new chance ... but sadly no luck:

# wget https://github.com/sysown/proxysql/releases/download/v1.4.9/proxysql_1.4.9-dbg-debian9_amd64.deb
# dpkg -i proxysql_1.4.9-dbg-debian9_amd64.deb

threads set to 20 in /etc/proxysql.cnf

# /etc/init.d/proxysql stop && /etc/init.d/proxysql reload

threads checked: now only 20 instead 100

First hour without clients it was looking fine so far.

But after touching all date to get them process around 18:30 UTC/20:30 MEST it tooks only 7 minutes to bring the new ProxySQL 1.4.9 down 😭

Backend MySQL servers could easily handle up to 15k connections each.
I configured test so that 8x30 workers where running
So there should be not more then 240 connections made ???
(Worst case could be up to doubled to 480 because the 7 worker groups
need sometimes references to our main worker group/application database)
ProxySQL System:
https://drive.google.com/open?id=1wlKxayeMu89H4uFMPI-UJur027de4gio
ProxySQL MySQL:
https://drive.google.com/open?id=1oLrP9Ow2Zp2mJmIl4F3-BPPGKTXysKV-
ProxySQL Cache-Status (not used):
https://drive.google.com/open?id=1qd9pWmLU9zrJhaE50GMEhUqYluMBqj9N

=> Could this be a problem because of deactivated multiplexing ?
If so, could it be unproblematic to activate multiplexing for all client
connections which have the same INCREMENT_OFFSET set?
So we could/need to deactivate it only for 2 countries where offset is different?

PHP Errors are all like this exception:

LOG: [COUNTRY] [2018-05-30 20:37:21] executing: ExportProductsSolr [id: 80d426cc-6437-11e8-adc4-4ccc6afcffd9]
<h1>Exception</h1><div style="background: #dadada; padding: 5px; font-weight: bold">PropelException - Unable to execute SELECT statement [SELECT gearman_synchronization.ID_GEARMAN_SYNCHRONIZATION, gearman_synchronization.UNIQUE_KEY, gearman_synchronization.ID_GROUP, gearman_synchronization.STATUS, gearman_synchronization.JOB_NAME, gearman_synchronization.USER, gearman_synchronization.EXCEPTION_MESSAGE, gearman_synchronization.CREATED_AT, gearman_synchronization.UPDATED_AT FROM `gearman_synchronization` WHERE gearman_synchronization.UNIQUE_KEY=:p1 LIMIT 1] [wrapped: SQLSTATE[HY000]: General error: 9001 Max connect timeout reached while reaching hostgroup 401 after 60305ms]</div><br />
in /data/www/staging/releases/20180530-050346/vendor/propel/propel1/runtime/lib/query/ModelCriteria.php (line: 1328)<br />

Interesting bug I found while writing - ProxySQL is shunning (with flapping between online/shunning) only the master which is fallback in the read group:

ProxySQL> SELECT hostgroup_id,hostname, status, weight, compression, max_connections, max_replication_lag, use_ssl, max_latency_ms FROM runtime_mysql_servers WHERE hostgroup_id IN (400,401);
+--------------+----------------+--------+---------+-------------+-----------------+---------------------+---------+----------------+
| hostgroup_id | hostname       | status | weight  | compression | max_connections | max_replication_lag | use_ssl | max_latency_ms |
+--------------+----------------+--------+---------+-------------+-----------------+---------------------+---------+----------------+
| 400          | staging-master | ONLINE | 1       | 1           | 10000           | 0                   | 0       | 0              |
| 401          | 172.16.30.1    | ONLINE | 9999000 | 1           | 10000           | 10                  | 0       | 0              |
| 401          | staging-master | ONLINE | 1       | 1           | 10000           | 0                   | 0       | 0              |
| 401          | staging-slave  | ONLINE | 9999    | 1           | 10000           | 10                  | 0       | 0              |
+--------------+----------------+--------+---------+-------------+-----------------+---------------------+---------+----------------+
4 rows in set (0.00 sec)

Ah, and compression is on again ... (=> next test without compression on all backends).

Here the full start log with first lines after the incident - since I haven't still found an info howto increase debugging I couldn't offer more yet:

2018-05-30 17:39:03 ProxySQL_Admin.cpp:7694:flush_error_log(): [INFO] ProxySQL version v1.4.9-0-gd9fd599_DEBUG
2018-05-30 17:39:03 ProxySQL_Admin.cpp:7697:flush_error_log(): [INFO] Detected OS: Linux gearman-worker-fs1dc12-01 4.9.0-6-amd64 #1 SMP Debian 4.9.88-1+deb9u1 (2018-05-07) x86_64
2018-05-30 17:39:03 main.cpp:829:ProxySQL_daemonize_phase2(): [INFO] Starting ProxySQL
2018-05-30 17:39:03 main.cpp:830:ProxySQL_daemonize_phase2(): [INFO] Sucessfully started
Main daemonize phase1 completed in 0.000378 secs.
2018-05-30 17:39:03 main.cpp:877:ProxySQL_daemonize_phase3(): [INFO] Angel process started ProxySQL process 16234
Standard ProxySQL Cluster rev. 0.1.0702_DEBUG -- ProxySQL_Cluster.cpp -- Wed May 30 12:20:00 2018
Standard ProxySQL Statistics rev. 1.4.1027_DEBUG -- ProxySQL_Statistics.cpp -- Wed May 30 12:20:00 2018
Standard ProxySQL HTTP Server Handler rev. 1.4.1031_DEBUG -- ProxySQL_HTTP_Server.cpp -- Wed May 30 12:20:00 2018
2018-05-30 17:39:03 ProxySQL_Admin.cpp:7651:Read_ProxySQL_Servers_from_configfile(): [INFO] Cluster: Adding ProxySQL Servers gearman-worker-fs1dc12-01:6032 from config file
Admin initialized in 0.040146 secs.
Standard ProxySQL Admin rev. 0.2.0902_DEBUG -- ProxySQL_Admin.cpp -- Wed May 30 12:20:00 2018
Standard MySQL Threads Handler rev. 0.2.0902_DEBUG -- MySQL_Thread.cpp -- Wed May 30 12:20:00 2018
Main phase3 : GloMyLogger initialized in 3e-06 secs.
Standard MySQL Authentication rev. 0.2.0902_DEBUG -- MySQL_Authentication.cpp -- Wed May 30 12:20:00 2018
Main init phase2 completed in 0.042859 secs.
Main phase3 : GloMyLogger initialized in 2e-06 secs.
2018-05-30 17:39:03 MySQL_HostGroups_Manager.cpp:918:generate_mysql_servers_table(): [INFO] Dumping current MySQL Servers structures for hostgroup ALL
+--------------+----------+------+--------+--------+-------------+-----------------+---------------------+---------+----------------+---------+-------------+
| hostgroup_id | hostname | port | weight | status | compression | max_connections | max_replication_lag | use_ssl | max_latency_ms | comment | mem_pointer |
+--------------+----------+------+--------+--------+-------------+-----------------+---------------------+---------+----------------+---------+-------------+
+--------------+----------+------+--------+--------+-------------+-----------------+---------------------+---------+----------------+---------+-------------+
2018-05-30 17:39:03 MySQL_HostGroups_Manager.cpp:587:commit(): [INFO] Dumping mysql_servers_incoming
+--------------+--------------------------------------+------+---------+--------+-------------+-----------------+---------------------+---------+----------------+----------------------------------
---------------------+
| hostgroup_id | hostname               | port | weight  | status | compression | max_connections | max_replication_lag | use_ssl | max_latency_ms | comment
                     |
+--------------+--------------------------------------+------+---------+--------+-------------+-----------------+---------------------+---------+----------------+----------------------------------
---------------------+
| 500          | production-master       | 3306 | 1       | 0      | 1           | 10000           | 0                   | 0       | 0              | production: master
                     |
| 501          | production-master       | 3306 | 1       | 0      | 1           | 10000           | 0                   | 0       | 0              | production: slave
                     |
| 501          | 172.16.30.1                          | 3306 | 9999000 | 0      | 1           | 10000           | 10                  | 0       | 0              | production: local slave gearm
an-worker-fs1dc12-01 |
| 501          | production-slave | 3306 | 9999    | 0      | 1           | 10000           | 10                  | 0       | 0              | production: slave
                     |
| 400          | staging-master          | 3306 | 1       | 0      | 1           | 10000           | 0                   | 0       | 0              | staging: master
                     |
| 401          | staging-master          | 3306 | 1       | 0      | 1           | 10000           | 0                   | 0       | 0              | staging: slave
                     |
| 401          | 172.16.30.1                          | 3306 | 9999000 | 0      | 1           | 10000           | 10                  | 0       | 0              | staging: local slave gearman-worker-fs1dc12-01    |
| 401          | staging-slave    | 3306 | 9999    | 0      | 1           | 10000           | 10                  | 0       | 0              | staging: slave                                    |
+--------------+--------------------------------------+------+---------+--------+-------------+-----------------+---------------------+---------+----------------+-------------------------------------------------------+
2018-05-30 17:39:03 MySQL_HostGroups_Manager.cpp:1026:generate_mysql_replication_hostgroups_table(): [INFO] New mysql_replication_hostgroups table
2018-05-30 17:39:03 MySQL_HostGroups_Manager.cpp:1065:generate_mysql_group_replication_hostgroups_table(): [INFO] New mysql_group_replication_hostgroups table
2018-05-30 17:39:03 MySQL_HostGroups_Manager.cpp:918:generate_mysql_servers_table(): [INFO] Dumping current MySQL Servers structures for hostgroup ALL
HID: 500 , address: production-master , port: 3306 , weight: 1 , status: ONLINE , max_connections: 10000 , max_replication_lag: 0 , use_ssl: 0 , max_latency_ms: 0 , comment: production: master
HID: 501 , address: production-master , port: 3306 , weight: 1 , status: ONLINE , max_connections: 10000 , max_replication_lag: 0 , use_ssl: 0 , max_latency_ms: 0 , comment: production: slave
HID: 501 , address: 172.16.30.1 , port: 3306 , weight: 9999000 , status: ONLINE , max_connections: 10000 , max_replication_lag: 10 , use_ssl: 0 , max_latency_ms: 0 , comment: production: local slave gearman-worker-fs1dc12-01
HID: 501 , address: production-slave , port: 3306 , weight: 9999 , status: ONLINE , max_connections: 10000 , max_replication_lag: 10 , use_ssl: 0 , max_latency_ms: 0 , comment: production: slave
HID: 400 , address: staging-master , port: 3306 , weight: 1 , status: ONLINE , max_connections: 10000 , max_replication_lag: 0 , use_ssl: 0 , max_latency_ms: 0 , comment: staging: master
HID: 401 , address: staging-master , port: 3306 , weight: 1 , status: ONLINE , max_connections: 10000 , max_replication_lag: 0 , use_ssl: 0 , max_latency_ms: 0 , comment: staging: slave
HID: 401 , address: 172.16.30.1 , port: 3306 , weight: 9999000 , status: ONLINE , max_connections: 10000 , max_replication_lag: 10 , use_ssl: 0 , max_latency_ms: 0 , comment: staging: local slave gearman-worker-fs1dc12-01
HID: 401 , address: staging-slave , port: 3306 , weight: 9999 , status: ONLINE , max_connections: 10000 , max_replication_lag: 10 , use_ssl: 0 , max_latency_ms: 0 , comment: staging: slave
2018-05-30 17:39:03 MySQL_HostGroups_Manager.cpp:1014:generate_mysql_servers_table(): [INFO] Dumping mysql_servers
+--------------+--------------------------------------+------+---------+--------+-------------+-----------------+---------------------+---------+----------------+-------------------------------------------------------+-----------------+
| hostgroup_id | hostname                             | port | weight  | status | compression | max_connections | max_replication_lag | use_ssl | max_latency_ms | comment                                               | mem_pointer     |
+--------------+--------------------------------------+------+---------+--------+-------------+-----------------+---------------------+---------+----------------+-------------------------------------------------------+-----------------+
| 500          | production-master       | 3306 | 1       | 0      | 1           | 10000           | 0                   | 0       | 0              | production: master                                | 140221738338944 |
| 401          | staging-slave    | 3306 | 9999    | 0      | 1           | 10000           | 10                  | 0       | 0              | staging: slave                                    | 140221738338048 |
| 401          | 172.16.30.1                          | 3306 | 9999000 | 0      | 1           | 10000           | 10                  | 0       | 0              | staging: local slave gearman-worker-fs1dc12-01    | 140221738338176 |
| 401          | staging-master          | 3306 | 1       | 0      | 1           | 10000           | 0                   | 0       | 0              | staging: slave                                    | 140221738338304 |
| 400          | staging-master          | 3306 | 1       | 0      | 1           | 10000           | 0                   | 0       | 0              | staging: master                                   | 140221738338432 |
| 501          | production-slave | 3306 | 9999    | 0      | 1           | 10000           | 10                  | 0       | 0              | production: slave                                 | 140221738338560 |
| 501          | 172.16.30.1                          | 3306 | 9999000 | 0      | 1           | 10000           | 10                  | 0       | 0              | production: local slave gearman-worker-fs1dc12-01 | 140221738338688 |
| 501          | production-master       | 3306 | 1       | 0      | 1           | 10000           | 0                   | 0       | 0              | production: slave                                 | 140221738338816 |
+--------------+--------------------------------------+------+---------+--------+-------------+-----------------+---------------------+---------+----------------+-------------------------------------------------------+-----------------+
2018-05-30 17:39:03 ProxySQL_Cluster.cpp:247:ProxySQL_Node_Entry(): [INFO] Created new Cluster Node Entry for host gearman-worker-fs1dc12-01:6032
Main phase3 : GloAdmin initialized in 0.003388 secs.
Standard Query Processor rev. 0.2.0902_DEBUG -- Query_Processor.cpp -- Wed May 30 12:20:00 2018
2018-05-30 17:39:03 ProxySQL_Cluster.cpp:60:ProxySQL_Cluster_Monitor_thread(): [INFO] Cluster: starting thread for peer gearman-worker-fs1dc12-01:6032
Main phase3 : Query Processor initialized in 0.000514 secs.
2018-05-30 17:39:03 ProxySQL_Cluster.cpp:180:ProxySQL_Cluster_Monitor_thread(): [WARNING] Cluster: unable to connect to peer gearman-worker-fs1dc12-01:6032 . Error: Can't connect to MySQL server on 'gearman-worker-fs1dc12-01' (107)
Main phase3 : MySQL Threads Handler initialized in 0.001844 secs.
In memory Standard Query Cache (SQC) rev. 1.2.0905_DEBUG -- Query_Cache.cpp -- Wed May 30 12:20:00 2018
Main phase3 : Query Cache initialized in 0.000174 secs.
2018-05-30 17:39:03 ProxySQL_Cluster.cpp:180:ProxySQL_Cluster_Monitor_thread(): [WARNING] Cluster: unable to connect to peer gearman-worker-fs1dc12-01:6032 . Error: Can't connect to MySQL server on 'gearman-worker-fs1dc12-01' (107)
2018-05-30 17:39:04 ProxySQL_Cluster.cpp:180:ProxySQL_Cluster_Monitor_thread(): [WARNING] Cluster: unable to connect to peer gearman-worker-fs1dc12-01:6032 . Error: Can't connect to MySQL server on 'gearman-worker-fs1dc12-01' (107)
2018-05-30 17:39:04 ProxySQL_Cluster.cpp:334:set_checksums(): [INFO] Cluster: detected a new checksum for mysql_query_rules from peer gearman-worker-fs1dc12-01:6032, version 1, epoch 1527701943, checksum 0x92405DB72E48C230 . Not syncing yet ...
2018-05-30 17:39:04 ProxySQL_Cluster.cpp:336:set_checksums(): [INFO] Cluster: checksum for mysql_query_rules from peer gearman-worker-fs1dc12-01:6032 matches with local checksum 0x92405DB72E48C230 , we won't sync.
2018-05-30 17:39:04 ProxySQL_Cluster.cpp:354:set_checksums(): [INFO] Cluster: detected a new checksum for mysql_servers from peer gearman-worker-fs1dc12-01:6032, version 1, epoch 1527701943, checksum 0x1128419592D44BE4 . Not syncing yet ...
2018-05-30 17:39:04 ProxySQL_Cluster.cpp:356:set_checksums(): [INFO] Cluster: checksum for mysql_servers from peer gearman-worker-fs1dc12-01:6032 matches with local checksum 0x1128419592D44BE4 , we won't sync.
2018-05-30 17:39:04 ProxySQL_Cluster.cpp:374:set_checksums(): [INFO] Cluster: detected a new checksum for mysql_users from peer gearman-worker-fs1dc12-01:6032, version 1, epoch 1527701943, checksum 0x1F26AE2C9089A939 . Not syncing yet ...
2018-05-30 17:39:04 ProxySQL_Cluster.cpp:376:set_checksums(): [INFO] Cluster: checksum for mysql_users from peer gearman-worker-fs1dc12-01:6032 matches with local checksum 0x1F26AE2C9089A939 , we won't sync.
2018-05-30 17:39:04 ProxySQL_Cluster.cpp:410:set_checksums(): [INFO] Cluster: detected a new checksum for proxysql_servers from peer gearman-worker-fs1dc12-01:6032, version 1, epoch 1527701943, checksum 0x85B2851AFCC5981D . Not syncing yet ...
2018-05-30 17:39:04 ProxySQL_Cluster.cpp:412:set_checksums(): [INFO] Cluster: checksum for proxysql_servers from peer gearman-worker-fs1dc12-01:6032 matches with local checksum 0x85B2851AFCC5981D , we won't sync.
Main phase3 : MySQL Threads Handler listeners started in 4.00432 secs.
Standard MySQL Monitor (StdMyMon) rev. 1.2.0723_DEBUG -- MySQL_Monitor.cpp -- Wed May 30 12:20:00 2018
Main phase3 : MySQL Monitor initialized in 7.8e-05 secs.
Main init phase3 completed in 4.01297 secs.
2018-05-30 18:07:08 main.cpp:1086:main(): [ERROR] Watchdog: 1 threads missed a heartbeat
2018-05-30 18:07:11 main.cpp:1086:main(): [ERROR] Watchdog: 1 threads missed a heartbeat
2018-05-30 18:30:12 main.cpp:1086:main(): [ERROR] Watchdog: 2 threads missed a heartbeat
2018-05-30 18:30:15 main.cpp:1086:main(): [ERROR] Watchdog: 3 threads missed a heartbeat
2018-05-30 18:31:09 main.cpp:1086:main(): [ERROR] Watchdog: 3 threads missed a heartbeat
2018-05-30 18:31:12 main.cpp:1086:main(): [ERROR] Watchdog: 1 threads missed a heartbeat
2018-05-30 18:31:15 main.cpp:1086:main(): [ERROR] Watchdog: 2 threads missed a heartbeat
2018-05-30 18:32:09 main.cpp:1086:main(): [ERROR] Watchdog: 1 threads missed a heartbeat
2018-05-30 18:32:12 main.cpp:1086:main(): [ERROR] Watchdog: 1 threads missed a heartbeat
2018-05-30 18:34:15 main.cpp:1086:main(): [ERROR] Watchdog: 1 threads missed a heartbeat
2018-05-30 18:37:17 mysql_connection.cpp:629:handler(): [ERROR] Failed to mysql_real_connect() on staging-master:3306 , FD (Conn:5410 , MyDS:5410) , 1040: Too many connections.
2018-05-30 18:37:17 mysql_connection.cpp:629:handler(): [ERROR] Failed to mysql_real_connect() on staging-master:3306 , FD (Conn:5411 , MyDS:5411) , 1040: Too many connections.
2018-05-30 18:37:17 mysql_connection.cpp:629:handler(): [ERROR] Failed to mysql_real_connect() on staging-master:3306 , FD (Conn:5410 , MyDS:5410) , 1040: Too many connections.
2018-05-30 18:37:17 MySQL_HostGroups_Manager.cpp:302:connect_error(): [ERROR] Shunning server staging-master:3306 with 5 errors/sec. Shunning for 10 seconds
2018-05-30 18:37:25 mysql_connection.cpp:629:handler(): [ERROR] Failed to mysql_real_connect() on staging-master:3306 , FD (Conn:5603 , MyDS:5603) , 1040: Too many connections.
2018-05-30 18:37:25 mysql_connection.cpp:629:handler(): [ERROR] Failed to mysql_real_connect() on staging-master:3306 , FD (Conn:5603 , MyDS:5603) , 1040: Too many connections.
2018-05-30 18:37:25 mysql_connection.cpp:629:handler(): [ERROR] Failed to mysql_real_connect() on staging-master:3306 , FD (Conn:5603 , MyDS:5603) , 1040: Too many connections.
2018-05-30 18:37:25 MySQL_HostGroups_Manager.cpp:302:connect_error(): [ERROR] Shunning server staging-master:3306 with 5 errors/sec. Shunning for 10 seconds

Reiner030 · 2018-06-12T21:23:43Z

Sadly there is still no cause found. I tried many possible changes with every time testing the behavior 3x with 30-45 minutes runtime each ... I guess you can imagine how much fun I had last weeks because there is no documentation howto debug ProxySQL correctly.

Because many tests were failing I took the packaged config (with threads=4) and added only the urgently needed config parts like server, users, rules, and maxsize/timout values from our config to the default one.

In my tests (with our deactivated multiplexing because of the INDEX_OFFSET=... initialization) it was working last Thursday/Friday like this:

system worked fine when I used directly the master only
system worked fine when I used master only over ProxySQL without compression
system worked fine when I used master with external slave over ProxySQL without compression
system worked fine when I used master with external and localhost slave over ProxySQL without compression
system worked bad when I used master with external and localhost slave over ProxySQL with mixed compression setup (master, external but not localhost slave)
system worked bad when I used master with external and localhost slave over ProxySQL with all servers compression setup

From this tests it looks like that the compression commonly breaks our pools.

But today the production run failed again even with using the tested successfull settings and so staging which results that there is no "real" replicable "experimental" set-up possible. 😭

We can modify the setup /routine for this special case but must have in mind that connections running over ProxySQL could fail without recognizable cause in many other high throughput situations so that ProxySQL seems needed to be replaced somehow which I tried to avoid; also it's nice included in solutions like from Severalnines which could be a later enhancment.

Here the failed staging run from today with ProxySQL stats page completely:
https://drive.google.com/open?id=1c1CpSDVR2mE-CksEwt9UdGsz5V3h7nfl
Here with ignoring the server/client connections and question to show other values:
https://drive.google.com/open?id=1bIdkkRNQGNp3GX63YUQYEwxg_HWt_Gv1
The other values are always 0.

I tried it also with free_connections_pct=0 instead default 10 but it was also failing.

If interested I could also mail/share all other test images but still it seems to be non-reproducable within several days it makes no much sense I guess.

joelhock · 2020-04-18T17:04:03Z

I have run across an issue with mixed compressed/uncompressed clients, also, with an easy way to reproduce. if an uncompressed client's backend response gets cached, then the cached copy can be used correctly with either a future uncompressed or compressed client. however, if a compressed client's request to the backend gets cached, then when a future uncompressed client sends the same query, it seems compressed cached data gets sent to it, and it drops the connection quickly.

renecannao · 2020-04-18T17:48:20Z

Thank you @joelhock , perhaps you provided the missing information to get this worked on!

joelhock · 2020-04-21T19:41:04Z

one more piece of information: in addition to my prior comment about compressed cached results not being usable for uncompressed clients, it also seems if the client that triggers a new connection to a database backend is a compressed client, then that connection to the backend stays forever compressed and future requests from uncompressed clients that are fulfilled by that backend connection will cause the uncompressed client to disconnect.

Fixes #2793 and #1493: Connection issues in compressed connections and in mixed compressed/uncompressed connections.

renecannao · 2020-05-17T22:53:07Z

Reopening , this isn't fixed in 2.1.0

Fixes #2793 and #1493: Connection issues in compressed connections and in mixed compressed/uncompressed connections.

joelhock · 2020-05-21T17:12:49Z

my test cases now work with 2.0.13. thank you!!

Reiner030 mentioned this issue Jul 12, 2018

Broken connections #1600

Open

JavierJF added a commit that referenced this issue May 14, 2020

Added two new regression tests for fixes for #2793 and #1493

d8db416

renecannao closed this as completed in 8e5f0ff May 17, 2020

renecannao added a commit that referenced this issue May 17, 2020

Merge pull request #2794 from sysown/v2.0.12-1493

c566652

Fixes #2793 and #1493: Connection issues in compressed connections and in mixed compressed/uncompressed connections.

renecannao reopened this May 17, 2020

JavierJF added a commit to JavierJF/proxysql that referenced this issue May 18, 2020

Fixes sysown#1493: Update 'pkt_sid' in case of compressed connection

9261e31

JavierJF added a commit to JavierJF/proxysql that referenced this issue May 18, 2020

Added two new regression tests for fixes for sysown#2793 and sysown#1493

a85e7fc

renecannao added a commit that referenced this issue May 20, 2020

Merge pull request #2807 from sysown/v2.1.0-1493

2b9dfb8

Fixes #2793 and #1493: Connection issues in compressed connections and in mixed compressed/uncompressed connections.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bug with multiple servers in group with mixed compress/uncompressed settings #1493

Bug with multiple servers in group with mixed compress/uncompressed settings #1493

Reiner030 commented May 4, 2018

renecannao commented May 4, 2018

Reiner030 commented May 5, 2018 •

edited

Loading

renecannao commented May 5, 2018

Reiner030 commented May 5, 2018

Reiner030 commented May 24, 2018

Reiner030 commented May 30, 2018

Reiner030 commented Jun 12, 2018

joelhock commented Apr 18, 2020

renecannao commented Apr 18, 2020

joelhock commented Apr 21, 2020

renecannao commented May 17, 2020

joelhock commented May 21, 2020

Bug with multiple servers in group with mixed compress/uncompressed settings #1493

Bug with multiple servers in group with mixed compress/uncompressed settings #1493

Comments

Reiner030 commented May 4, 2018

renecannao commented May 4, 2018

Reiner030 commented May 5, 2018 • edited Loading

renecannao commented May 5, 2018

Reiner030 commented May 5, 2018

Reiner030 commented May 24, 2018

Reiner030 commented May 30, 2018

Reiner030 commented Jun 12, 2018

joelhock commented Apr 18, 2020

renecannao commented Apr 18, 2020

joelhock commented Apr 21, 2020

renecannao commented May 17, 2020

joelhock commented May 21, 2020

Reiner030 commented May 5, 2018 •

edited

Loading