Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug with multiple servers in group with mixed compress/uncompressed settings #1493

Open
Reiner030 opened this issue May 4, 2018 · 12 comments

Comments

@Reiner030
Copy link

Hi,

I re-setup a new PHP + DB slave server with Debian Stretch.
Before all was running fine with weighted settings with a local ProxySQL:
external master + external slave + local slave - all uncompressed connections.

Since the compress flag for another database as mentioned in another issue was working nicely I added the compress setting for saving traffic also to new proxysql configuration. But only for the external master + external slave. For local connections it should be not helpful and not performant to also compress the data.

First days all works nicely but from time to time when the backend server has to do many PHP tasks there were exeptions thrown like: SQLSTATE[HY000]: General error: 9001 Max connect timeout reached while reaching hostgroup 501 after 60238ms without any reason - all servers are up, with not much connections and especially while these errors occured the connections went to zero on the localhost slave.

  • we have 60s timeout for backend services
  • 501 is our reader group
  • The user/server have only used connections < 1/3 of user/server max_connections running
  • These connections are with multiplexing set off
  • Tested with 1.4.8
  • Tested with 1.4.6 because old working servers running this version

and nothing into the logs which gave a good hint - only complaining about the master (with weight=1) if a server was mentioned:

2018-05-04 07:05:16 main.cpp:1086:main(): [ERROR] Watchdog: 1 threads missed a heartbeat
2018-05-04 07:15:10 main.cpp:1086:main(): [ERROR] Watchdog: 1 threads missed a heartbeat
2018-05-04 07:15:13 main.cpp:1086:main(): [ERROR] Watchdog: 40 threads missed a heartbeat
2018-05-04 07:15:16 main.cpp:1086:main(): [ERROR] Watchdog: 40 threads missed a heartbeat
2018-05-04 07:15:19 main.cpp:1086:main(): [ERROR] Watchdog: 40 threads missed a heartbeat
2018-05-04 07:15:21 mysql_connection.cpp:640:handler(): [ERROR] Connect timeout on master:3306 : exceeded by 2
9251us
2018-05-04 07:15:21 mysql_connection.cpp:604:handler(): [ERROR] Failed to mysql_real_connect() on master:3306
, FD (Conn:1578 , MyDS:1578) , 2013: Lost connection to MySQL server at 'reading authorization packet', system error: 11.
2018-05-04 07:15:21 mysql_connection.cpp:604:handler(): [ERROR] Failed to mysql_real_connect() on master:3306
, FD (Conn:1574 , MyDS:1574) , 2013: Lost connection to MySQL server at 'reading authorization packet', system error: 11.
2018-05-04 07:15:21 MySQL_HostGroups_Manager.cpp:301:connect_error(): [ERROR] Shunning server master:3306 with
 5 errors/sec. Shunning for 10 seconds
2018-05-04 07:20:10 main.cpp:1086:main(): [ERROR] Watchdog: 1 threads missed a heartbeat

It would be great to get some more hints in Wiki how the process is running and how debugging can be done (=> #969 ) because I was lost by searching for reasons/solutions till I found these both issues by the error message... Second one was the in changelog also mentioned

Automatically processing of uncompressed packets even if wrongly marked as compressed #1410

which remembered me to check my previous compress settings on the connections (which was off) I had them manually added some month ago in proxysql configuration before the ProxySQL reload function was available.

=> After setting ALL connections to compressed or to uncompressed the errors are gone.
=> After setting mixed compress/uncompress values to connections errors are back again.

==> this must be an "enhanced" problem of above bug #1410 ?
Best would be to fix it for upcoming 2.0.0 release... 😊

@renecannao
Copy link
Contributor

This issue is interesting.
To clarify, an important question: is your client (PHP) using compression, or no?
From what I read, I would assume that the answer is no, but please correct me if I am wrong.

Based on my assumption that client is not using compression, that means that ProxySQL's code for handling compression is not relevant at all (so forget about #1410).
If you are using compression for backends, all that ProxySQL does is to set CLIENT_COMPRESS flag, nothing more.
Therefore, if a problem exists, it is mostly likely either on MySQL server, or in the embedded MariaDB Client Library.
IF the problem is indeed compression, my main suspect right now is MariaDB Client Library, because this is what ProxySQL uses to communicates with backends.

Your error log indeeds give us some leads for further investigation. Is it possible to have the whole error log, or few hundreds lines?
What I find relevant in your error log is:

  • at 07:15:10 : one thread missed a heartbeat: either locked, or very busy
  • from 07:15:13 to 07:15:19 : 40 threads missed a heartbeat: it is unlikely that 40 threads are very busy, so it is more likely the are locked (maybe the first blocked thread is holding a mutex ths blocking all the others)
  • 07:15:21 : errors about master connections are logged
  • he errors about missing heartbeats disappear: is this a coincidence or not?

At this point, I would investigate what causes an apparent lock.
It is possible that ProxySQL's worker threads get locked, and they report connection timeout just because they weren't able to process the connection itself. If that's the case, compression is potentially unrelated to this issue.
From the few lines I have from your error log, it is possible that heartbeats are missing at regular intervals (07:05:16, 07:15:10 and 07:20:10): having the whole error log will help us finding some pattern.
Are you using mysql-threads=40 ? Unless you have an extremely beefy machine where ProxySQL is allowed to use 40 CPU cores, you should lower this value. Especially if you are running ProxySQL locally to your application. How much CPU is normally using ProxySQL?

Action items I would suggest right now:
a) drastically reduce mysql-threads based on its CPU utilization
b) provide the full error log, or a large portion of it

Thanks

@Reiner030
Copy link
Author

Reiner030 commented May 5, 2018

Hi,
thanks for quick answer 😄 I setup yesterday my connection all to compressed so it's working already fine but that could be a common problem if mixed connections where needed.

  • I have no more logging from ProxySQL which was the problem to find the cause.
    Thats why I asked for a howto in wiki pages also for such cases.
    Have I overseen such entries or do I need to install the debug version of ProxySQL?
    Found only the profiling wiki pages for memory leaks.

  • As suggested I use no compression on client side, only on backend server configuration for ProxySQL.
    I thought that the backend connection between ProxySQL and MySQL/MariaDB/Percona server is transparent to the client so independent if backend connection is compressed or not ProxySQL client communication has to be negotiated (un-)compressed connection type "only"?

  • Here my actual package versions for Debian Stretch/9 (all actual) with MariaDB repository packages and sury.org PHP packages:

# apt-cache policy php7.2-mysql libmariadb3 libmariadbclient18 libmysqlclient18 mariadb-client mariadb-client-10.2 mariadb-client-core-10.2 mariadb-common | grep -e ^[a-z] -e Installed:
php7.2-mysql:
  Installed: 7.2.4-1+0~20180405085422.20+stretch~1.gbpbff9f0
libmariadb3:
  Installed: 10.2.14+maria~stretch
libmariadbclient18:
  Installed: 10.2.14+maria~stretch
libmysqlclient18:
  Installed: 10.2.14+maria~stretch
mariadb-client:
  Installed: 10.2.14+maria~stretch
mariadb-client-10.2:
  Installed: 10.2.14+maria~stretch
mariadb-client-core-10.2:
  Installed: 10.2.14+maria~stretch
mariadb-common:
  Installed: 10.2.14+maria~stretch

(EDITED because image links were not working)

@renecannao
Copy link
Contributor

Hi.

You are welcome! :)
I still think that having a look at error log is the best starting point to troubleshoot this issue.
Is it possible to have it?

About compression on clients:
that is correct, clients can use compression or no, connections to backends can use compression or no, and they do not have to match.
For example, a client can use no-compression, and proxysql uses compression to backend, and viceversa.
My point in the previous comment is that proxysql's code only handles compression with clients: compression with backends is handled by mariadb client library.
On mariadb client library: proxysql uses an embedded modified version, thus what you have installed is not relevant.

About mysql-threads: ProxySQL worker threads use async calls, they do not block on network operations.
That means that a single thread can handle tens of thousands of connections.
That said, the way to tune mysql-threads is not based on number of connections, but on CPU usage.
In your setup it seems that mysql-threads=1 is actually enough, but you can use the default mysql-threads=4. Surely, please stay away from mysql-threads=40.
(we need to mmake the wiki more clear on this).

Finally, the links you posted does seem to work.

@Reiner030
Copy link
Author

Hi,

  • I already copied the "full log" between the logrotate line ~ 1 hour before and the stop/restart lines hours later.
    If you need more debug/logging I need to know howto create it. Thanks.
  • OK, then its now clear which MariaDB library you referenced.
  • I would then switch Monday back to default of 4 threads for ProxySQL.
  • The Github image links creates for me (already weeks ago in other issue writing) only a Non-Image content-type returned response from Github so I added for security the Google Drive links directly below it.

@Reiner030
Copy link
Author

First easy parts / found problems - it could be better to split parts into separated issues but I didn't know what should gone separated so feel free to reuse it if it make sense:

  • I tried to minimize threads as suggested but proxysql (1.4.6 package) is rejecting it;
2018-05-07 07:51:31 [INFO] Received LOAD MYSQL VARIABLES TO RUNTIME command
2018-05-07 07:51:31 ProxySQL_Admin.cpp:3875:flush_mysql_variables___database_to_runtime(): [WARNING] Impossible to set variable threads with value "4". Resetting to current "100".
2018-05-07 07:51:44 [INFO] Received SAVE MYSQL VARIABLES TO DISK command

On another instances with Debian Jessie I got same reject (1.4.6 package) and there is the package default of 20 set:

2018-05-09 08:05:41 [INFO] Received LOAD MYSQL VARIABLES TO RUNTIME command
2018-05-09 08:05:41 ProxySQL_Admin.cpp:3858:flush_mysql_variables___database_to_runtime(): [WARNING] Impossible to set variable threads with value "4". Resetting to current "20".

Also it's not possible to set the value down from 100 to 20 even:

2018-05-24 19:24:35 ProxySQL_Admin.cpp:1384:admin_handler_command_load_or_save(): [INFO] Received LOAD MYSQL VARIABLES TO RUNTIME command
2018-05-24 19:24:35 ProxySQL_Admin.cpp:3875:flush_mysql_variables___database_to_runtime(): [WARNING] Impossible to set variable threads with value "20". Resetting to current "100".
2018-05-24 19:24:42 ProxySQL_Admin.cpp:1368:admin_handler_command_load_or_save(): [INFO] Received SAVE MYSQL VARIABLES TO DISK command

Even after a ProxySQL restart the "saved to disk" value is again (20 or) 100 ...


  • It would be great if not too difficult to get more informations like to which servers were the connected threads failing?
2018-05-07 23:30:18 main.cpp:1086:main(): [ERROR] Watchdog: 1 threads missed a heartbeat
2018-05-07 23:30:19 mysql_connection.cpp:664:handler(): [ERROR] Connect timeout on staging-master:3306 : exceeded by 25713us

In this case it's clear but if many threads fail in parallel there could be more than 1 backend server affected like here where no server is mentioned explicit in another log line:

2018-05-21 06:25:02 ProxySQL_Admin.cpp:7658:flush_error_log(): [INFO] ProxySQL version 1.4.8-32-g669c149_DEBUG
2018-05-21 06:25:02 ProxySQL_Admin.cpp:7661:flush_error_log(): [INFO] Detected OS: Linux gearman-worker-fs1dc12-01 4.9.0-6-amd64 #1 SMP Debian 4.9.82-1+deb9u3 (2018-03-02) x86_64
2018-05-21 15:47:13 main.cpp:1086:main(): [ERROR] Watchdog: 1 threads missed a heartbeat
2018-05-21 15:47:16 main.cpp:1086:main(): [ERROR] Watchdog: 1 threads missed a heartbeat
2018-05-21 15:52:10 main.cpp:1086:main(): [ERROR] Watchdog: 1 threads missed a heartbeat
2018-05-21 19:17:10 main.cpp:1086:main(): [ERROR] Watchdog: 1 threads missed a heartbeat
2018-05-21 19:27:07 main.cpp:1086:main(): [ERROR] Watchdog: 1 threads missed a heartbeat
2018-05-21 23:30:08 main.cpp:1086:main(): [ERROR] Watchdog: 1 threads missed a heartbeat
2018-05-22 00:00:15 main.cpp:1086:main(): [ERROR] Watchdog: 1 threads missed a heartbeat
2018-05-22 00:00:18 main.cpp:1086:main(): [ERROR] Watchdog: 1 threads missed a heartbeat
2018-05-22 00:28:34 main.cpp:1086:main(): [ERROR] Watchdog: 1 threads missed a heartbeat
2018-05-22 00:35:13 main.cpp:1086:main(): [ERROR] Watchdog: 1 threads missed a heartbeat
2018-05-22 00:35:16 main.cpp:1086:main(): [ERROR] Watchdog: 1 threads missed a heartbeat
2018-05-22 03:05:20 main.cpp:1086:main(): [ERROR] Watchdog: 1 threads missed a heartbeat
2018-05-22 03:15:18 main.cpp:1086:main(): [ERROR] Watchdog: 1 threads missed a heartbeat
2018-05-22 03:15:21 main.cpp:1086:main(): [ERROR] Watchdog: 23 threads missed a heartbeat
2018-05-22 03:15:33 main.cpp:1086:main(): [ERROR] Watchdog: 1 threads missed a heartbeat
2018-05-22 03:15:36 main.cpp:1086:main(): [ERROR] Watchdog: 1 threads missed a heartbeat
2018-05-22 03:30:15 main.cpp:1086:main(): [ERROR] Watchdog: 1 threads missed a heartbeat
2018-05-22 03:30:18 main.cpp:1086:main(): [ERROR] Watchdog: 1 threads missed a heartbeat
2018-05-22 03:30:21 main.cpp:1086:main(): [ERROR] Watchdog: 1 threads missed a heartbeat
2018-05-22 06:10:16 main.cpp:1086:main(): [ERROR] Watchdog: 15 threads missed a heartbeat
2018-05-22 06:10:19 main.cpp:1086:main(): [ERROR] Watchdog: 15 threads missed a heartbeat
2018-05-22 06:10:22 main.cpp:1086:main(): [ERROR] Watchdog: 3 threads missed a heartbeat
2018-05-22 06:25:02 ProxySQL_Admin.cpp:657:admin_handler_command_proxysql(): [INFO] Received PROXYSQL FLUSH LOGS command

  • This error happend sometime while checking stats page; as found in code error message came from htauth handler which seems only have a buffer/array for 1 session and we were checking with 2 persons the stats page:
Stale nonce received.  If this happens a lot, you should probably increase the size of the nonce array.

  • It would be great to get better formatted messages so it's clear who is is throwing it
    • client when trying connect to ProxySQL
    • ProxySQL while trying to get backend
    • ...
      like this one:
Unable to execute SELECT statement [SELECT ... ] [wrapped: SQLSTATE[HY000]: General error: 9001 Max connect timeout reached while reaching hostgroup 501 after 60063ms] in ...

Question is: Are these others all client <-> ProxySQL related only?

Unable to open PDO connection [wrapped: SQLSTATE[HY000] [2002] Connection refused]' in ...
PropelException' with message 'Unable to open PDO connection [wrapped: SQLSTATE[HY000] [1040] Too many connections]' in ...
Unable to execute SELECT statement [SELECT ...] [wrapped: PDOStatement::execute(): MySQL server has gone away] in ...
Lost connection to MySQL server during query

=> which server is gone (if the message came from ProxySQL)?


  • In first crash after my hopefully found bug with mixed content I got this crash logged:
2018-05-09 12:01:34 main.cpp:1086:main(): [ERROR] Watchdog: 5 threads missed a heartbeat
2018-05-09 12:01:34 main.cpp:1090:main(): [ERROR] Watchdog: reached 10 missed heartbeats. Aborting!
proxysql: main.cpp:1091: int main(int, const char**): Assertion `0' failed.
Error: signal 6:
proxysql(_Z13crash_handleri+0x25)[0x4cf485]
/lib/x86_64-linux-gnu/libc.so.6(+0x350e0)[0x7fe9cec4c0e0]
/lib/x86_64-linux-gnu/libc.so.6(gsignal+0x37)[0x7fe9cec4c067]
/lib/x86_64-linux-gnu/libc.so.6(abort+0x148)[0x7fe9cec4d448]
/lib/x86_64-linux-gnu/libc.so.6(+0x2e266)[0x7fe9cec45266]
/lib/x86_64-linux-gnu/libc.so.6(+0x2e312)[0x7fe9cec45312]
proxysql(main+0x70d)[0x4cc2c4]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf5)[0x7fe9cec38b45]
proxysql[0x4c99b4]
2018-05-09 12:01:35 main.cpp:905:ProxySQL_daemonize_phase3(): [ERROR] ProxySQL crashed. Restarting!
2018-05-09 12:01:35 main.cpp:877:ProxySQL_daemonize_phase3(): [INFO] Angel process started ProxySQL process 3852
Standard ProxySQL Cluster rev. 0.1.0702_DEBUG -- ProxySQL_Cluster.cpp -- Thu Feb  1 02:43:15 2018
Standard ProxySQL Statistics rev. 1.4.1027_DEBUG -- ProxySQL_Statistics.cpp -- Thu Feb  1 02:43:15 2018
Standard ProxySQL HTTP Server Handler rev. 1.4.1031_DEBUG -- ProxySQL_HTTP_Server.cpp -- Thu Feb  1 02:43:15 2018
Admin initialized in 0.213958 secs.
  • In further logs I found also these:
Error: signal 11:
proxysql(_Z13crash_handleri+0x1a)[0x55be8b035ffa]
/lib/x86_64-linux-gnu/libc.so.6(+0x33060)[0x7f4b2416d060]
proxysql(_ZN12MySQL_Thread3runEv+0x1460)[0x55be8b0632c0]
proxysql(_Z24mysql_worker_thread_funcPv+0x6c)[0x55be8b03435c]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x7494)[0x7f4b25aa2494]
/lib/x86_64-linux-gnu/libc.so.6(clone+0x3f)[0x7f4b24222acf]
2018-05-05 01:25:11 main.cpp:905:ProxySQL_daemonize_phase3(): [ERROR] ProxySQL crashed. Restarting!
2018-05-05 01:25:11 [INFO] Angel process started ProxySQL process 30164
Standard ProxySQL Cluster rev. 0.1.0702 -- ProxySQL_Cluster.cpp -- Tue Apr  3 06:28:45 2018
Standard ProxySQL Statistics rev. 1.4.1027 -- ProxySQL_Statistics.cpp -- Wed Apr  4 13:21:29 2018
Standard ProxySQL HTTP Server Handler rev. 1.4.1031 -- ProxySQL_HTTP_Server.cpp -- Tue Apr  3 06:28:45 2018
Standard ProxySQL Admin rev. 0.2.0902 -- ProxySQL_Admin.cpp -- Tue Apr 10 12:18:22 2018
Standard MySQL Threads Handler rev. 0.2.0902 -- MySQL_Thread.cpp -- Tue Apr  3 06:28:45 2018
Standard MySQL Authentication rev. 0.2.0902 -- MySQL_Authentication.cpp -- Tue Apr  3 06:28:45 2018
...
Standard MySQL Monitor (StdMyMon) rev. 1.2.0723 -- MySQL_Monitor.cpp -- Wed Apr  4 13:21:29 2018
Error: signal 11:
proxysql(_Z13crash_handleri+0x1a)[0x55be8b035ffa]
/lib/x86_64-linux-gnu/libc.so.6(+0x33060)[0x7f4b2416d060]
proxysql(free_root+0x5b)[0x55be8b21afbb]
proxysql(mysql_free_result+0x36)[0x55be8b215676]
proxysql(_ZN16MySQL_Connection7handlerEs+0x3fc)[0x55be8b14870c]
proxysql(_ZN16MySQL_Connection11async_queryEsPcmPP13st_mysql_stmtP23stmt_execute_metadata_t+0xa6)[0x55be8b1494d6]
proxysql(_ZN13MySQL_Session7handlerEv+0x1ba0)[0x55be8b0723b0]
proxysql(_ZN12MySQL_Thread20process_all_sessionsEv+0x2df)[0x55be8b05b00f]
proxysql(_ZN12MySQL_Thread3runEv+0x1740)[0x55be8b0635a0]
proxysql(_Z24mysql_worker_thread_funcPv+0x6c)[0x55be8b03435c]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x7494)[0x7f4b25aa2494]
/lib/x86_64-linux-gnu/libc.so.6(clone+0x3f)[0x7f4b24222acf]
2018-05-05 01:30:09 main.cpp:905:ProxySQL_daemonize_phase3(): [ERROR] ProxySQL crashed. Restarting!
2018-05-05 01:30:09 [INFO] Angel process started ProxySQL process 1481
Standard ProxySQL Cluster rev. 0.1.0702 -- ProxySQL_Cluster.cpp -- Tue Apr  3 06:28:45 2018
Standard ProxySQL Statistics rev. 1.4.1027 -- ProxySQL_Statistics.cpp -- Wed Apr  4 13:21:29 2018
Standard ProxySQL HTTP Server Handler rev. 1.4.1031 -- ProxySQL_HTTP_Server.cpp -- Tue Apr  3 06:28:45 2018
Standard ProxySQL Admin rev. 0.2.0902 -- ProxySQL_Admin.cpp -- Tue Apr 10 12:18:22 2018
Standard MySQL Threads Handler rev. 0.2.0902 -- MySQL_Thread.cpp -- Tue Apr  3 06:28:45 2018
Standard MySQL Authentication rev. 0.2.0902 -- MySQL_Authentication.cpp -- Tue Apr  3 06:28:45 2018

Since also with debug version logging is not easy to expand due to missing wiki documentation I have since now 2 more crashes (and running instance only for staging which didn't get enough traffic to reproduce the problem).

From latest issues it could be similar to #1522 but I get no real crashes

Here the log with time around the last "tried" production failure after 8:00 UTC:

2018-05-17 06:50:17 main.cpp:1086:main(): [ERROR] Watchdog: 1 threads missed a heartbeat
2018-05-17 06:50:20 main.cpp:1086:main(): [ERROR] Watchdog: 1 threads missed a heartbeat
2018-05-17 07:15:12 main.cpp:1086:main(): [ERROR] Watchdog: 1 threads missed a heartbeat
2018-05-17 07:15:15 main.cpp:1086:main(): [ERROR] Watchdog: 1 threads missed a heartbeat
2018-05-17 07:20:21 main.cpp:1086:main(): [ERROR] Watchdog: 4 threads missed a heartbeat
2018-05-17 07:20:27 main.cpp:1086:main(): [ERROR] Watchdog: 1 threads missed a heartbeat
2018-05-17 07:20:30 main.cpp:1086:main(): [ERROR] Watchdog: 1 threads missed a heartbeat
2018-05-17 07:20:33 main.cpp:1086:main(): [ERROR] Watchdog: 1 threads missed a heartbeat
2018-05-17 07:20:36 main.cpp:1086:main(): [ERROR] Watchdog: 1 threads missed a heartbeat
2018-05-17 07:20:38 mysql_connection.cpp:664:handler(): [ERROR] Connect timeout on production-slave-02:3306 : exceeded by 24629us
2018-05-17 07:50:19 main.cpp:1086:main(): [ERROR] Watchdog: 2 threads missed a heartbeat
2018-05-17 07:50:34 main.cpp:1086:main(): [ERROR] Watchdog: 1 threads missed a heartbeat
2018-05-17 07:50:37 main.cpp:1086:main(): [ERROR] Watchdog: 1 threads missed a heartbeat
2018-05-17 08:00:22 main.cpp:1086:main(): [ERROR] Watchdog: 1 threads missed a heartbeat
2018-05-17 08:15:10 main.cpp:1086:main(): [ERROR] Watchdog: 86 threads missed a heartbeat
2018-05-17 08:15:13 main.cpp:1086:main(): [ERROR] Watchdog: 86 threads missed a heartbeat
2018-05-17 08:15:16 main.cpp:1086:main(): [ERROR] Watchdog: 86 threads missed a heartbeat
2018-05-17 08:15:19 main.cpp:1086:main(): [ERROR] Watchdog: 86 threads missed a heartbeat
2018-05-17 08:15:21 mysql_connection.cpp:628:handler(): [ERROR] Failed to mysql_real_connect() on production-slave-02:3306 , FD (Conn:670 , MyDS:670) , 2013: Lost connection to MySQL server at 'reading authorization packet', system error: 115.
2018-05-17 08:15:21 mysql_connection.cpp:664:handler(): [ERROR] Connect timeout on production-slave-02:3306 : exceeded by 29554us
2018-05-17 08:15:21 mysql_connection.cpp:628:handler(): [ERROR] Failed to mysql_real_connect() on production-slave-02:3306 , FD (Conn:667 , MyDS:667) , 2013: Lost connection to MySQL server at 'reading authorization packet', system error: 11.
2018-05-17 08:15:21 mysql_connection.cpp:628:handler(): [ERROR] Failed to mysql_real_connect() on production-master:3306 , FD (Conn:669 , MyDS:669) , 2013: Lost connection to MySQL server at 'reading authorization packet', system error: 11.
2018-05-17 08:15:22 main.cpp:1086:main(): [ERROR] Watchdog: 2 threads missed a heartbeat
2018-05-17 08:15:25 main.cpp:1086:main(): [ERROR] Watchdog: 15 threads missed a heartbeat
2018-05-17 08:15:34 main.cpp:1086:main(): [ERROR] Watchdog: 1 threads missed a heartbeat
2018-05-17 08:25:13 main.cpp:1086:main(): [ERROR] Watchdog: 1 threads missed a heartbeat
2018-05-17 08:25:16 main.cpp:1086:main(): [ERROR] Watchdog: 1 threads missed a heartbeat
2018-05-17 08:25:19 main.cpp:1086:main(): [ERROR] Watchdog: 1 threads missed a heartbeat
2018-05-17 08:25:22 main.cpp:1086:main(): [ERROR] Watchdog: 1 threads missed a heartbeat
2018-05-17 08:25:23 mysql_connection.cpp:664:handler(): [ERROR] Connect timeout on production-slave-02:3306 : exceeded by 30230us
2018-05-17 08:45:08 main.cpp:1086:main(): [ERROR] Watchdog: 19 threads missed a heartbeat
2018-05-17 09:30:15 main.cpp:1086:main(): [ERROR] Watchdog: 1 threads missed a heartbeat
2018-05-17 09:45:15 main.cpp:1086:main(): [ERROR] Watchdog: 84 threads missed a heartbeat
2018-05-17 09:45:18 main.cpp:1086:main(): [ERROR] Watchdog: 84 threads missed a heartbeat
2018-05-17 09:45:21 main.cpp:1086:main(): [ERROR] Watchdog: 1 threads missed a heartbeat
2018-05-17 09:57:15 main.cpp:1086:main(): [ERROR] Watchdog: 1 threads missed a heartbeat
2018-05-17 11:35:12 main.cpp:1086:main(): [ERROR] Watchdog: 1 threads missed a heartbeat
2018-05-17 11:50:06 main.cpp:1086:main(): [ERROR] Watchdog: 1 threads missed a heartbeat

As we can see the ProxySQL service "actively" disconnects from any mysql backend server - even the localhost one - which are then clientless and the client got only exceptions like:

[wrapped: SQLSTATE[HY000]: General error: 9001 Max connect timeout reached while reaching hostgroup 501 after 60075ms]' 

My idea is that perhaps deactivated multiplexing could cause such problems (boosted by mixed compressed/uncompressed backend connections) ?
If this could be a problem: Could it be possible to setup default multiplexing=on and only the connections with a different INCREMENT_OFFSET could/would use multiplexing=off ?


How is debugging made ? ... the big question which is not explained in wiki / I cannot reproduce out of code:

ProxySQL> UPDATE debug_levels SET verbosity=3 WHERE module in ('debug_mysql_server', 'debug_mysql_connection', 'debug_mysql_connpool');
Query OK, 3 rows affected (0.00 sec)

ProxySQL> SELECT * FROM debug_levels;
+-----------------------------+-----------+
| module                      | verbosity |
+-----------------------------+-----------+
| debug_generic               | 0         |
| debug_net                   | 0         |
| debug_pkt_array             | 0         |
| debug_poll                  | 0         |
| debug_mysql_com             | 0         |
| debug_mysql_server          | 3         |
| debug_mysql_connection      | 3         |
| debug_mysql_connpool        | 3         |
| debug_mysql_rw_split        | 0         |
| debug_mysql_auth            | 0         |
| debug_mysql_protocol        | 0         |
| debug_mysql_query_processor | 0         |
| debug_memory                | 0         |
| debug_admin                 | 0         |
| debug_sqlite                | 0         |
| debug_ipc                   | 0         |
| debug_query_cache           | 0         |
| debug_query_statistics      | 0         |
+-----------------------------+-----------+
18 rows in set (0.00 sec)
ProxySQL> LOAD DEBUG LEVELS TO RUNTIME;
ERROR 1045 (#2800): near "LOAD": syntax error
ProxySQL> SAVE DEBUG LEVELS TO DISK;
ERROR 1045 (#2800): near "SAVE": syntax error
ProxySQL>

is not working as expected and there are no log entries for these tries...

@Reiner030
Copy link
Author

New version, new chance ... but sadly no luck:

# wget https://github.com/sysown/proxysql/releases/download/v1.4.9/proxysql_1.4.9-dbg-debian9_amd64.deb
# dpkg -i proxysql_1.4.9-dbg-debian9_amd64.deb

threads set to 20 in /etc/proxysql.cnf

# /etc/init.d/proxysql stop && /etc/init.d/proxysql reload

threads checked: now only 20 instead 100

First hour without clients it was looking fine so far.

But after touching all date to get them process around 18:30 UTC/20:30 MEST it tooks only 7 minutes to bring the new ProxySQL 1.4.9 down 😭

=> Could this be a problem because of deactivated multiplexing ?
If so, could it be unproblematic to activate multiplexing for all client
connections which have the same INCREMENT_OFFSET set?
So we could/need to deactivate it only for 2 countries where offset is different?

PHP Errors are all like this exception:

LOG: [COUNTRY] [2018-05-30 20:37:21] executing: ExportProductsSolr [id: 80d426cc-6437-11e8-adc4-4ccc6afcffd9]
<h1>Exception</h1><div style="background: #dadada; padding: 5px; font-weight: bold">PropelException - Unable to execute SELECT statement [SELECT gearman_synchronization.ID_GEARMAN_SYNCHRONIZATION, gearman_synchronization.UNIQUE_KEY, gearman_synchronization.ID_GROUP, gearman_synchronization.STATUS, gearman_synchronization.JOB_NAME, gearman_synchronization.USER, gearman_synchronization.EXCEPTION_MESSAGE, gearman_synchronization.CREATED_AT, gearman_synchronization.UPDATED_AT FROM `gearman_synchronization` WHERE gearman_synchronization.UNIQUE_KEY=:p1 LIMIT 1] [wrapped: SQLSTATE[HY000]: General error: 9001 Max connect timeout reached while reaching hostgroup 401 after 60305ms]</div><br />
in /data/www/staging/releases/20180530-050346/vendor/propel/propel1/runtime/lib/query/ModelCriteria.php (line: 1328)<br />

Interesting bug I found while writing - ProxySQL is shunning (with flapping between online/shunning) only the master which is fallback in the read group:

ProxySQL> SELECT hostgroup_id,hostname, status, weight, compression, max_connections, max_replication_lag, use_ssl, max_latency_ms FROM runtime_mysql_servers WHERE hostgroup_id IN (400,401);
+--------------+----------------+--------+---------+-------------+-----------------+---------------------+---------+----------------+
| hostgroup_id | hostname       | status | weight  | compression | max_connections | max_replication_lag | use_ssl | max_latency_ms |
+--------------+----------------+--------+---------+-------------+-----------------+---------------------+---------+----------------+
| 400          | staging-master | ONLINE | 1       | 1           | 10000           | 0                   | 0       | 0              |
| 401          | 172.16.30.1    | ONLINE | 9999000 | 1           | 10000           | 10                  | 0       | 0              |
| 401          | staging-master | ONLINE | 1       | 1           | 10000           | 0                   | 0       | 0              |
| 401          | staging-slave  | ONLINE | 9999    | 1           | 10000           | 10                  | 0       | 0              |
+--------------+----------------+--------+---------+-------------+-----------------+---------------------+---------+----------------+
4 rows in set (0.00 sec)

Ah, and compression is on again ... (=> next test without compression on all backends).

Here the full start log with first lines after the incident - since I haven't still found an info howto increase debugging I couldn't offer more yet:

2018-05-30 17:39:03 ProxySQL_Admin.cpp:7694:flush_error_log(): [INFO] ProxySQL version v1.4.9-0-gd9fd599_DEBUG
2018-05-30 17:39:03 ProxySQL_Admin.cpp:7697:flush_error_log(): [INFO] Detected OS: Linux gearman-worker-fs1dc12-01 4.9.0-6-amd64 #1 SMP Debian 4.9.88-1+deb9u1 (2018-05-07) x86_64
2018-05-30 17:39:03 main.cpp:829:ProxySQL_daemonize_phase2(): [INFO] Starting ProxySQL
2018-05-30 17:39:03 main.cpp:830:ProxySQL_daemonize_phase2(): [INFO] Sucessfully started
Main daemonize phase1 completed in 0.000378 secs.
2018-05-30 17:39:03 main.cpp:877:ProxySQL_daemonize_phase3(): [INFO] Angel process started ProxySQL process 16234
Standard ProxySQL Cluster rev. 0.1.0702_DEBUG -- ProxySQL_Cluster.cpp -- Wed May 30 12:20:00 2018
Standard ProxySQL Statistics rev. 1.4.1027_DEBUG -- ProxySQL_Statistics.cpp -- Wed May 30 12:20:00 2018
Standard ProxySQL HTTP Server Handler rev. 1.4.1031_DEBUG -- ProxySQL_HTTP_Server.cpp -- Wed May 30 12:20:00 2018
2018-05-30 17:39:03 ProxySQL_Admin.cpp:7651:Read_ProxySQL_Servers_from_configfile(): [INFO] Cluster: Adding ProxySQL Servers gearman-worker-fs1dc12-01:6032 from config file
Admin initialized in 0.040146 secs.
Standard ProxySQL Admin rev. 0.2.0902_DEBUG -- ProxySQL_Admin.cpp -- Wed May 30 12:20:00 2018
Standard MySQL Threads Handler rev. 0.2.0902_DEBUG -- MySQL_Thread.cpp -- Wed May 30 12:20:00 2018
Main phase3 : GloMyLogger initialized in 3e-06 secs.
Standard MySQL Authentication rev. 0.2.0902_DEBUG -- MySQL_Authentication.cpp -- Wed May 30 12:20:00 2018
Main init phase2 completed in 0.042859 secs.
Main phase3 : GloMyLogger initialized in 2e-06 secs.
2018-05-30 17:39:03 MySQL_HostGroups_Manager.cpp:918:generate_mysql_servers_table(): [INFO] Dumping current MySQL Servers structures for hostgroup ALL
+--------------+----------+------+--------+--------+-------------+-----------------+---------------------+---------+----------------+---------+-------------+
| hostgroup_id | hostname | port | weight | status | compression | max_connections | max_replication_lag | use_ssl | max_latency_ms | comment | mem_pointer |
+--------------+----------+------+--------+--------+-------------+-----------------+---------------------+---------+----------------+---------+-------------+
+--------------+----------+------+--------+--------+-------------+-----------------+---------------------+---------+----------------+---------+-------------+
2018-05-30 17:39:03 MySQL_HostGroups_Manager.cpp:587:commit(): [INFO] Dumping mysql_servers_incoming
+--------------+--------------------------------------+------+---------+--------+-------------+-----------------+---------------------+---------+----------------+----------------------------------
---------------------+
| hostgroup_id | hostname               | port | weight  | status | compression | max_connections | max_replication_lag | use_ssl | max_latency_ms | comment
                     |
+--------------+--------------------------------------+------+---------+--------+-------------+-----------------+---------------------+---------+----------------+----------------------------------
---------------------+
| 500          | production-master       | 3306 | 1       | 0      | 1           | 10000           | 0                   | 0       | 0              | production: master
                     |
| 501          | production-master       | 3306 | 1       | 0      | 1           | 10000           | 0                   | 0       | 0              | production: slave
                     |
| 501          | 172.16.30.1                          | 3306 | 9999000 | 0      | 1           | 10000           | 10                  | 0       | 0              | production: local slave gearm
an-worker-fs1dc12-01 |
| 501          | production-slave | 3306 | 9999    | 0      | 1           | 10000           | 10                  | 0       | 0              | production: slave
                     |
| 400          | staging-master          | 3306 | 1       | 0      | 1           | 10000           | 0                   | 0       | 0              | staging: master
                     |
| 401          | staging-master          | 3306 | 1       | 0      | 1           | 10000           | 0                   | 0       | 0              | staging: slave
                     |
| 401          | 172.16.30.1                          | 3306 | 9999000 | 0      | 1           | 10000           | 10                  | 0       | 0              | staging: local slave gearman-worker-fs1dc12-01    |
| 401          | staging-slave    | 3306 | 9999    | 0      | 1           | 10000           | 10                  | 0       | 0              | staging: slave                                    |
+--------------+--------------------------------------+------+---------+--------+-------------+-----------------+---------------------+---------+----------------+-------------------------------------------------------+
2018-05-30 17:39:03 MySQL_HostGroups_Manager.cpp:1026:generate_mysql_replication_hostgroups_table(): [INFO] New mysql_replication_hostgroups table
2018-05-30 17:39:03 MySQL_HostGroups_Manager.cpp:1065:generate_mysql_group_replication_hostgroups_table(): [INFO] New mysql_group_replication_hostgroups table
2018-05-30 17:39:03 MySQL_HostGroups_Manager.cpp:918:generate_mysql_servers_table(): [INFO] Dumping current MySQL Servers structures for hostgroup ALL
HID: 500 , address: production-master , port: 3306 , weight: 1 , status: ONLINE , max_connections: 10000 , max_replication_lag: 0 , use_ssl: 0 , max_latency_ms: 0 , comment: production: master
HID: 501 , address: production-master , port: 3306 , weight: 1 , status: ONLINE , max_connections: 10000 , max_replication_lag: 0 , use_ssl: 0 , max_latency_ms: 0 , comment: production: slave
HID: 501 , address: 172.16.30.1 , port: 3306 , weight: 9999000 , status: ONLINE , max_connections: 10000 , max_replication_lag: 10 , use_ssl: 0 , max_latency_ms: 0 , comment: production: local slave gearman-worker-fs1dc12-01
HID: 501 , address: production-slave , port: 3306 , weight: 9999 , status: ONLINE , max_connections: 10000 , max_replication_lag: 10 , use_ssl: 0 , max_latency_ms: 0 , comment: production: slave
HID: 400 , address: staging-master , port: 3306 , weight: 1 , status: ONLINE , max_connections: 10000 , max_replication_lag: 0 , use_ssl: 0 , max_latency_ms: 0 , comment: staging: master
HID: 401 , address: staging-master , port: 3306 , weight: 1 , status: ONLINE , max_connections: 10000 , max_replication_lag: 0 , use_ssl: 0 , max_latency_ms: 0 , comment: staging: slave
HID: 401 , address: 172.16.30.1 , port: 3306 , weight: 9999000 , status: ONLINE , max_connections: 10000 , max_replication_lag: 10 , use_ssl: 0 , max_latency_ms: 0 , comment: staging: local slave gearman-worker-fs1dc12-01
HID: 401 , address: staging-slave , port: 3306 , weight: 9999 , status: ONLINE , max_connections: 10000 , max_replication_lag: 10 , use_ssl: 0 , max_latency_ms: 0 , comment: staging: slave
2018-05-30 17:39:03 MySQL_HostGroups_Manager.cpp:1014:generate_mysql_servers_table(): [INFO] Dumping mysql_servers
+--------------+--------------------------------------+------+---------+--------+-------------+-----------------+---------------------+---------+----------------+-------------------------------------------------------+-----------------+
| hostgroup_id | hostname                             | port | weight  | status | compression | max_connections | max_replication_lag | use_ssl | max_latency_ms | comment                                               | mem_pointer     |
+--------------+--------------------------------------+------+---------+--------+-------------+-----------------+---------------------+---------+----------------+-------------------------------------------------------+-----------------+
| 500          | production-master       | 3306 | 1       | 0      | 1           | 10000           | 0                   | 0       | 0              | production: master                                | 140221738338944 |
| 401          | staging-slave    | 3306 | 9999    | 0      | 1           | 10000           | 10                  | 0       | 0              | staging: slave                                    | 140221738338048 |
| 401          | 172.16.30.1                          | 3306 | 9999000 | 0      | 1           | 10000           | 10                  | 0       | 0              | staging: local slave gearman-worker-fs1dc12-01    | 140221738338176 |
| 401          | staging-master          | 3306 | 1       | 0      | 1           | 10000           | 0                   | 0       | 0              | staging: slave                                    | 140221738338304 |
| 400          | staging-master          | 3306 | 1       | 0      | 1           | 10000           | 0                   | 0       | 0              | staging: master                                   | 140221738338432 |
| 501          | production-slave | 3306 | 9999    | 0      | 1           | 10000           | 10                  | 0       | 0              | production: slave                                 | 140221738338560 |
| 501          | 172.16.30.1                          | 3306 | 9999000 | 0      | 1           | 10000           | 10                  | 0       | 0              | production: local slave gearman-worker-fs1dc12-01 | 140221738338688 |
| 501          | production-master       | 3306 | 1       | 0      | 1           | 10000           | 0                   | 0       | 0              | production: slave                                 | 140221738338816 |
+--------------+--------------------------------------+------+---------+--------+-------------+-----------------+---------------------+---------+----------------+-------------------------------------------------------+-----------------+
2018-05-30 17:39:03 ProxySQL_Cluster.cpp:247:ProxySQL_Node_Entry(): [INFO] Created new Cluster Node Entry for host gearman-worker-fs1dc12-01:6032
Main phase3 : GloAdmin initialized in 0.003388 secs.
Standard Query Processor rev. 0.2.0902_DEBUG -- Query_Processor.cpp -- Wed May 30 12:20:00 2018
2018-05-30 17:39:03 ProxySQL_Cluster.cpp:60:ProxySQL_Cluster_Monitor_thread(): [INFO] Cluster: starting thread for peer gearman-worker-fs1dc12-01:6032
Main phase3 : Query Processor initialized in 0.000514 secs.
2018-05-30 17:39:03 ProxySQL_Cluster.cpp:180:ProxySQL_Cluster_Monitor_thread(): [WARNING] Cluster: unable to connect to peer gearman-worker-fs1dc12-01:6032 . Error: Can't connect to MySQL server on 'gearman-worker-fs1dc12-01' (107)
Main phase3 : MySQL Threads Handler initialized in 0.001844 secs.
In memory Standard Query Cache (SQC) rev. 1.2.0905_DEBUG -- Query_Cache.cpp -- Wed May 30 12:20:00 2018
Main phase3 : Query Cache initialized in 0.000174 secs.
2018-05-30 17:39:03 ProxySQL_Cluster.cpp:180:ProxySQL_Cluster_Monitor_thread(): [WARNING] Cluster: unable to connect to peer gearman-worker-fs1dc12-01:6032 . Error: Can't connect to MySQL server on 'gearman-worker-fs1dc12-01' (107)
2018-05-30 17:39:04 ProxySQL_Cluster.cpp:180:ProxySQL_Cluster_Monitor_thread(): [WARNING] Cluster: unable to connect to peer gearman-worker-fs1dc12-01:6032 . Error: Can't connect to MySQL server on 'gearman-worker-fs1dc12-01' (107)
2018-05-30 17:39:04 ProxySQL_Cluster.cpp:334:set_checksums(): [INFO] Cluster: detected a new checksum for mysql_query_rules from peer gearman-worker-fs1dc12-01:6032, version 1, epoch 1527701943, checksum 0x92405DB72E48C230 . Not syncing yet ...
2018-05-30 17:39:04 ProxySQL_Cluster.cpp:336:set_checksums(): [INFO] Cluster: checksum for mysql_query_rules from peer gearman-worker-fs1dc12-01:6032 matches with local checksum 0x92405DB72E48C230 , we won't sync.
2018-05-30 17:39:04 ProxySQL_Cluster.cpp:354:set_checksums(): [INFO] Cluster: detected a new checksum for mysql_servers from peer gearman-worker-fs1dc12-01:6032, version 1, epoch 1527701943, checksum 0x1128419592D44BE4 . Not syncing yet ...
2018-05-30 17:39:04 ProxySQL_Cluster.cpp:356:set_checksums(): [INFO] Cluster: checksum for mysql_servers from peer gearman-worker-fs1dc12-01:6032 matches with local checksum 0x1128419592D44BE4 , we won't sync.
2018-05-30 17:39:04 ProxySQL_Cluster.cpp:374:set_checksums(): [INFO] Cluster: detected a new checksum for mysql_users from peer gearman-worker-fs1dc12-01:6032, version 1, epoch 1527701943, checksum 0x1F26AE2C9089A939 . Not syncing yet ...
2018-05-30 17:39:04 ProxySQL_Cluster.cpp:376:set_checksums(): [INFO] Cluster: checksum for mysql_users from peer gearman-worker-fs1dc12-01:6032 matches with local checksum 0x1F26AE2C9089A939 , we won't sync.
2018-05-30 17:39:04 ProxySQL_Cluster.cpp:410:set_checksums(): [INFO] Cluster: detected a new checksum for proxysql_servers from peer gearman-worker-fs1dc12-01:6032, version 1, epoch 1527701943, checksum 0x85B2851AFCC5981D . Not syncing yet ...
2018-05-30 17:39:04 ProxySQL_Cluster.cpp:412:set_checksums(): [INFO] Cluster: checksum for proxysql_servers from peer gearman-worker-fs1dc12-01:6032 matches with local checksum 0x85B2851AFCC5981D , we won't sync.
Main phase3 : MySQL Threads Handler listeners started in 4.00432 secs.
Standard MySQL Monitor (StdMyMon) rev. 1.2.0723_DEBUG -- MySQL_Monitor.cpp -- Wed May 30 12:20:00 2018
Main phase3 : MySQL Monitor initialized in 7.8e-05 secs.
Main init phase3 completed in 4.01297 secs.
2018-05-30 18:07:08 main.cpp:1086:main(): [ERROR] Watchdog: 1 threads missed a heartbeat
2018-05-30 18:07:11 main.cpp:1086:main(): [ERROR] Watchdog: 1 threads missed a heartbeat
2018-05-30 18:30:12 main.cpp:1086:main(): [ERROR] Watchdog: 2 threads missed a heartbeat
2018-05-30 18:30:15 main.cpp:1086:main(): [ERROR] Watchdog: 3 threads missed a heartbeat
2018-05-30 18:31:09 main.cpp:1086:main(): [ERROR] Watchdog: 3 threads missed a heartbeat
2018-05-30 18:31:12 main.cpp:1086:main(): [ERROR] Watchdog: 1 threads missed a heartbeat
2018-05-30 18:31:15 main.cpp:1086:main(): [ERROR] Watchdog: 2 threads missed a heartbeat
2018-05-30 18:32:09 main.cpp:1086:main(): [ERROR] Watchdog: 1 threads missed a heartbeat
2018-05-30 18:32:12 main.cpp:1086:main(): [ERROR] Watchdog: 1 threads missed a heartbeat
2018-05-30 18:34:15 main.cpp:1086:main(): [ERROR] Watchdog: 1 threads missed a heartbeat
2018-05-30 18:37:17 mysql_connection.cpp:629:handler(): [ERROR] Failed to mysql_real_connect() on staging-master:3306 , FD (Conn:5410 , MyDS:5410) , 1040: Too many connections.
2018-05-30 18:37:17 mysql_connection.cpp:629:handler(): [ERROR] Failed to mysql_real_connect() on staging-master:3306 , FD (Conn:5411 , MyDS:5411) , 1040: Too many connections.
2018-05-30 18:37:17 mysql_connection.cpp:629:handler(): [ERROR] Failed to mysql_real_connect() on staging-master:3306 , FD (Conn:5410 , MyDS:5410) , 1040: Too many connections.
2018-05-30 18:37:17 MySQL_HostGroups_Manager.cpp:302:connect_error(): [ERROR] Shunning server staging-master:3306 with 5 errors/sec. Shunning for 10 seconds
2018-05-30 18:37:25 mysql_connection.cpp:629:handler(): [ERROR] Failed to mysql_real_connect() on staging-master:3306 , FD (Conn:5603 , MyDS:5603) , 1040: Too many connections.
2018-05-30 18:37:25 mysql_connection.cpp:629:handler(): [ERROR] Failed to mysql_real_connect() on staging-master:3306 , FD (Conn:5603 , MyDS:5603) , 1040: Too many connections.
2018-05-30 18:37:25 mysql_connection.cpp:629:handler(): [ERROR] Failed to mysql_real_connect() on staging-master:3306 , FD (Conn:5603 , MyDS:5603) , 1040: Too many connections.
2018-05-30 18:37:25 MySQL_HostGroups_Manager.cpp:302:connect_error(): [ERROR] Shunning server staging-master:3306 with 5 errors/sec. Shunning for 10 seconds

@Reiner030
Copy link
Author

Sadly there is still no cause found. I tried many possible changes with every time testing the behavior 3x with 30-45 minutes runtime each ... I guess you can imagine how much fun I had last weeks because there is no documentation howto debug ProxySQL correctly.

Because many tests were failing I took the packaged config (with threads=4) and added only the urgently needed config parts like server, users, rules, and maxsize/timout values from our config to the default one.

In my tests (with our deactivated multiplexing because of the INDEX_OFFSET=... initialization) it was working last Thursday/Friday like this:

  • system worked fine when I used directly the master only
  • system worked fine when I used master only over ProxySQL without compression
  • system worked fine when I used master with external slave over ProxySQL without compression
  • system worked fine when I used master with external and localhost slave over ProxySQL without compression
  • system worked bad when I used master with external and localhost slave over ProxySQL with mixed compression setup (master, external but not localhost slave)
  • system worked bad when I used master with external and localhost slave over ProxySQL with all servers compression setup

From this tests it looks like that the compression commonly breaks our pools.

But today the production run failed again even with using the tested successfull settings and so staging which results that there is no "real" replicable "experimental" set-up possible. 😭

We can modify the setup /routine for this special case but must have in mind that connections running over ProxySQL could fail without recognizable cause in many other high throughput situations so that ProxySQL seems needed to be replaced somehow which I tried to avoid; also it's nice included in solutions like from Severalnines which could be a later enhancment.

Here the failed staging run from today with ProxySQL stats page completely:
https://drive.google.com/open?id=1c1CpSDVR2mE-CksEwt9UdGsz5V3h7nfl
Here with ignoring the server/client connections and question to show other values:
https://drive.google.com/open?id=1bIdkkRNQGNp3GX63YUQYEwxg_HWt_Gv1
The other values are always 0.

I tried it also with free_connections_pct=0 instead default 10 but it was also failing.

If interested I could also mail/share all other test images but still it seems to be non-reproducable within several days it makes no much sense I guess.

@joelhock
Copy link

I have run across an issue with mixed compressed/uncompressed clients, also, with an easy way to reproduce. if an uncompressed client's backend response gets cached, then the cached copy can be used correctly with either a future uncompressed or compressed client. however, if a compressed client's request to the backend gets cached, then when a future uncompressed client sends the same query, it seems compressed cached data gets sent to it, and it drops the connection quickly.

@renecannao
Copy link
Contributor

Thank you @joelhock , perhaps you provided the missing information to get this worked on!

@joelhock
Copy link

one more piece of information: in addition to my prior comment about compressed cached results not being usable for uncompressed clients, it also seems if the client that triggers a new connection to a database backend is a compressed client, then that connection to the backend stays forever compressed and future requests from uncompressed clients that are fulfilled by that backend connection will cause the uncompressed client to disconnect.

renecannao added a commit that referenced this issue May 17, 2020
Fixes #2793 and #1493: Connection issues in compressed connections and in mixed compressed/uncompressed connections.
@renecannao
Copy link
Contributor

Reopening , this isn't fixed in 2.1.0

@renecannao renecannao reopened this May 17, 2020
JavierJF added a commit to JavierJF/proxysql that referenced this issue May 18, 2020
renecannao added a commit that referenced this issue May 20, 2020
Fixes #2793 and #1493: Connection issues in compressed connections and in mixed compressed/uncompressed connections.
@joelhock
Copy link

my test cases now work with 2.0.13. thank you!!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants