Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

v2.0.6 and up: Way more than max_connections connections #2619

Closed
jbransen opened this issue Mar 27, 2020 · 10 comments · Fixed by #3195
Closed

v2.0.6 and up: Way more than max_connections connections #2619

jbransen opened this issue Mar 27, 2020 · 10 comments · Fixed by #3195

Comments

@jbransen
Copy link

jbransen commented Mar 27, 2020

I upgraded from v2.0.4 to v2.0.10, but when switching over the ProxySQL log was flooded with:

2020-03-27 10:12:40 mysql_connection.cpp:835:handler(): [ERROR] Failed to mysql_real_connect() on dbmaster:3306 , FD (Conn:1120 , MyDS:1120) , 1040: Too many connections.
2020-03-27 10:12:40 mysql_connection.cpp:835:handler(): [ERROR] Failed to mysql_real_connect() on dbmaster:3306 , FD (Conn:1120 , MyDS:1120) , 1040: Too many connections.
2020-03-27 10:12:40 mysql_connection.cpp:835:handler(): [ERROR] Failed to mysql_real_connect() on dbmaster:3306 , FD (Conn:1120 , MyDS:1120) , 1040: Too many connections.
2020-03-27 10:12:40 MySQL_HostGroups_Manager.cpp:905:connect_error(): [ERROR] Shunning server dbmaster:3306 with 5 errors/sec. Shunning for 10 seconds

I'm running ProxySQL v2.0.10 on Ubuntu 18.04 connecting to MariaDB 10.3.14.

After increasing several limits, timeouts, etc., I found out the following. On the ProxySQL side, in the (runtime_)mysql_servers table, max_connections is set to 200. On the mysql side max_connections is 2000. I had a few other processes connected directly to MySQL, and ProxySQL was using all of the remaining 2000 connections. Then I increased max_connections on the mysql side and within seconds ProxySQL was using 3000 more to saturate this. My upstream client uses a pool of size 100, mysql-max_connections is set to 2048, and with ProxySQL v2.0.4 there are only 58 open connections at this time, under the same load where ProxySQL v2.0.10 was using thousands.

Somehow there is a large difference in the two versions, and since this only happens in a production environment I am not so keen on debugging much further, but I am happy to provide more relevant info. To me this very much looks like a bug, but maybe there is some config setting or change that I overlooked that could cause this?

@renecannao
Copy link
Contributor

Can you please share the output from stats_mysql_global and stats_mysql_connection_pool ?
And also a large section (or all) of the error log.

What you described it shouldn't happen, and if it happened that indeed seems like a bug.
The outputs required may start giving some clue.

Thanks

@jbransen
Copy link
Author

Uhm I made a stupid mistake, I reverted to the older version, which means I also lost all stats. Apologies for this, I will try to reproduce this within the next few days and hopefully share more data.

@jbransen
Copy link
Author

jbransen commented Apr 2, 2020

With some effort I can reproduce this on a development instance. This is with select queries only, but I could not reproduce it with simple queries, so I had to use our custom software that uses transactions. My server setting is now such that I allow at most 20 parallel connections:

Admin> select * from runtime_mysql_servers;
+--------------+---------------------------------+------+-----------+--------+--------+-------------+-----------------+---------------------+---------+----------------+---------+
| hostgroup_id | hostname                        | port | gtid_port | status | weight | compression | max_connections | max_replication_lag | use_ssl | max_latency_ms | comment |
+--------------+---------------------------------+------+-----------+--------+--------+-------------+-----------------+---------------------+---------+----------------+---------+
| 0            | dbserver                        | 3306 | 0         | ONLINE | 1      | 0           | 20              | 0                   | 0       | 0              |         |
+--------------+---------------------------------+------+-----------+--------+--------+-------------+-----------------+---------------------+---------+----------------+---------+
1 row in set (0.00 sec)

On this I've twice run 100 parallel requests, where each requests runs 4 queries over the same connection. When I ask ProxySQL everything seems fine:

Admin> select * from stats_mysql_connection_pool;
+-----------+---------------------------------+----------+--------+----------+----------+--------+---------+-------------+---------+-------------------+-----------------+-----------------+------------+
| hostgroup | srv_host                        | srv_port | status | ConnUsed | ConnFree | ConnOK | ConnERR | MaxConnUsed | Queries | Queries_GTID_sync | Bytes_data_sent | Bytes_data_recv | Latency_us |
+-----------+---------------------------------+----------+--------+----------+----------+--------+---------+-------------+---------+-------------------+-----------------+-----------------+------------+
| 0         | dbserver                        | 3306     | ONLINE | 0        | 2        | 200    | 0       | 10          | 800     | 0                 | 510800          | 17051786        | 0          |
+-----------+---------------------------------+----------+--------+----------+----------+--------+---------+-------------+---------+-------------------+-----------------+-----------------+------------+
1 row in set (0.00 sec)

The global stats don't show anything interesting to me, except that the monitor user fails because I have not configured that on my dev instance. It was working fine on the production instance when I had this same problem, so I strongly believe we can ignore those.

Admin> select * from stats_mysql_global;
+---------------------------------------------+----------------+
| Variable_Name                               | Variable_Value |
+---------------------------------------------+----------------+
| ProxySQL_Uptime                             | 522            |
| Active_Transactions                         | 0              |
| Client_Connections_aborted                  | 0              |
| Client_Connections_connected                | 0              |
| Client_Connections_created                  | 20             |
| Server_Connections_aborted                  | 0              |
| Server_Connections_connected                | 2              |
| Server_Connections_created                  | 200            |
| Server_Connections_delayed                  | 0              |
| Client_Connections_non_idle                 | 0              |
| Queries_backends_bytes_recv                 | 17051786       |
| Queries_backends_bytes_sent                 | 510800         |
| Queries_frontends_bytes_recv                | 516400         |
| Queries_frontends_bytes_sent                | 17747986       |
| Query_Processor_time_nsec                   | 0              |
| Backend_query_time_nsec                     | 0              |
| mysql_backend_buffers_bytes                 | 0              |
| mysql_frontend_buffers_bytes                | 0              |
| mysql_session_internal_bytes                | 10176          |
| Com_autocommit                              | 20             |
| Com_autocommit_filtered                     | 20             |
| Com_commit                                  | 200            |
| Com_commit_filtered                         | 0              |
| Com_rollback                                | 0              |
| Com_rollback_filtered                       | 0              |
| Com_backend_change_user                     | 0              |
| Com_backend_init_db                         | 0              |
| Com_backend_set_names                       | 200            |
| Com_frontend_init_db                        | 0              |
| Com_frontend_set_names                      | 0              |
| Com_frontend_use_db                         | 0              |
| Com_backend_stmt_prepare                    | 0              |
| Com_backend_stmt_execute                    | 0              |
| Com_backend_stmt_close                      | 0              |
| Com_frontend_stmt_prepare                   | 0              |
| Com_frontend_stmt_execute                   | 0              |
| Com_frontend_stmt_close                     | 0              |
| Mirror_concurrency                          | 0              |
| Mirror_queue_length                         | 0              |
| Questions                                   | 820            |
| Selects_for_update__autocommit0             | 0              |
| Slow_queries                                | 0              |
| GTID_consistent_queries                     | 0              |
| GTID_session_collected                      | 0              |
| Servers_table_version                       | 2              |
| MySQL_Thread_Workers                        | 4              |
| Access_Denied_Wrong_Password                | 0              |
| Access_Denied_Max_Connections               | 0              |
| Access_Denied_Max_User_Connections          | 0              |
| MySQL_Monitor_Workers                       | 8              |
| MySQL_Monitor_Workers_Aux                   | 0              |
| MySQL_Monitor_Workers_Started               | 8              |
| MySQL_Monitor_connect_check_OK              | 0              |
| MySQL_Monitor_connect_check_ERR             | 6              |
| MySQL_Monitor_ping_check_OK                 | 0              |
| MySQL_Monitor_ping_check_ERR                | 66             |
| MySQL_Monitor_read_only_check_OK            | 0              |
| MySQL_Monitor_read_only_check_ERR           | 0              |
| MySQL_Monitor_replication_lag_check_OK      | 0              |
| MySQL_Monitor_replication_lag_check_ERR     | 0              |
| ConnPool_get_conn_latency_awareness         | 0              |
| ConnPool_get_conn_immediate                 | 0              |
| ConnPool_get_conn_success                   | 200            |
| ConnPool_get_conn_failure                   | 0              |
| generated_error_packets                     | 0              |
| max_connect_timeouts                        | 0              |
| backend_lagging_during_query                | 0              |
| backend_offline_during_query                | 0              |
| queries_with_max_lag_ms                     | 0              |
| queries_with_max_lag_ms__delayed            | 0              |
| queries_with_max_lag_ms__total_wait_time_us | 0              |
| mysql_unexpected_frontend_com_quit          | 0              |
| Client_Connections_hostgroup_locked         | 0              |
| hostgroup_locked_set_cmds                   | 0              |
| hostgroup_locked_queries                    | 0              |
| mysql_unexpected_frontend_packets           | 0              |
| aws_aurora_replicas_skipped_during_query    | 0              |
| automatic_detected_sql_injection            | 0              |
| whitelisted_sqli_fingerprint                | 0              |
| mysql_killed_backend_connections            | 0              |
| mysql_killed_backend_queries                | 0              |
| MyHGM_myconnpoll_get                        | 200            |
| MyHGM_myconnpoll_get_ok                     | 200            |
| MyHGM_myconnpoll_push                       | 255            |
| MyHGM_myconnpoll_destroy                    | 0              |
| MyHGM_myconnpoll_reset                      | 0              |
| SQLite3_memory_bytes                        | 3140792        |
| ConnPool_memory_bytes                       | 143352         |
| Stmt_Client_Active_Total                    | 0              |
| Stmt_Client_Active_Unique                   | 0              |
| Stmt_Server_Active_Total                    | 0              |
| Stmt_Server_Active_Unique                   | 0              |
| Stmt_Max_Stmt_id                            | 1              |
| Stmt_Cached                                 | 0              |
| Query_Cache_Memory_bytes                    | 0              |
| Query_Cache_count_GET                       | 0              |
| Query_Cache_count_GET_OK                    | 0              |
| Query_Cache_count_SET                       | 0              |
| Query_Cache_bytes_IN                        | 0              |
| Query_Cache_bytes_OUT                       | 0              |
| Query_Cache_Purged                          | 0              |
| Query_Cache_Entries                         | 0              |
+---------------------------------------------+----------------+
102 rows in set (0.00 sec)

However, on the mysql side I have a processlist that shows the following:

MariaDB [(none)]> select * from information_schema.processlist where user = 'debug';
+-------+-------+----------------------+--------+---------+------+-------+------+----------+-------+-----------+----------+-------------+-----------------+---------------+-----------+-------------+-------+
| ID    | USER  | HOST                 | DB     | COMMAND | TIME | STATE | INFO | TIME_MS  | STAGE | MAX_STAGE | PROGRESS | MEMORY_USED | MAX_MEMORY_USED | EXAMINED_ROWS | QUERY_ID  | INFO_BINARY | TID   |
+-------+-------+----------------------+--------+---------+------+-------+------+----------+-------+-----------+----------+-------------+-----------------+---------------+-----------+-------------+-------+
| 50754 | debug | proxysqlserver:47766 | somedb | Sleep   |    0 |       | NULL |   75.515 |     0 |         0 |    0.000 |       79072 |           79640 |             0 | 103145748 | NULL        | 17422 |
| 50753 | debug | proxysqlserver:47764 | somedb | Sleep   |    0 |       | NULL |   33.316 |     0 |         0 |    0.000 |       78016 |         1125696 |             0 | 103145767 | NULL        | 17421 |
| 50752 | debug | proxysqlserver:47762 | somedb | Sleep   |    0 |       | NULL |   30.991 |     0 |         0 |    0.000 |       78016 |         1125696 |             0 | 103145766 | NULL        | 17420 |
[...snip...many similar rows...]
| 50675 | debug | proxysqlserver:47208 | somedb | Sleep   |    4 |       | NULL | 4416.542 |     0 |         0 |    0.000 |       77752 |         1125432 |             0 | 103142437 | NULL        | 17236 |
| 50674 | debug | proxysqlserver:47204 | somedb | Sleep   |    4 |       | NULL | 4506.802 |     0 |         0 |    0.000 |       77752 |         1125432 |             0 | 103142338 | NULL        | 17235 |
| 50536 | debug | proxysqlserver:46682 | somedb | Sleep   |    6 |       | NULL | 6389.169 |     0 |         0 |    0.000 |       78016 |         1125696 |             0 | 103141122 | NULL        | 17164 |
| 50520 | debug | proxysqlserver:46542 | somedb | Sleep   |    8 |       | NULL | 8508.163 |     0 |         0 |    0.000 |       77752 |         1125432 |             0 | 103139295 | NULL        | 16980 |
+-------+-------+----------------------+--------+---------+------+-------+------+----------+-------+-----------+----------+-------------+-----------------+---------------+-----------+-------------+-------+
81 rows in set (0.010 sec)

and also

MariaDB [(none)]> show status like 'Threads_connected';
+-------------------+-------+
| Variable_name     | Value |
+-------------------+-------+
| Threads_connected | 105   |
+-------------------+-------+
1 row in set (0.001 sec)

(there were a few other connections, the number dropped to 25 after my test)

This mysql server is being used by other processes, so I left max_connections on the mysql side high enough to not be exceeded this time. The ProxySQL logs therefore show nothing except an error that the monitor user could not connect.

I'm convinced that if I lower max_connections in mysql the connections will start failing again, because this happened in my production environment as well, and the above logs clearly show that there were more than 20 connections open even though ProxySQL there weren't.

@renecannao
Copy link
Contributor

This is with select queries only, but I could not reproduce it with simple queries, so I had to use our custom software that uses transactions.

Is there anything else "special" in your custom software, other than the fact that uses transactions?
I am asking to narrow the scope of this issue, to be able to reproduce it ourselves too.

Thanks

@jbransen
Copy link
Author

jbransen commented Apr 2, 2020

Not that I am aware of, but so far I haven't been able to reproduce it with simpler queries, so there must be something in there which triggers this. As a reference, I am using this for the MySQL connection. I will try to narrow this down to a reproducable example where I can share the full code.

@jbransen jbransen changed the title v2.0.10: Way more than max_connections connections v2.0.6 and up: Way more than max_connections connections Apr 3, 2020
@jbransen
Copy link
Author

jbransen commented Apr 3, 2020

I've bisected that this bug was introduced with version 2.0.6, I've updated the title.

Here is some code to reproduce it. It's in Haskell, it can be build with the Stack build tool, put the files in a directory and run stack build to compile, and then stack exec proxysql-debug to execute. It assumes a ProxySQL instance on localhost:6033 with user debug and no password, but if should be obvious where to change that in the code.

This code makes 10 connections and for each connection does 10 SELECT 1 queries, all sequentially. When running this with ProxySQL 2.0.6 or newer, where max_connections in ProxySQL is set to 20, I can see 100 open connections in MySQL directly after running the queries. Although the code only should make 10 connections and ProxySQL would need only a single connection, it somehow creates a new connection for each query. The runtime is also surprisingly different, with v2.0.5 this runs in 90ms, while with v2.0.6 it takes more than 300ms, which can be well explained when a new connection is made for each query.

What I have not been able to do is reproduce this with other MySQL clients. With tcpdump I can see that the only difference between the clients seems to be in capabilities in the login. After this packet all query packets seem identical, so I am pretty sure it must be in something that's set upon connection. Here is a dump from Wireshark for such login packet:

Frame 6: 110 bytes on wire (880 bits), 110 bytes captured (880 bits)
Ethernet II, Src: 00:00:00_00:00:00 (00:00:00:00:00:00), Dst: 00:00:00_00:00:00 (00:00:00:00:00:00)
Internet Protocol Version 4, Src: 127.0.0.1, Dst: 127.0.0.1
Transmission Control Protocol, Src Port: 53112, Dst Port: 6033, Seq: 1, Ack: 79, Len: 44
MySQL Protocol
    Packet Length: 40
    Packet Number: 1
    Login Request
        Client Capabilities: 0xa30d
            .... .... .... ...1 = Long Password: Set
            .... .... .... ..0. = Found Rows: Not set
            .... .... .... .1.. = Long Column Flags: Set
            .... .... .... 1... = Connect With Database: Set
            .... .... ...0 .... = Don't Allow database.table.column: Not set
            .... .... ..0. .... = Can use compression protocol: Not set
            .... .... .0.. .... = ODBC Client: Not set
            .... .... 0... .... = Can Use LOAD DATA LOCAL: Not set
            .... ...1 .... .... = Ignore Spaces before '(': Set
            .... ..1. .... .... = Speaks 4.1 protocol (new flag): Set
            .... .0.. .... .... = Interactive Client: Not set
            .... 0... .... .... = Switch to SSL after handshake: Not set
            ...0 .... .... .... = Ignore sigpipes: Not set
            ..1. .... .... .... = Knows about transactions: Set
            .0.. .... .... .... = Speaks 4.1 protocol (old flag): Not set
            1... .... .... .... = Can do 4.1 authentication: Set
        Extended Client Capabilities: 0x0001
            .... .... .... ...1 = Multiple statements: Set
            .... .... .... ..0. = Multiple results: Not set
            .... .... .... .0.. = PS Multiple results: Not set
            .... .... .... 0... = Plugin Auth: Not set
            .... .... ...0 .... = Connect attrs: Not set
            .... .... ..0. .... = Plugin Auth LENENC Client Data: Not set
            .... .... .0.. .... = Client can handle expired passwords: Not set
            .... .... 0... .... = Session variable tracking: Not set
            .... ...0 .... .... = Deprecate EOF: Not set
            0000 000. .... .... = Unused: 0x00
        MAX Packet: 16777215
        Charset: utf8 COLLATE utf8_general_ci (33)
        Username: debug
        Schema: 

@renecannao
Copy link
Contributor

Thank you, very informative.
I will try to reproduce it during the weekend.

A personal note, in 2.0.6 we added tracking of capabilities flags CLIENT_FOUND_ROWS , CLIENT_MULTI_RESULTS and CLIENT_MULTI_STATEMENTS.
Maybe it is related

@jbransen
Copy link
Author

@renecannao ping, any luck looking into this?

@JavierJF
Copy link
Collaborator

Hi @jbransen , this issue has been identified and should be fixed for v2.0.16, issue was regarding mysql-haskell not setting the correct flags specified by the MySQL specification (as you can see here https://dev.mysql.com/doc/c-api/8.0/en/c-api-multiple-queries.html), CLIENT_MULTI_RESULTS should be set in case of setting CLIENT_MULTI_STATEMENTS, as it's implied by it. Because of this, a mismatch between client and backend connections flags was being generated. This is now enforced in our side in case a client doesn't respect it. Thanks you!

@jbransen
Copy link
Author

Great, thanks for the heads up! I've also opened an upstream MR

renecannao added a commit that referenced this issue Dec 20, 2020
Closes #2619: ProxySQL is creating more than 'max_connections' connections
renecannao added a commit that referenced this issue Dec 27, 2020
Closes #2619: ProxySQL is creating more than 'max_connections' connections
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants