Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

1.4.6 SHUNNED host didn't reovery #1428

Closed
cloudufull opened this issue Mar 21, 2018 · 4 comments
Closed

1.4.6 SHUNNED host didn't reovery #1428

cloudufull opened this issue Mar 21, 2018 · 4 comments

Comments

@cloudufull
Copy link

###os : CentOS release 6.7 (Final) x64
###proxysql version : 1.4.6-7-g3b9e4a6

##issue:

  • our slave host became SHUNNED because of lag exceed for a moment,
  • when the lag turns to 0 ,our slave became online agine
  • but proxy sql didn't Forward read query sql throw slave host agine
  • i tried to delete slave host from mysql_servers table ,and add it back again manually , then the proxy returns to normal,Continue forwarding
    sql to slave host.
  • I think it may be a bug~

log detail

2018-03-21 12:00:18 MySQL_Session.cpp:2779:handler(): [WARNING] Error during query on (1000,10.13.43.41,3355): 1062, Duplicate entry '393248824-22959' for key 'user_prize_unq'
2018-03-21 12:00:20 MySQL_HostGroups_Manager.cpp:1521:replication_lag_action(): [WARNING] Shunning server 10.13.43.36:3355 with replication lag of 6 second
2018-03-21 12:00:20 MySQL_Session.cpp:2693:handler(): [ERROR] Detected an offline server during query: 10.13.43.36, 3355
2018-03-21 12:00:20 MySQL_Session.cpp:2693:handler(): [ERROR] Detected an offline server during query: 10.13.43.36, 3355
2018-03-21 12:00:20 MySQL_Session.cpp:2693:handler(): [ERROR] Detected an offline server during query: 10.13.43.36, 3355
2018-03-21 12:00:20 MySQL_Session.cpp:2693:handler(): [ERROR] Detected an offline server during query: 10.13.43.36, 3355
2018-03-21 12:00:20 MySQL_Session.cpp:2693:handler(): [ERROR] Detected an offline server during query: 10.13.43.36, 3355
2018-03-21 12:00:20 MySQL_Session.cpp:2693:handler(): [ERROR] Detected an offline server during query: 10.13.43.36, 3355
2018-03-21 12:00:20 MySQL_Session.cpp:2693:handler(): [ERROR] Detected an offline server during query: 10.13.43.36, 3355
2018-03-21 12:00:20 MySQL_Session.cpp:2693:handler(): [ERROR] Detected an offline server during query: 10.13.43.36, 3355
2018-03-21 12:00:30 MySQL_HostGroups_Manager.cpp:1532:replication_lag_action(): [WARNING] Re-enabling server 10.13.43.36:3355 with replication lag of 0 second
2018-03-21 12:01:14 MySQL_HostGroups_Manager.cpp:1521:replication_lag_action(): [WARNING] Shunning server 10.13.43.36:3355 with replication lag of 7 second
2018-03-21 12:01:24 MySQL_HostGroups_Manager.cpp:1532:replication_lag_action(): [WARNING] Re-enabling server 10.13.43.36:3355 with replication lag of 0 second
...................................
...........................
2018-03-21 12:34:20 MySQL_HostGroups_Manager.cpp:1521:replication_lag_action(): [WARNING] Shunning server 10.13.43.36:3355 with replication lag of 6 second
...................................
.................
2018-03-21 13:55:16 MySQL_HostGroups_Manager.cpp:1532:replication_lag_action(): [WARNING] Re-enabling server 10.13.43.36:3355 with replication lag of 0 second


mysql> select *  from stats_mysql_connection_pool;
+-----------+--------------+----------+--------+----------+----------+--------+---------+----------+-----------------+-----------------+------------+
| hostgroup | srv_host     | srv_port | status | ConnUsed | ConnFree | ConnOK | ConnERR | Queries  | Bytes_data_sent | Bytes_data_recv | Latency_us |
+-----------+--------------+----------+--------+----------+----------+--------+---------+----------+-----------------+-----------------+------------+
| 1000      | 10.13.43.41 | 3355     | ONLINE | 1        | 290      | 312    | 0       | 36038483 | 3182725816      | 2408319350      | 121        |
| 100       | 10.13.43.36 | 3355     | ONLINE | 0        | 0        | 62686  | 0       | 24554055 | 1959322368      | 15300735568     | 240        |
| 100       | 10.13.43.41 | 3355     | ONLINE | 680      | 0        | 6481   | 0       | 9036877  | 704968781       | 5814293986      | 121        |
+-----------+--------------+----------+--------+----------+----------+--------+---------+----------+-----------------+-----------------+------------+
3 rows in set (0.00 sec)

mysql> select *  from stats_mysql_connection_pool_reset;
+-----------+--------------+----------+--------+----------+----------+--------+---------+----------+-----------------+-----------------+------------+
| hostgroup | srv_host     | srv_port | status | ConnUsed | ConnFree | ConnOK | ConnERR | Queries  | Bytes_data_sent | Bytes_data_recv | Latency_us |
+-----------+--------------+----------+--------+----------+----------+--------+---------+----------+-----------------+-----------------+------------+
| 1000      | 10.13.43.41 | 3355     | ONLINE | 1        | 290      | 312    | 0       | 36038509 | 3182727125      | 2408324706      | 138        |
| 100       | 10.13.43.36 | 3355     | ONLINE | 0        | 0        | 62686  | 0       | 24554055 | 1959322368      | 15300735568     | 201        |
| 100       | 10.13.43.41 | 3355     | ONLINE | 680      | 0        | 6481   | 0       | 9036896  | 704970166       | 5814296866      | 138        |
+-----------+--------------+----------+--------+----------+----------+--------+---------+----------+-----------------+-----------------+------------+
3 rows in set (0.00 sec)

mysql> select *  from stats_mysql_connection_pool;
+-----------+--------------+----------+--------+----------+----------+--------+---------+---------+-----------------+-----------------+------------+
| hostgroup | srv_host     | srv_port | status | ConnUsed | ConnFree | ConnOK | ConnERR | Queries | Bytes_data_sent | Bytes_data_recv | Latency_us |
+-----------+--------------+----------+--------+----------+----------+--------+---------+---------+-----------------+-----------------+------------+
| 1000      | 10.13.43.41 | 3355     | ONLINE | 0        | 291      | 0      | 0       | 257     | 16148           | 20986           | 125        |
| 100       | 10.13.43.36 | 3355     | ONLINE | 0        | 0        | 0      | 0       | 0       | 0               | 0               | 148        |
| 100       | 10.13.43.41 | 3355     | ONLINE | 680      | 0        | 0      | 0       | 175     | 13628           | 33017           | 125        |
+-----------+--------------+----------+--------+----------+----------+--------+---------+---------+-----------------+-----------------+------------+
3 rows in set (0.00 sec)

mysql> select *  from stats_mysql_connection_pool;
+-----------+--------------+----------+--------+----------+----------+--------+---------+---------+-----------------+-----------------+------------+
| hostgroup | srv_host     | srv_port | status | ConnUsed | ConnFree | ConnOK | ConnERR | Queries | Bytes_data_sent | Bytes_data_recv | Latency_us |
+-----------+--------------+----------+--------+----------+----------+--------+---------+---------+-----------------+-----------------+------------+
| 1000      | 10.13.43.41 | 3355     | ONLINE | 0        | 291      | 0      | 0       | 328     | 20365           | 21542           | 134        |
| 100       | 10.13.43.36 | 3355     | ONLINE | 0        | 0        | 0      | 0       | 0       | 0               | 0               | 141        |
| 100       | 10.13.43.41 | 3355     | ONLINE | 680      | 0        | 0      | 0       | 215     | 16785           | 39668           | 134        |
+-----------+--------------+----------+--------+----------+----------+--------+---------+---------+-----------------+-----------------+------------+
3 rows in set (0.00 sec)




mysql> delete from mysql_servers where hostgroup_id=100 and hostname='10.13.43.41';
Query OK, 1 row affected (0.00 sec)

mysql> load mysql  servers to run;     
Query OK, 0 rows affected (0.00 sec)

mysql> insert into mysql_servers(hostgroup_id,hostname,port,weight,max_connections,max_replication_lag) values (100,'10.13.43.41',3355,10,3000,5);
Query OK, 1 row affected (0.00 sec)

mysql> 
mysql> load mysql servers to run;
Query OK, 0 rows affected (0.00 sec)

mysql> select *  from stats_mysql_connection_pool;                                  
+-----------+--------------+----------+--------+----------+----------+--------+---------+---------+-----------------+-----------------+------------+
| hostgroup | srv_host     | srv_port | status | ConnUsed | ConnFree | ConnOK | ConnERR | Queries | Bytes_data_sent | Bytes_data_recv | Latency_us |
+-----------+--------------+----------+--------+----------+----------+--------+---------+---------+-----------------+-----------------+------------+
| 1000      | 10.13.43.41 | 3355     | ONLINE | 0        | 291      | 0      | 0       | 56849   | 3705323         | 6249318         | 104        |
| 100       | 10.13.43.36 | 3355     | ONLINE | 145      | 0        | 145    | 0       | 876     | 72309           | 185927          | 194        |
| 100       | 10.13.43.41 | 3355     | ONLINE | 35       | 0        | 35     | 0       | 201     | 16364           | 29955           | 104        |
+-----------+--------------+----------+--------+----------+----------+--------+---------+---------+-----------------+-----------------+------------+
3 rows in set (0.00 sec)



#####This issue seems to be no problem in the short term, but more than 20 minutes(or long time) will occur.

@renecannao
Copy link
Contributor

@cloudufull , can you please attach all the error log from 2018-03-21 12:00:00 till few hours later?
No grep, I want to look at it without filters.

Something that confused me: the slave seems to be 10.13.43.36, but you removed 10.13.43.41 (that is the slave).

Looking at your output of stats_mysql_connection_pool , my hypothesis is that you are sending to slaves some query that is causing multiplexing to be disabled (for example, SELECTs with @) , thus always the same connections are being uses: this is why connections stick on 10.13.43.41 until you removed it

@cloudufull
Copy link
Author

er... sorry , I describe wrong.
reader_hostgroup has two members, 10.13.43.36 is slave and 10.13.43.41 is master
proxysql didn't Forward select query to 10.13.43.36 ,so i deleted the host 10.13.43.41 ,then
reader_hostgroup only left our slave 10.13.43.36 and proxysql begin Forward select query to 10.13.43.36 .......

mysql> select * from runtime_mysql_replication_hostgroups;
+------------------+------------------+-----------------------+
| writer_hostgroup | reader_hostgroup | comment |
+------------------+------------------+-----------------------+
| 1000 | 100 | 读写分离高可用 |
+------------------+------------------+-----------------------+
1 row in set (0.00 sec)

er can you tell me your email ? ,My leader does not allow me to push the proxysql log here!
-_-!

@renecannao
Copy link
Contributor

np. My email is [email protected] .

From the look at stats_mysql_connection_pool (readers have 0 ConnFree , only ConnUsed), and what you are telling me (removing 10.13.43.41 enabled 10.13.43.36) I am confident that the problem is that your connections to readers have multiplexing disabled , therefore new connections weren't created on 10.13.43.36 because all clients' connections were on 10.13.43.36 .
In this case, the error log seems unnecessary, as this is not a bug.

@cloudufull
Copy link
Author

ok . Thank you very much for your reply.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants