-
Notifications
You must be signed in to change notification settings - Fork 977
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ProxySQL sets state of all server to OFFLINE_HARD for no obvious reasons #850
Comments
The galera checker and node monitor scripts being used can be found at https://github.com/twindb/proxysql/tree/master/support/percona |
Excerpt from Galera checker script for the same time period: Mon Dec 19 11:10:56 UTC 2016 Check server 10:172.31.15.163:3306 , status ONLINE , wsrep_local_state 4
Mon Dec 19 11:10:56 UTC 2016 Check server 11:172.31.10.29:3306 , status ONLINE , wsrep_local_state 4
Mon Dec 19 11:10:56 UTC 2016 Check server 11:172.31.11.165:3306 , status ONLINE , wsrep_local_state 4
Mon Dec 19 11:10:56 UTC 2016 Check server 11:172.31.5.18:3306 , status ONLINE , wsrep_local_state 4
Mon Dec 19 11:10:56 UTC 2016 Number of writers online: 0 : hostgroup: 10
Mon Dec 19 11:10:56 UTC 2016 Trying to set an available reader node as the writer node of the cluster
Mon Dec 19 11:10:56 UTC 2016 Number of writers online: 0 : hostgroup: 10
Mon Dec 19 11:10:56 UTC 2016 Trying to enable last available node of the cluster (in Donor/Desync state)
Mon Dec 19 11:10:56 UTC 2016 Enabling config |
Relevant code galera checker code: |
@ovaistariq : I think something has happened before that time. Could you please attach an hour of so of logs from both ProxySQL's error log and Galera checker script log, before such event? |
Sure will do that tonight. |
Let me know if you need more info. As you can see, just before seeing 0 writers online, galera_checker did see all nodes in ONLINE state:
|
Similarly, this is also interesting:
|
ProxySQL receives a |
All the versions up to 1.3.2, handle Extraction a portion of error log without errors, we can divide it this way:
But, what happened 5 seconds after, at 11:10:56 is interesting:
There is a lot of output, but I think what is mostly relevant is the fact that there are 2 The output of the second
Only 2 nodes seems available in "phase c.2":
But suddenly they become 4 nodes in "phase c.4":
I am yet not sure what happened, but I believe there is some race condition involved. I also noticed that in this log file there are 2 calls to
So I think the question is: what script run One more note: Final note: version 1.4.0 is a lot faster than 1.3.x and there are less moving part. A lot of code was simplified due to work on #829 , and therefore 1.4.0 it is now able to process a large number of hosts over 30 times faster than 1.3.x . |
|
@ovaistariq : closing this issue as it seems not relevant anymore. Thanks |
It does indeed does not seem like a ProxySQL bug |
ProxySQL is configured in an environment where its acting as a load balancer for PXC nodes. Randomly, it sets the nodes to OFFLINE_HARD state for no obvious reasons. This has happened 3 times in the last 4 days. The cluster doesn't have any load at all, its a test cluster used for periodic load testing. ProxySQL node and all the cluster nodes are in amazon AWS within the same private subnet.
Excerpts from ProxySQL's error log:
The text was updated successfully, but these errors were encountered: