-
Notifications
You must be signed in to change notification settings - Fork 976
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Client requests hang on DB backend timeout #3133
Comments
Can you please attach the full error log? |
Sure. Thanks! |
At 12:44:41:
When do you run |
After 12:44:41 any time. App unfreezes immediately right after I unpause pxc-node1. Even before ProxySQL marks node as ONLINE. |
This suggests the problem has nothing to do with ProxySQL . Maybe some network issue in Docker ? Having the full error log in proxysql after 12:44:41 can surely help |
There are no log events after 12:44:41 in ProxySQL, until I unpause pxc-node1. I don't believe this is a network issue with docker. What I see so far, is that ProxySQL still keeps "ESTABLISHED" state connection with a paused pxc-node1 container (which is a mysql process) which should be treated as dead and there technically shouldn't be any connections, except healthcheck attemps.
The source port number 43868 wasn't changed in last 10 minutes which makes me believe, this is not healthcheck attempt, but rather the connection for my application. And after I terminate my PHP application, the connection immediately changes state to FIN_WAIT
I also noticed that even if I telnet to a paused pxc-node1, I still get a connection accepted, but no MySQL response to a socket, because that's OS manages sockets and TCP stack accepts a connection. |
Please ... |
Ok, I've restarted ProxySQL from scratch and prepared you a new log and added my comments to logs in order to give you more info what happens during these moments. My comments are prefixed/postfixed with |
Thanks a lot. |
Just made a few tests with different versions and v2.0.10 is the newest one that works fine with this test. Everything newer than that, behaves as in issue described. |
Closes #3133: Fixes client connection hanging forever when backend is already gone
Closes #3133: Fixes client connection hanging forever when backend is already gone
Hi,
We have a simple Docker test setup with 3 node PXC and ProxySQL in front of it, using mysql_galera_hostgroups for dynamic topology configuration as it should be. All settings in ProxySQL are default, nothing special.
Then we have a simple PHP application that is using mysqli standard library to run SELECT queries to ProxySQL which in turn randomly sends queries to all 3 PXC backends.
The problem arises with the following test:
We simply run
docker pause pxc-node1
which suspends mysqld process, thus simulating timeout on backend or very unresponsive DB.At this point PHP application hangs forever. It is visible that PHP application is able to connect to ProxySQL, ProxySQL accepts connection, but then nothing happens and application waits for a response from ProxySQL forever, because real backend DB is obviously not responding with anything.
In ProxySQL logs it's clearly visible that it detected that pxc-node1 is gone with healthcheck timeout and puts it to OFFLINE_HARD.
If I terminate PHP application and run it again, then it works fine and simply not sending requests to a dead node anymore as ProxySQL is not routing requests to a dead node anymore. This is expected of course.
What is not expected is that all requests and connections that were created/running during node outage are freezing the whole application without any chance to recover.
It's also worth nothing that if I run
docker unpause pxc-node1
, then application immediately unfreezes and continue going. Then ProxySQL also reports that health check restored and puts node back to ONLINE and everything looks normal again.I tried all possible combinations of %timeout% variables in ProxySQL, but nothing helped.
IMHO, ProxySQL should send error back to client for requests that ProxySQL waiting for respond from an already dead DB, in case healthcheck marks this node as dead.
Any advices there?
Thank you!
The text was updated successfully, but these errors were encountered: