Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Client requests hang on DB backend timeout #3133

Closed
andreygolev opened this issue Nov 3, 2020 · 11 comments · Fixed by #3176
Closed

Client requests hang on DB backend timeout #3133

andreygolev opened this issue Nov 3, 2020 · 11 comments · Fixed by #3176

Comments

@andreygolev
Copy link

Hi,
We have a simple Docker test setup with 3 node PXC and ProxySQL in front of it, using mysql_galera_hostgroups for dynamic topology configuration as it should be. All settings in ProxySQL are default, nothing special.

Then we have a simple PHP application that is using mysqli standard library to run SELECT queries to ProxySQL which in turn randomly sends queries to all 3 PXC backends.

The problem arises with the following test:
We simply run docker pause pxc-node1 which suspends mysqld process, thus simulating timeout on backend or very unresponsive DB.

At this point PHP application hangs forever. It is visible that PHP application is able to connect to ProxySQL, ProxySQL accepts connection, but then nothing happens and application waits for a response from ProxySQL forever, because real backend DB is obviously not responding with anything.

In ProxySQL logs it's clearly visible that it detected that pxc-node1 is gone with healthcheck timeout and puts it to OFFLINE_HARD.

2020-11-03 11:49:36 MySQL_Monitor.cpp:1549:monitor_galera_thread(): [ERROR] Error on Galera check for pxc-node1:3306 after 1002ms. Unable to create a connection. If the server is overload, increase mysql-monitor_connect_timeout. Error: timeout or error in creating new connection: Lost connection to MySQL server at 'handshake: reading inital communication packet', system error: 110.
2020-11-03 11:49:36 MySQL_Monitor.cpp:1785:monitor_galera_thread(): [ERROR] Server pxc-node1:3306 missed 3 Galera checks. Assuming offline

If I terminate PHP application and run it again, then it works fine and simply not sending requests to a dead node anymore as ProxySQL is not routing requests to a dead node anymore. This is expected of course.

What is not expected is that all requests and connections that were created/running during node outage are freezing the whole application without any chance to recover.

It's also worth nothing that if I run docker unpause pxc-node1, then application immediately unfreezes and continue going. Then ProxySQL also reports that health check restored and puts node back to ONLINE and everything looks normal again.

I tried all possible combinations of %timeout% variables in ProxySQL, but nothing helped.

IMHO, ProxySQL should send error back to client for requests that ProxySQL waiting for respond from an already dead DB, in case healthcheck marks this node as dead.

Any advices there?

Thank you!

mysql_servers =
(
    { address="pxc-node1", port=3306, hostgroup=10, max_connections=100, weight=100 },
    { address="pxc-node2", port=3306, hostgroup=10, max_connections=100, weight=100  },
    { address="pxc-node3", port=3306, hostgroup=10, max_connections=100, weight=100 }
)

mysql_galera_hostgroups =
(
    {
        writer_hostgroup=10
        backup_writer_hostgroup=20
        reader_hostgroup=30
        offline_hostgroup=9999
        max_writers=1
        writer_is_also_reader=1
        max_transactions_behind=100
        active=1
    }
)


#defines MySQL Query Rules
mysql_query_rules:
(
    {
        rule_id=1
        active=1
        match_pattern="^SELECT .* FOR UPDATE$"
        destination_hostgroup=10
        apply=1
    },
    {
        rule_id=2
        active=1
        match_pattern="^SELECT"
        destination_hostgroup=30
        apply=1
    },
    {
        rule_id=3
        active=1
        match_pattern=".*"
        destination_hostgroup=10
        apply=1
    }
)
@renecannao
Copy link
Contributor

Can you please attach the full error log?
Thanks

@andreygolev
Copy link
Author

Sure. Thanks!
log.txt

@renecannao
Copy link
Contributor

At 12:44:41:

2020-11-03 12:44:41 MySQL_Monitor.cpp:1785:monitor_galera_thread(): [ERROR] Server pxc-node1:3306 missed 3 Galera checks. Assuming offline

When do you run docker unpause pxc-node1 ?

@andreygolev
Copy link
Author

After 12:44:41 any time. App unfreezes immediately right after I unpause pxc-node1. Even before ProxySQL marks node as ONLINE.

@renecannao
Copy link
Contributor

Even before ProxySQL marks node as ONLINE.

This suggests the problem has nothing to do with ProxySQL . Maybe some network issue in Docker ?

Having the full error log in proxysql after 12:44:41 can surely help

@andreygolev
Copy link
Author

There are no log events after 12:44:41 in ProxySQL, until I unpause pxc-node1.

I don't believe this is a network issue with docker.

What I see so far, is that ProxySQL still keeps "ESTABLISHED" state connection with a paused pxc-node1 container (which is a mysql process) which should be treated as dead and there technically shouldn't be any connections, except healthcheck attemps.

root@731bd0dae272:/# netstat -atup|grep pxc-node1
tcp        0      0 731bd0dae272:43868      pxc-node1.net1:mysql    ESTABLISHED 1/proxysql

The source port number 43868 wasn't changed in last 10 minutes which makes me believe, this is not healthcheck attempt, but rather the connection for my application.

And after I terminate my PHP application, the connection immediately changes state to FIN_WAIT

tcp        0      0 731bd0dae272:43868      pxc-node1.net1:mysql    FIN_WAIT2   -

I also noticed that even if I telnet to a paused pxc-node1, I still get a connection accepted, but no MySQL response to a socket, because that's OS manages sockets and TCP stack accepts a connection.

@renecannao
Copy link
Contributor

Having the full error log in proxysql after 12:44:41 can surely help

Please ...
There may be more hypothesis (and potential bugs), but having information from error log will surely help.

@andreygolev
Copy link
Author

Ok, I've restarted ProxySQL from scratch and prepared you a new log and added my comments to logs in order to give you more info what happens during these moments. My comments are prefixed/postfixed with ------ text ------

log2.txt

@renecannao
Copy link
Contributor

Thank you.
This seems a regression bug, because commits d559459 and d1b467e (related to #1085) should fix this behavior.

We will investigate further.

@andreygolev
Copy link
Author

Thanks a lot.

@andreygolev
Copy link
Author

Just made a few tests with different versions and v2.0.10 is the newest one that works fine with this test. Everything newer than that, behaves as in issue described.

renecannao added a commit that referenced this issue Dec 2, 2020
Closes #3133: Fixes client connection hanging forever when backend is already gone
renecannao added a commit that referenced this issue Dec 2, 2020
Closes #3133: Fixes client connection hanging forever when backend is already gone
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants