-
Notifications
You must be signed in to change notification settings - Fork 977
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Consistent timeouts/retries in multi-layer proxysql architecture routing queries to RDS Aurora MySQL read replicas #4158
Comments
From the description of the issue my guess is that you are being affected by NLB Connection idle timeout . |
Thanks for the speedy reply @renecannao Unfortunately enabling TCP keepalives has not resolved the problem:
|
Can you share the full error log? You probably also need to use |
I'll dig back in with |
Sorry for the delayed response. I'm looking into sharing the error log and packet captures, but the data contained could be sensitive so that might take a while and won't be shareable on Github. Since my last post I've removed the NLB from the equation - the issue still arises even with the local proxysql talking directly to the remote proxysql. It does not happen with the local proxysql talking directly to the database. I'm not very familiar with using
And a similar packet capture from the same time on the local side:
Nothing in particular stands out to me here, but as I said this isn't my area of expertise. One thing I did find interesting when looking at the packets using
But with various different unknown commands:
I don't know if that's relevant or just user error on my part. |
V2.x change user compression #4158
To give more context for who is going to read this issue. |
We're encountering issues in production using proxysql for sending SELECTs to our read replicas in RDS Aurora. Unfortunately, I've been unable to create a reproducible test case, though this behaviour is consistently producible in our production environment. When we route SELECT traffic through two layers of proxysql to our read replicas we immediately begin to see a small percentage of queries failing and being retried. Our production architecture has a local proxysql process on every instance. Multiple processes of our Rails application connect to that local proxysql, and it forwards queries through an NLB to another proxysql process running on dedicated hardware, which then forwards the queries to our backend databases.
As soon as we begin routing traffic, the "local layer" proxysql logs begin to fill with log lines of this form:
And at the same time, the "remote layer" proxysql logs begin to fill with log lines of this form:
Once this starts happening we see some queries randomly being delayed by one or more seconds. Despite not needing very significant traffic in production to create the problem, I've been unable to reproduce it anywhere other than in our production environment. The "local layer" proxysql is version
2.4.4-41-g83ffb72
running on Amazon Linux 2 installed from the officially distributed 2.4.4 RPM for centos7. The "remote layer" proxysql has been tried on versions 2.4.4, 2.4.8 and now2.5.1-90-gbedaa6c
and all have exhibited the same behaviour.Enabling
mysql-connection_warming
on the local layer goes some way to mitigating the issue. With it enabled we don't see it occurring constantly, only in a burst oncemysql-connection_max_age_ms
is reached.We don't see the same behaviour in our existing legacy proxysql deployment which only serves queries for our primary replicas and not readers. That deployment uses the same local layer but the "remote layer" runs proxysql version
1.4.13-15-g69d4207
.global_variables
table from the local layer:Remote layer:
Let me know if there's anything else you need me to provide.
The text was updated successfully, but these errors were encountered: