-
Notifications
You must be signed in to change notification settings - Fork 337
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Resetting connection hangs on Aurora 2 (MySQL 5.7.12) #1321
Comments
This is a known bug with Aurora 2.x: https://mysqlconnector.net/troubleshooting/aurora-freeze/. ( To work around it, add |
Duplicate of #486 |
This is literally the first thing we've tried - only to find out this setting is only supported on 2.x, and we're still on 1.3.14. I thought the pipelining was only added in 2.x, no? I also still wonder why there's no timeout at this point (getting connection timeouts and retrying would probably be okay for us). |
Sorry, my mistake; I forgot that that optimisation (and thus the bug) wasn't in 1.3.x. |
Just out of curiosity, what's the conflicting dependency? |
Pomelo.Extensions.Caching.MySql: we're on 2.0.4, and the earliest that allows |
An update: we've managed an experimental upgrade to 2.2.6, added We have 19 threads blocked with the same symptoms. Here's what's interesting. Let's look at the ReadPacketAfterHeader: var payloadLength = (int) SerializationUtility.ReadUInt32(headerBytes[..3]);
int packetSequenceNumber = headerBytes[3];
Exception? packetOutOfOrderException = null;
var expectedSequenceNumber = getNextSequenceNumber() % 256;
if (expectedSequenceNumber != -1 && packetSequenceNumber != expectedSequenceNumber)
packetOutOfOrderException = MySqlProtocolException.CreateForPacketOutOfOrder(expectedSequenceNumber, packetSequenceNumber); In all 19 threads We have 6 threads with If I am not mistaken, this gives us the following packet headers (?), which throw us off the path:
First five cases I've looked at show these 4 bytes straight at the top of According to https://stackoverflow.com/a/26657124/1105881 (random Google search), We'll try to disable TLS to see how that affects us. |
An update: disabling TLS seems to fix the issue even on |
Excellent detective work. This really does sound like MySqlConnector is being given the raw bytes of a TLS packet, even though those should have been processed and decrypted by Either way, it sounds like bad data from the server, which is corrupting MySqlConnector's understanding of the current state of the stream and causing it to hang, waiting for more data. Definitive proof would probably require a Wireshark packet capture, although without the SSL key material, we would only see the encrypted bytes (which wouldn't help). |
I would still propose that we shouldn't wait in this state indefinitely (which is what's happening). I believe there's a connection timeout setting, which is by default at 15s, and it's not honored. |
Agreed; the |
Is there any small chance this fix will be backported to 1.x? |
Fixed in 2.2.7.
I'm not sure if that will happen; the last commit on that branch was two years ago so it may not even compile right now. (It also targets out-of-support target frameworks, which may cause issues.) |
We have a weird behavior showing on a system working with Amazon Aurora 2. The behavior exhibits only under load and only sporadically, so (a) it is hard to reproduce it systematically, thus we resort to reading memory dumps, and (b) we suspect it might be related to some concurrency issue. I have two dumps from two nodes in the cluster having this issue and they both have the same characteristics.
We using version 1.3.14 of
MySqlConnector
and we cannot currently upgrade to 2.x due to conflicting dependencies.The behavior exhibits as a thread permanently (or at least long enough to produce a cascade effect throughout the system) hung in this call stack:
As we unroll the stack and examine the state, we observe the following:
BufferedByteReader.ReadBytesAsync
totalBytesToRead
is 197399 (which, I suppose, is way too much)ProtocolUtility.ReadPacketAfterHeader
payloadLength
is set to the same 197399, and we have alsopacketOutOfOrderException
set toMySqlProtocolException
with a message of "Packet received out-of-order. Expected 1; got 30."Our guess is that while talking to Aurora something happens to the packet order, the connector gets the wrong byte count and tries to read a lot of data that doesn't arrive - thus, gets stuck in
Socket.Receive
.The question is, why we wait in
Socket.Receive
? Shouldn't there be a timeout for this operation? My knowledge to the lower-level networking is very limited, and I couldn't find anywhere in the code where we'd set the timeout on theSocket
/TcpClient
/SslStream
/NetworkStream
. Am I missing something?The text was updated successfully, but these errors were encountered: