-
Notifications
You must be signed in to change notification settings - Fork 499
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Azure SQL. long query - driver: bad connection #166
Comments
We also see this with any query with over 1000 bytes. We only just got this error, so going to debug a little bit and see what we can find, this only happens with Encrypted connections too. |
curiously, when developing on a test application using similar techniques as in the *_test.go files, a 1kb query works fine - will continue hunting, but we are seeing these issues in production, so for now we have disabled Encryption until we can figure out what is going on. Will provide any info if we can. |
We're seeing this in production as well, on a large merge statement. Edit: Not on SQL Azure, on SQL Server 2012 (12.0.4422) |
I'm seeing this issue (the "error: driver: bad connection", not the wsarecv...) with longer running queries (encrypted, I haven't tried turning it off). I increased the connection timeout (connection timeout=300; in the connection string) and it solved my problem for now. |
Try again on latest version, it now sets TCP keep alive to 30 seconds by default |
turning encryption off is not an option. My keep alive and connection timeout are30 and 600 and they still fail. This needs to be reopened |
Can you provide a test program that connects to an azure database with encryption and fails? That would be super useful. |
|
Great! I've got a confirmed failing test case. I'll be able to diagnose and fix now. Thanks! |
There was an ISSUE that suggested that setting the keepalive=30 and connection timeout=300 would work. It worked for a while on different queries but then crashed today. FOO! |
I'm debugging the code and I noticed one thing.... Since I'm using pooled connections.... immediately before this SQL was executed the driver opened a new connection to the driver. I inserted a small SQL before the LONG SQL and it still failed on the long. |
This link to the MS TDS documentation suggests that maybe TDS 7.4 is required for SQL Server 2016: https://msdn.microsoft.com/en-us/library/dd339982.aspx |
Here is some interesting information.... There is clearly a problem with the server in that it must be performing some sort of sanity check on the SQL and if it looks suspicious that is resets the connection. I suppose this must be some sort of DOS or PEN protection. For example I had a small SQL that looks like: select getdate()*/}} d |
And another note:
produced these results. I would have expected one row or one row per supported TDS version
|
So far I have tested a few things. No fixes yet. Issue is probably in this driver's handling of encrypted connection in the netlib TDS layer. I can replicate the issue locally on both a linux and windows installed SQL Server if I turn encryption on and run TestAzureDatabase. I'll continue to look into it as I get a chance. If someone else wants to look into it, I would recommend comparing the TDS spec for how to do encrypted connections with the implementation. |
well... I have a SQL String that is 1923 bytes in length. It works on SQL Server 2008 but not on SQL Server 2016. |
The Azure documentation talks about sql server firewalls. Since I'm using pooling could that be the issue? Also... I execute a query, read the first row, then generate some SQL and open a second connection, execute that query... which may or may not reset... close that result set, read another record from the the first result set and repeat until exhausted. |
Hello! Do you have an ETA on this issue? |
No. I'm currently working on some database/sql issues for go1.9 that need to get done before the first cut. I'll see if I can work on it tomorrow morning. |
database/sql changes are merged and docs are sent. I'll work on this next. |
I know you have not started in this yet... but wanted to point out that the error has changed per changes made in a different issue (#269):
|
still no progress here. |
this issue got me again so we found a workaround (not clearly understand why its working) while reproducing this issue with local MSSQL 2016 with "encrypt=true" when using "packet size=2048" problem goes away (with 4096 issue reproduces just like with Azure SQL) then I switched to test this with Azure and got "Invalid packet size, it is longer than buffer size" error when using smaller packet size. It seems Azure ignores packet size while handshaking and sends 4kb packet anyway. So I slightly modified source code to get bigger buffer size while using in login packet size=2048 and then it worked need to test it more |
Thanks. That is in line with my suspicion that this is in the netlib layer of TDS. We are probably clobbering part of the packet sometimes, and it only becomes readily apparent when we turn on encryption. |
this continues to be a problem for me.... but I keep working around the issue by refactoring my SQL but it's getting harder with each failure. |
This issue hasn't been closed and I am running into a very similar error:
I am seeing this only when I have encryption enabled, but have not dug into the issue further than that. |
it seems to be a buffer problem... change your buffer size to 2K. The solution is not very happy but at least I can work again. I'm anxious for a proper fix. I tried a code review with no luck. |
I am pretty new to sql land. I was reading that the buffer solution may require refactoring of some of the (bigger?) sql queries. Is that right? Is there an easy explanation of why? Btw the buffer fix has definitely gotten the errors to go away at least so far |
How to change buffer size to 2k? Can't seem to do it through connection string. |
add |
Yes, I did that already, then it says 'invalid packet size, it ias longer than buffer size'. To explain, I use SQL Azure database, I was having issues with big query/big number of parameters, I changed packet size, and now I don't know how to change abovementioned buffer size. |
pretty sure you cannot change the buffersize and that it's static. There was mention about the packet size needed to be an even number to handle double byte characters some place. So check that. Also, when I saw a similar error I was trying a packet size < 1024 (I think). You'll have to read the code to kind the conditions where that error occurred. One last thing to check... I remember something about timeouts or keepalive. Here are the other params I use. I'm certain the LOG setting is meaningless to the problem.
|
as I mention before there is an issue with Azure when using smaller packets size |
I've added a test for this in the master branch. I can reproduce it on Azure and local databases. |
So, my initial issue is #300, but I thought it might be related to this one. If I change packet size to 2048 and SQL Azure ignores it and use 4096 what's the point? In case I missed something, I changed packet size to 2048, changed buffer size in driver code to 4096 manually and still I have the same issue I had before changing packet size at all - connection reset by peer. So, if I understand correctly, it is impossible to use golang with SQL Azure with this driver at the moment? |
Just chiming in to give our user experience. We are currently using MsSQL in AWS in production, but where our implementation differs is we were unable to use full encryption. There are 3 levels, none, communication encryption, and full encryption - this includes Authentication. In our experience when using full encryption and passing across a SQL query which is larger than the buffer the decryption on MsSQL's side fails and then closes the connection. I have Go debugged my way through this a lot and not been able to figure out under what circumstances the query and the connection fails (i.e.: where the failure is). Go's encryption is a dark and scary place (at least for me) so we have opted to just secure the communication for now. We would much rather have full encryption, but we had to choose stability over security (it was not taken lightly). In future, I would love to jump back into seeing if I can fix this, but the time to do so has been hard to find. So, I wouldn't say it’s impossible, and this is a big issue for us as well, however we have managed to get by. |
If the packet size is set too small, the read buffer size will overflow. Prevent this by setting the buffer size independent of the packet size. For #166
@sunnyque In be9747b (branch kardianos-buf-size) I've de-coupled the packet size from the buffer size. While it does allow you to set the packet size to 512 (before it would overflow), it doesn't solve this issue. @kylescottmcgill It is possible that if you use the above branch/commit (not in master as of this writing) and set "packet size=512" in the DSN, you may have better luck connecting with full encryption. I don't think this is a Go TLS issue, at least not directly. I'm reaching out to the SQL Server Drivers team for assistance, but I haven't heard back as of yet from the team themselves. |
thanks, will test it with our environment |
If the packet size is set too small, the read buffer size will overflow. Prevent this by setting the buffer size independent of the packet size. For #166
I was finally able to use Microsoft Message Analyzer on it.
According to MMA, it isn't seeing a EOM message type at the end of the SQL Batch packet. Sometimes this isn't a problem, but maybe sometimes it is? Unsure. I'm also seeing odd NULL bytes inserted into the SQL Batch data. I'm unsure if that is a problem with MMA or if there really is an issue with the driver writing NULL values. I don't think I've seen any issue when SQL can fit in a single TDS packet. |
It looks like the above issues of not marking the EOM byte is just MMA being buggy. I bet the internal NULL bytes are also a product of buggy MMA. |
If the packet size is set too small, the read buffer size will overflow. Prevent this by setting the buffer size independent of the packet size. For #166
Hello! I stumbled upon this issue several months ago and now I just wondering if you have an ETA on this? |
Nope. I tried to solve it, but came up short. I haven't heard back from the MS SQL Server drivers team either. There is a test you can un-skip and work with to try to debug. |
So I followed @kardianos steps of debugging this issue with MMA and compared the network traffic with .Net implementation of the same sql query. The investigation resulted in a pull request: #327 |
sqlite3: fix wrong schema table usage
* lazy initialization of charset maps * initialize each charsetmap separately * switch from init to get
got a headache with a strange behavior when working with SQL Azure.
this code causes "driver: bad connection" error when executing second select
log:
so as we see there is 3 attempt to execute long query (why???) and we have "wsarecv error by unknown reason" for each of them
tried to connect and run with SQL2005 - no problems at all.
wireshark doesn't hep because connection with sql azure is encrypted
got problem only with long queries (I'm inserting images into database like 0x123456....), all the rest of db routine working well
The text was updated successfully, but these errors were encountered: