-
Notifications
You must be signed in to change notification settings - Fork 373
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
"right truncation" error from sqlsrv_fetch_array of long VARCHAR(max) column containing 8-bit characters #1261
Comments
Thanks @bjj for bringing this to our attention. We can reproduce it in Linux Ubuntu 18.04 as well and will get back to you after some investigation. |
Hi @bjj , when connecting with In your test scenario, because of the 8 bit character ñ, your php script should not be saved in UTF-8 format. In other words, make sure the output of the Otherwise, if the output of If you run this php script in command line, make sure your terminal uses the right character set and the correct locale has been generated and set as well (e.g. Once you make sure you have the right settings, you should not run into this truncation issue again. |
Thanks for looking into this @yitam
I think that might be the root cause of the bug. It is doing conversion of the characters. You can easily observe (with a slightly modified version of that script) that there are 2 bytes of data in the table (which correspond to one desired UTF-8 codepoint) but because they get converted again to UTF-8 (as if they were something else, probably ISO-8859-1) they expand to 4 bytes (2 UTF-8 characters representing what the original binary happened to be). I have not found any settings that avoid this conversion on Linux. This same script works fine on Windows. I created that minimum repro case just to report this problem. The encoding of that PHP script is not the cause of (or solution to) this problem. The original data that showed this issue was already in the database. It can be inserted into the database in other ways. You can verify that the correct bytes
The special two binary bytes |
Hi @bjj I see your points, but because Linux uses UTF-8 as default please set your desired locale (this is not required in Windows) in your php script (must be done before you make the first connection):
If you use both Windows and Linux, you may consider shifting to UTF-8. |
Hi @yitam , sorry to let this comment thread get derailed by character encoding issues. While I do believe they are related, the core elements of this bug are:
So while I think the C++ code in (1) is quite possibly choosing the wrong buffer length due to encoding assumption mismatches, that is not relevant to (3). I'm calling a PHP function whose contract is "return an arbitrary length string". There's no reason for that to return an error that effectively says "sorry, for reasons outside your control, the internal buffer allocated by the C++ implementation of this PHP function was not large enough to contain the string you wanted". (also I did try your suggested |
@bjj I see your points, but using |
My mistake -- I did the test in a container that did not have |
Glad to hear that using After consulting with the ODBC team, you might want to read the documentation about autotranslation when using different codepages |
I would love to turn off |
FYI, @bjj , ISO-8859-1 is not exactly the same as Windows 1252 If you're using both Windows and Linux or have no control how the data gets inserted, the clients and SQL Server might be using different ACPs. With The recommendation is to deliberately use |
@bjj FYI, in light of ODBC driver's autotranslation when using different codepages, we have decided to fix this issue in PHP drivers to make it more flexible in handling the possible UTF-8 characters when fetching ASCII data. |
Closing this issue as per 5.10.0-beta1 |
I'm still getting this error when using 5.10.0-beta1 with PHP 8.1 https://github.com/shlinkio/shlink/runs/4110069594?check_suite_focus=true |
Hi @acelaya can you provide repro steps or a script to help us investigate? |
Sure thing. You can reproduce it on this repository: https://github.com/shlinkio/shlink The workflow above is this one: https://github.com/shlinkio/shlink/blob/develop/.github/workflows/ci.yml#L179 It basically starts a SQL server instance via docker compose, and then runs some database integration tests ( The jobs are run both with PHP 8 and pdo_sqlsrv 5.9, and with PHP 8.1 with pdo_sqlsrv 5.10 beta. In the second case they throw this error: The test failing is this one https://github.com/shlinkio/shlink/blob/develop/module/Core/test-db/Domain/Repository/DomainRepositoryTest.php, which covers this class https://github.com/shlinkio/shlink/blob/develop/module/Core/src/Domain/Repository/DomainRepository.php I will try to isolate the failing query this afternoon after work, if the information above is not enough. |
@acelaya, did you mean that with 5.10beta1 this exception is thrown but not with 5.9? |
Yes, exactly.
Perfect, will do. |
@yitam sorry for the delay, but I took me a lot of time to reproduce the issue on a small context, and I had to debug a lot on the DBAL library I use to figure out what was going on, but I have finally created the ticket #1329 I included a supre small repo with reproducible steps. I hope it simplifies debugging it. |
PHP Driver version or file name
sqlsrv support => enabled
ExtensionVer => 5.9.0+14119
Directive => Local Value => Master Value
sqlsrv.WarningsReturnAsErrors => On => On
sqlsrv.LogSeverity => 0 => 0
sqlsrv.LogSubsystems => 0 => 0
sqlsrv.ClientBufferMaxKBSize => 10240 => 10240
sqlsrv.SetLocaleInfo => 2 => 2
SQL Server version
15.0.2000.5 (earlier versions also repro)
Client operating system
Ubuntu 20.04.2 LTS
PHP version
PHP 7.4.3 (cli) (built: Oct 6 2020 15:47:56) ( NTS )
Copyright (c) The PHP Group
Zend Engine v3.4.0, Copyright (c) Zend Technologies
with Zend OPcache v7.4.3, Copyright (c), by Zend Technologies
with Xdebug v2.9.2, Copyright (c) 2002-2020, by Derick Rethans
Microsoft ODBC Driver version
unixodbc-dev | 2.3.7 | https://packages.microsoft.com/ubuntu/20.04/prod focal/main amd64 Packages
unixodbc-dev | 2.3.6-0.1build1 | http://archive.ubuntu.com/ubuntu focal/main amd64 Packages
Problem description
When using
sqlsrv_fetch_array
fails withString data, right truncation
under the following conditions:VARCHAR(max)
or the column type is just barely big enough to fit the exact dataNOT LIKE '%UTF8'
(i.e. it is something likeSQL_Latin1_General...
and notLatin1_General_100_CI_AS_SC_UTF8
)CharacterSet
is not set or set toSQLSRV_ENC_CHAR
(and not set to'UTF-8'
)In this case the column's binary data happens to be UTF-8, but I don't think that matters since all byte sequences are legal Latin1 and I think it's converting from suspected-Latin1 to UTF-8.
I suspect that some code that does dynamic allocation to fetch results is starting with a 2048 byte buffer and mis-handling the case where the Latin1->UTF-8 conversion causes the string to grow. In the
VARCHAR(n)
cases that same code probably does exact allocation and avoids the bug whenevern
is big enough to contain the encoded result.Expected behavior and actual behavior
Expected successful read of string data (possibly, in this case, with the 8-bit characters presumed to be Latin1 and actually turned into UTF-8 codepoints).
Actually got
String data, right truncation
Repro code or steps to reproduce
Note that the 8 bit characters
ñ
represent n-with-tilde-on-top as raw UTF-8-expressed-as-Latin1The text was updated successfully, but these errors were encountered: