Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hebrew strings truncation #555

Closed
shemi opened this issue Sep 26, 2017 · 18 comments
Closed

Hebrew strings truncation #555

shemi opened this issue Sep 26, 2017 · 18 comments

Comments

@shemi
Copy link

shemi commented Sep 26, 2017

When I try to pull data from our database I get shorter strings.
in this example, I use a stored procedure but I try to do select and the results the same

$serverName = "myServer";
$connectionOptions = array(
    "Database" => "$dbName",
    "UID" => "$dbUser",
    "PWD" => "$dbPassword"
);
//Establishes the connection
$conn = sqlsrv_connect($serverName, $connectionOptions);
if ($conn === false) {
    die(FormatErrors(sqlsrv_errors()));
}
//Select Query
$tsql = "exec AlumnusDataGet @id = 30000000";
//Executes the query
$getResults = sqlsrv_query($conn, $tsql);
//Error handling

if ($getResults == FALSE)
    die(FormatErrors(sqlsrv_errors()));
?>
    <h1> Results : </h1>
<?php
while ($row = sqlsrv_fetch_array($getResults, SQLSRV_FETCH_ASSOC)) {
    var_dump($row);
    echo("<br/>");
}
sqlsrv_free_stmt($getResults);
function FormatErrors($errors)
{
    /* Display errors. */
    echo "Error information: <br/>";

    foreach ($errors as $error) {
        echo "SQLSTATE: " . $error['SQLSTATE'] . "<br/>";
        echo "Code: " . $error['code'] . "<br/>";
        echo "Message: " . $error['message'] . "<br/>";
    }
}

The expected output is:

array 
  'TitleHeb' => 'גב''
  'LastNameHeb' => 'אלמן'
  'FirstNameHeb' => 'דנה'
  'TitleEng' => 'Ms.'
  'LastNameEng' => 'Elman'
 ...

The actual output:

array 
  'TitleHeb' => 'גב''
  'LastNameHeb' => 'אלמ'
  'FirstNameHeb' => 'דנ'
  'TitleEng' => 'Ms.'
  'LastNameEng' => 'Elman'
 ...

As you can see in the "LastNameHeb" and "FirstNameHeb" the last character is missing.

PHP Version: 7.1.3-3
Tested with both drivers (versions: 4., 5.)
OS: Ubuntu 16.04

@david-puglielli
Copy link
Contributor

Hi @shemi, could you provide a table schema and some information on how Hebrew values are inserted into the table (like an example INSERT statement) so that this is easier for us to reproduce? Thanks!

@shemi
Copy link
Author

shemi commented Sep 26, 2017

Hi @david-puglielli, I can't share the original schema (Hope you can understand) but I test the code with new DB that I created just now and the same problem appears.

The new DB I created using SQL Manager Studio.
The only things I change in the "New DB" dialog was in the "Options" tab:
Collection: Hebrew_CI_AS
Compatibility level: SQL Server 2008 (100)

To insert the new data I used also the SQL Manager Studio right click on the table "Edit Top 200 Rows "

test DB backup

$serverName = "ServerName";
$connectionOptions = array(
    "Database" => "HebTest",
    "UID" => "test",
    "PWD" => "test",
    "CharacterSet" => SQLSRV_ENC_CHAR
);
//Establishes the connection
$conn = sqlsrv_connect($serverName, $connectionOptions);
if ($conn === false) {
    die(FormatErrors(sqlsrv_errors()));
}
//Select Query
$tsql = "SELECT TOP (1000) [ID]
      ,[NameEng]
      ,[NameHeb]
  FROM [HebTest].[dbo].[Users]";
//Executes the query
$getResults = sqlsrv_query($conn, $tsql);
//Error handling

if ($getResults == FALSE)
    die(FormatErrors(sqlsrv_errors()));
?>
    <h1> Results : </h1>
<?php
while ($row = sqlsrv_fetch_array($getResults, SQLSRV_FETCH_ASSOC)) {
    var_dump($row);
    echo("<br/>");
}
sqlsrv_free_stmt($getResults);
function FormatErrors($errors)
{
    /* Display errors. */
    echo "Error information: <br/>";

    foreach ($errors as $error) {
        echo "SQLSTATE: " . $error['SQLSTATE'] . "<br/>";
        echo "Code: " . $error['code'] . "<br/>";
        echo "Message: " . $error['message'] . "<br/>";
    }
}

The expected output is:

array
  'ID' => 1
  'NameEng' => 'shemi'
  'NameHeb' => 'שמי'

The actual output:

array
  'ID' => 1
  'NameEng' => 'shemi'
  'NameHeb' => 'שמ'

@david-puglielli
Copy link
Contributor

Okay thank you @shemi, I am able to reproduce the issue. We will look into it and keep you updated.

@shemi
Copy link
Author

shemi commented Oct 2, 2017

Hi @david-puglielli, I wonder if you have any idea on how to I can workaround the problem? Thanks

@shemi
Copy link
Author

shemi commented Oct 10, 2017

Hi @david-puglielli, I am really stuck here. I tried every version (stable and pre-release) and the same problem appeared. I really need some direction or any sort of help here will be welcome. thank you.

@Hadis-Knj
Copy link

@shemi at this moment we're working on this issue, I'll get back to you once we have more information

@david-puglielli
Copy link
Contributor

Hi @shemi, as a workaround could you try using nchar or nvarchar data types instead of char or varchar? I was able to get the correct output by changing the data type for the Hebrew column to nvarchar.

This problem is likely related to the UTF-8 limitation in the Linux driver, which we are working on fixing. For more information, see #515.

@shemi
Copy link
Author

shemi commented Oct 18, 2017

Hi @david-puglielli,

Thank you for your response.

The database I work with is pretty big and already in production so the option to switch to nchar or nvarchar not possible...

I found another workaround, we concat a space to the end of the string and its solve the problem.

I notice that the truncation happens only on the last character and only if it's a Hebrew character.

@david-puglielli
Copy link
Contributor

@shemi Glad you found a functional workaround. We are continuing to investigate, and it turns out that there is a bug in the ODBC driver involving encoding autodetection which also contributes to the string truncation. This bug has been fixed in version 17 of the ODBC driver, for which preview builds are available here. A full fix probably still requires lifting the UTF-8 limitation in the PHP drivers, though, and we are currently working on that fix.

@yitam
Copy link
Contributor

yitam commented Oct 27, 2017

@shemi , as @david-puglielli mentioned, we are in the process of fixing this problem. I'd appreciate if you can answer these questions at your earliest convenience:

  1. the system locale in your Ubuntu environment
  2. the system locale in your Windows (assuming you run your SQL Manager Studio in Windows)
  3. have you tried to insert Hebrew characters using PHP script? If so, can you send us a sample script?

@yitam yitam added the bug label Oct 27, 2017
@shemi
Copy link
Author

shemi commented Oct 28, 2017

  1. English us
  2. English us
  3. no. I only read data.

@yitam
Copy link
Contributor

yitam commented Oct 30, 2017

Thanks for getting back to us @shemi . Could you please post the output of locale command in Ubuntu?

@shemi
Copy link
Author

shemi commented Oct 30, 2017

LANG=en_US.UTF-8
LANGUAGE=en_US:
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_PAPER="en_US.UTF-8"
LC_NAME="en_US.UTF-8"
LC_ADDRESS="en_US.UTF-8"
LC_TELEPHONE="en_US.UTF-8"
LC_MEASUREMENT="en_US.UTF-8"
LC_IDENTIFICATION="en_US.UTF-8"
LC_ALL=en_US.UTF-8

@yitam
Copy link
Contributor

yitam commented Oct 30, 2017

Thanks @shemi
We will keep you informed.

@yitam yitam added the odbc label Nov 8, 2017
@yitam
Copy link
Contributor

yitam commented Nov 15, 2017

Hi @shemi this is to inform you that the ODBC team is currently looking into this truncation issue.

When there is any update, we will let you know. Also, since your env locale is UTF-8, when you call sqlsrv_connect() you should use 'CharacterSet'=>'UTF-8' instead of 'CharacterSet'=>SQLSRV_ENC_CHAR.

@yitam
Copy link
Contributor

yitam commented Dec 20, 2017

Hi @shemi this issue has been resolved with our 5.2.0-RC together with the latest ODBC preview

Please check when you have a chance. Looking forward to hearing from you.

@yitam
Copy link
Contributor

yitam commented Jan 12, 2018

@shemi just want to see if you have a chance to test this. Due to inactivity we may close this issue. Thanks!

@yitam yitam added the resolved label Jan 25, 2018
@yitam
Copy link
Contributor

yitam commented Jan 31, 2018

Closing this due to inactivity. Please feel free to reopen if you still have issues.

@yitam yitam closed this as completed Jan 31, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants