Encoding issue in the NTriplesSerializer #67

coreation · 2014-10-17T13:47:58Z

Hi,

I figured out a problem with escape function

Problem:
André -> the é is nicely escaped with \u00E9 (iirc)
Andréé -> the éé is replaced with \uAAA9 ( a square character)

Now what I found through debugging is that when putting through characters with the preg_replace_callback, the "éé" sequence is seen as 1 character, even with the mb_strlen functionality. If I however comment the line where you utf8_decode a string on the second line of the escape function, this "éé" sequence is done properly with two \u00E9 sequences.

My guess is that the utf8_decode unwillingly decodes a good utf8-string (why in the first place is this necessary?) and this messes up the mb_strlen, where utf-8 is given as the character encoding, yet the string is now ISO-8859-1 through the utf8_decode...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Encoding issue in the NTriplesSerializer #67

Encoding issue in the NTriplesSerializer #67

coreation commented Oct 17, 2014

Encoding issue in the NTriplesSerializer #67

Encoding issue in the NTriplesSerializer #67

Comments

coreation commented Oct 17, 2014