Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Encoding issue in the NTriplesSerializer #67

Open
coreation opened this issue Oct 17, 2014 · 0 comments
Open

Encoding issue in the NTriplesSerializer #67

coreation opened this issue Oct 17, 2014 · 0 comments

Comments

@coreation
Copy link
Contributor

Hi,

I figured out a problem with escape function

Problem:
André -> the é is nicely escaped with \u00E9 (iirc)
Andréé -> the éé is replaced with \uAAA9 ( a square character)

Now what I found through debugging is that when putting through characters with the preg_replace_callback, the "éé" sequence is seen as 1 character, even with the mb_strlen functionality. If I however comment the line where you utf8_decode a string on the second line of the escape function, this "éé" sequence is done properly with two \u00E9 sequences.

My guess is that the utf8_decode unwillingly decodes a good utf8-string (why in the first place is this necessary?) and this messes up the mb_strlen, where utf-8 is given as the character encoding, yet the string is now ISO-8859-1 through the utf8_decode...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant