Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

n-quads is UTF-8, but Blazegraph only supports US-ASCII #206

Open
jpmccu opened this issue Aug 11, 2021 · 3 comments
Open

n-quads is UTF-8, but Blazegraph only supports US-ASCII #206

jpmccu opened this issue Aug 11, 2021 · 3 comments

Comments

@jpmccu
Copy link

jpmccu commented Aug 11, 2021

According to the IANA record [1], n-quads is only supposed to be interpreted as UTF-8, but currently posting utf-8 data in n-quads results in it being interpreted as ASCII. You claim to support the appropriate charset for each format, but n-quads needs to honor utf-8.

Encoding considerations: 8bit
The syntax of N-Quads is expressed over code points in Unicode. The encoding is always UTF-8.
Unicode code points may also be expressed using an \uXXXX (U+0 to U+FFFF) or \UXXXXXXXX syntax (for U+10000 onwards) where X is a hexadecimal digit [0-9A-F]

[1] https://www.iana.org/assignments/media-types/application/n-quads

@thompsonbry
Copy link
Contributor

thompsonbry commented Aug 12, 2021 via email

@jpmccu
Copy link
Author

jpmccu commented Aug 12, 2021 via email

@nvbach91
Copy link

nvbach91 commented Jul 6, 2023

Adding -Dfile.encoding=UTF-8 -Dfile.client.encoding=UTF-8 -Dclient.encoding.override=UTF-8 did the trick

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants
@jpmccu @thompsonbry @nvbach91 and others