Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rdf_query works incorretly on UTF-8 characters #36

Open
ostroganov opened this issue May 30, 2020 · 4 comments
Open

rdf_query works incorretly on UTF-8 characters #36

ostroganov opened this issue May 30, 2020 · 4 comments

Comments

@ostroganov
Copy link

See attached example:
test.txt

rdf <- rdf_parse("test.txt", format = "nquads")
rdf

the results looks good:

Total of 1 triples, stored in hashes
-------------------------------
<http://subject> <http://predicate> "Error. Ошибка. 错误"@en .

dataframe produced by rdf_query however contains incorrect characters:

rdf_query(rdf,"SELECT ?s ?p ?o { ?s ?p ?o }")
# A tibble: 1 x 3
  s              p                o                          
  <chr>          <chr>            <chr>                      
1 http://subject http://predicate Error. Ошибка. 错误
@cboettig
Copy link
Member

@gothub Any idea if this is a limitation of the underlying redland C libs? I believe we're requesting UTF-8 in the rdf_query() call to redland already,

https://github.com/ropensci/rdflib/blob/master/R/rdf_query.R#L57-L62

@gothub
Copy link

gothub commented Jun 1, 2020

@cboettig how are the RDF graphs being constructed? redland can use the RDF language tag to specify encoding, for example, this test

@cboettig
Copy link
Member

cboettig commented Jun 1, 2020

Construction is with redland::parseFileIntoModel, see https://github.com/ropensci/rdflib/blob/master/R/rdf_parse.R#L68. as OP shows, this part seems fine.

query is done by redland::librdf_query_results_to_string2, as shown in link above, and not with getNextResult, since the iterator had very poor performance on large graphs.

@Tesla2509
Copy link

Is there any solution to get correct characters?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants