Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Exception while GET: Parsing of imported Ontology due to usage of ttl-parser together with xml-file #22

Open
keiligch opened this issue Sep 21, 2022 · 2 comments

Comments

@keiligch
Copy link

Server answer with stack trace

<body>Exception occured while processing /models
    <pre>Message: net.enilink.komma.core.KommaException: Invalid RDF data:
IRI included an unencoded space: '32' [line 1]
    net.enilink.komma.model.ModelUtil.readData(ModelUtil.java:644)
    net.enilink.komma.model.rdf4j.SerializableModelSupport$1.run(SerializableModelSupport.java:211)
    java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
    java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
    java.base/java.lang.Thread.run(Thread.java:829)
Caught and thrown by:
Message: org.eclipse.rdf4j.rio.RDFParseException: IRI included an unencoded space: '32' [line 1]
    org.eclipse.rdf4j.rio.helpers.RDFParserHelper.reportError(RDFParserHelper.java:276)
    org.eclipse.rdf4j.rio.helpers.AbstractRDFParser.reportError(AbstractRDFParser.java:584)
    org.eclipse.rdf4j.rio.turtle.TurtleParser.reportError(TurtleParser.java:1298)
    org.eclipse.rdf4j.rio.turtle.TurtleParser.parseURI(TurtleParser.java:900)
    org.eclipse.rdf4j.rio.turtle.TurtleParser.parseValue(TurtleParser.java:568)
    org.eclipse.rdf4j.rio.turtle.TurtleParser.parseSubject(TurtleParser.java:395)
    org.eclipse.rdf4j.rio.turtle.TurtleParser.parseTriples(TurtleParser.java:330)
    org.eclipse.rdf4j.rio.turtle.TurtleParser.parseStatement(TurtleParser.java:200)
    org.eclipse.rdf4j.rio.turtle.TurtleParser.parse(TurtleParser.java:162)
    org.eclipse.rdf4j.rio.turtle.TurtleParser.parse(TurtleParser.java:125)
    net.enilink.komma.model.ModelUtil.readData(ModelUtil.java:642)
    net.enilink.komma.model.rdf4j.SerializableModelSupport$1.run(SerializableModelSupport.java:211)
    java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
    java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
    java.base/java.lang.Thread.run(Thread.java:829)
</pre>
</body>

Process

  • posting a new model in ttl-format to the /models-endpoint . Ontology imports another ontology in rdf/xml which is accessable in the web (example ttl below)
  • sending GET with the same model as parameter -> only the first time server exception is thrown
  • sending GET again -> everything ok

Reason (most likely)
When sending GET request model is being parsed with the RDF4J.rio-TurtleParser, but also the imported rdf/xml. The first line of the xml is <?xml version="1.0" encoding="UTF-8"?> which probably is interpreted as an IRI by the turtle parser. The exception is thrown due to the space inbetween.
In my test, I could isolate the problematic ontology (no further imports, so the mistake is certainly from there): http://spinrdf.org/sp

HTTP Code for testing (insert domain address)

POST /models?model=http://example.com/test HTTP/1.1
Host: {my own domain}
Content-Type: text/turtle

@prefix owl: <http://www.w3.org/2002/07/owl#> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .

<http://www.example.com/test#> rdf:type owl:Ontology ;
                                owl:imports <http://spinrdf.org/sp> .
@kenwenzel
Copy link
Member

We should use IURIConverter.contentDescription at

https://github.com/komma/komma/blob/22424c4a21394d510455f1148c99faa41ffcc755/bundles/core/net.enilink.komma.model/src/main/java/net/enilink/komma/model/base/ModelSupport.java#L729

if neither a mime type nor a content description are contained within the options map.

@kenwenzel
Copy link
Member

The response from http://spinrdf.org/sp does not declare a Content-Type header:
image

Therefore, we cannot determine the correct parser upfront. A solution would be to use some sort of trial and error mechanism to guess the correct mime type.
See also https://github.com/apache/any23/blob/master/mime/src/main/java/org/apache/any23/mime/TikaMIMETypeDetector.java

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants