URIRef with base and fragment #1594

aothms · 2017-07-20T14:50:31Z

aothms
Jul 20, 2017

Base URIs that do not end in a slash have parts stripped off (for example aparent when translating sparql queries with a BASE).

>>> URIRef('string', base='http://www.w3.org/2001/XMLSchema#')
rdflib.term.URIRef(u'http://www.w3.org/2001/string')

I tried to find in https://tools.ietf.org/html/rfc3986#section-5.2 whether this is correct, but remain unsure. It comes from using urlparse.urljoin(). I tried to test this behaviour on other implementions. The virtuoso at https://dbpedia.org/sparql does not have the parts trimmed off for the following query:

BASE <http://www.w3.org/2001/XMLSchema#>

SELECT * {
  ?s ?p ?o .
  bind(<string> as ?x)
} LIMIT 1

s	p	o	x
http://dbpedia.org/ontology/deathDate	http://www.w3.org/1999/02/22-rdf-syntax-ns#type	http://www.w3.org/2002/07/owl#FunctionalProperty	http://www.w3.org/2001/XMLSchema#string

ghost · 2021-12-26T14:39:20Z

ghost
Dec 26, 2021

TL;DR - the behaviour of URIRef is correct. Strip any “#” fragment identifier off the end of the base URI and prefix the value with it instead.

Incorrect

>>> URIRef('string', base='http://www.w3.org/2001/XMLSchema#')
rdflib.term.URIRef(u'http://www.w3.org/2001/string')

Correct

>>> URIRef("#string", base="http://www.w3.org/2001/XMLSchema")
rdflib.term.URIRef('http://www.w3.org/2001/XMLSchema#string')

Explanation

Less cryptically, you're “holding it wrong” --- but the lack of documentation doesn't help.

The thing is, that on its own the # (octothorpe), aka “fragment identifier” signifies “empty fragment”, i.e. there's nothing after it and so it’s optimised away to nothing. The popular approach to publishing namespace URIs suffixed with an empty fragment, e.g. http://www.w3.org/2001/XMLSchema# is presumably to enable simple string concatenation <namespace> + <concept> but it is slightly misleading in that http://www.w3.org/2001/XMLSchema# is actually identical to http://www.w3.org/2001/XMLSchema and a simple string concatenation in the latter case would produce a wrong URI.

The example you provided

>>> URIRef('string', base='http://www.w3.org/2001/XMLSchema#')
rdflib.term.URIRef(u'http://www.w3.org/2001/string')

is correct according to the Python documentation for urljoin (which is what the RDFLib implementation of URIRef uses) - because the fragment is empty, the “#” is stripped off and “XMLSchema” is replaced by “string”, just like the example in the Python docs wherein “Python.html” is replace by “FAQ.html”:

>>> from urllib.parse import urljoin
>>> urljoin('http://www.cwi.nl/%7Eguido/Python.html', 'FAQ.html')
'http://www.cwi.nl/%7Eguido/FAQ.html'

Now what you wanted was:

>>> URIRef("#string", base="http://www.w3.org/2001/XMLSchema")
rdflib.term.URIRef('http://www.w3.org/2001/XMLSchema#string')

At first glance, that might seem a bit nit-picky but it's necessary because of the equivalence of a URI with and without a fragment identifier appended. Using the XSD namespace as an example, solid-namespace publishes it with a fragment identifier appended, whereas Linked Open Vocabularies doesn't.

The wikipedia entry for URI_fragment notes:

“In RDF vocabularies, such as RDFS, OWL, or SKOS, fragment identifiers are used to identify resources in the same XML Namespace, but are not necessarily corresponding to a specific part of a document. For example, http://www.w3.org/2004/02/skos/core#broader identifies the concept "broader" in SKOS Core vocabulary, but it does not refer to a specific part of the resource identified by http://www.w3.org/2004/02/skos/core, a complete RDF file in which semantics of this specific concept is declared, along with other concepts in the same vocabulary.”

Note: “a complete RDF file” - so a fragment identifier is needed to (notionally) index into the document. OTOH, trailing slashes indicate that's it's not a complete RDF file but (again, notionally) a collection of smaller ones and so the indexing is just straightforward slash-based and doesn't use, or need, a fragment identifier:

>>> URIRef("deathDate", base="http://dbpedia.org/ontology/")
rdflib.term.URIRef('http://dbpedia.org/ontology/deathDate')

Some more examples, just for completeness ...

With fragment identifier:

>>> URIRef("#type", base="http://www.w3.org/1999/02/22-rdf-syntax-ns")
rdflib.term.URIRef('http://www.w3.org/1999/02/22-rdf-syntax-ns#type')

>>> URIRef("#FunctionalProperty", base="http://www.w3.org/2002/07/owl")
rdflib.term.URIRef('http://www.w3.org/2002/07/owl#FunctionalProperty')

>>> URIRef("#string", base="http://www.w3.org/2001/XMLSchema")
rdflib.term.URIRef('http://www.w3.org/2001/XMLSchema#string')

Without fragment identifier:

>>> URIRef("title", base="http://purl.org/dc/terms/")
rdflib.term.URIRef('http://purl.org/dc/terms/title')

>>> URIRef("abstract", base="http://purl.org/dc/dcmitype/")
rdflib.term.URIRef('http://purl.org/dc/dcmitype/abstract')

I hope this clarifies how to use URIRef to get the result you want.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

URIRef with base and fragment #1594

{{title}}

Replies: 1 comment

{{title}}

Select a reply

URIRef with base and fragment #1594

aothms Jul 20, 2017

Replies: 1 comment

ghost Dec 26, 2021

Incorrect

Correct

Explanation

With fragment identifier:

Without fragment identifier:

aothms
Jul 20, 2017

ghost
Dec 26, 2021