New shape type to check RDF-repository level constraints #83

amivanoff · 2024-09-14T15:03:31Z

I'm proposing to consider a new type of SHACL Shape (e.g. 'RepositoryShape') to enforce a vendor-neutral constaints on RDF graphs on a repository level (or triplestore-level in case if there is to 'repositories' in a specific triplestore).

Constaints on prefixes/namespaces

a list of allowed pairs prefix-namespace (e.g. allow 'dcterms:' only for 'http://purl.org/dc/terms/', but not 'dc:' or 'dct:', or allow 'sh:' but not 'shacl:'), could be closed or open list
a list of allowed namespace IRI patterns, could be closed or open list (in this case a disallowed namespace IRI patterns list could be helpful)
a list of disallowed namespace IRI patterns

Constaints on graphs:

a list of allowed graph IRI patterns, could be closed or open list (in this case a disallowed graph IRI patterns list could be helpful)
a list of disallowed graph IRI patterns

Maybe there are other high-level constraints which are not covered by existing shape types.

tpluscode · 2024-09-14T19:47:18Z

Hello @amivanoff. Thanks for sharing. I gotta say though, I have some reservations...

a list of allowed pairs prefix-namespace

Prefixes are purely syntactic feature of serializations so this is one does not really make sense

a list of allowed namespace IRI patterns
a list of disallowed namespace IRI patterns

This could work, but not necessarily on "repository level". If you had a wide-target shape, such as sh:targetClass rdfs:Resource, then in theory you target everything. Store which already implement SHACL would just run with it.

The real question is, "why?". Also, predicates, or subject, or objects, or all of them? What is the use case? Something like, "I'd like to ban schema.org" kind of restriction?

a list of allowed graph IRI patterns
a list of disallowed graph IRI patterns

These are kinda interesting, but again, what is the use case? For me, this sounds unnecessarily limiting without a clear purpose.

tpluscode · 2024-09-14T19:49:12Z

That said, Constraints on graphs strike me as a little out of scope. From the spec, SHACL is:

a language for validating RDF graphs against a set of conditions

That implicitly, if not poorly phrased, means the contents of RDF graphs. Named graphs are a different matter, a little unrelated to the actual graph data.

Well, maybe not poorly phrased because "RDF graph" is by definition a data structure. Organising that data in named graphs, or files if you'd be talking about a filesystem, is an orthogonal concern.

amivanoff · 2024-09-14T23:48:50Z

Prefixes are purely syntactic feature of serializations so this is one does not really make sense

Yes, according to the RDF 1.1 spec, prefixes are purely syntactic. But many triplestores (if not all) keep and expand prefixes-namespaces maps while parsing/serializing every uploaded RDF files. Triplestores even have an editing API for these maps. And some collisions with namespaces have appeared before and continue to appear now. Like dc/dct/dcterm/dcterms and sh/shacl.

Today the Pandora's box of profiles is opened. New and new people are authoring profiles in SHACL. And these collisions will appear more frequently.

But what problems could it cause?

1 It could affect serialization like here described here SEMICeu/MLDCAT-AP#21

2 We have a way to declare prefix and namespaces. And SHACL spec authors used it. So this 'namespaces semantics' leaked into the the SHACL spec already. If it wasn't important, as you mentioned, why bother to specify and check it at all?

shsh:
	rdfs:label "SHACL for SHACL"@en ;
	rdfs:comment "This shapes graph can be used to validate SHACL shapes graphs against a subset of the syntax rules."@en ;
	sh:declare [
		sh:prefix "shsh" ;
		sh:namespace "http://www.w3.org/ns/shacl-shacl#" ;
	] .

sh:
	a owl:Ontology ;
	rdfs:label "W3C Shapes Constraint Language (SHACL) Vocabulary"@en ;
	rdfs:comment "This vocabulary defines terms used in SHACL, the W3C Shapes Constraint Language."@en ;
	sh:declare [
		sh:prefix "sh" ;
		sh:namespace "http://www.w3.org/ns/shacl#"^^xsd:anyURI ;
	] ;
	sh:suggestedShapesGraph <http://www.w3.org/ns/shacl-shacl#> .

Well, because these declarations are used by SHACL-SPARQL processor internally. And validator takes into account some of namespaces semantics when dealing with owl:imports*/sh:declare values in owl:Ontology and sh:prefixes in sh:SPARQLConstraint.

In our case we are generating SPARQL Query and SPARQL Update queries on a client side based on shapes from different domains living in one virtual or non-virtual RDF multi-graph. And these collisions hurt us a little also. Today we are solving it manually, by forking profiles, which are not playing nice together.

Maybe it could be done in a more general way, not only for a validator's internal SPARQL Processor, but for the rest of us. On RDF namespaces level, without owl:Ontology and owl:imports. Maybe it could help to clean up this current prefixes-namespaces mess, caused by the RDF 1.1 spec.

In fact, I am proposing not a RepositoryShape itself, but a way to constrain repository namespaces and maybe any other things on this level. Maybe it could be done with other syntax or maybe SHACL shapes is not the appropriate place for it. Or not a priority today.

UC1:

A profile author wants to validate his third-party profile, expressed in SHACL.
He uses the new SHACL spec shapes as a constraint for validator. This spec is enforcing the sh:http://www.w3.org/ns/shacl# prefix-namespaces in third-party profiles, expressed in SHACL.
Validator checks prefixes-namespaces from uploaded RDF files. Something like this:
- If there are multiple prefixes for the same namespace
- If there are multiple namespace for the same prefix
- If prefix from uploaded document differs from recommended prefix for this namespace

amivanoff · 2024-09-15T01:14:06Z

@tpluscode, am I understanding correctly, that

This proposal extends the RDF Model, adds prefixes and namespaces into it. Both of which the Model ignores for decades.
SHACL Shapes spec could not extend RDF 1.2 Concepts and Abstract Syntax in any way.
Therefore in a lightweight stack RDFS + SHACL nothing could be done with namespaces on a vendor neutral way at all. Not on the SHACL level. Not on any other level.
At least, until (and if) RDF Concepts will start to pay attention to the namespaces as a part of the RDF Model.
Only talking, informal "best practices" and so on could help.
And non-standard innovations in triplestores (like prefix-namespace map, API to control it and other extensions beyond the Model).

HolgerKnublauch · 2024-09-15T14:39:58Z

@amivanoff I sympathize with your problem statement. It would be useful to be able to define constraints about prefixes. As discussed here, the underlying issue is that prefixes are not part of the RDF data model even though many triple stores do in fact have infrastructure to store them. For SHACL we therefore had to invent a vocabulary to turn these declarations into triples.

If RDF had a deeper integration of prefixes, there would also be some way to query those. For example, there could be standard SPARQL functions to query the declared prefixes. If that would be the case, then SHACL-SPARQL could be used to express constraints or define constraint components for typical use cases. In the absence of this, maybe enough triple store implementers could agree on a set of such de-facto standard functions.

This does not require a new shape type (SHACL has node shapes and property shapes). To express graph-wide constraints, it is sufficient to just declare a node shape that is linked to some focus node that represents the graph itself. Often that resource exists and has the graph URI and type owl:Ontology. People could then use sh:targetClass owl:Ontology to talk about these graphs, or use a singleton patten based on sh:targetNode.

Having said all this, I believe this topic is outside of the SHACL spec, and cannot be solved by "us". SHACL defines the sh:prefixes vocabulary which could also be leveraged for other purposes, but it's not up to "us" to influent what RDF or SPARQL standards should do, as we kind-of sit downstream from them.

TallTed · 2024-09-16T19:30:01Z

But many triplestores (if not all) keep and expand prefixes-namespaces maps while parsing/serializing every uploaded RDF files.

Certainly not all.

Virtuoso does allow users with certain permissions to modify a persistent list of prefixes and their namespaces (see the set on the DBpedia public SPARQL endpoint, for instance), but this is not automatically populated based on ingested data. See documentation of the built-in xml_set_ns_decl function.

As documented, Virtuoso will use this list when producing RDF output serializations that support such prefixes, as well as when executing SPARQL queries that lack PREFIX declarations for any prefixes used in that query that are found in the list.

I urge caution in assuming that a feature or function that you observe in some number of software tools (such as these prefix-namespace maps) is (almost) universal, especially but not only if that feature or function is not part of a standard that is implemented in/by those tools. Such assumptions can lead to unexpected lock-in and/or malfunction of your own software or process that is built to depend on such un-standardized feature or function, down the road.

Further, I encourage you to explore the prefix.cc service (see the entry for dc, for instance) to get an idea of how common "collisions" are.

As things stand, prefix declarations should be treated as limited in scope, usually to a given document (like a Turtle document — though a single Turtle document may contain multiple declarations of the same prefix with different namespaces, each of which declarations is active and authoritative until the next declaration of that prefix) or a given query (like a SPARQL query), and sometimes to a given instance of a given RDF store (such as the Virtuoso-powered DBpedia and URIBurner SPARQL endpoints, for which the stored lists and the URIs through which they are viewable are quite different) — though any query or document that includes different declarations for the same prefix string SHOULD over-ride the stored namespace for purposes of that query or document. Failure to do so will inevitably lead to unexpected results, though realizing that such a problem is active may take some time, never mind determining its cause and/or subsequent solution!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

New shape type to check RDF-repository level constraints #83

New shape type to check RDF-repository level constraints #83

amivanoff commented Sep 14, 2024

tpluscode commented Sep 14, 2024

tpluscode commented Sep 14, 2024 •

edited

Loading

amivanoff commented Sep 14, 2024

amivanoff commented Sep 15, 2024 •

edited

Loading

HolgerKnublauch commented Sep 15, 2024

TallTed commented Sep 16, 2024

New shape type to check RDF-repository level constraints #83

New shape type to check RDF-repository level constraints #83

Comments

amivanoff commented Sep 14, 2024

tpluscode commented Sep 14, 2024

tpluscode commented Sep 14, 2024 • edited Loading

amivanoff commented Sep 14, 2024

amivanoff commented Sep 15, 2024 • edited Loading

HolgerKnublauch commented Sep 15, 2024

TallTed commented Sep 16, 2024

tpluscode commented Sep 14, 2024 •

edited

Loading

amivanoff commented Sep 15, 2024 •

edited

Loading