-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New shape type to check RDF-repository level constraints #83
Comments
Hello @amivanoff. Thanks for sharing. I gotta say though, I have some reservations...
Prefixes are purely syntactic feature of serializations so this is one does not really make sense
This could work, but not necessarily on "repository level". If you had a wide-target shape, such as The real question is, "why?". Also, predicates, or subject, or objects, or all of them? What is the use case? Something like, "I'd like to ban
These are kinda interesting, but again, what is the use case? For me, this sounds unnecessarily limiting without a clear purpose. |
That said,
That implicitly, if not poorly phrased, means the contents of RDF graphs. Named graphs are a different matter, a little unrelated to the actual graph data. Well, maybe not poorly phrased because "RDF graph" is by definition a data structure. Organising that data in named graphs, or files if you'd be talking about a filesystem, is an orthogonal concern. |
Yes, according to the RDF 1.1 spec, prefixes are purely syntactic. But many triplestores (if not all) keep and expand prefixes-namespaces maps while parsing/serializing every uploaded RDF files. Triplestores even have an editing API for these maps. And some collisions with namespaces have appeared before and continue to appear now. Like dc/dct/dcterm/dcterms and sh/shacl. Today the Pandora's box of profiles is opened. New and new people are authoring profiles in SHACL. And these collisions will appear more frequently. But what problems could it cause? 1 It could affect serialization like here described here SEMICeu/MLDCAT-AP#21 2 We have a way to declare prefix and namespaces. And SHACL spec authors used it. So this 'namespaces semantics' leaked into the the SHACL spec already. If it wasn't important, as you mentioned, why bother to specify and check it at all? shsh:
rdfs:label "SHACL for SHACL"@en ;
rdfs:comment "This shapes graph can be used to validate SHACL shapes graphs against a subset of the syntax rules."@en ;
sh:declare [
sh:prefix "shsh" ;
sh:namespace "http://www.w3.org/ns/shacl-shacl#" ;
] . sh:
a owl:Ontology ;
rdfs:label "W3C Shapes Constraint Language (SHACL) Vocabulary"@en ;
rdfs:comment "This vocabulary defines terms used in SHACL, the W3C Shapes Constraint Language."@en ;
sh:declare [
sh:prefix "sh" ;
sh:namespace "http://www.w3.org/ns/shacl#"^^xsd:anyURI ;
] ;
sh:suggestedShapesGraph <http://www.w3.org/ns/shacl-shacl#> . Well, because these declarations are used by SHACL-SPARQL processor internally. And validator takes into account some of namespaces semantics when dealing with In our case we are generating SPARQL Query and SPARQL Update queries on a client side based on shapes from different domains living in one virtual or non-virtual RDF multi-graph. And these collisions hurt us a little also. Today we are solving it manually, by forking profiles, which are not playing nice together. Maybe it could be done in a more general way, not only for a validator's internal SPARQL Processor, but for the rest of us. On RDF namespaces level, without In fact, I am proposing not a RepositoryShape itself, but a way to constrain repository namespaces and maybe any other things on this level. Maybe it could be done with other syntax or maybe SHACL shapes is not the appropriate place for it. Or not a priority today. UC1:
|
@tpluscode, am I understanding correctly, that
|
@amivanoff I sympathize with your problem statement. It would be useful to be able to define constraints about prefixes. As discussed here, the underlying issue is that prefixes are not part of the RDF data model even though many triple stores do in fact have infrastructure to store them. For SHACL we therefore had to invent a vocabulary to turn these declarations into triples. If RDF had a deeper integration of prefixes, there would also be some way to query those. For example, there could be standard SPARQL functions to query the declared prefixes. If that would be the case, then SHACL-SPARQL could be used to express constraints or define constraint components for typical use cases. In the absence of this, maybe enough triple store implementers could agree on a set of such de-facto standard functions. This does not require a new shape type (SHACL has node shapes and property shapes). To express graph-wide constraints, it is sufficient to just declare a node shape that is linked to some focus node that represents the graph itself. Often that resource exists and has the graph URI and type owl:Ontology. People could then use sh:targetClass owl:Ontology to talk about these graphs, or use a singleton patten based on sh:targetNode. Having said all this, I believe this topic is outside of the SHACL spec, and cannot be solved by "us". SHACL defines the sh:prefixes vocabulary which could also be leveraged for other purposes, but it's not up to "us" to influent what RDF or SPARQL standards should do, as we kind-of sit downstream from them. |
Certainly not all. Virtuoso does allow users with certain permissions to modify a persistent list of prefixes and their namespaces (see the set on the DBpedia public SPARQL endpoint, for instance), but this is not automatically populated based on ingested data. See documentation of the built-in As documented, Virtuoso will use this list when producing RDF output serializations that support such prefixes, as well as when executing SPARQL queries that lack I urge caution in assuming that a feature or function that you observe in some number of software tools (such as these prefix-namespace maps) is (almost) universal, especially but not only if that feature or function is not part of a standard that is implemented in/by those tools. Such assumptions can lead to unexpected lock-in and/or malfunction of your own software or process that is built to depend on such un-standardized feature or function, down the road. Further, I encourage you to explore the prefix.cc service (see the entry for As things stand, prefix declarations should be treated as limited in scope, usually to a given document (like a Turtle document — though a single Turtle document may contain multiple declarations of the same prefix with different namespaces, each of which declarations is active and authoritative until the next declaration of that prefix) or a given query (like a SPARQL query), and sometimes to a given instance of a given RDF store (such as the Virtuoso-powered DBpedia and URIBurner SPARQL endpoints, for which the stored lists and the URIs through which they are viewable are quite different) — though any query or document that includes different declarations for the same prefix string SHOULD over-ride the stored namespace for purposes of that query or document. Failure to do so will inevitably lead to unexpected results, though realizing that such a problem is active may take some time, never mind determining its cause and/or subsequent solution! |
I'm proposing to consider a new type of SHACL Shape (e.g. 'RepositoryShape') to enforce a vendor-neutral constaints on RDF graphs on a repository level (or triplestore-level in case if there is to 'repositories' in a specific triplestore).
Constaints on prefixes/namespaces
Constaints on graphs:
Maybe there are other high-level constraints which are not covered by existing shape types.
The text was updated successfully, but these errors were encountered: