-
Notifications
You must be signed in to change notification settings - Fork 40
SPARQL Endpoints
The architecture of RDFUnit decouples the test case generation from the test execution. This makes it possible to target the same tests in different data sources and SPARQL endpoints are one of them.
The overview of the command line options, inlcuding SPARQL endpoints, is available on the CLI wiki page. This page is to give a better overview of the different options that can be tweaked to give better results and/or performance.
Some constraints need to have the schema in the dataset to e.g. traverse the type hierarchy. When you validate in-memory datasets, RDFUnit loads the schema to avoid this problem. However, this is not possible in an immutable dataset such as an Endpoint. Make sure you keep the relevant schemas together with the data or in a named graph. If this is not the case, you might get some false-negatives.
Automated schema discovery is also available in SPARQL endpoints. This will create some load when RDFUnit profiles the endpoint but should be quick.
Note that RDFUnit cannot yet load/discover SHACL constraints directly from the endpoint. SHACL constraints need to be passed with the -s
command option
By default, RDFUnit targets the default
graph. Depending on the configuration of the endpoint, all graphs might be queried as well. Check your configuration to make sure.
With the -g
options you may optionally specify the graph or graphs that the RDFUnit will validate. This option will ignore the contents of all the other graphs.
If you want to validate the combination of the default
graph together with other graphs, please consult your endpoint documentation to get the IRI
of your default graph. e.g. for Stardog this is <tag:stardog:api:context:default>
.
To avoid abuse of public SPARQL endpoint, RDFUnit has a default delay between queries, this also helps to keep the load of your endpoint down.
If you want to remove this delay use -D 0
the time is in milliseconds
By default, there is no limit defined in the results you request. When you use the aggregated
or status
execution type, limit
has no effect since every query returns only one results. For shacl
or shacl-lite
you can specify limit
to force the endpoint to return a maximum of X
violations per constraint.
Besides reducing the load on the server, this option can be used to return error samples per constraint.
This is very useful for large datasets where the same constraint can have thousands or millions of violation instances.
By default, there is no pagination in the results. If your endpoint has a limit in the results it can return in a single query, use e.g. -P 1000
. This breaks the results in chunks of 1000 and internally assembles the results together. AS with limit, this option is relevant only when you use the shacl
or shacl-lite
execution option.
By default, RDFUnit creates a file cache when you validate SPARQL endpoints. This speed-ups consecutive validations assuming the data in the endpoint remain static or change very infrequently. If this is not your case, please disable caching by adding -T 0
to disable it or -T xx
to keeping a query cache fow xx
minutes.