Skip to content
Martin Ledvinka edited this page Sep 20, 2023 · 11 revisions

OntoDriver is a data access layer used by JOPA. Splitting storage access and the object-triple/ontology mapping allows storage-specific OntoDriver implementations to be added easily. In addition, the application can then switch the underlying storage by merely changing a few configuration parameters in the persistence setup.

This page gives an overview of setting up the persistence and some basic insight into the configuration of the particular OntoDriver implementations.

OntoDriver Configuration

Common OntoDriver configuration parameters can be found, together with their explanation, in class cz.cvut.kbss.ontodriver.config.OntoDriverProperties.

Jena OntoDriver

The Jena OntoDriver is the latest implementation of OntoDriver. It can use the following underlying storages:

  • In-memory - a transactional Jena Dataset is created.
  • File - Jena RDFDataMng is used to load a model from a file.
  • TDB

Support for Jena SDB was also considered, but since the SDB project development has stopped it is unlikely that it will be implemented in the Jena OntoDriver.

Jena OntoDriver configuration parameters and their possible values can be found in cz.cvut.kbss.ontodriver.jena.config.JenaOntoDriverProperties.

Note also that to Jena applies the cz.cvut.jopa.reasonerFactoryClass parameter, which allows to specify a Jena-compatible reasoner implementation.

The driver supports two transaction isolation strategies:

Read-committed

Each transaction keeps a local model consisting of added and removed statements. Find statements operations are run against a shared model and their results are enhanced with the transaction-local changes. However, this means that when the shared model changes, e.g. another transaction commits changes to it, a find operation may produce different results in subsequent calls.

The local model changes do not apply to SPARQL query results, which are run against the shared model only.

Snapshot-based

Each transaction on begin creates a complete snapshot of the dataset and operates on it. On commit, the changes done to it are merged into the main dataset. This strategy provides better isolation, but is more demanding in terms of memory.

This strategy is also used when a reasoner is specified for the driver, because the reasoner has to operate on a single model (i.e. it is not possible to use a reasoner and the read-committed strategy).

OWL API OntoDriver

The OWL API driver supports access to proper OWL (2) ontologies via OWL API.

The driver supports some specific features, namely module extraction where signature-based submodules of the main ontology can be extracted to improve performance. Also, OWL API allows a mapping file to be specified which is used when resolving logical IRIs of ontologies, e.g. in imports.

The parameters are explained in cz.cvut.kbss.ontodriver.owlapi.config.OwlapiOntoDriverProperties.

OWL API also can make use of the cz.cvut.jopa.reasonerFactoryClass parameter - a OWLReasoner-compatible implementation class has to be specified.

The driver supports SPARQL DL queries by relying on OWL2Query. However, the range of operators implemented is limited.

The OWL API driver uses the snapshot-based transaction isolation strategy, described above.

RDF4J OntoDriver

Driver configuration parameters with explanation can be found in cz.cvut.kbss.ontodriver.rdf4j.config.Rdf4jOntoDriverProperties. Mainly, it is about deciding whether an RDFS forward chaining rule-based reasoner should be used or not.

Supported storages are:

  • In-memory - has to be configured via the cz.cvut.kbss.ontodriver.rdf4j.use-volatile-storage property.
  • RDF4J native store - specify a valid folder on file system and RDF4J will create the necessary binary files (or load the data if they already exist).
  • RDF4J server repository - if a URL of remote RDF4J repository is specified, the driver will connect to it.

Repository Configuration

It is also possible to pass a path to a repository configuration file (usually TTL) to the driver. The content of this file is then used to configure the embedded storage (in-memory or local native). This way, a SHACL or Lucene storage can be created in memory or as a local native repository without starting a full-blown RDF4J server, for example. To exploit this configuration, pass a path to the repository configuration file using the cz.cvut.kbss.ontodriver.rdf4j.repository-config property to the driver. The path may be relative or absolute (more is described in Javadoc to the Rdf4jOntoDriverProperties class). Examples of various repository configuration files can be found in the RDF4J GitHub repository - https://github.com/eclipse/rdf4j/tree/master/core/repository/api/src/main/resources/org/eclipse/rdf4j/repository/config.

The RDF4J driver uses the read-committed transaction isolation strategy described above.

Inference

Since version 0.18.6, the RDF4J driver is able to recognize when it is connected to a GraphDB repository (by checking for presence of internal GraphDB identifiers) and if so, adjust resolution of contexts for inferred statements. This is based on the fact that GraphDB uses a pseudo-context for accessing inferred statements. The driver thus augments contexts specified by the descriptors to include this implicit pseudo-context for statement loading.

Default Context for Inferred Statements

In 0.18.6, a new configuration parameter was introduced to the driver - cz.cvut.kbss.ontodriver.rdf4j.inference-in-default-context. This parameter configures whether inferred statements in a regular RDF4J repository are expected to reside in the default context or in the context of the statements from which they were inferred. This is useful in two scenarios:

  1. When using rules in RDF4J. Statements generated by rules (typically SPIN, although that is deprecated) are inserted into the default repository context.
  2. For compatibility with GraphDB-based data access (see above). For example, one may be using GraphDB in a deployed application, but use an in-memory RDF4J repository for tests. To be able to use the same context configuration (via descriptors) for production and test code, this parameter will need to be set to true.
Connection Pool

When connecting to a remote repository (RDF4J-compatible server), RDF4J uses Apache HTTP client with a connection pool. This pool has a maximum size that may prove insufficient for highly concurrent environments. It is now (since 1.1.2) possible to configure the maximum size of this connection pool as well as the timeout after which a waiting connection requests will fail with an exception. The following properties can be used to configure these values:

  • cz.cvut.kbss.ontodriver.rdf4j.max-connections - maximum size of the connection pool. Defaults to Math.max(20, #CPUs * 2), where #CPUs is the number of available CPUs
  • cz.cvut.kbss.ontodriver.rdf4j.connection-request-timeout - connection request timeout in millis, defaults to 30000ms (30s)