SaGe is a SPARQL query engine for public Linked Data providers that implements Web preemption. The SPARQL engine includes a smart Sage client and a Sage SPARQL query server hosting RDF datasets (hosted using HDT). This repository contains the Python implementation of the SaGe SPARQL query server.
SPARQL queries are suspended by the web server after a fixed quantum of time and resumed upon client request. Using Web preemption, Sage ensures stable response times for query execution and completeness of results under high load.
The complete approach and experimental results are available in a Research paper accepted at The Web Conference 2019, available here. Thomas Minier, Hala Skaf-Molli and Pascal Molli. "SaGe: Web Preemption for Public SPARQL Query services" in Proceedings of the 2019 World Wide Web Conference (WWW'19), San Francisco, USA, May 13-17, 2019.
We appreciate your feedback/comments/questions to be sent to our mailing list or our issue tracker on github.
Installation in a virtualenv is strongly advised!
Requirements:
- Python 3.7 (or higher)
- pip
- gcc/clang with c++11 support
- Python Development headers
You should have the
Python.h
header available on your system.
For example, for Python 3.6, install thepython3.6-dev
package on Debian/Ubuntu systems.
The core engine of the SaGe SPARQL query server with HDT as a backend can be installed as follows:
pip install sage-engine[hdt,postgres]
The SaGe query engine uses various backends to load RDF datasets. The various backends available are installed as extras dependencies. The above command install both the HDT and PostgreSQL backends.
The SaGe SPARQL query server can also be manually installed using the poetry dependency manager.
git clone https://github.com/sage-org/sage-engine
cd sage-engine
poetry install --extras "hdt postgre"
As with pip, the various SaGe backends are installed as extras dependencies, using the --extras
flag.
A Sage server is configured using a configuration file in YAML syntax.
You will find below a minimal working example of such configuration file.
A full example is available in the config_examples/
directory
name: SaGe Test server
maintainer: Chuck Norris
quota: 75
max_results: 2000
graphs:
-
name: dbpedia
uri: http://example.org/dbpedia
description: DBPedia
backend: hdt-file
file: datasets/dbpedia.2016.hdt
The quota
and max_results
fields are used to set the maximum time quantum and the maximum number of results
allowed per request, respectively.
Each entry in the datasets
field declare a RDF dataset with a name, description, backend and options specific to this backend.
Currently, only the hdt-file
backend is supported, which allow a Sage server to load RDF datasets from HDT files. Sage uses pyHDT to load and query HDT files.
The sage
executable, installed alongside the Sage server, allows to easily start a Sage server from a configuration file using Gunicorn, a Python WSGI HTTP Server.
# launch Sage server with 4 workers on port 8000
sage my_config.yaml -w 4 -p 8000
The full usage of the sage
executable is detailed below:
Usage: sage [OPTIONS] CONFIG
Launch the Sage server using the CONFIG configuration file
Options:
-p, --port INTEGER The port to bind [default: 8000]
-w, --workers INTEGER The number of server workers [default: 4]
--log-level [debug|info|warning|error]
The granularity of log outputs [default:
info]
--help Show this message and exit.
The Sage server is also available through a Docker image. In order to use it, do not forget to mount in the container the directory that contains you configuration file and your datasets.
docker pull callidon/sage
docker run -v path/to/config-file:/opt/data/ -p 8000:8000 callidon/sage sage /opt/data/config.yaml -w 4 -p 8000
To generate the documentation, navigate in the docs
directory and generate the documentation
cd docs/
make html
open build/html/index.html
Copyright 2017-2019 - GDD Team, LS2N, University of Nantes