Skip to content

An in-memory incremental Datalog engine based on Differential Dataflow

License

Notifications You must be signed in to change notification settings

mihaibudiu/differential-datalog

 
 

Repository files navigation

License: MIT CI workflow pipeline status rustc Gitter chat

Differential Datalog (DDlog)

DDlog is a programming language for incremental computation. It is well suited for writing programs that continuously update their output in response to input changes. With DDlog, the programmer does not need to worry about writing incremental algorithms. Instead they specify the desired input-output mapping in a declarative manner, using a dialect of Datalog. The DDlog compiler then synthesizes an efficient incremental implementation. DDlog is based on Frank McSherry's excellent differential dataflow library.

DDlog has the following key properties:

  1. Relational: A DDlog program transforms a set of input relations (or tables) into a set of output relations. It is thus well suited for applications that operate on relational data, ranging from real-time analytics to cloud management systems and static program analysis tools.

  2. Dataflow-oriented: At runtime, a DDlog program accepts a stream of updates to input relations. Each update inserts, deletes, or modifies a subset of input records. DDlog responds to an input update by outputting an update to its output relations.

  3. Incremental: DDlog processes input updates by performing the minimum amount of work necessary to compute changes to output relations. This has significant performance benefits for many queries.

  4. Bottom-up: DDlog starts from a set of input facts and computes all possible derived facts by following user-defined rules, in a bottom-up fashion. In contrast, top-down engines are optimized to answer individual user queries without computing all possible facts ahead of time. For example, given a Datalog program that computes pairs of connected vertices in a graph, a bottom-up engine maintains the set of all such pairs. A top-down engine, on the other hand, is triggered by a user query to determine whether a pair of vertices is connected and handles the query by searching for a derivation chain back to ground facts. The bottom-up approach is preferable in applications where all derived facts must be computed ahead of time and in applications where the cost of initial computation is amortized across a large number of queries.

  5. In-memory: DDlog stores and processes data in memory. In a typical use case, a DDlog program is used in conjunction with a persistent database, with database records being fed to DDlog as ground facts and the derived facts computed by DDlog being written back to the database.

    At the moment, DDlog can only operate on databases that completely fit the memory of a single machine. We are working on a distributed version of DDlog that will be able to partition its state and computation across multiple machines.

  6. Typed: In its classical textbook form Datalog is more of a mathematical formalism than a practical tool for programmers. In particular, pure Datalog does not have concepts like types, arithmetics, strings or functions. To facilitate writing of safe, clear, and concise code, DDlog extends pure Datalog with:

    1. A powerful type system, including Booleans, unlimited precision integers, bitvectors, floating point numbers, strings, tuples, tagged unions, vectors, sets, and maps. All of these types can be stored in DDlog relations and manipulated by DDlog rules. Thus, with DDlog one can perform relational operations, such as joins, directly over structured data, without having to flatten it first (as is often done in SQL databases).

    2. Standard integer, bitvector, and floating point arithmetic.

    3. A simple procedural language that allows expressing many computations natively in DDlog without resorting to external functions.

    4. String operations, including string concatenation and interpolation.

    5. Syntactic sugar for writing imperative-style code using for/let/assignments.

  7. Integrated: while DDlog programs can be run interactively via a command line interface, its primary use case is to integrate with other applications that require deductive database functionality. A DDlog program is compiled into a Rust library that can be linked against a Rust, C/C++, Java, or Go program (bindings for other languages can be easily added). This enables good performance, but somewhat limits the flexibility, as changes to the relational schema or rules require re-compilation.

Documentation

Installation

Installing DDlog from a binary release

To install a precompiled version of DDlog, download the latest binary release, extract it from archive, add ddlog/bin to your $PATH, and set $DDLOG_HOME to point to the ddlog directory. You will also need to install the Rust toolchain (see instructions below).

If you're using OS X, you will need to override the binary's security settings through these instructions. Else, when first running the DDlog compiler (through calling ddlog), you will get the following warning dialog:

"ddlog" cannot be opened because the developer cannot be verified.
macOS cannot verify that this app is free from malware.

You are now ready to start coding in DDlog.

Compiling DDlog from sources

Installing dependencies manually

  • Haskell stack:
    wget -qO- https://get.haskellstack.org/ | sh
    
  • Rust toolchain v1.52.1 or later:
    curl https://sh.rustup.rs -sSf | sh
    . $HOME/.cargo/env
    rustup component add rustfmt
    rustup component add clippy
    
    Note: The rustup script adds path to Rust toolchain binaries (typically, $HOME/.cargo/bin) to ~/.profile, so that it becomes effective at the next login attempt. To configure your current shell run source $HOME/.cargo/env.
  • JDK, e.g.:
    apt install default-jdk
    
  • Google FlatBuffers library. Download and build FlatBuffers release 1.11.0 from github. Make sure that the flatc tool is in your $PATH. Additionally, make sure that FlatBuffers Java classes are in your $CLASSPATH:
    ./tools/install-flatbuf.sh
    cd flatbuffers
    export CLASSPATH=`pwd`"/java":$CLASSPATH
    export PATH=`pwd`:$PATH
    cd ..
    
  • Static versions of the following libraries: libpthread.a, libc.a, libm.a, librt.a, libutil.a, libdl.a, libgmp.a, and libstdc++.a can be installed from distro-specific packages. On Ubuntu:
    apt install libc6-dev libgmp-dev
    
    On Fedora:
    dnf install glibc-static gmp-static libstdc++-static
    

Building

To build the software once you've installed the dependencies using one of the above methods, clone this repository and set $DDLOG_HOME variable to point to the root of the repository. Run

stack build

anywhere inside the repository to build the DDlog compiler. To install DDlog binaries in Haskell stack's default binary directory:

stack install

To install to a different location:

stack install --local-bin-path <custom_path>

To test basic DDlog functionality:

stack test --ta '-p path'

Note: this takes a few minutes

You are now ready to start coding in DDlog.

vim syntax highlighting

The easiest way to enable differential datalog syntax highlighting for .dl files in Vim is by creating a symlink from <ddlog-folder>/tools/vim/syntax/dl.vim into ~/.vim/syntax/.

If you are using a plugin manager you may be able to directly consume the file from the upstream repository as well. In the case of Vundle, for example, configuration could look as follows:

call vundle#begin('~/.config/nvim/bundle')
...
Plugin 'vmware/differential-datalog', {'rtp': 'tools/vim'} <---- relevant line
...
call vundle#end()

Debugging with GHCi

To run the test suite with the GHCi debugger:

stack ghci --ghci-options -isrc --ghci-options -itest differential-datalog:differential-datalog-test

and type do main in the command prompt.

Building with profiling info enabled

stack clean

followed by

stack build --profile

or

stack test --profile

About

An in-memory incremental Datalog engine based on Differential Dataflow

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Rust 30.5%
  • Haskell 29.9%
  • Java 28.0%
  • C 3.4%
  • Python 3.1%
  • Go 1.8%
  • Other 3.3%