/ɣu.jə.ˈdɑx/: hello, good day (Dutch greeting used during daytime)
goeieDAG provides a unified Python API to Ninja and Make (TODO) build systems, aiming to make it extremely easy to benefit from parallel processing in any graph-like workflow.
pip install goeieDAG==0.0.2
from pathlib import Path
import goeiedag
from goeiedag import ALL_INPUTS, INPUT, OUTPUT
workdir = Path("output")
graph = goeiedag.Graph()
# Extract OS name from /etc/os-release
graph.add(["grep", "^NAME=", INPUT, ">", OUTPUT],
inputs=["/etc/os-release"],
outputs=["os-name.txt"])
# Get username
graph.add(["whoami", ">", OUTPUT],
inputs=[],
outputs=["username.txt"])
# Glue together to produce output
graph.add(["cat", ALL_INPUTS, ">", OUTPUT.result],
inputs=["os-name.txt", "username.txt"],
outputs=dict(result="result.txt")) # can also use a dictionary and refer to inputs/outputs by name
goeiedag.build_all(graph, workdir)
# Print output
print((workdir / "result.txt").read_text())
- It is a tested and proven paradigm (
make
traces back to 1976!) - It provides an obvious way of evaluating which products need rebuilding (subject to an accurate dependency graph)
- It naturally isolates and parallelizes individual build tasks
- It is agnostic as to how data objects are serialized (convenient for the library author...)
- Graph edges are implicitly defined by input/output file names
- A high-quality executor (Ninja) is available and installable via a Python package
- Simpler mental model & usage: no need to separately define build rules or think about implicit/explicit inputs and outputs
- API accepts Paths; no need to cast everything to
str
! - Higher-level API in general (for example, the output directory is created automatically)
The code makes use of PEP-604.
- Ninja (Python package) -- provides a lower-level API, used by goeieDAG as back-end
- TaskGraph -- similar project, but centered around Python functions and in-process parallelism
- Snakemake -- similar goals, but a stand-alone tool rather than a library
- Dask -- different execution model; caching of intermediate results is left up to the user
- doit