Succinct tree sequences are a highly efficient way of storing a set of related DNA sequences by encoding their ancestral history as a set of correlated trees along the genome. The tree sequence format is output by a number of software libraries and programs (such as msprime, SLiM, fwdpp, and tsinfer) that either simulate or infer the evolutionary history of genetic sequences. The evolutionary history of genetic sequences is often technically referred to as an Ancestral Recombination Graph (ARG); succinct tree sequences are fully compatible with this formulation, and tskit is a therefore a powerful platform for processing ARGs.
The tskit
library provides the underlying functionality used to load, examine, and
manipulate tree sequences, including efficient methods for calculating genetic
statistics. It often forms part of an installation of other software packages such as
those listed above. Please see the
documentation for further details, which
includes
installation instructions.
Also see the road map for
planned improvements and additions to the library.
To get started with tskit, tutorials and other content are at http://tskit.dev. For help and support from the community you can use discussions here on github, or raise an issue for a specific bug or feature request.
We warmly welcome contributions from the community. Raise an issue if you have an idea you'd like to work on, or submit a PR for comments and help.
The base tskit
library provides both a Python
and C API. A Rust API is provided in the
tskit-rust repository.
Most users of tskit
will use the python API as it provides a convenient, high-level API
to access, analyse and create tree sequences. Full documentation is
here.
The tskit
C API provides comprehensive, low-level methods for manipulating and
processing tree-sequences. Written to the C99 standard and fully thread-safe, it can be
used with either C or C++. Full documentation is
here.