Authors: Mix Irving [email protected], Andre Staltz [email protected]
Date: 2023-03-23
License: CC0-1.0
In any given set of SSB messages, it is impossible to determine their partial or total order based on timestamps alone. However, including the hashes of previous messages in a message allows us to determine the order of messages. In this SIP, we specify a consistent way of declaring hashes of previous messages, such that they form a directed acyclic graph (DAG). This DAG is useful for many use cases, including replication, and multi-writer "records".
As a permissionless, decentralized, and eventually consistent database, SSB cannot rely on timestamps to determine the order of messages. Instead, when a message refers to the hash of another message, we can infer that the message that was referred to must have been created first, because we cannot create that hash without knowing the original data. These referred hashes effectively form a proof that a message was created after another message.
A "tangle" in SSB is a pattern of declaring previous message hashes, which will define a directed acyclic graph (DAG) of messages. A tangle is a useful way to determine a partial ordering which in turn is useful for everything from replication to building multi-writer "records".
A topological sort of a tangle renders a linear ordering of messages, which is useful for displaying messages in user interfaces.
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119.
There are different types of tangle, but they all MUST specify:
- Candidates messages: the set of messages which could be part of the tangle
- Recipe: how the tangle (the DAG) is constructed from these candidates
It is RECOMMENDED for every recipe to start with some "root message" and extend out from that point, checking the validity of messages as they are added. Some candidate messages may not have connections to the graph, or may be "invalid" extensions, in which case they are excluded from the tangle.
A given message MAY belong to zero, one, or many tangles. Each tangle is a separate DAG. A tangle is identified by a human-readable string.
We define:
- the root message as the earliest message in a tangle. Causally it is the "oldest", and is often used to identify all other candidate messages.
- the tip(s) of the tangle are the message(s) at the leading or "newest" end of the tangle i.e. it can transitively connected to the root, and no other messages yet link back to it.
There MUST only be one root message per tangle.
Each message m
in a tangle SHOULD have a tangles
field in the content
field, i.e. m.content.tangles
. If m.content.tangles
exists, its value MUST
be an object with one field for each tangle. The key of each field MUST be a
human-readable identifier string (lets refer to it here as x
) for that tangle,
and the value MUST be an object (called the "tangle data") with the fields:
root
: the ID of the root message of the tangle ornull
ifm
is the rootprevious
: an Array of message IDs of the known tip(s) of the DAG at the timem
was published, ornull
ifm
is the root
The most trivial tangle is the classic SSB feed. In this case, the tangle for
feed A
is defined as:
- Candidates messages:
- MUST have
author
field equal toA
- MUST have
- Recipe:
- If the candidate message has
sequence
field equal to1
, then its fieldprevious
MUST benull
- If the candidate message has
sequence
field equal toi
wherei
is a number greater than1
, then its fieldprevious
MUST be the message ID of the message that hassequence
field equal toi - 1
- If the candidate message has
flowchart RL
A(A):::root;B(B);C(C);D(D);
subgraph "feed"
direction RL
D-->C-->B-->A
end
classDef default fill: #e953da, stroke-width: 0, color: white
classDef root fill: #4c44cf, stroke-width: 0, color: white
classDef cluster stroke:#e953da,fill:none;
The tangle data these messages carry looks like:
A => { sequence: 1, previous: null, ... }
B => { sequence: 2, previous: A, ... }
C => { sequence: 3, previous: B, ... }
D => { sequence: 4, previous: C, ... }
(The fields author
, signature
, timestamp
, hash
, and content
have been
ommited here to make the backlinking pattern clearer)
This is a special case of a tangle, for a few reasons:
- This tangle does not have "tangle data" with
root
andprevious
- The DAG is linear, with no branches or merges
While the classic feed is a special case of a tangle without branches, in the case of multiple authors, we need to support branches and merges. This is because different authors may contribute concurrently, or while offline.
The following is not a description of a specific "tangle data" with candidates and recipes, but rather a template for tangle data when there are messages from multiple authors.
As a general pattern for multi-author tangle data: when a message is published,
its previous
field SHOULD have the IDs of all known tips of the tangle,
and it SHOULD NOT have any other message IDs in its previous
field.
As an example, suppose there are messages with tangle data that look like:
A => { root: null, previous: null }
B => { root: A, previous: [A] }
X => { root: A, previous: [B] }
Y => { root: A, previous: [B] }
M => { root: A, previous: [X, Y] }
Then the directed acyclic graph (DAG) looks like:
flowchart RL
A(A):::root;B(B);X(X);Y(Y);M(M);
M-->X-->B-->A
M-->Y--->B
classDef default fill: #e953da, stroke-width: 0, color: white
classDef root fill: #4c44cf, stroke-width: 0, color: white
Diagram where messages X, Y were both published concurrently (so were unaware of one another). Message M is aware of both X and Y, and extends the tangle from them. M is now the new "tip" of the tangle.
Note that message M
points to X
and Y
as its previous
messages. This
conforms to the general pattern for multi-author tangle data.
Further, the root message MUST NOT include its own ID (not known until
published), so it sets its root
value as null
, which means "I am the root".
The "tangle data" { root, previous }
for e.g. a "chess game tangle" SHOULD be
stored at msg.value.content.tangles.chess
, note the choice of a human-friendly
name chess
for the domain of a chess game.
Some tangles may have candidates that define a strict subset of messages as valid, e.g. only messages authored by peers the original author follows.
Privacy Considerations – if a tangle is encrypted (e.g. to a group) then all messages in it should be encrypted similarly, because message backlinks reveal the other authors who may have participated in the tangle, which may be a privacy risk. This means that you also want to be careful of revealing individual messages in a tangle (a feature box1, box2 support).
Implementation Considerations – it's important that when defining a tangle, the candidate messages and recipe are very clearly defined. The recipe may also need to include how a tangle should be presented to users (e.g. whether a tangle is linearised, and how).
- ssbdrv's document about tangles
- Tangle Origin Story
- tangle-js - a collection of NodeJS modules used for building tangles data, and reducing data held in messages. Effectively turns tangles into CRDTs
JavaScript:
ssb-sort
– used for tangle sorting used by many JS clients.ssb-tribes/lib/get-group-tangle.js
– uses@tangle
modules to determine tangle tips