Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add EIP-7495 specifications for StableContainer[N] and Profile[B] #3789

Draft
wants to merge 6 commits into
base: dev
Choose a base branch
from
Draft
Changes from 5 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
100 changes: 99 additions & 1 deletion ssz/simple-serialize.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,16 +15,20 @@
- [Default values](#default-values)
- [`is_zero`](#is_zero)
- [Illegal types](#illegal-types)
- [`Profile[B]` constraints](#profileb-constraints)
- [Serialization](#serialization)
- [`uintN`](#uintn)
- [`boolean`](#boolean)
- [`Bitvector[N]`](#bitvectorn)
- [`Bitlist[N]`](#bitlistn)
- [Vectors, containers, lists](#vectors-containers-lists)
- [`StableContainer[N]`](#stablecontainern)
- [`Profile[B]`](#profileb)
- [Union](#union)
- [Deserialization](#deserialization)
- [Merkleization](#merkleization)
- [Summaries and expansions](#summaries-and-expansions)
- [`StableContainer` stability guarantees](#stablecontainer-stability-guarantees)
- [Implementations](#implementations)
- [JSON mapping](#json-mapping)

Expand Down Expand Up @@ -55,6 +59,21 @@
foo: uint64
bar: boolean
```
* **stablecontainer**: ordered heterogeneous collection of optional values, with `N` indicating the potential maximum number of fields to which it can ever grow in the future
* notation `StableContainer[N]`, with `Optional[T]` referring to Python's `typing.Optional`, e.g.
```python
class Shape(StableContainer[4]):
side: Optional[uint16]
color: Optional[uint8]
radius: Optional[uint16]
```
* **profile**: ordered heterogeneous collection of a subset of values of a base `StableContainer` type `B`
* notation `Profile[B]`, e.g.
```python
class Square(Profile[Shape]):
side: uint16
color: Optional[uint8]
```
* **vector**: ordered fixed-length homogeneous collection, with `N` values
* notation `Vector[type, N]`, e.g. `Vector[uint64, N]`
* **list**: ordered variable-length homogeneous collection, limited to `N` values
Expand All @@ -70,7 +89,7 @@

### Variable-size and fixed-size

We recursively define "variable-size" types to be lists, unions, `Bitlist` and all types that contain a variable-size type. All other types are said to be "fixed-size".
We recursively define "variable-size" types to be lists, unions, `Bitlist`, `StableContainer`, `Profile` that contain optional fields, and all types that contain a variable-size type. All other types are said to be "fixed-size".

### Byte

Expand All @@ -94,6 +113,8 @@ Assuming a helper function `default(type)` which returns the default value for `
| `uintN` | `0` |
| `boolean` | `False` |
| `Container` | `[default(type) for type in container]` |
| `StableContainer[N]` | `[None * N]` |
| `Profile[B]` | `[default(type) for type in profile]` |
| `Vector[type, N]` | `[default(type)] * N` |
| `Bitvector[N]` | `[False] * N` |
| `List[type, N]` | `[]` |
Expand All @@ -108,8 +129,26 @@ An SSZ object is called zeroed (and thus, `is_zero(object)` returns true) if it

- Empty vector types (`Vector[type, 0]`, `Bitvector[0]`) are illegal.
- Containers with no fields are illegal.
- StableContainers with `N <= 0` are illegal.
- StableContainers with non-optional fields are illegal.
- The `None` type option in a `Union` type is only legal as the first option (i.e. with index zero).

### `Profile[B]` constraints

- Fields in `Profile[B]` correspond to fields with the same field name in `B`.
- Fields in `Profile[B]` follow the same order as in `B`.
- Fields in the base `StableContainer` type `B` are all `Optional`.
- Fields MAY be disallowed in `Profile[B]` by omitting them.
- Fields MAY be kept optional in `Profile[B]` by retaining them as `Optional`.
- Fields MAY be required in `Profile[B]` by unwrapping them from `Optional`.
- All field types in `Profile[B]` MUST be compatible with the corresponding field types in `B`.
- Field types are compatible with themselves.
- `byte` is compatible with `uint8` and vice versa.
- `Bitlist[N]` / `Bitvector[N]` field types are compatible if they share the same capacity `N`.
- `List[T, N]` / `Vector[T, N]` field types are compatible if `T` is compatible and if they also share the same capacity `N`.
- `Container` / `StableContainer[N]` field types are compatible if all inner field types are compatible, if they also share the same field names in the same order, and for `StableContainer[N]` if they also share the same capacity `N`.
- `Profile[X]` field types are compatible with `StableContainer` types compatible with `X`, and are compatible with `Profile[Y]` where `Y` is compatible with `X` if also all inner field types are compatible. Differences solely in optionality do not affect merkleization compatibility.

## Serialization

We recursively define the `serialize` function which consumes an object `value` (of the type specified) and returns a bytestring of type `bytes`.
Expand Down Expand Up @@ -171,6 +210,45 @@ fixed_parts = [part if part != None else variable_offsets[i] for i, part in enum
return b"".join(fixed_parts + variable_parts)
```

### `StableContainer[N]`

Serialization of `StableContainer[N]` is defined similarly to `Container`. Notable changes are:

- A `Bitvector[N]` is constructed, indicating active fields within the `StableContainer[N]`. For fields with a present value (not `None`), a `True` bit is included. For fields with a `None` value, a `False` bit is included. The `Bitvector[N]` is padded with `False` bits up through length `N`
- Only active fields are serialized, i.e., fields with a corresponding `True` bit in the `Bitvector[N]`
- The serialization of the `Bitvector[N]` is prepended to the serialized active fields
- If variable-length fields are serialized, their offsets are relative to the start of serialized active fields, after the `Bitvector[N]`

```python
def is_active_field(element):
return not is_optional(element) or element is not None

# Determine active fields
active_fields = Bitvector[N](([is_active_field(element) for element in value] + [False] * N)[:N])
active_values = [element for element in value if is_active_field(element)]

# Recursively serialize
fixed_parts = [serialize(element) if not is_variable_size(element) else None for element in active_values]
variable_parts = [serialize(element) if is_variable_size(element) else b"" for element in active_values]

# Compute and check lengths
fixed_lengths = [len(part) if part != None else BYTES_PER_LENGTH_OFFSET for part in fixed_parts]
variable_lengths = [len(part) for part in variable_parts]
assert sum(fixed_lengths + variable_lengths) < 2**(BYTES_PER_LENGTH_OFFSET * BITS_PER_BYTE)

# Interleave offsets of variable-size parts with fixed-size parts
variable_offsets = [serialize(uint32(sum(fixed_lengths + variable_lengths[:i]))) for i in range(len(active_values))]
fixed_parts = [part if part != None else variable_offsets[i] for i, part in enumerate(fixed_parts)]

# Return the concatenation of the active fields `Bitvector` with the active
# fixed-size parts (offsets interleaved) and the active variable-size parts
return serialize(active_fields) + b"".join(fixed_parts + variable_parts)
```

### `Profile[B]`

Serialization of `Profile[B]` is similar to the one of its base `StableContainer[N]`, except that the leading `Bitvector` is replaced by a sparse representation that only includes information about fields that are optional in `Profile[B]`. Bits for required fields of `Profile[B]` as well as the zero-padding to capacity `N` are not included. If there are no optional fields in `Profile[B]`, the `Bitvector` is omitted.

### Union

A `value` as `Union[T...]` type has properties `value.value` with the contained value, and `value.selector` which indexes the selected `Union` type option `T`.
Expand Down Expand Up @@ -203,6 +281,8 @@ Deserialization can be implemented using a recursive algorithm. The deserializat
* Using the first offset, we can compute the length of the list (divide by `BYTES_PER_LENGTH_OFFSET`), as it gives us the total number of bytes in the offset data.
* The size of each object in the vector/list can be inferred from the difference of two offsets. To get the size of the last object, the total number of bytes has to be known (it is not generally possible to deserialize an SSZ object of unknown length)
* Containers follow the same principles as vectors, with the difference that there may be fixed-size objects in a container as well. This means the `fixed_parts` data will contain offsets as well as fixed-size objects.
* `StableContainer[N]`: The serialized data will start with a `Bitvector[N]`. That value MUST be validated: All extra bits in the `Bitvector[N]` that exceed the number of fields MUST be `False`. The rest of the data is deserialized same as a regular SSZ `Container`, consulting the `Bitvector[N]` to determine which fields are present in the data. Absent fields are skipped during deserialization and assigned `None` values.
* `Profile[B]`: If there are optional fields in `Profile[B]`, the serialized data will start with a `Bitvector[O]` with `O` set to the total number of optional fields. The rest of the data is deserialized same as a regular SSZ `Container`, consulting the `Bitvector[O]` to determine which fields are present in the data. Absent fields are skipped during deserialization and assigned `None` values.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here it implies "in the case there is no optional field, deserialize like a regular container". Is it worth pointing this out more explicitly?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's quite explicit already, as the Bitvector[O] is only there "if there are optional fields".

* In the case of bitlists, the length in bits cannot be uniquely inferred from the number of bytes in the object. Because of this, they have a bit at the end that is always set. This bit has to be used to infer the size of the bitlist in bits.
* In the case of unions, the first byte of the deserialization scope is deserialized as type selector, the remainder of the scope is deserialized as the selected type.

Expand All @@ -226,6 +306,7 @@ We first define helper functions:
* `List[B, N]` and `Vector[B, N]`, where `B` is a basic type: `(N * size_of(B) + 31) // 32` (dividing by chunk size, rounding up)
* `List[C, N]` and `Vector[C, N]`, where `C` is a composite type: `N`
* containers: `len(fields)`
* `StableContainer[N]`: always `N`, regardless of the actual number of fields in the type definition
* `pack(values)`: Given ordered objects of the same basic type:
1. Serialize `values` into bytes.
2. If not aligned to a multiple of `BYTES_PER_CHUNK` bytes, right-pad with zeroes to the next multiple.
Expand All @@ -242,6 +323,7 @@ We first define helper functions:
- If `1` chunk: the root is the chunk itself.
- If `> 1` chunks: merkleize as binary tree.
* `mix_in_length`: Given a Merkle root `root` and a length `length` (`"uint256"` little-endian serialization) return `hash(root + length)`.
* `mix_in_aux`: Given a Merkle root `root` and an auxiliary SSZ object root `aux` return `hash(root + aux)`.
* `mix_in_selector`: Given a Merkle root `root` and a type selector `selector` (`"uint256"` little-endian serialization) return `hash(root + selector)`.

We now define Merkleization `hash_tree_root(value)` of an object `value` recursively:
Expand All @@ -252,6 +334,8 @@ We now define Merkleization `hash_tree_root(value)` of an object `value` recursi
* `mix_in_length(merkleize(pack_bits(value), limit=chunk_count(type)), len(value))` if `value` is a bitlist.
* `merkleize([hash_tree_root(element) for element in value])` if `value` is a vector of composite objects or a container.
* `mix_in_length(merkleize([hash_tree_root(element) for element in value], limit=chunk_count(type)), len(value))` if `value` is a list of composite objects.
* `mix_in_aux(merkleize(([hash_tree_root(element) if is_active_field(element) else Bytes32() for element in value.data] + [Bytes32()] * N)[:N]), hash_tree_root(value.active_fields))` if `value` is a `StableContainer[N]`.
* Merkleization of `Profile[B]` follows the merkleization of base type `B`.
* `mix_in_selector(hash_tree_root(value.value), value.selector)` if `value` is of union type, and `value.value` is not `None`
* `mix_in_selector(Bytes32(), 0)` if `value` is of union type, and `value.value` is `None`

Expand All @@ -261,6 +345,18 @@ Let `A` be an object derived from another object `B` by replacing some of the (p

We similarly define "summary types" and "expansion types". For example, [`BeaconBlock`](../specs/phase0/beacon-chain.md#beaconblock) is an expansion type of [`BeaconBlockHeader`](../specs/phase0/beacon-chain.md#beaconblockheader). Notice that objects expand to at most one object of a given expansion type. For example, `BeaconBlockHeader` objects uniquely expand to `BeaconBlock` objects.

## `StableContainer` stability guarantees

The serialization and merkleization of a `StableContainer[N]` remains stable as long as:

- The maximum capacity `N` does not change
- The order of fields does not change
- New fields are always appended to the end
- All fields have immutable SSZ schemas, or recursively adopt `StableContainer[N]`
- `List`/`Bitlist` capacities do not change; shortening is possible via application logic

While the serialization of `Profile[B]` is more compact, note that it is not forward-compatible and that context information that determines the underlying data type has to be indicated out of bands. If forward-compatibility is required, `Profile[B]` SHALL be converted to its base type `B` and subsequently serialized according to `B`.

## Implementations

See https://github.com/ethereum/eth2.0-specs/issues/2138 for a list of current known implementations.
Expand All @@ -277,6 +373,8 @@ When decoding JSON data, all fields in the SSZ schema must be present with a val
| `byte` | hex-byte-string | `"0x00"` |
| `boolean` | bool | `false` |
| `Container` | object | `{ "field": ... }` |
| `StableContainer[N]` | object | `{ "field": ... }`; Fields with a `None` value SHALL be omitted when serializing to JSON |
| `Profile[B]` | object | `{ "field": ... }`; Fields with a `None` value SHALL be omitted when serializing to JSON |
| `Vector[type, N]` | array | `[element, ...]` |
| `Vector[byte, N]` | hex-byte-string | `"0x1122"` |
| `Bitvector[N]` | hex-byte-string | `"0x1122"` |
Expand Down