-
Notifications
You must be signed in to change notification settings - Fork 974
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SSZ encoding/decoding for bit lists, bit vectors, recursive types. #1630
Comments
The length of the given ssz object is known from the type definition. Refer to definitions in the phase 0 spec for examples of ssz objects. If you expect a bit vector, you also know its length so that you can determine which bits are {un,}set in the encoded data.
we still have lists, the linked issue was resolved by adding a notion of "max length" which helps w/ stable merkelization.
they do; in general you have a
no
These functions are part of the offset encoding where we first encode fixed parts, then the offsets of the variable parts in the remainder of the encoding and then the actual data of the variable parts themselves. this allows for efficient retrieval of various parts of the encoding w/o having to decode the entire blob of ssz data. |
Thanks a lot @ralexstokes for the detailed explanation. I have just checked in more details some of "the implementations". Isn't the purpose of the SSZ spec to make sure clients can talk to each other? Regarding the fixed_size part, do I understand correctly:
|
a given ssz object is encoded as a single binary blob, the important bit is just their relative ordering so that it can be efficient to find variable size (sub)elements of some object
the serialization is here: https://github.com/ethereum/eth2.0-specs/blob/dev/ssz/simple-serialize.md#vectors-containers-lists-unions re: cava -- i'd open an issue w/ that repo if something seems off. that discussion does not belong in this repo which should contain client-agnostic concerns. |
thanks @ralexstokes.
yes I had a look at that, but I must admit it is not crystal clear. It is also unclear to me what the deserialiser is supposed to do.
sure. But the simple-serialise.md references "the implementations" (including Cava) to find efficient algorithms that can serialise. And as not much information is available in the simple-serialise.md file, "the implemenations" are the next port of call to help. The specs may benefit from clearer definitions for the recursive types to make sure that every client implements the proper spec. |
@ralexstokes @MrChico It is still a bit dry. I am trying to write a formal specification (by formal I mean formal methods) of the specs and I'd like to capture the actual specification properly. |
@ralexstokes @protolambda First, is there a proper definition (e.g. an algebraic datatype) of what a fixed-size type is? Second, is there any example how the encoding works for say List[uint8]? How is a List[uint8] encoded? For instance List[8, 111]? Again I have tried to follow some links to get a better understanding but to avail. Any help would be music appreciated. |
@franck44, hey, just got a notification, sorry I missed this issue earlier. For some more reference on SSZ, check my repo (draft, but mostly complete): https://github.com/protolambda/eth2.0-ssz/ Regarding encoding of lists: SSZ is a combination of 2 things to derive lengths: type information and scope in bytes. With scope I mean total byte count that is deserialized, or produced by serializing. And you can recurse deeper; the difference between offsets (or last offset and end of parent scope) is a scope. Also, Now, to encode a byte list: Now a bigger list: For dynamic elements, such as a list of lists, offsets are used:
Hope that explains the type info + offsets. Generally we do not nest small dynamic items much, so the offsets are fine and speed up lookups by a lot. Other than offsets (and selects for Union type), SSZ is completely free of runtime information, and everything can be derived from the types. If you need more practical examples, there are some human readable test vectors here: https://github.com/protolambda/remerkleable/blob/master/remerkleable/test_impl.py For more low level code, you can also look at my ZSSZ library: github.com/protolambda/zssz |
Also, regarding Cava, I think it is outdated. We did not have offsets until some time in April 2019 or so. |
@protolambda Thanks a lot for your quick reply. Very much appreciated. Just to provide you with some context, we are trying to write a formal specification of SSZ/Merkleisation, some implementation of it and prove that it is correct (wrt spec). The main problem we are facing is navigating the information ... there is a lot out there and sometimes not consistent (for instance the specs refers to the implementations for "efficient algorithms" but the implementations do not seem to agree on how to serialise/deserialise). To write a formal spec, we really need precise description of what a function should do (e.g. deserialise) rather than how it does it. |
They definitely should, but on the surface of serialize/deserialize/hash-tree-root outputs. Internally they can organize however suites them best. There's no such thing as a "canonical representation" of data in memory. Only a local understanding of the type structure that it expresses. The type is the API and usage contract, not implementation. (the spec sets a bad example here, implementation is assumed to be done by clients their way, meeting the same outputs) A client can choose how to deserialize, e.g. directly into a binary tree structure (see remerkleable) or into native data-types of the given language (e.g. Prysm has methods to deserialize into Go structs that also conform to protobuf). And the implementations are out of sync for different use-cases and niches, e.g. fast read/writes of small objects (e.g. ZSSZ and lighthouse SSZ are very fast here), or re-use of earlier computed merkle work when the state only changes slightly (e.g. remerkleable). Most implementations (except the archived/outdated ones) specify what is necessary to stay in consensus with the eth2 beacon chain spec. The current spec should be deterministic and unambiguous, although understandably hard to read and understand the finer details from. If you are looking for more description, the diagrams on that eth2-docs repo, and the eth2.0-ssz draft repo I linked should help.
Regard usage of these primitives:
Unfortunately the specs are more focused on being non-ambiguous yet practical, than explanatory. If you want to read background on choices in the eth2-specs in general I recommend https://notes.ethereum.org/@djrtwo/Bkn3zpwxB and https://benjaminion.xyz/eth2-annotated-spec/ Also, if you have any link of your work-in-progress (if open source?) please share :) |
@franck44 any questions left? Can I close this issue? |
@protolambda |
I have tried to understand how bit vectors and bit lists (and other vectors and lists) are serialised but I am afraid I am missing a few steps.
Forgive me if these are silly questions and feel free to ignore and close if not relevant.
First, I had a look at "the implementations" py-ssz and cava and found a few differences compared to the simple-serialize-md guidelines:
for bit vectors, the py-ssz tests seem to consider bitvectors of size which are multiple of 8, which makes the deserialisation easier. The cava implementation does not seem to deal with bit vectors (and not any vectors).
How is the actual length of a bit vector encoded so that we can deserialise correctly?
For instance, bit vector(true, true) is encoded into 0x03 in py-ssz and we probably want to decode it as (true, true) but the length is missing in the encoding.
what is the status of lists? According to issue Reforming dynamic lists in SSZ #1160, there was a discussion about removing lists from the SSZ legal types.
lists should have homogeneous types. It does not seem to rule out lists of containers, and containers can have fields that are containers. simple-serialise.md refers to a recursive encoding but I could not find any example in py-ssz nor cava.
fourth, Cava provides encoding for String but it does not seem to a type defined in simple-serialise.md. Is String a legal (basic) SSZ type?
it is unclear to me why we need is_fixed_size (or not) and also get_size is not defined for variable sized types. When serialising, every list or vector has a given size. What is the purpose of is_fixed_size?
Where can we get more details about the efficient algorithms to encode these datatypes as sequences of bytes?
Thanks
Franck
The text was updated successfully, but these errors were encountered: