Skip to content
This repository has been archived by the owner on Jun 29, 2022. It is now read-only.

Commit

Permalink
DagCBOR: tighten up spec, additional strictness requirements
Browse files Browse the repository at this point in the history
Ref: #227
  • Loading branch information
rvagg committed Mar 31, 2020
1 parent 4f82fc0 commit 51eb2f6
Showing 1 changed file with 69 additions and 17 deletions.
86 changes: 69 additions & 17 deletions block-layer/codecs/dag-cbor.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,40 +2,92 @@

**Status: Descriptive - Draft**

* [Format](#format)
* [Links](#links)
* [Map Keys](#map-keys)
* [Strictness](#strictness)
* [Floating Point Encoding (Unresolved)](#floating-point-encoding-unresolved)
* [Implementations](#implementations)
* [JavaScript](#javascript)
* [Go](#go)

DagCBOR supports the full [IPLD Data Model].

CBOR already natively supports all [IPLD Data Model Kinds].
DagCBOR uses the [Concise Binary Object Representation (CBOR)] data format, which natively supports all [IPLD Data Model Kinds].

## Format

The CBOR IPLD format is called DagCBOR to disambiguate it from regular CBOR.
Most CBOR objects are valid DagCBOR. The only hard restriction is that any field
with the tag 42 must be a valid CID.
The CBOR IPLD format is called DagCBOR to disambiguate it from regular CBOR. Most CBOR objects are valid DagCBOR. The primary differences are:
* tag `42` interpreted as CIDs
* maps may only be keyed by strings
* additional strictness requirements about valid data encoding forms

## Links

As with all IPLD formats, DagCBOR must be able to encode [Links]. In DagCBOR, links are the binary form of a [CID] encoded using the raw-binary identity [Multibase]. That is, the Multibase identity prefix (`0x00`) is prepended to the binary form of a CID and this new byte array is encoded into CBOR as a byte-string (major type 2), with the tag `42`.

The inclusion of the Multibase prefix exists for historical reasons and the identity prefix *must not* be omitted.

## Map Keys

In DagCBOR, map keys must be strings, as defined by the [IPLD Data Model].

## Strictness

DagCBOR requires that there exist a single way of encoding any given object, and that encoded forms contain no superfluous data that may be ignored or lost in a round-trip decode/encode.

Therefore the DagCBOR codec must:

1. Use no tags other than the CID tag (`42`). A valid DagCBOR encoder must not encode using any additional tags and a valid DagCBOR decoder must reject objects containing additional tags as invalid.
2. Use the canonical CBOR encoding defined by the the suggestions in [section 3.9 of the CBOR specification]. A valid DagCBOR decoder should reject objects not following these restrictions as invalid. Specifically:
* Integer encoding must be as short as possible.
* The expression of lengths in major types 2 through 5 must be as short as possible.
* The keys in every map must be sorted lowest value to highest. Sorting is performed on the bytes of the representation of the keys.
- If two keys have different lengths, the shorter one sorts earlier;
- If two keys have the same length, the one with the lower value in (byte-wise) lexical order sorts earlier.
* Indefinite-length items must be made into definite-length items.

### Floating Point Encoding (Unresolved)

Strict **floating point** encoding rules need to be resolved. Current CBOR encoding implementations used by IPLD libraries are _not_ unified in their approach.

[borc], for JavaScript (used via [dag-cbor]), uses a smallest-possible approach:

* Floating point values must be encoded as the smallest of 16-, 32-, or 64-bit floating point that accurately represents the value, even for integral values.

[refmt], for Go (used via [ipld-cbor] and [ipld-prime]), uses a consistent 64-bit approach:

## Link Format
* All floating point values must be encoded as 64-bit floating point, even for integral values.

As with all IPLD formats, DagCBOR must be able to encode [Links].
In DagCBOR, links are [CIDs] encoded using the raw-binary identity [Multibase]. That Multibase prefix (`0x00`) *must not* be omitted. They are stored as byte-string type (major type 2), with the tag 42.
One of these approaches will be chosen and the libraries for the other language will be adjusted or replaced to harmonize.

(the inclusion of the Multibase exists for historical reasons)
## Implementations

## Map Key Restriction
### JavaScript

In DagCBOR, map keys must be strings.
[dag-cbor], used by [ipld] and [@ipld/block] adheres to this specification, with the following caveats:

## Canonical DagCBOR
* Strictness is not yet enforced on decode, blocks encoded that don't follow the strictness rules are not rejected
* Floating point values are encoded as their smallest form (see above)

Canonical DagCBOR must:
### Go

1. Use no tags other than the CID tag (42). Other tags may be lost in
conversion.
2. Use the [canonical CBOR](https://tools.ietf.org/html/rfc7049#section-3.9)
encoding.
[ipld-cbor] and [ipld-prime] adhere to this specification, with the following caveats:

* Strictness is not yet enforced on decode, blocks encoded that don't follow the strictness rules are not rejected
* All floating point value are encoded as 64-bits

[IPLD Data Model]: ../../data-model-layer/data-model.md
[Concise Binary Object Representation (CBOR)]: https://tools.ietf.org/html/rfc7049
[IPLD Data Model Kinds]: ../../data-model-layer/data-model.md#kinds
[Links]: ../../data-model-layer/data-model.md#link-kind
[CIDs]: ../CID.md
[Multibase]: https://github.com/multiformats/multibase
[canonical CBOR]: https://tools.ietf.org/html/rfc7049#section-3.9
[section 3.9 of the CBOR specification]: https://tools.ietf.org/html/rfc7049#section-3.9
[borc]: https://github.com/dignifiedquire/borc
[dag-cbor]: https://github.com/ipld/js-ipld-dag-cbor/
[refmt]: https://github.com/polydawn/refmt/
[ipld-cbor]: https://github.com/ipfs/go-ipld-cbor
[ipld-prime]: http://github.com/ipld/go-ipld-prime
[ipld]: https://github.com/ipld/js-ipld
[@ipld/block]: https://github.com/ipld/js-block

0 comments on commit 51eb2f6

Please sign in to comment.