Skip to content

Latest commit

 

History

History
729 lines (534 loc) · 33.1 KB

README.md

File metadata and controls

729 lines (534 loc) · 33.1 KB

CBOR Codec Go logo

fxamacker/cbor is a library for encoding and decoding CBOR and CBOR Sequences.

CBOR is a trusted alternative to JSON, MessagePack, Protocol Buffers, etc.  CBOR is an Internet Standard defined by IETF STD 94 (RFC 8949) and is designed to be relevant for decades.

fxamacker/cbor is used in projects by Arm Ltd., Cisco, EdgeX Foundry, Flow Foundation, Fraunhofer‑AISEC, Kubernetes, Let's Encrypt (ISRG), Linux Foundation, Microsoft, Mozilla, Oasis Protocol, Tailscale, Teleport, etc.

See Quick Start and Releases. 🆕 UnmarshalFirst and DiagnoseFirst can decode CBOR Sequences. MarshalToBuffer and UserBufferEncMode accepts user-specified buffer.

fxamacker/cbor

CodeQL Go Report Card

fxamacker/cbor is a CBOR codec in full conformance with IETF STD 94 (RFC 8949). It also supports CBOR Sequences (RFC 8742) and Extended Diagnostic Notation (Appendix G of RFC 8610).

Features include full support for CBOR tags, Core Deterministic Encoding, duplicate map key detection, etc.

API is mostly same as encoding/json, plus interfaces that simplify concurrency and CBOR options.

Design balances trade-offs between security, speed, concurrency, encoded data size, usability, etc.

🔎  Highlights

🚀  Speed

Encoding and decoding is fast without using Go's unsafe package. Slower settings are opt-in. Default limits allow very fast and memory efficient rejection of malformed CBOR data.

🔒  Security

Decoder has configurable limits that defend against malicious inputs. Duplicate map key detection is supported. By contrast, encoding/gob is not designed to be hardened against adversarial inputs.

Codec passed multiple confidential security assessments in 2022. No vulnerabilities found in subset of codec in a nonconfidential security assessment prepared by NCC Group for Microsoft Corporation.

🗜️  Data Size

Struct tags (toarray, keyasint, omitempty) automatically reduce size of encoded structs. Encoding optionally shrinks float64→32→16 when values fit.

🧩  Usability

API is mostly same as encoding/json plus interfaces that simplify concurrency for CBOR options. Encoding and decoding modes can be created at startup and reused by any goroutines.

Presets include Core Deterministic Encoding, Preferred Serialization, CTAP2 Canonical CBOR, etc.

📆  Extensibility

Features include CBOR extension points (e.g. CBOR tags) and extensive settings. API has interfaces that allow users to create custom encoding and decoding without modifying this library.


Secure Decoding with Configurable Settings

fxamacker/cbor has configurable limits, etc. that defend against malicious CBOR data.

Some other codecs can crash or use excessive resources while decoding untrusted data.

Warning

Notably, encoding/gob is not designed to be hardened against adversarial inputs.

🔎  gob fatal error (out of memory) 💥 decoding 181 bytes

// Example of encoding/gob having "fatal error: runtime: out of memory"
// while decoding 181 bytes (all Go versions as of Oct. 5, 2024).
package main
import (
	"bytes"
	"encoding/gob"
	"encoding/hex"
	"fmt"
)

// Example data is from https://github.com/golang/go/issues/24446
// (shortened to 181 bytes).
const data = "4dffb503010102303001ff30000109010130010800010130010800010130" +
	"01ffb80001014a01ffb60001014b01ff860001013001ff860001013001ff" +
	"860001013001ff860001013001ffb80000001eff850401010e3030303030" +
	"30303030303030303001ff3000010c0104000016ffb70201010830303030" +
	"3030303001ff3000010c000030ffb6040405fcff00303030303030303030" +
	"303030303030303030303030303030303030303030303030303030303030" +
	"30"

type X struct {
	J *X
	K map[string]int
}

func main() {
	raw, _ := hex.DecodeString(data)
	decoder := gob.NewDecoder(bytes.NewReader(raw))

	var x X
	decoder.Decode(&x) // fatal error: runtime: out of memory
	fmt.Println("Decoding finished.")
}

fxamacker/cbor is fast at rejecting malformed CBOR data.

Note

Benchmarks rejecting 10 bytes of malicious CBOR data decoding to []byte:

Codec Speed (ns/op) Memory Allocs
fxamacker/cbor 2.7.0 47 ± 7% 32 B/op 2 allocs/op
ugorji/go 1.2.12 5878187 ± 3% 67111556 B/op 13 allocs/op

Faster hardware (overclocked DDR4 or DDR5) can reduce speed difference.

🔎  Benchmark details

Latest comparison for decoding CBOR data to Go []byte:

  • Input: []byte{0x9B, 0x00, 0x00, 0x42, 0xFA, 0x42, 0xFA, 0x42, 0xFA, 0x42}
  • go1.22.7, linux/amd64, i5-13600K (DDR4-2933, disabled e-cores)
  • go test -bench=. -benchmem -count=20

Prior comparisons

Codec Speed (ns/op) Memory Allocs
fxamacker/cbor 2.5.0-beta2 44.33 ± 2% 32 B/op 2 allocs/op
fxamacker/cbor 0.1.0 - 2.4.0 ~44.68 ± 6% 32 B/op 2 allocs/op
ugorji/go 1.2.10 5524792.50 ± 3% 67110491 B/op 12 allocs/op
ugorji/go 1.1.0 - 1.2.6 💥 runtime: out of memory: cannot allocate
  • Input: []byte{0x9B, 0x00, 0x00, 0x42, 0xFA, 0x42, 0xFA, 0x42, 0xFA, 0x42}
  • go1.19.6, linux/amd64, i5-13600K (DDR4)
  • go test -bench=. -benchmem -count=20

Smaller Encodings with Struct Tags

Struct tags automatically reduce encoded size of structs and improve speed.

We can write less code by using struct tags:

  • toarray: encode without field names (decode back to original struct)
  • keyasint: encode field names as integers (decode back to original struct)
  • omitempty: omit empty fields when encoding

alt text

Note

fxamacker/cbor can encode a 3-level nested Go struct to 1 byte!

  • encoding/json: 18 bytes of JSON
  • fxamacker/cbor: 1 byte of CBOR
🔎  Encoding 3-level nested Go struct with omitempty

https://go.dev/play/p/YxwvfPdFQG2

// Example encoding nested struct (with omitempty tag)
// - encoding/json:  18 byte JSON
// - fxamacker/cbor:  1 byte CBOR

package main

import (
	"encoding/hex"
	"encoding/json"
	"fmt"

	"github.com/fxamacker/cbor/v2"
)

type GrandChild struct {
	Quux int `json:",omitempty"`
}

type Child struct {
	Baz int        `json:",omitempty"`
	Qux GrandChild `json:",omitempty"`
}

type Parent struct {
	Foo Child `json:",omitempty"`
	Bar int   `json:",omitempty"`
}

func cb() {
	results, _ := cbor.Marshal(Parent{})
	fmt.Println("hex(CBOR): " + hex.EncodeToString(results))

	text, _ := cbor.Diagnose(results) // Diagnostic Notation
	fmt.Println("DN: " + text)
}

func js() {
	results, _ := json.Marshal(Parent{})
	fmt.Println("hex(JSON): " + hex.EncodeToString(results))

	text := string(results) // JSON
	fmt.Println("JSON: " + text)
}

func main() {
	cb()
	fmt.Println("-------------")
	js()
}

Output (DN is Diagnostic Notation):

hex(CBOR): a0
DN: {}
-------------
hex(JSON): 7b22466f6f223a7b22517578223a7b7d7d7d
JSON: {"Foo":{"Qux":{}}}

Quick Start

Install: go get github.com/fxamacker/cbor/v2 and import "github.com/fxamacker/cbor/v2".

Tip

Tinygo users can try beta/experimental branch feature/cbor-tinygo-beta.

🔎  More about tinygo feature branch

Tinygo

Branch feature/cbor-tinygo-beta is based on fxamacker/cbor v2.7.0 and it can be compiled using tinygo v0.33 (also compiles with golang/go).

It passes unit tests (with both go1.22 and tinygo v0.33) and is considered beta/experimental for tinygo.

⚠️ The feature/cbor-tinygo-beta branch does not get fuzz tested yet.

Changes in this feature branch only affect tinygo compiled software. Summary of changes:

  • default DecOptions.MaxNestedLevels is reduced to 16 (was 32). User can specify higher limit but 24+ crashes tests when compiled with tinygo v0.33.
  • disabled decoding CBOR tag data to Go interface because tinygo v0.33 is missing needed feature.
  • encoding error message can be different when encoding function type.

Related tinygo issues:

Key Points

This library can encode and decode CBOR (RFC 8949) and CBOR Sequences (RFC 8742).

  • CBOR data item is a single piece of CBOR data and its structure may contain 0 or more nested data items.
  • CBOR sequence is a concatenation of 0 or more encoded CBOR data items.

Configurable limits and options can be used to balance trade-offs.

  • Encoding and decoding modes are created from options (settings).
  • Modes can be created at startup and reused.
  • Modes are safe for concurrent use.

Default Mode

Package level functions only use this library's default settings.
They provide the "default mode" of encoding and decoding.

// API matches encoding/json for Marshal, Unmarshal, Encode, Decode, etc.
b, err = cbor.Marshal(v)        // encode v to []byte b
err = cbor.Unmarshal(b, &v)     // decode []byte b to v
decoder = cbor.NewDecoder(r)    // create decoder with io.Reader r
err = decoder.Decode(&v)        // decode a CBOR data item to v

// v2.7.0 added MarshalToBuffer() and UserBufferEncMode interface.
err = cbor.MarshalToBuffer(v, b) // encode v to b instead of using built-in buf pool.

// v2.5.0 added new functions that return remaining bytes.

// UnmarshalFirst decodes first CBOR data item and returns remaining bytes.
rest, err = cbor.UnmarshalFirst(b, &v)   // decode []byte b to v

// DiagnoseFirst translates first CBOR data item to text and returns remaining bytes.
text, rest, err = cbor.DiagnoseFirst(b)  // decode []byte b to Diagnostic Notation text

// NOTE: Unmarshal() returns ExtraneousDataError if there are remaining bytes, but
// UnmarshalFirst() and DiagnoseFirst() allow trailing bytes.

Important

CBOR settings allow trade-offs between speed, security, encoding size, etc.

  • Different CBOR libraries may use different default settings.
  • CBOR-based formats or protocols usually require specific settings.

For example, WebAuthn uses "CTAP2 Canonical CBOR" which is available as a preset.

Presets

Presets can be used as-is or as a starting point for custom settings.

// EncOptions is a struct of encoder settings.
func CoreDetEncOptions() EncOptions              // RFC 8949 Core Deterministic Encoding
func PreferredUnsortedEncOptions() EncOptions    // RFC 8949 Preferred Serialization
func CanonicalEncOptions() EncOptions            // RFC 7049 Canonical CBOR
func CTAP2EncOptions() EncOptions                // FIDO2 CTAP2 Canonical CBOR

Presets are used to create custom modes.

Custom Modes

Modes are created from settings. Once created, modes have immutable settings.

💡 Create the mode at startup and reuse it. It is safe for concurrent use.

// Create encoding mode.
opts := cbor.CoreDetEncOptions()   // use preset options as a starting point
opts.Time = cbor.TimeUnix          // change any settings if needed
em, err := opts.EncMode()          // create an immutable encoding mode

// Reuse the encoding mode. It is safe for concurrent use.

// API matches encoding/json.
b, err := em.Marshal(v)            // encode v to []byte b
encoder := em.NewEncoder(w)        // create encoder with io.Writer w
err := encoder.Encode(v)           // encode v to io.Writer w

Default mode and custom modes automatically apply struct tags.

User Specified Buffer for Encoding (v2.7.0)

UserBufferEncMode interface extends EncMode interface to add MarshalToBuffer(). It accepts a user-specified buffer instead of using built-in buffer pool.

em, err := myEncOptions.UserBufferEncMode() // create UserBufferEncMode mode

var buf bytes.Buffer
err = em.MarshalToBuffer(v, &buf) // encode v to provided buf

Struct Tags

Struct tags (toarray, keyasint, omitempty) reduce encoded size of structs.

🔎  Example encoding 3-level nested Go struct to 1 byte CBOR

https://go.dev/play/p/YxwvfPdFQG2

// Example encoding nested struct (with omitempty tag)
// - encoding/json:  18 byte JSON
// - fxamacker/cbor:  1 byte CBOR
package main

import (
	"encoding/hex"
	"encoding/json"
	"fmt"

	"github.com/fxamacker/cbor/v2"
)

type GrandChild struct {
	Quux int `json:",omitempty"`
}

type Child struct {
	Baz int        `json:",omitempty"`
	Qux GrandChild `json:",omitempty"`
}

type Parent struct {
	Foo Child `json:",omitempty"`
	Bar int   `json:",omitempty"`
}

func cb() {
	results, _ := cbor.Marshal(Parent{})
	fmt.Println("hex(CBOR): " + hex.EncodeToString(results))

	text, _ := cbor.Diagnose(results) // Diagnostic Notation
	fmt.Println("DN: " + text)
}

func js() {
	results, _ := json.Marshal(Parent{})
	fmt.Println("hex(JSON): " + hex.EncodeToString(results))

	text := string(results) // JSON
	fmt.Println("JSON: " + text)
}

func main() {
	cb()
	fmt.Println("-------------")
	js()
}

Output (DN is Diagnostic Notation):

hex(CBOR): a0
DN: {}
-------------
hex(JSON): 7b22466f6f223a7b22517578223a7b7d7d7d
JSON: {"Foo":{"Qux":{}}}

🔎  Example using several struct tags

alt text

Struct tags simplify use of CBOR-based protocols that require CBOR arrays or maps with integer keys.

CBOR Tags

CBOR tags are specified in a TagSet.

Custom modes can be created with a TagSet to handle CBOR tags.

em, err := opts.EncMode()                  // no CBOR tags
em, err := opts.EncModeWithTags(ts)        // immutable CBOR tags
em, err := opts.EncModeWithSharedTags(ts)  // mutable shared CBOR tags

TagSet and modes using it are safe for concurrent use. Equivalent API is available for DecMode.

🔎  Example using TagSet and TagOptions

// Use signedCWT struct defined in "Decoding CWT" example.

// Create TagSet (safe for concurrency).
tags := cbor.NewTagSet()
// Register tag COSE_Sign1 18 with signedCWT type.
tags.Add(	
	cbor.TagOptions{EncTag: cbor.EncTagRequired, DecTag: cbor.DecTagRequired}, 
	reflect.TypeOf(signedCWT{}), 
	18)

// Create DecMode with immutable tags.
dm, _ := cbor.DecOptions{}.DecModeWithTags(tags)

// Unmarshal to signedCWT with tag support.
var v signedCWT
if err := dm.Unmarshal(data, &v); err != nil {
	return err
}

// Create EncMode with immutable tags.
em, _ := cbor.EncOptions{}.EncModeWithTags(tags)

// Marshal signedCWT with tag number.
if data, err := cbor.Marshal(v); err != nil {
	return err
}

Functions and Interfaces

🔎  Functions and interfaces at a glance

Common functions with same API as encoding/json:

  • Marshal, Unmarshal
  • NewEncoder, (*Encoder).Encode
  • NewDecoder, (*Decoder).Decode

NOTE: Unmarshal will return ExtraneousDataError if there are remaining bytes because RFC 8949 treats CBOR data item with remaining bytes as malformed.

  • 💡 Use UnmarshalFirst to decode first CBOR data item and return any remaining bytes.

Other useful functions:

  • Diagnose, DiagnoseFirst produce human-readable Extended Diagnostic Notation from CBOR data.
  • UnmarshalFirst decodes first CBOR data item and return any remaining bytes.
  • Wellformed returns true if the the CBOR data item is well-formed.

Interfaces identical or comparable to Go encoding packages include:
Marshaler, Unmarshaler, BinaryMarshaler, and BinaryUnmarshaler.

The RawMessage type can be used to delay CBOR decoding or precompute CBOR encoding.

Security Tips

🔒 Use Go's io.LimitReader to limit size when decoding very large or indefinite size data.

Default limits may need to be increased for systems handling very large data (e.g. blockchains).

DecOptions can be used to modify default limits for MaxArrayElements, MaxMapPairs, and MaxNestedLevels.

Status

v2.7.0 (June 23, 2024) adds features and improvements that help large projects (e.g. Kubernetes) use CBOR as an alternative to JSON and Protocol Buffers. Other improvements include speedups, improved memory use, bug fixes, new serialization options, etc. It passed fuzz tests (5+ billion executions) and is production quality.

For more details, see release notes.

Prior Release

v2.6.0 (February 2024) adds important new features, optimizations, and bug fixes. It is especially useful to systems that need to convert data between CBOR and JSON. New options and optimizations improve handling of bignum, integers, maps, and strings.

v2.5.0 was released on Sunday, August 13, 2023 with new features and important bug fixes. It is fuzz tested and production quality after extended beta v2.5.0-beta (Dec 2022) -> v2.5.0 (Aug 2023).

IMPORTANT: 👉 Before upgrading from v2.4 or older release, please read the notable changes highlighted in the release notes. v2.5.0 is a large release with bug fixes to error handling for extraneous data in Unmarshal, etc. that should be reviewed before upgrading.

See v2.5.0 release notes for list of new features, improvements, and bug fixes.

See "Version and API Changes" section for more info about version numbering, etc.

Who uses fxamacker/cbor

fxamacker/cbor is used in projects by Arm Ltd., Berlin Institute of Health at Charité, Chainlink, Cisco, Confidential Computing Consortium, ConsenSys, EdgeX Foundry, F5, FIDO Alliance, Flow Foundation, Fraunhofer‑AISEC, Kubernetes, Let's Encrypt (ISRG), Linux Foundation, Matrix.org, Microsoft, Mozilla, National Cybersecurity Agency of France (govt), Netherlands (govt), Oasis Protocol, Smallstep, Tailscale, Taurus SA, Teleport, TIBCO, and others.

fxamacker/cbor passed multiple confidential security assessments. A nonconfidential security assessment (prepared by NCC Group for Microsoft Corporation) includes a subset of fxamacker/cbor v2.4.0 in its scope.

Standards

fxamacker/cbor is a CBOR codec in full conformance with IETF STD 94 (RFC 8949). It also supports CBOR Sequences (RFC 8742) and Extended Diagnostic Notation (Appendix G of RFC 8610).

Notable CBOR features include:

CBOR Feature Description
CBOR tags API supports built-in and user-defined tags.
Preferred serialization Integers encode to fewest bytes. Optional float64 → float32 → float16.
Map key sorting Unsorted, length-first (Canonical CBOR), and bytewise-lexicographic (CTAP2).
Duplicate map keys Always forbid for encoding and option to allow/forbid for decoding.
Indefinite length data Option to allow/forbid for encoding and decoding.
Well-formedness Always checked and enforced.
Basic validity checks Optionally check UTF-8 validity and duplicate map keys.
Security considerations Prevent integer overflow and resource exhaustion (RFC 8949 Section 10).

Known limitations are noted in the Limitations section.

Go nil values for slices, maps, pointers, etc. are encoded as CBOR null. Empty slices, maps, etc. are encoded as empty CBOR arrays and maps.

Decoder checks for all required well-formedness errors, including all "subkinds" of syntax errors and too little data.

After well-formedness is verified, basic validity errors are handled as follows:

  • Invalid UTF-8 string: Decoder has option to check and return invalid UTF-8 string error. This check is enabled by default.
  • Duplicate keys in a map: Decoder has options to ignore or enforce rejection of duplicate map keys.

When decoding well-formed CBOR arrays and maps, decoder saves the first error it encounters and continues with the next item. Options to handle this differently may be added in the future.

By default, decoder treats time values of floating-point NaN and Infinity as if they are CBOR Null or CBOR Undefined.

Click to expand topic:

🔎  Duplicate Map Keys

This library provides options for fast detection and rejection of duplicate map keys based on applying a Go-specific data model to CBOR's extended generic data model in order to determine duplicate vs distinct map keys. Detection relies on whether the CBOR map key would be a duplicate "key" when decoded and applied to the user-provided Go map or struct.

DupMapKeyQuiet turns off detection of duplicate map keys. It tries to use a "keep fastest" method by choosing either "keep first" or "keep last" depending on the Go data type.

DupMapKeyEnforcedAPF enforces detection and rejection of duplidate map keys. Decoding stops immediately and returns DupMapKeyError when the first duplicate key is detected. The error includes the duplicate map key and the index number.

APF suffix means "Allow Partial Fill" so the destination map or struct can contain some decoded values at the time of error. It is the caller's responsibility to respond to the DupMapKeyError by discarding the partially filled result if that's required by their protocol.

🔎  Tag Validity

This library checks tag validity for built-in tags (currently tag numbers 0, 1, 2, 3, and 55799):

  • Inadmissible type for tag content
  • Inadmissible value for tag content

Unknown tag data items (not tag number 0, 1, 2, 3, or 55799) are handled in two ways:

  • When decoding into an empty interface, unknown tag data item will be decoded into cbor.Tag data type, which contains tag number and tag content. The tag content will be decoded into the default Go data type for the CBOR data type.
  • When decoding into other Go types, unknown tag data item is decoded into the specified Go type. If Go type is registered with a tag number, the tag number can optionally be verified.

Decoder also has an option to forbid tag data items (treat any tag data item as error) which is specified by protocols such as CTAP2 Canonical CBOR.

For more information, see decoding options and tag options.

Limitations

If any of these limitations prevent you from using this library, please open an issue along with a link to your project.

  • CBOR Undefined (0xf7) value decodes to Go's nil value. CBOR Null (0xf6) more closely matches Go's nil.
  • CBOR map keys with data types not supported by Go for map keys are ignored and an error is returned after continuing to decode remaining items.
  • When decoding registered CBOR tag data to interface type, decoder creates a pointer to registered Go type matching CBOR tag number. Requiring a pointer for this is a Go limitation.

Fuzzing and Code Coverage

Code coverage is always 95% or higher (with go test -cover) when tagging a release.

Coverage-guided fuzzing must pass billions of execs using before tagging a release. Fuzzing is done using nonpublic code which may eventually get merged into this project. Until then, reports like OpenSSF Scorecard can't detect fuzz tests being used by this project.


Versions and API Changes

This project uses Semantic Versioning, so the API is always backwards compatible unless the major version number changes.

These functions have signatures identical to encoding/json and their API will continue to match encoding/json even after major new releases:
Marshal, Unmarshal, NewEncoder, NewDecoder, (*Encoder).Encode, and (*Decoder).Decode.

Exclusions from SemVer:

  • Newly added API documented as "subject to change".
  • Newly added API in the master branch that has never been tagged in non-beta release.
  • If function parameters are unchanged, bug fixes that change behavior (e.g. return error for edge case was missed in prior version). We try to highlight these in the release notes and add extended beta period. E.g. v2.5.0-beta (Dec 2022) -> v2.5.0 (Aug 2023).

This project avoids breaking changes to behavior of encoding and decoding functions unless required to improve conformance with supported RFCs (e.g. RFC 8949, RFC 8742, etc.) Visible changes that don't improve conformance to standards are typically made available as new opt-in settings or new functions.

Code of Conduct

This project has adopted the Contributor Covenant Code of Conduct. Contact [email protected] with any questions or comments.

Contributing

Please open an issue before beginning work on a PR. The improvement may have already been considered, etc.

For more info, see How to Contribute.

Security Policy

Security fixes are provided for the latest released version of fxamacker/cbor.

For the full text of the Security Policy, see SECURITY.md.

Acknowledgements

Many thanks to all the contributors on this project!

I'm especially grateful to Bastian Müller and Dieter Shirley for suggesting and collaborating on CBOR stream mode, and much more.

I'm very grateful to Stefan Tatschner, Yawning Angel, Jernej Kos, x448, ZenGround0, and Jakob Borg for their contributions or support in the very early days.

Big thanks to Ben Luddy for his contributions in v2.6.0 and v2.7.0.

This library clearly wouldn't be possible without Carsten Bormann authoring CBOR RFCs.

Special thanks to Laurence Lundblade and Jeffrey Yasskin for their help on IETF mailing list or at 7049bis.

Huge thanks to The Go Authors for creating a fun and practical programming language with batteries included!

This library uses x448/float16 which used to be included. As a standalone package, x448/float16 is useful to other projects as well.

License

Copyright © 2019-2024 Faye Amacker.

fxamacker/cbor is licensed under the MIT License. See LICENSE for the full license text.