parity with cbor-x #69

jimmywarting · 2022-11-29T14:23:48Z

I'm trying out both cbor-x and cborg...

Here is the deal:

cbor-x supports both typed arrays and arrayBuffer
cborg don't support typed arrays and produces regular byte array instead (same as arrayBuffer)
when i use ArrayBuffer with cbor-x then it will produce similar response as when i'm using cborg (example at bottom)

So here is an example where i use cbor-x and encoding Uint8Array's (it will produce a uint8 arrays)

const { encode } = await import('https://cdn.jsdelivr.net/npm/cbor-x/+esm')
const { encode: toHex } = await import('https://jspm.dev/hex-string')

toHex(encode({ data: new Uint8Array([97,98,99]) }))

/*
HEX: b900016464617461d84043616263

b9 0001        # map(1)
   64          #   text(4)
      64617461 #     "data"
   d8 40       #   typed array of u8, tag(64)
      43       #     bytes(3)
         61    #       unsigned(97)
         62    #       unsigned(98)
         63    #       unsigned(99)
*/

Where as when i using cborg it would produce another result:

const { encode } = await import('https://cdn.jsdelivr.net/npm/cborg/+esm')
const { encode: toHex } = await import('https://jspm.dev/hex-string')

toHex(encode({ data: new Uint8Array([97,98,99]) }))
/*
HEX: a1646461746143616263

a1             # map(1)
   64          #   text(4)
      64617461 #     "data"
   43          #   bytes(3)
      616263   #     "abc"
*/

To achieve the same result using `cbor-x` then i can use regular ArrayBuffers

const { encode } = await import('https://cdn.jsdelivr.net/npm/cbor-x/+esm')
const { encode: toHex } = await import('https://jspm.dev/hex-string')

toHex(encode({ data: new Uint8Array([97,98,99]).buffer }))

/*
HEX: b90001646461746143616263

b9 0001        # map(1)
   64          #   text(4)
      64617461 #     "data"
   43          #   bytes(3)
      616263   #     "abc
*/

Now the actual issue/feature request:

I'm not exactly asking you to support TypedArrays or any other tags.
I'm asking you to rather stop using Uint8Arrays in your codebase and your code example and switch to using ArrayBuffer instead of Uint8Arrays.

Why?

To make it easier to switch between other cbor encoder/decoders
to make parity with other encoders to work similar
And lastly allow plugin to write tag supports for Uint8Array and other typed arrays

^{(credit to @Nemo157 for its online diagnostic tool

(also want to ping @kriszyp to ask if he has any saying to this)

(ps: i don't like how every package name their own function encode/decode it's conflicting with other hex/base64/TextDecoder etc - just name name according to what kind of thing they do: like `encodeToHex(input)`)}

The text was updated successfully, but these errors were encountered:

rvagg · 2022-12-01T23:36:42Z

Yeah, OK I'm not against this, it'd be a breaking change but I think this is a reasonable ask - but to be clear, supporting tags out of the box is out of scope for cborg, it's not intended to be fully-featured out of the box. I think you understand this from your current work though.
I'm not a super-fan of the differentiation between major 2 and tag 64; to me they're the same thing. But since you're coming to this from the perspective of structured cloning then yeah .. we're losing information in this process.

dlongley · 2024-03-09T19:34:48Z

We're using this library and prefer that Uint8Arrays continue to, by default, use major 2 and not tag 64. Most isomorphic JS is written using Uint8Arrays to express and manipulate arrays of bytes and it would be awkward to have to do something special here. I'd think that not requiring tags at all is the more common use case (i.e., "I just want to work with bytes and CBOR in JS, just like I do in other languages").

Additionally, ArrayBuffers in JS are not directly manipulated, I'd expect most of the time they are not resizable, and the size of an underlying buffer is not necessarily the same size of the view (e.g., Uint8Array). It's not hard to imagine some unpleasant bugs being introduced if developers must convert the Uint8Arrays in their applications to properly sized ArrayBuffers to use this library w/major 2.

Given implementation experience, it seems that it was perhaps a mistake for the CBOR spec to make a distinction between Uint8Array and byte strings, and if that is or starts to become the community consensus, implementations shouldn't further entrench the (potential) mistake.

Perhaps what is needed here are some simple examples showing how to cause Uint8Arrays to be encoded using tag 64 for those that really need it. If something more than that is needed, perhaps a non-breaking feature (similar to the useMaps option this library provides) could be created to switch from always using Uint8Arrays (current default) to using ArrayBuffers for major 2 and Uint8Arrays for tag 64.

EDIT: I saw just now that this issue is going on 2 years old -- perhaps it should actually just be closed at this point.

Ref: #69

rvagg · 2024-03-11T01:19:11Z

This is all pretty interesting and with the recent exposure of Tokenizer I think this can be fully implemented externally if someone cares enough about this, so I've put up an example and some short documentation of how this can be done in #112.

I'd also be open to entertaining additional code in here to support some or all of this. We have a taglib.js where the tag encoders and decoders could exist. Maybe an export that enables this wholesale for people that want it, while retaining the default behaviour as entirely untagged.

To be clear, the default mode of cborg is always going to be without tags--encode and decode. You have to opt in to that, and currently the only way to deal with byte arrays is to squash them all down to a single type. I've chosen Uint8Array as the base form rather than ArrayBuffer and I think this is a sensible default. But for the structured clone case, I think you could implement it using options, and we could potentially even offer an export like the json one to do this.

Ref: #69

rvagg added a commit that referenced this issue Mar 11, 2024

docs: add byte strings example using tags to retain typedarray types

38a467b

Ref: #69

rvagg added a commit that referenced this issue Mar 11, 2024

docs: add byte strings example using tags to retain typedarray types

0c4d5c8

Ref: #69

rvagg mentioned this issue Mar 11, 2024

docs: add byte strings example using tags to retain typedarray types #112

Merged

rvagg added a commit that referenced this issue Mar 11, 2024

docs: add byte strings example using tags to retain typedarray types

5bf989b

Ref: #69

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

parity with cbor-x #69

parity with cbor-x #69

jimmywarting commented Nov 29, 2022 •

edited

Loading

rvagg commented Dec 1, 2022

dlongley commented Mar 9, 2024 •

edited

Loading

rvagg commented Mar 11, 2024

parity with cbor-x #69

parity with cbor-x #69

Comments

jimmywarting commented Nov 29, 2022 • edited Loading

Here is the deal:

Now the actual issue/feature request:

Why?

rvagg commented Dec 1, 2022

dlongley commented Mar 9, 2024 • edited Loading

rvagg commented Mar 11, 2024

jimmywarting commented Nov 29, 2022 •

edited

Loading

dlongley commented Mar 9, 2024 •

edited

Loading