Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

parity with cbor-x #69

Open
jimmywarting opened this issue Nov 29, 2022 · 3 comments
Open

parity with cbor-x #69

jimmywarting opened this issue Nov 29, 2022 · 3 comments

Comments

@jimmywarting
Copy link

jimmywarting commented Nov 29, 2022

I'm trying out both cbor-x and cborg...

Here is the deal:

  • cbor-x supports both typed arrays and arrayBuffer
  • cborg don't support typed arrays and produces regular byte array instead (same as arrayBuffer)
  • when i use ArrayBuffer with cbor-x then it will produce similar response as when i'm using cborg (example at bottom)

So here is an example where i use cbor-x and encoding Uint8Array's (it will produce a uint8 arrays)

const { encode } = await import('https://cdn.jsdelivr.net/npm/cbor-x/+esm')
const { encode: toHex } = await import('https://jspm.dev/hex-string')

toHex(encode({ data: new Uint8Array([97,98,99]) }))

/*
HEX: b900016464617461d84043616263

b9 0001        # map(1)
   64          #   text(4)
      64617461 #     "data"
   d8 40       #   typed array of u8, tag(64)
      43       #     bytes(3)
         61    #       unsigned(97)
         62    #       unsigned(98)
         63    #       unsigned(99)
*/

Where as when i using cborg it would produce another result:

const { encode } = await import('https://cdn.jsdelivr.net/npm/cborg/+esm')
const { encode: toHex } = await import('https://jspm.dev/hex-string')

toHex(encode({ data: new Uint8Array([97,98,99]) }))
/*
HEX: a1646461746143616263

a1             # map(1)
   64          #   text(4)
      64617461 #     "data"
   43          #   bytes(3)
      616263   #     "abc"
*/
To achieve the same result using `cbor-x` then i can use regular ArrayBuffers
const { encode } = await import('https://cdn.jsdelivr.net/npm/cbor-x/+esm')
const { encode: toHex } = await import('https://jspm.dev/hex-string')

toHex(encode({ data: new Uint8Array([97,98,99]).buffer }))

/*
HEX: b90001646461746143616263

b9 0001        # map(1)
   64          #   text(4)
      64617461 #     "data"
   43          #   bytes(3)
      616263   #     "abc
*/

Now the actual issue/feature request:

I'm not exactly asking you to support TypedArrays or any other tags.
I'm asking you to rather stop using Uint8Arrays in your codebase and your code example and switch to using ArrayBuffer instead of Uint8Arrays.

Why?

  • To make it easier to switch between other cbor encoder/decoders
  • to make parity with other encoders to work similar
  • And lastly allow plugin to write tag supports for Uint8Array and other typed arrays

(credit to @Nemo157 for its online diagnostic tool
(also want to ping @kriszyp to ask if he has any saying to this)
(ps: i don't like how every package name their own function encode/decode it's conflicting with other hex/base64/TextDecoder etc - just name name according to what kind of thing they do: like `encodeToHex(input)`)
@rvagg
Copy link
Owner

rvagg commented Dec 1, 2022

Yeah, OK I'm not against this, it'd be a breaking change but I think this is a reasonable ask - but to be clear, supporting tags out of the box is out of scope for cborg, it's not intended to be fully-featured out of the box. I think you understand this from your current work though.
I'm not a super-fan of the differentiation between major 2 and tag 64; to me they're the same thing. But since you're coming to this from the perspective of structured cloning then yeah .. we're losing information in this process.

@dlongley
Copy link

dlongley commented Mar 9, 2024

We're using this library and prefer that Uint8Arrays continue to, by default, use major 2 and not tag 64. Most isomorphic JS is written using Uint8Arrays to express and manipulate arrays of bytes and it would be awkward to have to do something special here. I'd think that not requiring tags at all is the more common use case (i.e., "I just want to work with bytes and CBOR in JS, just like I do in other languages").

Additionally, ArrayBuffers in JS are not directly manipulated, I'd expect most of the time they are not resizable, and the size of an underlying buffer is not necessarily the same size of the view (e.g., Uint8Array). It's not hard to imagine some unpleasant bugs being introduced if developers must convert the Uint8Arrays in their applications to properly sized ArrayBuffers to use this library w/major 2.

Given implementation experience, it seems that it was perhaps a mistake for the CBOR spec to make a distinction between Uint8Array and byte strings, and if that is or starts to become the community consensus, implementations shouldn't further entrench the (potential) mistake.

Perhaps what is needed here are some simple examples showing how to cause Uint8Arrays to be encoded using tag 64 for those that really need it. If something more than that is needed, perhaps a non-breaking feature (similar to the useMaps option this library provides) could be created to switch from always using Uint8Arrays (current default) to using ArrayBuffers for major 2 and Uint8Arrays for tag 64.

EDIT: I saw just now that this issue is going on 2 years old -- perhaps it should actually just be closed at this point.

@rvagg
Copy link
Owner

rvagg commented Mar 11, 2024

This is all pretty interesting and with the recent exposure of Tokenizer I think this can be fully implemented externally if someone cares enough about this, so I've put up an example and some short documentation of how this can be done in #112.

I'd also be open to entertaining additional code in here to support some or all of this. We have a taglib.js where the tag encoders and decoders could exist. Maybe an export that enables this wholesale for people that want it, while retaining the default behaviour as entirely untagged.

To be clear, the default mode of cborg is always going to be without tags--encode and decode. You have to opt in to that, and currently the only way to deal with byte arrays is to squash them all down to a single type. I've chosen Uint8Array as the base form rather than ArrayBuffer and I think this is a sensible default. But for the structured clone case, I think you could implement it using options, and we could potentially even offer an export like the json one to do this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants