Skip to content
This repository has been archived by the owner on Oct 10, 2019. It is now read-only.

Truncate BigInts when setting UintXArray or IntXArray instead of throwing #137

Closed
seishun opened this issue Mar 29, 2018 · 27 comments
Closed

Comments

@seishun
Copy link

seishun commented Mar 29, 2018

Currently, the following throws a TypeError:

const val = 123412341234n;
const buf = new Uint8Array(8);
buf[0] = val;

Simply casting to Number is not sufficient, since it can cause a loss of precision. The code for writing a 64-bit integer into a Node.js Buffer instance (which is also a Uint8Array instance) would look something like this:

  buf[offset++] = Number(value & 0xffn);
  value = value >> 8n;
  buf[offset++] = Number(value & 0xffn);
  value = value >> 8n;
  buf[offset++] = Number(value & 0xffn);
  value = value >> 8n;
  buf[offset++] = Number(value & 0xffn);
  value = value >> 8n;
  buf[offset++] = Number(value & 0xffn);
  value = value >> 8n;
  buf[offset++] = Number(value & 0xffn);
  value = value >> 8n;
  buf[offset++] = Number(value & 0xffn);
  value = value >> 8n;
  buf[offset++] = Number(value & 0xffn);

This is not only verbose, but also weird, because both Number and BigInt are supersets of Uint8.

This is clearly not a case of avoiding silent truncations, since all TypedArrays (including the new BigInt64Array and BigUint64Array) still allow them when setting values of the correct type. For example, the following silently truncates according to the spec:

const val = 123412341234123412341234123412341234123412341234123412341234n;
const buf = new BigUint64Array(1);
buf[0] = val;

My informal proposal: when setting indexes of Int8Array, Uint8Array, Int16Array, Uint16Array, Int32Array or Uint32Array to BigInt values, it should do the same as with Number values - that is, set the index to the value modulo 2^8, 2^16 or 2^32, respectively.

Note: this is not the same as #106. That patch performed a conversion to Number, which can cause a loss of precision.

@littledan
Copy link
Member

The decision to not do #106 wasn't centered on losing data after the decimal point. It was more about having a mental model which is based on keeping track of whether you are working with a Number or a BigInt. See the committee notes. What do you think of this argument?

@seishun
Copy link
Author

seishun commented Apr 14, 2018

losing data after the decimal point

That's not the issue. Consider the following code:

const val = 0x10000000000000000000000000001n;
const buf = new Uint8Array(8);
buf[0] = val;

With #106, buf[0] would be equal to 0 instead of the expected 1 due to the implicit conversion to Number.

It was more about having a mental model which is based on keeping track of whether you are working with a Number or a BigInt. See the committee notes. What do you think of this argument?

The only argument I found in the notes that seems to apply to my suggestion is the following:

With the ones we're introducing, if you write a value into a typed array and the value is either a Number or a BigInt, in the range of the typed array, and read it back, depending on which typed array you use, it may be the same type or it might be the other type.

I don't see what issues this can cause in practice. In fact, it's already the case with IntXArrays and UintXArrays, since they allow you to write a string, for example.

I do agree with the following argument:

For some you get the correct modulo mathematical value result, for others you get rounding then modulo, pandora's box

But it applies to #106 and is not an issue if modulo is applied to the numeric value directly, without converting to Number.

@littledan
Copy link
Member

@waldemarhorwat said,

I'd either like to see no coercions, or if we want genericity we should add BigInt arrays of all widths. It would let folks standardize on using just those and not have to wonder whether they're getting a Number or a BigInt each time

What do you say to this concern?

@seishun
Copy link
Author

seishun commented Apr 15, 2018

I'd either like to see no coercions

Me too. But my proposal doesn't involve any coercions (any more than Numbers are now "coerced" into uint8 when writing into Uint8Array).

we should add BigInt arrays of all widths. It would let folks standardize on using just those and not have to wonder whether they're getting a Number or a BigInt each time

Adding a set of BigIntXArrays and BigUintXArrays just for the type you get when reading from it doesn't seem worthwhile to me, considering you can convert it yourself by calling BigInt(). Even if they are added, I would still argue for allowing writing both Numbers and BigInts to IntXArrays and UintXArrays. For instance, Node.js's Buffer is backed by Uint8Array and can't switch to BigUint8Array.

@littledan
Copy link
Member

But my proposal doesn't involve any coercions (any more than Numbers are now "coerced" into uint8 when writing into Uint8Array).

We discussed this point at TC39, about there already being coercions and information loss and this not making it worse. However, the point here is about coercions between types. I don't think TC39 will permit coercions between types here, as it's a big mismatch with our general no-coercions design.

Adding a set of BigIntXArrays and BigUintXArrays just for the type you get when reading from it doesn't seem worthwhile to me, considering you can convert it yourself by calling BigInt()

Agreed. I don't want to add these. For this reason, if we frame things according to @waldemarhorwat's choice, I'd rather take the no coercions route. I believe we have discussed the idea of adding coercions but not array types and it was rejected already, without resting on the decision of whether to round or throw after the decimal point.

@seishun
Copy link
Author

seishun commented Apr 15, 2018

However, the point here is about coercions between types.

This proposal doesn't add any new coercions between types either. TypedArrays don't store values as Numbers. When you write a Number to a Uint8Array and then read it, the conversion goes Number -> byte -> Number. With this proposal, if you write a BigInt and then read it, it goes BigInt -> byte -> Number. Where's the new coercion here?

For this reason, if we frame things according to @waldemarhorwat's choice, I'd rather take the no coercions route.

I don't understand why not adding these leads to the no coercions route.

without resting on the decision of whether to round

@waldemarhorwat mentioned rounding as a problem, see the last quote in #137 (comment).

throw after the decimal point.

What does this mean? The decimal point is not relevant here, see my example in #137 (comment).

@littledan
Copy link
Member

I'd really prefer not to have to present a change to the committee unless there is a very strong motivation for the change, given the back and forth so far. It seems like you can get by without this change with an explicit cast. When working with BigInts and Numbers, it's necessary to understand which type you are working with, given how arithmetic operations do not permit mixed types.

Given that, and the name Big in 64-bit arrays, will it ever be unintuitive to know whether you need to insert an explicit cast? A concrete example would help. The long series of 8-bit Sets at the beginning of this post doesn't seem so relevant to the question; that could be handled by making a BigInt64Array out of the underlying ArratBuffer.

@seishun
Copy link
Author

seishun commented Apr 15, 2018

I'd really prefer not to have to present a change to the committee unless there is a very strong motivation for the change, given the back and forth so far. It seems like you can get by without this change with an explicit cast.

Consider the following scenario:

  1. A developer wants to set an index in a Uint8(16,32)Array to a BigInt value modulo 2^8(16,32).
  2. Simple assignment throws TypeError, so they add an explicit cast to Number.
  3. The resulting code seems to work correctly. They don't test values greater than 9007199254740991.
  4. At some point the code runs with values greater than 9007199254740991, resulting in incorrect behavior. Many hours are spent tracking down the bug.

will it ever be unintuitive to know whether you need to insert an explicit cast?

That's the problem, an explicit cast is an "obvious" but incorrect solution. You have to perform the modulo operation yourself before casting, which is less than intuitive.

that could be handled by making a BigInt64Array out of the underlying ArratBuffer.

That's not an option for Node.js, or anyone who needs to write in non-platform byte order.

@littledan
Copy link
Member

Consider the following scenario:

This scenario helps a lot. Thanks for explaining. Another way to look at this issue, rather than it being about TypedArrays, would be about whether we should throw exceptions when casting from BigInt to Number and losing information.

The cast from BigInt to Number will silently round if the BigInt is too big. Maybe this is the problematic decision. By contrast, the cast from Number to BigInt will throw an exception of the Number has something which cannot be accurately represented.

@jakobkummerow previously pointed this out. We decided on the rounding behavior for BigInt-to-Number in #14 based on @waldemarhorwat's guidance, and didn't really revisit it when we decided to throw some exceptions in #15 . Was this an oversight?

@seishun
Copy link
Author

seishun commented Apr 16, 2018

Another way to look at this issue, rather than it being about TypedArrays, would be about whether we should throw exceptions when casting from BigInt to Number and losing information.

I agree that casting from BigInt to Number is worth discussing, but it's a separate issue. If casting BigInt to Number throws when the BigInt is too big, then the step 4 in my scenario will be an error. That's better than incorrect behavior, but less than ideal.

If setting an index in a Uint8(16,32)Array to a BigInt value performs a modulo operation on the BigInt value, then the developers won't have to wonder how to correctly convert BigInt to Number without rounding or errors.

@littledan
Copy link
Member

Developers who do things that are mixed between BigInts and Numbers will have to use explicit cast operations. This is core to the design of BigInt. If you're using TypedArrays, you're doing an even more advanced thing than basic arithmetic, so I expect the need for explicit casts to be even less of a problem. The Number constructor applied to a string does rounding as well.

@sjrd
Copy link

sjrd commented Apr 16, 2018

whether we should throw exceptions when casting from BigInt to Number and losing information.

If we do that, then it will complicate the implementation of a 64-bit integer -> double conversion as found in many source languages. For example, in Java (double) longVal is supposed to lose precision, and not throw. If the equivalent Number(bigIntVal) throws, it will be harder to implement it.

It's OK if the default Number(bigIntVal) throws, but then I would ask that there be another function BigInt.toNumberUnsafe(bigIntVal) or something similar, that would silently drop precision, as the current implementation.

@littledan
Copy link
Member

OK, at this point, I'm strongly leaning to maintaining the current specification as is.

@seishun
Copy link
Author

seishun commented Apr 16, 2018

Developers who do things that are mixed between BigInts and Numbers will have to use explicit cast operations.

Writing a BigInt to a UintXArray or IntXArray is a "thing that is mixed between BigInts and Numbers" only because the current spec requires the value you write to be a Number. My proposal makes it a thing that is not mixed between BigInts and Numbers.

Also, the explicit cast is not the problem - having to do the modulo operation yourself to avoid rounding errors is the problem.

The Number constructor applied to a string does rounding as well.

I'm not proposing to change the behavior of the Number constructor, so I don't see how this is relevant.

It seems you misunderstand my proposal. Perhaps I should draft up a diff for the spec to make things clearer?

@littledan
Copy link
Member

Feel free to draft up a specification change. I think we've already discussed this in committee, though, and we're unlikely to make this change.

@seishun
Copy link
Author

seishun commented Apr 16, 2018

It's quite simple:

diff --git a/spec.html b/spec.html
index 69525af..8e2aebe 100644
--- a/spec.html
+++ b/spec.html
@@ -1665,7 +1665,7 @@ <h1>IntegerIndexedElementSet ( _O_, _index_, _value_ )</h1>
           1. Let _arrayTypeName_ be the String value of _O_.[[TypedArrayName]].
           1. Let _elementType_ be the String value of the Element Type value in <emu-xref href="#table-49"></emu-xref> for _arrayTypeName_.
           1. <ins>If _arrayTypeName_ is `"BigUint64Array"` or `"BigInt64Array"`, let _numValue_ be ? ToBigInt(_value_).</ins>
-          1. <ins>Otherwise,</ins> let _numValue_ be ? ToNumber(_value_).
+          1. <ins>Otherwise,</ins> let _numValue_ be ? ToNumeric(_value_).
           1. Let _buffer_ be _O_.[[ViewedArrayBuffer]].
           1. If IsDetachedBuffer(_buffer_) is *true*, throw a *TypeError* exception.
           1. If IsInteger(_index_) is *false*, return *false*.
@@ -1688,7 +1688,7 @@ <h1>SetValueInBuffer ( _arrayBuffer_, _byteIndex_, _type_, _value_, _isTypedArra
           1. Assert: IsDetachedBuffer(_arrayBuffer_) is *false*.
           1. Assert: There are sufficient bytes in _arrayBuffer_ starting at _byteIndex_ to represent a value of _type_.
           1. Assert: _byteIndex_ is an integer value &ge; 0.
-          1. Assert: Type(_value_) is <ins>BigInt if _type_ is `"BigInt64"` or `"BigUint64"`; otherwise, Type(_value_) is Number</ins>.
+          1. Assert: Type(_value_) is <ins>BigInt if _type_ is `"BigInt64"` or `"BigUint64"`; otherwise, Type(_value_) is Number or BigInt</ins>.
           1. Let _block_ be _arrayBuffer_.[[ArrayBufferData]].
           1. Let _elementSize_ be the Number value of the Element Size value specified in <emu-xref href="#table-49"></emu-xref> for Element Type _type_.
           1. If _isLittleEndian_ is not present, set _isLittleEndian_ to the value of the [[LittleEndian]] field of the surrounding agent's Agent Record.

Would you like me to create a PR with it?

@littledan
Copy link
Member

I don't think this change would be enough to implement the behavior you've been describing; you'd really want to reland most of #106, with some small tweaks. Anyway, I was just saying feel free to write it since you suggested it; ultimately making the design decision is the hard part, and we'll be able to work out the spec text.

I think the next step would be to find a TC39 delegate who is interested in championing your change. I don't plan to champion it, because I'm unconvinced it makes sense as a change, based on the reasoning I gave earlier in this thread.

@seishun
Copy link
Author

seishun commented Apr 16, 2018

I don't think this change would be enough to implement the behavior you've been describing

You're right - the conversion operations for TypedArray must not have first step that calls ToNumber. The argument is already Number or BigInt.

What else is missing in your opinion?

I don't plan to champion it, because I'm unconvinced it makes sense as a change, based on the reasoning I gave earlier in this thread.

As I understood, the reasoning you have given so far refers to avoiding implicit casts to Number. But as you can see, there are no implicit casts to Number here. In fact, my proposal allows one to avoid a cast to Number, explicit or implicit, in the scenario I described in #137 (comment). I really don't understand your opposition.

@littledan
Copy link
Member

Whether you are doing some kind of underlying cast or not depends on your perspective. One way if thinking about TypedArrays is in terms of the underlying ArrayBuffer; another way is in terms of containing Numbers (or now BigInts) in a certain range.

It's not only about casts, but about overloading. In general, operations are made to work for BigInt or Number, but not both, except in particular cases where it is very justified. The idea is to reinforce the concept that programmers keep track of types and keep things generally monomorphic.

@waldemarhorwat
Copy link

Consider the following scenario:

  • A developer wants to set an index in a Uint8(16,32)Array to a BigInt value modulo 2^8(16,32).
  • Simple assignment throws TypeError, so they add an explicit cast to Number.
  • The resulting code seems to work correctly. They don't test values greater than 9007199254740991.
  • At some point the code runs with values greater than 9007199254740991, resulting in incorrect behavior. Many hours are spent tracking down the bug.

I'd prefer to solve problems like this by just supporting BigUint8Array (and BigInt typed arrays of the other integer widths) directly so you never need to convert to Number on reading or writing.

@littledan
Copy link
Member

OK, let's consider adding BigInt8Array as a possible follow-on proposal, and stick with the minimal proposal for now with strict checks and without this type.

@seishun
Copy link
Author

seishun commented Apr 17, 2018

In general, operations are made to work for BigInt or Number, but not both, except in particular cases where it is very justified.

I provided a scenario where requiring Numbers on writing can lead to incorrect code, is that not sufficient justification?

The idea is to reinforce the concept that programmers keep track of types and keep things generally monomorphic.

What's the value of this idea in this particular case? Does it help prevent bugs? Does it make things simpler?

I'd prefer to solve problems like this by just supporting BigUint8Array (and BigInt typed arrays of the other integer widths) directly so you never need to convert to Number on reading or writing.

Why?

Adding BigUint8Array won't help people who are already using Uint8Array, as I mentioned in #137 (comment).

@littledan
Copy link
Member

See this section of the explainer for background.

@seishun
Copy link
Author

seishun commented Apr 17, 2018

I agree with that section, but I don't see how it applies here. This is not a "messy situation", and there is a good (and intuitive) answer to converting a BigInt to a [u]int{8|16|32}.

@BlueRaja
Copy link

Rather than all this BigInt8Array nonsense, how about a constructor parameter for Uint8Array that accepts a BigInt (and vice-versa)? Then you can easily convert between the two, without having to add unintuitive automatic-casting logic.

The only potential concern I see is that you'd have to specify the endianess of BigInts, at least when converting to/from Uint8Array

@littledan
Copy link
Member

@BlueRaja Interesting idea, but is it worth the complexity? I don't see significant problems with requiring an explicit cast here.

@littledan
Copy link
Member

OK, I haven't heard any strong evidence for why we need this truncation. It's possible to write incorrect code with BigInts, but the Number constructor is an explicit, information-losing cast that can be pointed to and located as the source of the problem. Given that the proposal is already at Stage 3, and BigInt8Array is a compatible follow-on proposal, I'm closing this issue.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants