TextDecoder: ERR_ENCODING_INVALID_ENCODED_DATA on very long array buffer #47645

martian17 · 2023-04-20T16:10:17Z

Version

v18.14.1

Platform

Linux 5.19.0-38-generic #39~22.04.1-Ubuntu SMP PREEMPT_DYNAMIC Fri Mar 17 21:16:15 UTC 2 x86_64 x86_64 x86_64 GNU/Linux

Subsystem

No response

What steps will reproduce the bug?

When I try to decode a long utf-16le encoded buffer, ERR_ENCODING_INVALID_ENCODED_DATA is thrown instead of ERR_STRING_TOO_LONG.

new TextDecoder("utf-16le").decode(new Uint16Array(2**27).fill(48))
// Uncaught TypeError: The encoded data was not valid for encoding utf-16le
//     at TextDecoder.decode (node:internal/encoding:448:14) {
//   code: 'ERR_ENCODING_INVALID_ENCODED_DATA'
// }

The default encoding version seems to work correctly, and throws an appropriate error

new TextDecoder().decode(new Uint8Array(2**29).fill(48))
// Uncaught Error: Cannot create a string longer than 0x1fffffe8 characters
//     at TextDecoder.decode (node:internal/encoding:433:16) {
//   code: 'ERR_STRING_TOO_LONG'
// }

Another thing that I realized is that TextDecoder() seems to be capable of consuming an array buffer twice as long as TextDecoder("utf-16le") without throwing error, and produce a string that's 4 times as long.

How often does it reproduce? Is there a required condition?

Confirmed this bug in both normal file execution and node.js repl

What is the expected behavior? Why is that the expected behavior?

new TextDecoder("utf-16le") should be able to create a string up to 0x1fffffe8 characters.
It should throw ERR_STRING_TOO_LONG when this length is exceeded.

What do you see instead?

ERR_ENCODING_INVALID_ENCODED_DATA is thrown when the input Uint16Array length is 2**27

Uncaught TypeError: The encoded data was not valid for encoding utf-16le
    at TextDecoder.decode (node:internal/encoding:448:14) {
  code: 'ERR_ENCODING_INVALID_ENCODED_DATA'
}

Additional information

No response

The text was updated successfully, but these errors were encountered:

martian17 · 2023-04-20T16:32:14Z

On Google Chrome, TextDecoder with encoding "utf-16le" seems to be able to parse Uint16Array with the size up to around 2**29-100. Node should be capable of this as well.

VoltrexKeyva added the util Issues and PRs related to the built-in util module. label Apr 20, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TextDecoder: ERR_ENCODING_INVALID_ENCODED_DATA on very long array buffer #47645

TextDecoder: ERR_ENCODING_INVALID_ENCODED_DATA on very long array buffer #47645

martian17 commented Apr 20, 2023 •

edited

Loading

martian17 commented Apr 20, 2023 •

edited

Loading

TextDecoder: ERR_ENCODING_INVALID_ENCODED_DATA on very long array buffer #47645

TextDecoder: ERR_ENCODING_INVALID_ENCODED_DATA on very long array buffer #47645

Comments

martian17 commented Apr 20, 2023 • edited Loading

Version

Platform

Subsystem

What steps will reproduce the bug?

How often does it reproduce? Is there a required condition?

What is the expected behavior? Why is that the expected behavior?

What do you see instead?

Additional information

martian17 commented Apr 20, 2023 • edited Loading

martian17 commented Apr 20, 2023 •

edited

Loading

martian17 commented Apr 20, 2023 •

edited

Loading