Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TextDecoder: ERR_ENCODING_INVALID_ENCODED_DATA on very long array buffer #47645

Open
martian17 opened this issue Apr 20, 2023 · 1 comment
Open
Labels
util Issues and PRs related to the built-in util module.

Comments

@martian17
Copy link

martian17 commented Apr 20, 2023

Version

v18.14.1

Platform

Linux 5.19.0-38-generic #39~22.04.1-Ubuntu SMP PREEMPT_DYNAMIC Fri Mar 17 21:16:15 UTC 2 x86_64 x86_64 x86_64 GNU/Linux

Subsystem

No response

What steps will reproduce the bug?

When I try to decode a long utf-16le encoded buffer, ERR_ENCODING_INVALID_ENCODED_DATA is thrown instead of ERR_STRING_TOO_LONG.

new TextDecoder("utf-16le").decode(new Uint16Array(2**27).fill(48))
// Uncaught TypeError: The encoded data was not valid for encoding utf-16le
//     at TextDecoder.decode (node:internal/encoding:448:14) {
//   code: 'ERR_ENCODING_INVALID_ENCODED_DATA'
// }

The default encoding version seems to work correctly, and throws an appropriate error

new TextDecoder().decode(new Uint8Array(2**29).fill(48))
// Uncaught Error: Cannot create a string longer than 0x1fffffe8 characters
//     at TextDecoder.decode (node:internal/encoding:433:16) {
//   code: 'ERR_STRING_TOO_LONG'
// }

Another thing that I realized is that TextDecoder() seems to be capable of consuming an array buffer twice as long as TextDecoder("utf-16le") without throwing error, and produce a string that's 4 times as long.

How often does it reproduce? Is there a required condition?

Confirmed this bug in both normal file execution and node.js repl

What is the expected behavior? Why is that the expected behavior?

new TextDecoder("utf-16le") should be able to create a string up to 0x1fffffe8 characters.
It should throw ERR_STRING_TOO_LONG when this length is exceeded.

What do you see instead?

ERR_ENCODING_INVALID_ENCODED_DATA is thrown when the input Uint16Array length is 2**27

Uncaught TypeError: The encoded data was not valid for encoding utf-16le
    at TextDecoder.decode (node:internal/encoding:448:14) {
  code: 'ERR_ENCODING_INVALID_ENCODED_DATA'
}

Additional information

No response

@martian17
Copy link
Author

martian17 commented Apr 20, 2023

On Google Chrome, TextDecoder with encoding "utf-16le" seems to be able to parse Uint16Array with the size up to around 2**29-100. Node should be capable of this as well.

@VoltrexKeyva VoltrexKeyva added the util Issues and PRs related to the built-in util module. label Apr 20, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
util Issues and PRs related to the built-in util module.
Projects
None yet
Development

No branches or pull requests

2 participants