Size difference between node and browser #82

Marius-Romanus · 2023-02-06T21:36:41Z

Hi, there is a size difference calculating the same string type between the browser and node.

I understand that being only a string and not having objects or anything weird, it should be the same size, right?

Greetings!.

console.log("node sizeof()", sizeof('Lorem ipsum dolor sit amet, consectetur adipiscing elit. Sed ac vestibulum lacus, sit amet maximus libero. Aliquam erat volutpat. Quisque at orci tortor. Donec at mi nunc.'));
node sizeof() 184

console.log("browser sizeof()", sizeof('Lorem ipsum dolor sit amet, consectetur adipiscing elit. Sed ac vestibulum lacus, sit amet maximus libero. Aliquam erat volutpat. Quisque at orci tortor. Donec at mi nunc.'));
browser sizeof() 342

The text was updated successfully, but these errors were encountered:

miktam · 2023-02-07T08:51:42Z

OK, the difference is coming from here https://github.com/miktam/sizeof/blob/master/indexv2.js#L88

Node.js uses precise string calculation. Here is the PR #80

The browser uses quite a simplistic approach, assuming that every string char is 2 bytes.

To be precise in the browser environment, let me check if there is a difference between different VM implementations.

Marius-Romanus · 2023-02-07T09:43:19Z

Hello, I've been doing some research and it seems that the best options are.

For node: Buffer.byteLength(string);
For browser: (new TextEncoder().encode(string)).length;

I think this library has a good approach:
https://github.com/ehmicky/string-byte-length

What I don't know is the compatibility that you give since that library is in node version >=14.18.0

Regarding TextEncoder it seems to have good compatibility: https://caniuse.com/?search=TextEncoder

In the example that I have given, in both cases it gives a size of 171, which does not match what it gives now.

With a complex emoji gives: 🏳️‍🌈 gives 14
And with a simple emoji: 😀 gives 4

I also don't know if it differs with Cyrillic, Arabic, Chinese characters, etc.

Greetings

miktam · 2023-02-07T17:27:45Z

@Marius-Romanus thank you for the investigation!
browser-based implementation seems useful, I added it here #83

Regarding node.js version, compatibility might be the issue, as you rightfully noted.
the current implementation is providing similar results (184 in the current version vs 171)

Marius-Romanus · 2023-02-07T18:24:31Z

Hello, Buffer.byteLength exists in Node since the first versions, but I think it has been modified many times and I don't know the expected result in each of them or possible errors:
https://nodejs.org/docs/latest-v0.10.x/api/buffer.html#buffer_class_method_buffer_bytelength_string_encoding

Although I imagine that you have already seen it but I leave you the documentation (you can pass the type of encoding):
https://nodejs.org/dist/latest-v18.x/docs/api/buffer.html#static-method-bufferbytelengthstring-encoding

@ehmicky may have put the compatibility in for something else, or even for ECMAScript imports in Node. ;)

Greetings

ehmicky · 2023-02-07T18:57:01Z

Hi everyone,

I am not completely sure I am answering your question correctly, but the reason this module does not support Node 12 is because Node 12 is not officially supported anymore. Also, please note Node 14 official support will be dropped in 2 months.

The main advantage of using string-byte-length directly instead of inlining Buffer.byteLength(string) and (new TextEncoder().encode(string)).length is that this library switches between 3 different implementations depending on the platform and input size, in order to give the best performance (see benchmarks).

Also, I think you might want to distinguish UTF-8 and UTF-16 when discussing about sizes. A string only has a specific byte size for a given encoding. As pointed out in your README, the JavaScript specification considers strings to be conceptually "somewhat" UTF-16, i.e. each character is 2 bytes. I mentioned "somewhat" because surrogate characters (U+d800 to U+dfff) and astral characters (U+10000 and above) are handled a little differently, and it depends on the JavaScript method being used.

However, in memory, over the network, or in a file, those strings are likely to be encoded in UTF-8, where each character can be 1, 2, 3 or 4 bytes long. string-byte-length gives out the UTF-8 size, not the UTF-16 size, and so does Buffer.from() and new TextEncoder(). IMHO knowing the UTF-8 size is more useful than UTF-16 in most use cases.

If you're interested about this topic, I wrote the following article which details the differences.

…string. Tested on node v12 - works. Does not work on node v10. Continuation of #82

miktam · 2023-02-11T16:14:45Z

ok, latest PR works in node v12, but does not work in v10.

Let´s see if this is the best we can have.

miktam self-assigned this Feb 7, 2023

miktam added a commit that referenced this issue Feb 7, 2023

Calculate string size in a browser precisely. Fixes #82

1aded8c

miktam linked a pull request Feb 7, 2023 that will close this issue

Calculate string size in a browser precisely. #83

Open

miktam added a commit that referenced this issue Feb 11, 2023

Use new Buffer.from().byteLength for node. Special case for an empty …

f912efb

…string. Tested on node v12 - works. Does not work on node v10. Continuation of #82

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Size difference between node and browser #82

Size difference between node and browser #82

Marius-Romanus commented Feb 6, 2023

miktam commented Feb 7, 2023

Marius-Romanus commented Feb 7, 2023

miktam commented Feb 7, 2023

Marius-Romanus commented Feb 7, 2023

ehmicky commented Feb 7, 2023 •

edited

Loading

miktam commented Feb 11, 2023

Size difference between node and browser #82

Size difference between node and browser #82

Comments

Marius-Romanus commented Feb 6, 2023

miktam commented Feb 7, 2023

Marius-Romanus commented Feb 7, 2023

miktam commented Feb 7, 2023

Marius-Romanus commented Feb 7, 2023

ehmicky commented Feb 7, 2023 • edited Loading

miktam commented Feb 11, 2023

ehmicky commented Feb 7, 2023 •

edited

Loading