buffer: use dynamic buffer pool #30661

BridgeAR · 2019-11-26T14:47:29Z

commit: buffer: use internal allocate function instead of public one

This makes sure that `Buffer.concat` does not use exposed functions
that might have been tampered with. Instead, it uses the direct
call to the internal `allocate` function and skipps the size check.

commit: buffer: only validate if necessary

This moves a validation check and removes another one that is done
twice. The validation is obsolete in case of `Buffer.concat`. For
`Buffer.copy` only one check is required (one side is already
guaranteed to be an instance of buffer).

commit: test: harden test/parallel/test-zlib-unused-weak.js

This makes sure the buffer pool size is taken into account while
checking for the GC overhead. It would otherwise indicate a faulty
result.

commit: buffer: use a dynamic pool size instead of a fixed one

This significantly improves the performance in case lots of smallish
buffers are used. In case they are frequently allocated, the pool
will increase to a size of up to 2 MB. It starts with no pool and is
therefore smaller by default than the current default. That way it's
better for devices that have hard memory constraints.

Fixes: #30611 (at least partially)
Refs: #27121

Benchmark:
https://ci.nodejs.org/view/Node.js%20benchmark/job/benchmark-node-micro-benchmarks/471/
https://ci.nodejs.org/view/Node.js%20benchmark/job/benchmark-node-micro-benchmarks/472

Benchmark results (at least one star for accuracy):

                                                                     confidence improvement accuracy (*)    (**)    (***)
buffers/buffer-creation.js n=600000 len=1024 type='fast-allocUnsafe'        ***    293.90 %      ±24.37% ±32.82%  ±43.53%
buffers/buffer-creation.js n=600000 len=1024 type='slow-allocUnsafe'          *      9.16 %       ±9.04% ±12.04%  ±15.68%
buffers/buffer-creation.js n=600000 len=4096 type='fast-allocUnsafe'        ***   1288.00 %      ±66.61% ±89.72% ±118.99%
buffers/buffer-creation.js n=600000 len=8192 type='fast-allocUnsafe'        ***    846.83 %      ±38.02% ±51.23%  ±67.99%

buffers/buffer-from.js n=800000 len=100 source='buffer'                     ***      7.86 %       ±3.00%  ±3.99%  ±5.19%
buffers/buffer-from.js n=800000 len=100 source='string-base64'               **     10.76 %       ±7.04%  ±9.41% ±12.32%
buffers/buffer-from.js n=800000 len=100 source='string-utf8'                  *      9.05 %       ±8.76% ±11.70% ±15.32%
buffers/buffer-from.js n=800000 len=2048 source='array'                     ***     20.93 %       ±7.66% ±10.27% ±13.52%
buffers/buffer-from.js n=800000 len=2048 source='buffer'                    ***     11.19 %       ±5.99%  ±7.98% ±10.43%
buffers/buffer-from.js n=800000 len=2048 source='string'                    ***     22.33 %       ±7.46%  ±9.92% ±12.91%
buffers/buffer-from.js n=800000 len=2048 source='string-base64'              **     14.42 %       ±9.02% ±12.05% ±15.76%
buffers/buffer-from.js n=800000 len=2048 source='string-utf8'               ***     25.95 %       ±8.25% ±10.97% ±14.28%

Checklist

make -j4 test (UNIX), or vcbuild test (Windows) passes
tests and/or benchmarks are included
documentation is changed or added
commit message follows commit guidelines

This makes sure that `Buffer.concat` does not use exposed functions that might have been tampered with. Instead, it uses the direct call to the internal `allocate` function and skipps the size check.

This moves a validation check and removes another one that is done twice. The validation is obsolete in case of `Buffer.concat`. For `Buffer.copy` only one check is required (one side is already guaranteed to be an instance of buffer).

This makes sure the buffer pool size is taken into account while checking for the GC overhead. It would otherwise indicate a faulty result.

This significantly improves the performance in case lots of smallish buffers are used. In case they are frequently allocated, the pool will increase to a size of up to 2 MB. It starts with no pool and is therefore smaller by default than the current default. That way it's better for devices that have hard memory constraints.

nodejs-github-bot · 2019-11-26T14:49:14Z

CI: https://ci.nodejs.org/job/node-test-pull-request/27039/

BridgeAR · 2019-11-26T14:57:27Z

I marked this as semver-major due to the different behavior (having a maximum as pool size instead of a fixed value). It should not negatively impact any application but that way we are on the safe side.

addaleax · 2019-11-26T14:59:05Z

The first two commits LGTM, although they can probably be PR’ed on their own, as they seem largely unrelated to the rest and aren’t semver-major either?

addaleax

It starts with no pool and is
therefore smaller by default than the current default. That way it's
better for devices that have hard memory constraints.

That sounds … optimistic? It still increases the pool size to up to 2 MB, and the worst-case memory retention scenario still remains an issue this way.

Also, it seems dangerous to introduce “magic” timing-dependent behaviour into Node.js…

Right now, I’d prefer an actual pooling solution as suggested in #30611.

addaleax · 2019-11-26T14:57:42Z

doc/api/buffer.md

-`Buffer` instances created using [`Buffer.allocUnsafe()`][] and the deprecated
-`new Buffer(size)` constructor only when `size` is less than or equal to
-`Buffer.poolSize >> 1` (floor of [`Buffer.poolSize`][] divided by two).
+The `Buffer` module uses an internal `Buffer` instance as pool for the fast


Suggested change

The `Buffer` module uses an internal `Buffer` instance as pool for the fast

The `Buffer` module uses an internal `Buffer` instance as a pool for the fast

BridgeAR · 2019-11-26T15:06:24Z

@addaleax the first three commits are patches and can be moved into a separate PR (it was just something I stumbled upon while working on the other part). The third commit is necessary for the fourth one but it could land independently before this PR. It just makes sure that what ever we use as static buffer is taken into account.

BridgeAR · 2019-11-26T15:29:51Z

@addaleax

That sounds … optimistic? It still increases the pool size to up to 2 MB, and the worst-case memory retention scenario still remains an issue this way.

It would still be possible for any application to change that to zero to prevent using any pool. I do not fear the memory usage in this case. 2 MB is small for most applications and it's still possible to lower that in case it does not fit for some applications.

Also, it seems dangerous to introduce “magic” timing-dependent behaviour into Node.js…

The pool size regulates itself quite well depending on how many buffers are required over time. I can not see where this would be dangerous and it would be great if you would provide some examples to clarify that part.
If that would make you more comfortable about it: I am absolutely fine to introduce this with a comment that we might revert to the old behavior in case people fall over this.

addaleax · 2019-11-26T15:52:53Z

@addaleax

That sounds … optimistic? It still increases the pool size to up to 2 MB, and the worst-case memory retention scenario still remains an issue this way.

It would still be possible for any application to change that to zero to prevent using any pool. I do not fear the memory usage in this case. 2 MB is small for most applications and it's still possible to lower that in case it does not fit for some applications.

I don’t have terribly strong opinions on this, but I’d personally think that for platform software like Node.js, it’s better to have defaults that work for almost all applications and can be tuned up to increase performance rather than defaults that work for most applications but sometimes need tuning down because otherwise it causes issues.

Also, it seems dangerous to introduce “magic” timing-dependent behaviour into Node.js…

The pool size regulates itself quite well depending on how many buffers are required over time. I can not see where this would be dangerous and it would be great if you would provide some examples to clarify that part.

I mean, it’s made kind of okay by the fact that the pool size is an implementation detail, but this is going to make performance and memory retention dependent on timing, which is dependent on things like hardware configuration, machine load, etc, and that is usually a Bad Idea.

It might be a good idea if you could describe why you picked time as a factor in this and how it affects the performance here to have that 1 second cutoff. In my experience, in situations like this time is almost always a proxy for some other value (e.g. the number of some kind of operations), and figuring out what that is would be helpful.

If that would make you more comfortable about it: I am absolutely fine to introduce this with a comment that we might revert to the old behavior in case people fall over this.

I mean, let’s see what other people think of this approach.

BridgeAR · 2019-11-26T16:23:27Z

It might be a good idea if you could describe why you picked time as a factor in this and how it affects the performance here to have that 1 second cutoff. In my experience, in situations like this time is almost always a proxy for some other value (e.g. the number of some kind of operations), and figuring out what that is would be helpful.

The main point for me was that I did not want to increase the pool in general. I wanted to make sure that it'll drop in size in case the application won't use buffers (even though that's pretty much impossible, since core uses them in multiple situations).
Applications that just infrequently receive requests should also not retain much memory that could be used by other applications in the meanwhile.

The main question is when is it good to increase the pool size and when is it good to decrease?

I used an approach where I anticipate a specific amount of data incoming or outgoing from the application over some protocols over time (including data transformations and small pauses).
Every protocol should have at least 2 MB throughput per second in case it's used a lot. The buffer will increase while data is incoming and improves the performance dealing with that data. In case the application becomes idle (e.g., less users during the night) or in case only small amounts of data are handled, the pool size decreases again.
One second might be generous and we could lower that to e.g., 500ms, since 4mb data transfer per second is still not a lot.

To conclude: I used a data / second approach to decide when to increase or decrease the size and I can not think of a better way. But I am definitely open for suggestions!

addaleax · 2019-11-26T16:44:54Z

The main question is when is it good to increase the pool size and when is it good to decrease?

I think my main question is not when, but why it is good to increase/decrease? As in, what benefit does dynamically adjusting the pool size provide over a fixed 2 MB pool size?

The worst-case memory retention issue remains the same between those two options. Any slice from the pool could still keep a 2 MB buffer alive, no matter how small it is, if it happens to live longer than it should.

On the other hand, what’s the benefit of reducing the pool size if the pool isn’t being used? A 2 MB allocation will, in most real-world cases, not result in using 2 MB of memory immediately; rather, the actual memory pages will only be allocated once they’re being accessed. So lowering the buffer size is not affecting actual memory usage immediately.

BridgeAR · 2019-11-26T17:39:30Z

The worst-case memory retention issue remains the same between those two options. Any slice from the pool could still keep a 2 MB buffer alive, no matter how small it is, if it happens to live longer than it should.

That does not seem to be a common case though? For these cases it's also possible to use Buffer.allocateUnsafeSlow. I'll add a comment to our docs that when using Buffer.allocateUnsafe data should mostly not be hold on too for a long time duration.

lowering the buffer size is not affecting actual memory usage immediately.

It's definitely not happening right away. It should prevent applications from growing the pool size to 2 MB in the first place in case they do not use buffers a lot.

the actual memory pages will only be allocated once they’re being accessed

For applications that use buffers rarely the average memory usage would probably be more linear. If we use a 2 MB static pool, it would grow to up to 2 MB and then drop to a low value instead of only growing to e.g., 512 KB with the dynamic approach. This should be helpful, especially if someone keeps a small chunk alive.

addaleax · 2019-11-26T21:36:17Z

The worst-case memory retention issue remains the same between those two options. Any slice from the pool could still keep a 2 MB buffer alive, no matter how small it is, if it happens to live longer than it should.

That does not seem to be a common case though? For these cases it's also possible to use Buffer.allocateUnsafeSlow. I'll add a comment to our docs that when using Buffer.allocateUnsafe data should mostly not be hold on too for a long time duration.

I know that Buffer.allocUnsafeSlow() is the right thing for these use cases, but I’m honestly not confident that we can rely on our users also being aware of that :)

lowering the buffer size is not affecting actual memory usage immediately.

It's definitely not happening right away. It should prevent applications from growing the pool size to 2 MB in the first place in case they do not use buffers a lot.

Right, but why? If we’re willing to accept the downsides of 2 MB pool allocations anyway, why not just go with that size always?

the actual memory pages will only be allocated once they’re being accessed

For applications that use buffers rarely the average memory usage would probably be more linear. If we use a 2 MB static pool, it would grow to up to 2 MB and then drop to a low value instead of only growing to e.g., 512 KB with the dynamic approach. This should be helpful, especially if someone keeps a small chunk alive.

Can you explain this in more detail? Why is the memory usage pattern more “linear” here?

ChALkeR · 2019-12-26T03:08:47Z

lib/buffer.js

@@ -132,16 +139,29 @@ function createUnsafeBuffer(size) {
  }
 }

-function createPool() {
-  poolSize = Buffer.poolSize;
+// This is faster than using the bigint version.


Is the difference measurable? If so, would just using process.hrtime()[0] and now - last > 1 (approx ~1.5) make sense? Probably not, asking just in case.

puzpuzpuz · 2020-10-08T17:57:10Z

The first two commits LGTM, although they can probably be PR’ed on their own, as they seem largely unrelated to the rest and aren’t semver-major either?

I came across this PR and also think that the first two commits are valuable and deserve a separate PR. @BridgeAR is too much to ask you to submit a separate PR with those commits?

github-actions · 2020-10-19T18:02:12Z

This issue/PR was marked as stalled, it will be automatically closed in 30 days. If it should remain open, please leave a comment explaining why it should remain open.

github-actions · 2020-11-19T00:03:03Z

Closing this because it has stalled. Feel free to reopen if this PR is still relevant, or to ping the collaborator who labelled it stalled if you have any questions.

BridgeAR added 3 commits November 26, 2019 11:59

buffer: use internal allocate function instead of public one

01329e9

This makes sure that `Buffer.concat` does not use exposed functions that might have been tampered with. Instead, it uses the direct call to the internal `allocate` function and skipps the size check.

buffer: only validate if necessary

8328a35

This moves a validation check and removes another one that is done twice. The validation is obsolete in case of `Buffer.concat`. For `Buffer.copy` only one check is required (one side is already guaranteed to be an instance of buffer).

test: harden test/parallel/test-zlib-unused-weak.js

be33e66

This makes sure the buffer pool size is taken into account while checking for the GC overhead. It would otherwise indicate a faulty result.

nodejs-github-bot added the buffer Issues and PRs related to the buffer subsystem. label Nov 26, 2019

BridgeAR force-pushed the use-dynamic-buffer-pool branch from 5556f03 to e783973 Compare November 26, 2019 14:48

BridgeAR requested a review from mcollina November 26, 2019 14:51

BridgeAR added performance Issues and PRs related to the performance of Node.js. semver-major PRs that contain breaking changes and should be released in the next major version. labels Nov 26, 2019

addaleax reviewed Nov 26, 2019

View reviewed changes

ChALkeR reviewed Dec 26, 2019

View reviewed changes

BridgeAR force-pushed the master branch 2 times, most recently from 8ae28ff to 2935f72 Compare May 31, 2020 12:19

jasnell added the stalled Issues and PRs that are stalled. label Jun 25, 2020

aduh95 added stalled Issues and PRs that are stalled. and removed stalled Issues and PRs that are stalled. labels Oct 19, 2020

github-actions bot closed this Nov 19, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

buffer: use dynamic buffer pool #30661

buffer: use dynamic buffer pool #30661

BridgeAR commented Nov 26, 2019 •

edited

Loading

nodejs-github-bot commented Nov 26, 2019

BridgeAR commented Nov 26, 2019

addaleax commented Nov 26, 2019

addaleax left a comment

addaleax Nov 26, 2019

BridgeAR commented Nov 26, 2019

BridgeAR commented Nov 26, 2019

addaleax commented Nov 26, 2019 •

edited

Loading

BridgeAR commented Nov 26, 2019

addaleax commented Nov 26, 2019

BridgeAR commented Nov 26, 2019

addaleax commented Nov 26, 2019

ChALkeR Dec 26, 2019 •

edited

Loading

puzpuzpuz commented Oct 8, 2020

github-actions bot commented Oct 19, 2020

github-actions bot commented Nov 19, 2020

	The `Buffer` module uses an internal `Buffer` instance as pool for the fast
	The `Buffer` module uses an internal `Buffer` instance as a pool for the fast

buffer: use dynamic buffer pool #30661

buffer: use dynamic buffer pool #30661

Conversation

BridgeAR commented Nov 26, 2019 • edited Loading

Checklist

nodejs-github-bot commented Nov 26, 2019

BridgeAR commented Nov 26, 2019

addaleax commented Nov 26, 2019

addaleax left a comment

Choose a reason for hiding this comment

addaleax Nov 26, 2019

Choose a reason for hiding this comment

BridgeAR commented Nov 26, 2019

BridgeAR commented Nov 26, 2019

addaleax commented Nov 26, 2019 • edited Loading

BridgeAR commented Nov 26, 2019

addaleax commented Nov 26, 2019

BridgeAR commented Nov 26, 2019

addaleax commented Nov 26, 2019

ChALkeR Dec 26, 2019 • edited Loading

Choose a reason for hiding this comment

puzpuzpuz commented Oct 8, 2020

github-actions bot commented Oct 19, 2020

github-actions bot commented Nov 19, 2020

BridgeAR commented Nov 26, 2019 •

edited

Loading

addaleax commented Nov 26, 2019 •

edited

Loading

ChALkeR Dec 26, 2019 •

edited

Loading