Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

buffer: use dynamic buffer pool #30661

Closed
wants to merge 4 commits into from

Conversation

BridgeAR
Copy link
Member

@BridgeAR BridgeAR commented Nov 26, 2019

  1. commit: buffer: use internal allocate function instead of public one
This makes sure that `Buffer.concat` does not use exposed functions
that might have been tampered with. Instead, it uses the direct
call to the internal `allocate` function and skipps the size check.
  1. commit: buffer: only validate if necessary
This moves a validation check and removes another one that is done
twice. The validation is obsolete in case of `Buffer.concat`. For
`Buffer.copy` only one check is required (one side is already
guaranteed to be an instance of buffer).
  1. commit: test: harden test/parallel/test-zlib-unused-weak.js
This makes sure the buffer pool size is taken into account while
checking for the GC overhead. It would otherwise indicate a faulty
result.
  1. commit: buffer: use a dynamic pool size instead of a fixed one
This significantly improves the performance in case lots of smallish
buffers are used. In case they are frequently allocated, the pool
will increase to a size of up to 2 MB. It starts with no pool and is
therefore smaller by default than the current default. That way it's
better for devices that have hard memory constraints.

Fixes: #30611 (at least partially)
Refs: #27121

Benchmark:
https://ci.nodejs.org/view/Node.js%20benchmark/job/benchmark-node-micro-benchmarks/471/
https://ci.nodejs.org/view/Node.js%20benchmark/job/benchmark-node-micro-benchmarks/472

Benchmark results (at least one star for accuracy):

                                                                     confidence improvement accuracy (*)    (**)    (***)
buffers/buffer-creation.js n=600000 len=1024 type='fast-allocUnsafe'        ***    293.90 %      ±24.37% ±32.82%  ±43.53%
buffers/buffer-creation.js n=600000 len=1024 type='slow-allocUnsafe'          *      9.16 %       ±9.04% ±12.04%  ±15.68%
buffers/buffer-creation.js n=600000 len=4096 type='fast-allocUnsafe'        ***   1288.00 %      ±66.61% ±89.72% ±118.99%
buffers/buffer-creation.js n=600000 len=8192 type='fast-allocUnsafe'        ***    846.83 %      ±38.02% ±51.23%  ±67.99%

buffers/buffer-from.js n=800000 len=100 source='buffer'                     ***      7.86 %       ±3.00%  ±3.99%  ±5.19%
buffers/buffer-from.js n=800000 len=100 source='string-base64'               **     10.76 %       ±7.04%  ±9.41% ±12.32%
buffers/buffer-from.js n=800000 len=100 source='string-utf8'                  *      9.05 %       ±8.76% ±11.70% ±15.32%
buffers/buffer-from.js n=800000 len=2048 source='array'                     ***     20.93 %       ±7.66% ±10.27% ±13.52%
buffers/buffer-from.js n=800000 len=2048 source='buffer'                    ***     11.19 %       ±5.99%  ±7.98% ±10.43%
buffers/buffer-from.js n=800000 len=2048 source='string'                    ***     22.33 %       ±7.46%  ±9.92% ±12.91%
buffers/buffer-from.js n=800000 len=2048 source='string-base64'              **     14.42 %       ±9.02% ±12.05% ±15.76%
buffers/buffer-from.js n=800000 len=2048 source='string-utf8'               ***     25.95 %       ±8.25% ±10.97% ±14.28%
Checklist
  • make -j4 test (UNIX), or vcbuild test (Windows) passes
  • tests and/or benchmarks are included
  • documentation is changed or added
  • commit message follows commit guidelines

This makes sure that `Buffer.concat` does not use exposed functions
that might have been tampered with. Instead, it uses the direct
call to the internal `allocate` function and skipps the size check.
This moves a validation check and removes another one that is done
twice. The validation is obsolete in case of `Buffer.concat`. For
`Buffer.copy` only one check is required (one side is already
guaranteed to be an instance of buffer).
This makes sure the buffer pool size is taken into account while
checking for the GC overhead. It would otherwise indicate a faulty
result.
@nodejs-github-bot nodejs-github-bot added the buffer Issues and PRs related to the buffer subsystem. label Nov 26, 2019
This significantly improves the performance in case lots of smallish
buffers are used. In case they are frequently allocated, the pool
will increase to a size of up to 2 MB. It starts with no pool and is
therefore smaller by default than the current default. That way it's
better for devices that have hard memory constraints.
@nodejs-github-bot
Copy link
Collaborator

@BridgeAR BridgeAR added performance Issues and PRs related to the performance of Node.js. semver-major PRs that contain breaking changes and should be released in the next major version. labels Nov 26, 2019
@BridgeAR
Copy link
Member Author

I marked this as semver-major due to the different behavior (having a maximum as pool size instead of a fixed value). It should not negatively impact any application but that way we are on the safe side.

@addaleax
Copy link
Member

The first two commits LGTM, although they can probably be PR’ed on their own, as they seem largely unrelated to the rest and aren’t semver-major either?

Copy link
Member

@addaleax addaleax left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It starts with no pool and is
therefore smaller by default than the current default. That way it's
better for devices that have hard memory constraints.

That sounds … optimistic? It still increases the pool size to up to 2 MB, and the worst-case memory retention scenario still remains an issue this way.

Also, it seems dangerous to introduce “magic” timing-dependent behaviour into Node.js…

Right now, I’d prefer an actual pooling solution as suggested in #30611.

`Buffer` instances created using [`Buffer.allocUnsafe()`][] and the deprecated
`new Buffer(size)` constructor only when `size` is less than or equal to
`Buffer.poolSize >> 1` (floor of [`Buffer.poolSize`][] divided by two).
The `Buffer` module uses an internal `Buffer` instance as pool for the fast
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The `Buffer` module uses an internal `Buffer` instance as pool for the fast
The `Buffer` module uses an internal `Buffer` instance as a pool for the fast

@BridgeAR
Copy link
Member Author

@addaleax the first three commits are patches and can be moved into a separate PR (it was just something I stumbled upon while working on the other part). The third commit is necessary for the fourth one but it could land independently before this PR. It just makes sure that what ever we use as static buffer is taken into account.

@BridgeAR
Copy link
Member Author

@addaleax

That sounds … optimistic? It still increases the pool size to up to 2 MB, and the worst-case memory retention scenario still remains an issue this way.

It would still be possible for any application to change that to zero to prevent using any pool. I do not fear the memory usage in this case. 2 MB is small for most applications and it's still possible to lower that in case it does not fit for some applications.

Also, it seems dangerous to introduce “magic” timing-dependent behaviour into Node.js…

The pool size regulates itself quite well depending on how many buffers are required over time. I can not see where this would be dangerous and it would be great if you would provide some examples to clarify that part.
If that would make you more comfortable about it: I am absolutely fine to introduce this with a comment that we might revert to the old behavior in case people fall over this.

@addaleax
Copy link
Member

addaleax commented Nov 26, 2019

@addaleax

That sounds … optimistic? It still increases the pool size to up to 2 MB, and the worst-case memory retention scenario still remains an issue this way.

It would still be possible for any application to change that to zero to prevent using any pool. I do not fear the memory usage in this case. 2 MB is small for most applications and it's still possible to lower that in case it does not fit for some applications.

I don’t have terribly strong opinions on this, but I’d personally think that for platform software like Node.js, it’s better to have defaults that work for almost all applications and can be tuned up to increase performance rather than defaults that work for most applications but sometimes need tuning down because otherwise it causes issues.

Also, it seems dangerous to introduce “magic” timing-dependent behaviour into Node.js…

The pool size regulates itself quite well depending on how many buffers are required over time. I can not see where this would be dangerous and it would be great if you would provide some examples to clarify that part.

I mean, it’s made kind of okay by the fact that the pool size is an implementation detail, but this is going to make performance and memory retention dependent on timing, which is dependent on things like hardware configuration, machine load, etc, and that is usually a Bad Idea.

It might be a good idea if you could describe why you picked time as a factor in this and how it affects the performance here to have that 1 second cutoff. In my experience, in situations like this time is almost always a proxy for some other value (e.g. the number of some kind of operations), and figuring out what that is would be helpful.

If that would make you more comfortable about it: I am absolutely fine to introduce this with a comment that we might revert to the old behavior in case people fall over this.

I mean, let’s see what other people think of this approach.

@BridgeAR
Copy link
Member Author

It might be a good idea if you could describe why you picked time as a factor in this and how it affects the performance here to have that 1 second cutoff. In my experience, in situations like this time is almost always a proxy for some other value (e.g. the number of some kind of operations), and figuring out what that is would be helpful.

The main point for me was that I did not want to increase the pool in general. I wanted to make sure that it'll drop in size in case the application won't use buffers (even though that's pretty much impossible, since core uses them in multiple situations).
Applications that just infrequently receive requests should also not retain much memory that could be used by other applications in the meanwhile.

The main question is when is it good to increase the pool size and when is it good to decrease?

I used an approach where I anticipate a specific amount of data incoming or outgoing from the application over some protocols over time (including data transformations and small pauses).
Every protocol should have at least 2 MB throughput per second in case it's used a lot. The buffer will increase while data is incoming and improves the performance dealing with that data. In case the application becomes idle (e.g., less users during the night) or in case only small amounts of data are handled, the pool size decreases again.
One second might be generous and we could lower that to e.g., 500ms, since 4mb data transfer per second is still not a lot.

To conclude: I used a data / second approach to decide when to increase or decrease the size and I can not think of a better way. But I am definitely open for suggestions!

@addaleax
Copy link
Member

The main question is when is it good to increase the pool size and when is it good to decrease?

I think my main question is not when, but why it is good to increase/decrease? As in, what benefit does dynamically adjusting the pool size provide over a fixed 2 MB pool size?

The worst-case memory retention issue remains the same between those two options. Any slice from the pool could still keep a 2 MB buffer alive, no matter how small it is, if it happens to live longer than it should.

On the other hand, what’s the benefit of reducing the pool size if the pool isn’t being used? A 2 MB allocation will, in most real-world cases, not result in using 2 MB of memory immediately; rather, the actual memory pages will only be allocated once they’re being accessed. So lowering the buffer size is not affecting actual memory usage immediately.

@BridgeAR
Copy link
Member Author

The worst-case memory retention issue remains the same between those two options. Any slice from the pool could still keep a 2 MB buffer alive, no matter how small it is, if it happens to live longer than it should.

That does not seem to be a common case though? For these cases it's also possible to use Buffer.allocateUnsafeSlow. I'll add a comment to our docs that when using Buffer.allocateUnsafe data should mostly not be hold on too for a long time duration.

lowering the buffer size is not affecting actual memory usage immediately.

It's definitely not happening right away. It should prevent applications from growing the pool size to 2 MB in the first place in case they do not use buffers a lot.

the actual memory pages will only be allocated once they’re being accessed

For applications that use buffers rarely the average memory usage would probably be more linear. If we use a 2 MB static pool, it would grow to up to 2 MB and then drop to a low value instead of only growing to e.g., 512 KB with the dynamic approach. This should be helpful, especially if someone keeps a small chunk alive.

@addaleax
Copy link
Member

The worst-case memory retention issue remains the same between those two options. Any slice from the pool could still keep a 2 MB buffer alive, no matter how small it is, if it happens to live longer than it should.

That does not seem to be a common case though? For these cases it's also possible to use Buffer.allocateUnsafeSlow. I'll add a comment to our docs that when using Buffer.allocateUnsafe data should mostly not be hold on too for a long time duration.

I know that Buffer.allocUnsafeSlow() is the right thing for these use cases, but I’m honestly not confident that we can rely on our users also being aware of that :)

lowering the buffer size is not affecting actual memory usage immediately.

It's definitely not happening right away. It should prevent applications from growing the pool size to 2 MB in the first place in case they do not use buffers a lot.

Right, but why? If we’re willing to accept the downsides of 2 MB pool allocations anyway, why not just go with that size always?

the actual memory pages will only be allocated once they’re being accessed

For applications that use buffers rarely the average memory usage would probably be more linear. If we use a 2 MB static pool, it would grow to up to 2 MB and then drop to a low value instead of only growing to e.g., 512 KB with the dynamic approach. This should be helpful, especially if someone keeps a small chunk alive.

Can you explain this in more detail? Why is the memory usage pattern more “linear” here?

@@ -132,16 +139,29 @@ function createUnsafeBuffer(size) {
}
}

function createPool() {
poolSize = Buffer.poolSize;
// This is faster than using the bigint version.
Copy link
Member

@ChALkeR ChALkeR Dec 26, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the difference measurable? If so, would just using process.hrtime()[0] and now - last > 1 (approx ~1.5) make sense? Probably not, asking just in case.

@BridgeAR BridgeAR force-pushed the master branch 2 times, most recently from 8ae28ff to 2935f72 Compare May 31, 2020 12:19
@jasnell jasnell added the stalled Issues and PRs that are stalled. label Jun 25, 2020
@puzpuzpuz
Copy link
Member

The first two commits LGTM, although they can probably be PR’ed on their own, as they seem largely unrelated to the rest and aren’t semver-major either?

I came across this PR and also think that the first two commits are valuable and deserve a separate PR. @BridgeAR is too much to ask you to submit a separate PR with those commits?

@aduh95 aduh95 added stalled Issues and PRs that are stalled. and removed stalled Issues and PRs that are stalled. labels Oct 19, 2020
@github-actions
Copy link
Contributor

This issue/PR was marked as stalled, it will be automatically closed in 30 days. If it should remain open, please leave a comment explaining why it should remain open.

@github-actions
Copy link
Contributor

Closing this because it has stalled. Feel free to reopen if this PR is still relevant, or to ping the collaborator who labelled it stalled if you have any questions.

@github-actions github-actions bot closed this Nov 19, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
buffer Issues and PRs related to the buffer subsystem. performance Issues and PRs related to the performance of Node.js. semver-major PRs that contain breaking changes and should be released in the next major version. stalled Issues and PRs that are stalled.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

buffer: support multiple pools for Buffer.allocUnsafe
7 participants