-
Notifications
You must be signed in to change notification settings - Fork 29.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
buffer: use dynamic buffer pool #30661
Conversation
This makes sure that `Buffer.concat` does not use exposed functions that might have been tampered with. Instead, it uses the direct call to the internal `allocate` function and skipps the size check.
This moves a validation check and removes another one that is done twice. The validation is obsolete in case of `Buffer.concat`. For `Buffer.copy` only one check is required (one side is already guaranteed to be an instance of buffer).
This makes sure the buffer pool size is taken into account while checking for the GC overhead. It would otherwise indicate a faulty result.
This significantly improves the performance in case lots of smallish buffers are used. In case they are frequently allocated, the pool will increase to a size of up to 2 MB. It starts with no pool and is therefore smaller by default than the current default. That way it's better for devices that have hard memory constraints.
5556f03
to
e783973
Compare
I marked this as semver-major due to the different behavior (having a maximum as pool size instead of a fixed value). It should not negatively impact any application but that way we are on the safe side. |
The first two commits LGTM, although they can probably be PR’ed on their own, as they seem largely unrelated to the rest and aren’t semver-major either? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It starts with no pool and is
therefore smaller by default than the current default. That way it's
better for devices that have hard memory constraints.
That sounds … optimistic? It still increases the pool size to up to 2 MB, and the worst-case memory retention scenario still remains an issue this way.
Also, it seems dangerous to introduce “magic” timing-dependent behaviour into Node.js…
Right now, I’d prefer an actual pooling solution as suggested in #30611.
`Buffer` instances created using [`Buffer.allocUnsafe()`][] and the deprecated | ||
`new Buffer(size)` constructor only when `size` is less than or equal to | ||
`Buffer.poolSize >> 1` (floor of [`Buffer.poolSize`][] divided by two). | ||
The `Buffer` module uses an internal `Buffer` instance as pool for the fast |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The `Buffer` module uses an internal `Buffer` instance as pool for the fast | |
The `Buffer` module uses an internal `Buffer` instance as a pool for the fast |
@addaleax the first three commits are patches and can be moved into a separate PR (it was just something I stumbled upon while working on the other part). The third commit is necessary for the fourth one but it could land independently before this PR. It just makes sure that what ever we use as static buffer is taken into account. |
It would still be possible for any application to change that to zero to prevent using any pool. I do not fear the memory usage in this case. 2 MB is small for most applications and it's still possible to lower that in case it does not fit for some applications.
The pool size regulates itself quite well depending on how many buffers are required over time. I can not see where this would be dangerous and it would be great if you would provide some examples to clarify that part. |
I don’t have terribly strong opinions on this, but I’d personally think that for platform software like Node.js, it’s better to have defaults that work for almost all applications and can be tuned up to increase performance rather than defaults that work for most applications but sometimes need tuning down because otherwise it causes issues.
I mean, it’s made kind of okay by the fact that the pool size is an implementation detail, but this is going to make performance and memory retention dependent on timing, which is dependent on things like hardware configuration, machine load, etc, and that is usually a Bad Idea. It might be a good idea if you could describe why you picked time as a factor in this and how it affects the performance here to have that 1 second cutoff. In my experience, in situations like this time is almost always a proxy for some other value (e.g. the number of some kind of operations), and figuring out what that is would be helpful.
I mean, let’s see what other people think of this approach. |
The main point for me was that I did not want to increase the pool in general. I wanted to make sure that it'll drop in size in case the application won't use buffers (even though that's pretty much impossible, since core uses them in multiple situations). The main question is when is it good to increase the pool size and when is it good to decrease? I used an approach where I anticipate a specific amount of data incoming or outgoing from the application over some protocols over time (including data transformations and small pauses). To conclude: I used a |
I think my main question is not when, but why it is good to increase/decrease? As in, what benefit does dynamically adjusting the pool size provide over a fixed 2 MB pool size? The worst-case memory retention issue remains the same between those two options. Any slice from the pool could still keep a 2 MB buffer alive, no matter how small it is, if it happens to live longer than it should. On the other hand, what’s the benefit of reducing the pool size if the pool isn’t being used? A 2 MB allocation will, in most real-world cases, not result in using 2 MB of memory immediately; rather, the actual memory pages will only be allocated once they’re being accessed. So lowering the buffer size is not affecting actual memory usage immediately. |
That does not seem to be a common case though? For these cases it's also possible to use
It's definitely not happening right away. It should prevent applications from growing the pool size to 2 MB in the first place in case they do not use buffers a lot.
For applications that use buffers rarely the average memory usage would probably be more linear. If we use a 2 MB static pool, it would grow to up to 2 MB and then drop to a low value instead of only growing to e.g., 512 KB with the dynamic approach. This should be helpful, especially if someone keeps a small chunk alive. |
I know that
Right, but why? If we’re willing to accept the downsides of 2 MB pool allocations anyway, why not just go with that size always?
Can you explain this in more detail? Why is the memory usage pattern more “linear” here? |
@@ -132,16 +139,29 @@ function createUnsafeBuffer(size) { | |||
} | |||
} | |||
|
|||
function createPool() { | |||
poolSize = Buffer.poolSize; | |||
// This is faster than using the bigint version. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the difference measurable? If so, would just using process.hrtime()[0]
and now - last > 1
(approx ~1.5) make sense? Probably not, asking just in case.
8ae28ff
to
2935f72
Compare
I came across this PR and also think that the first two commits are valuable and deserve a separate PR. @BridgeAR is too much to ask you to submit a separate PR with those commits? |
This issue/PR was marked as stalled, it will be automatically closed in 30 days. If it should remain open, please leave a comment explaining why it should remain open. |
Closing this because it has stalled. Feel free to reopen if this PR is still relevant, or to ping the collaborator who labelled it stalled if you have any questions. |
buffer: use internal allocate function instead of public one
buffer: only validate if necessary
test: harden test/parallel/test-zlib-unused-weak.js
buffer: use a dynamic pool size instead of a fixed one
Fixes: #30611 (at least partially)
Refs: #27121
Benchmark:
https://ci.nodejs.org/view/Node.js%20benchmark/job/benchmark-node-micro-benchmarks/471/
https://ci.nodejs.org/view/Node.js%20benchmark/job/benchmark-node-micro-benchmarks/472
Benchmark results (at least one star for accuracy):
Checklist
make -j4 test
(UNIX), orvcbuild test
(Windows) passes