Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

buffer: support multiple pools for Buffer.allocUnsafe #30611

Closed
puzpuzpuz opened this issue Nov 23, 2019 · 14 comments
Closed

buffer: support multiple pools for Buffer.allocUnsafe #30611

puzpuzpuz opened this issue Nov 23, 2019 · 14 comments
Labels
buffer Issues and PRs related to the buffer subsystem.

Comments

@puzpuzpuz
Copy link
Member

puzpuzpuz commented Nov 23, 2019

The problem

At the moment Buffer.allocUnsafe uses a global pool (strictly speaking, it's not a "classical" object pool, but let's use this terminology), i.e. pre-allocated internal Buffer which is then sliced for newly allocated buffers, when possible. The size of the pool may be configured by setting Buffer.poolSize property (8KB by default). Once the current internal buffer fills up, a new one is allocated by the pool.

Here is the related fragment of the official documentation:

The Buffer module pre-allocates an internal Buffer instance of size Buffer.poolSize that is used as a pool for the fast allocation of new Buffer instances created using Buffer.allocUnsafe() and the deprecated new Buffer(size) constructor only when size is less than or equal to Buffer.poolSize >> 1 (floor of Buffer.poolSize divided by two).

The problem with this approach is that this pool is global, so its size can't be configured by a Node.js library. Of course, a library can change Buffer.poolSize, but that may clash with another library or the application which may also change it. Also, if the application changes this setting to a sub-optimal value, it may impact the performance of libraries used within the application.

Many libraries, especially client libraries, use Buffer.allocUnsafe on their hot path. So, predictable and optimal size of the pool is critical for their performance. Thus, it would be great to have a way to isolate libraries from each other and the application.

The solution

The idea is to introduce the concept of multiple identified pools for Buffer.allocUnsafe. See the snippet:

// Create a new internal pool with size of 32KB.
// The function returns a pool id (number).
const poolId = Buffer.createUnsafePool(32 * 1024);

// Allocate unsafe buffer using the internal pool.
// The second optional argument is the pool id.
const buf = Buffer.allocUnsafe(16, poolId);

With this change libraries are free to use their own dedicated pool with a size that they find optimal.

Optionally, another function Buffer.destroyUnsafePool could be added. This function will destroy the pool, so its buffer could be GCed. But I don't think it's a must have as with current API it's not possible to destroy the global pool.

Note that this change is backward-compatible, thus it won't impact current API and its behavior.

I'd like to hear some positive feedback from node collaborators first, so I could proceed with the implementation.

Alternatives

This could be implemented as an npm module. But I'm certain that it's a natural addition to the standard API, which already implements such pool, yet it's global and its configuration is really inconvenient for libraries. So, having it in the core will be a valuable addition.

Another thing to mention is that the default size doesn't seem large enough (at least for certain use cases) and could be increased (see #27121). But this won't help with the isolation.

@addaleax addaleax added the buffer Issues and PRs related to the buffer subsystem. label Nov 23, 2019
@addaleax
Copy link
Member

Alternatives

I think this could be implemented relatively easily as an npm module? It wouldn’t really have to do more than allocate a large buffer using Buffer.allocUnsafe() and then hand out .slice()s of it?

@puzpuzpuz
Copy link
Member Author

puzpuzpuz commented Nov 23, 2019

I think this could be implemented relatively easily as an npm module? It wouldn’t really have to do more than allocate a large buffer using Buffer.allocUnsafe() and then hand out .slice()s of it?

Yes, that could be an npm module. But I'm certain that it's a natural addition to the standard API, which already implements such pool, yet it's global and its configuration is really inconvenient for libraries. So, having it in the core will be a valuable addition.

Nevertheless, I'll add this option into Alternatives section. Thanks!

@mcollina
Copy link
Member

I know @BridgeAR have a proposal to improve this situation without introducing a new API.

@puzpuzpuz
Copy link
Member Author

puzpuzpuz commented Nov 23, 2019

I know @BridgeAR have a proposal to improve this situation without introducing a new API.

@mcollina
Hmm, I'm really curious how pool isolation and independent configuration can be achieved without API changes. I'll be glad to participate in the discussion of that proposal, if that's possible.

Although, I can imagine a totally different API that could be introduced in node core. I'm speaking of a "classical" pool with buffer reuse. Such API could reduce amount of litter - the problem that isn't solved by the mechanism we have inside of Buffer.allocUnsafe now. But that new API will have to deal with either manual reference counting (which is really unsafe) or integrate with GC for buffer reclamation (which is really complicated and may lead to changes in V8). So, it seems to me that such idea should go live in npm modules, not in core.

@addaleax
Copy link
Member

Although, I can imagine a totally different API that could be introduced in node core. I'm speaking of a "classical" pool with buffer reuse. Such API could reduce amount of litter

I think WeakRefs and finalizers might help with that? V8 does currently come with an experimental implementation of that feature (behind --harmony-weak-refs), and I think this might actually be a really good use case.

@puzpuzpuz
Copy link
Member Author

puzpuzpuz commented Nov 24, 2019

I think WeakRefs and finalizers might help with that? V8 does currently come with an experimental implementation of that feature (behind --harmony-weak-refs), and I think this might actually be a really good use case.

That's an interesting suggestion. Thanks for bringing it up!

I've looked at FinalizationGroup API and it could be enough to implement a pool that would reuse internal buffers once all slices pointing at them were GCed. The pool could store weak refs to internal buffers (let's call them source buffers) and register all slices (as "holdings") with their source buffer (as "the object whose lifetime we're concerned with") in a separate FinalizationGroup per source buffer. Once a finalizer function is called and it reports that all slices were GCed, the pool could reclaim the corresponding source buffer and reuse it.

I have some doubts in the performance of this approach. For large source buffers the cost of bookkeeping of all slices in the FinalizationGroup may be too high. For small source buffers (especially with 8KB source buffers which is the default in the standard API's global pool) it's probably not worth it at all. So, some PoC experiments and benchmarking have to be done.

Potentially this pooling mechanism could be implemented in the existing core API (the global pool for Buffer.allocUnsafe), or that could be an npm module. In both cases, it's orthogonal to the impl & config isolation problem which is described by this issue.

If you think that this pool with WeakRefs and FinalizationGroups deserves a separate feature request in node core repo, please let me know and I'll create it.

@BridgeAR
Copy link
Member

I just opened a PR with my approach. it switches the current static pool to become dynamic instead. Buffer.poolSize is handled as upper bound instead of the actual allocated value. The allocation size depends on the frequency and the size of the allocation during a specific time period. The pool starts with zero bytes and may increase up to 2 MB. That limit seemed high enough that almost all applications would profit from it significantly without having to handle individual pools.
If no buffers are allocated or only infrequently, the size of the pool will shrink to a lower value than the currently fixed size. It should therefore also benefit applications that do not use Buffer. But even if they do not use buffers internally: Node.js core does use them and the dynamic pool increases the performance in multiple use cases significantly.

@puzpuzpuz
Copy link
Member Author

puzpuzpuz commented Nov 26, 2019

@BridgeAR
It sounds to me that by taking the approach with a dynamic internal buffer size (with low and high watermarks) you're trying to solve a different problem. As far as I understand, you want to increase the default value for Buffer.poolSize without changing memory footprint of apps/libraries that don't use Buffer.allocUnsafe a lot.

In this issue, I'm advocating for library authors who want to make sure that performance of their libraries is not screwed by a misbehaving library. Also, it can be combined with your dynamic pool approach. Moreover, if both features would become available in standard API, node could start using a dedicated dynamic pool for its own purposes.

@BridgeAR
Copy link
Member

It would indeed not be ideal if any third party library changes the default. That should only be done by the "actual application".

I advocate against multiple pools for the following reason: having a single pool takes advantage of the pool as often as possible without increasing the overall memory footprint a lot. With multiple distinct pools it's neither possible for the application to prevent any pool usage (because they have very tight memory constraints) in a simple way, nor is it actually used ideally (quite a bit of each pool will be unused during the time that it's not in the hot code path. And even at that point, multiple pools could be used and that increases the overall memory usage).

We have multiple APIs that are not meant for library authors to be changed and only by the application itself. This applies to pretty much all global default values. It would be great to fix these in general but increasing the overall memory usage by defining multiple pools does not sound ideal to me.

@puzpuzpuz
Copy link
Member Author

It would indeed not be ideal if any third party library changes the default. That should only be done by the "actual application".

Sounds like an enhancement for the standard documentation.

I advocate against multiple pools for the following reason

I get your point. In the end, it sounds like npm ecosystem could benefit from a buffer pool library that not only implements Buffer.allocUnsafe (and thus, improves performance), but acts as an object pool, i.e. reduces allocation rate by reusing buffers.

Please feel free to close this issue.

@addaleax
Copy link
Member

Please feel free to close this issue.

Fwiw, I’m still a fan of the “actual pooling” approach laid out above. If we close this, I’d open a new issue about it, as it really seems like a good way forward here.

@BridgeAR
Copy link
Member

I like the idea to have a way to reuse memory that won't be used anymore instead of allocating new parts.
We might be able to expose a function that indicates that a specific part of a pool is not needed anymore and then reuse that for new allocations.
It might be dangerous though, if not used appropriately.

Is that going in the direction of what you originally had in mind (just by using a completely separate pool)?

@puzpuzpuz
Copy link
Member Author

puzpuzpuz commented Nov 26, 2019

Fwiw, I’m still a fan of the “actual pooling” approach laid out above. If we close this, I’d open a new issue about it, as it really seems like a good way forward here.

@addaleax
I also like the idea and would like to participate in design and implementation, if possible. Please mention me in that separate issue. Or I can create it, if it sounds like a better plan to you.

We might be able to expose a function that indicates that a specific part of a pool is not needed anymore and then reuse that for new allocations. It might be dangerous though, if not used appropriately.

@BridgeAR
There is a potentially much safer way to achieve it. It was discussed right here. See: #30611 (comment)

@puzpuzpuz
Copy link
Member Author

Closing this one. Created #30683 to follow-up on the pooling mechanism.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
buffer Issues and PRs related to the buffer subsystem.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants