Lazy memory allocation in channel pushers. #394

ryzhyk · 2021-06-11T02:28:23Z

We have an application that builds a very large dataflow graph (thousands of operators) and creates hundreds of concurrently running instances of this graph. Heap profiling showed that the application wastes a lot of memory in channel pushers, which maintain Message::default_length entries even when the system is idle with no outstanding messages.

I understand that this is a performance optimization that reduces the number of malloc calls, but the memory overhead was not acceptable for our use case, so I created a patch that changes the allocation policy to only allocate channel memory (in chunks of Message::default_length entries) when there are messages to send and to deallocate it when there are no messages, reducing the memory footprint to 0 in idle state: ddlog-dev@a94e176

@frankmcsherry , any chance you'd accept a cleaned up version of this patch as a PR? I realize that it caters for a somewhat idiosyncratic use case, but it doesn't seem to cause a measurable slow down in my experiments and it does save memory.

The text was updated successfully, but these errors were encountered:

Kixiron · 2021-06-12T19:19:53Z

If this being the default behavior is a deal breaker it can always be gated either behind a feature flag or behind a timely Config option

frankmcsherry · 2021-06-23T14:17:34Z

Sorry for the delay, I was out / am overwhelmed. :)

The thing you describe sounds like an issue we should fix. If you have a patch, we could totally look at it. It would be a performance regression for some use cases (those that fill these buffers in steady state). Do you have a take on whether your graphs 1. alternate between filling the buffers and then idling on empty buffers, or 2. don't fill the buffers and so a smaller default / adaptive approach would be better, or maybe a bit of both?

frankmcsherry · 2021-06-23T14:19:11Z

Another option, though idk for certain: this could just be a different pact implementation. Exchange has an implementation, but it doesn't have to be the pact that you use (we could have both implementations, and users could choose between them as appropriate). That .. only really makes sense if the exchange channel is where the empty buffers languish, rather than at other moments in the dataflow.

frankmcsherry · 2021-06-23T14:30:03Z

There's also, and I'm not sure I want to be that guy but, the possibility that making hundreds of independent copies of a large dataflow graph is just an anti-pattern in TD. I apologize if you have before, but can you explain why this can't be one copy of the large dataflow graph, with keys on the data to indicate which logical instance they belong to?

ryzhyk · 2021-06-23T19:24:17Z

It would be a performance regression for some use cases (those that fill these buffers in steady state)

I understand this is true in theory, but I wonder if this actually happens in practice.

If you have a patch

I do. I'll clean it up and submit as PR, thanks!

I can also try implementing @Kixiron 's suggestion and make this a configuration option (which will break backwards compatibility for existing timely applications and add some extra forks in the code).

alternate between filling the buffers and then idling on empty buffers, or 2. don't fill the buffers and so a smaller default / adaptive approach would be better, or maybe a bit of both?

It's definitely 1. And I did try with a smaller default, which helps, but a lot of memory still gets wasted.

but can you explain why this can't be one copy of the large dataflow graph, with keys on the data to indicate which logical instance they belong to?

There are several reasons. The most important one is programmability/security. Our users write hundreds of (ddlog) rules and it would be a pain for them to use the extra key in all rules. And even if we provided some idioms to help them get this right, a small typo would cause data to flow across different logical instances, which in the real world translates to sensitive data leaking across different external customers, which is the absolute worst nightmare for the folks we're working with. In fact, we are planning to implement even stronger isolation by encapsulating each dataflow in a separate process.

Aside from this, scalability is another concern. @Kixiron managed to improve the scalability of DDlog-generated code with the number of workers, but it's still nowhere near as good as the near-linear scaling we get when using multiple separate dataflows.

The current implementation of channel pushers preallocates `Message::default_length` entries per pusher. For very large graphs this adds up to a lot of memory being allocated even when the system is idle with no outstanding messages. This patch changes the allocation policy to only allocate channel memory when there are messages to send and to deallocate it when there are no messages, reducing the memory footprint to 0 in idle state at the cost of some potential slow-down due to a larger number of allocations. See TimelyDataflow#394.

ryzhyk · 2021-06-25T02:55:29Z

Created a draft PR: #397

The current implementation of channel pushers preallocates `Message::default_length` entries per pusher. For very large graphs this adds up to a lot of memory being allocated even when the system is idle with no outstanding messages. This patch changes the allocation policy to only allocate channel memory when there are messages to send and to deallocate it at the end of a burst of messages (signaled by pushing a `None` message), reducing the memory footprint to 0 in idle state at the cost of some potential slow-down due to a larger number of allocations. See TimelyDataflow#394.

ryzhyk mentioned this issue Jun 25, 2021

Lazy-allocate pusher memory. #397

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Lazy memory allocation in channel pushers. #394

Lazy memory allocation in channel pushers. #394

ryzhyk commented Jun 11, 2021 •

edited

Loading

Kixiron commented Jun 12, 2021

frankmcsherry commented Jun 23, 2021

frankmcsherry commented Jun 23, 2021

frankmcsherry commented Jun 23, 2021

ryzhyk commented Jun 23, 2021

ryzhyk commented Jun 25, 2021

Lazy memory allocation in channel pushers. #394

Lazy memory allocation in channel pushers. #394

Comments

ryzhyk commented Jun 11, 2021 • edited Loading

Kixiron commented Jun 12, 2021

frankmcsherry commented Jun 23, 2021

frankmcsherry commented Jun 23, 2021

frankmcsherry commented Jun 23, 2021

ryzhyk commented Jun 23, 2021

ryzhyk commented Jun 25, 2021

ryzhyk commented Jun 11, 2021 •

edited

Loading