Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lazy-allocate pusher memory. #397

Draft
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

ryzhyk
Copy link
Contributor

@ryzhyk ryzhyk commented Jun 25, 2021

The current implementation of channel pushers preallocates
Message::default_length entries per pusher. For very large graphs this
adds up to a lot of memory being allocated even when the system is idle
with no outstanding messages. This patch changes the allocation policy
to only allocate channel memory when there are messages to send and to
deallocate it when there are no messages, reducing the memory footprint
to 0 in idle state at the cost of some potential slow-down due to a
larger number of allocations.

See #394.

@frankmcsherry
Copy link
Member

A few quick thoughts:

  1. Lazy allocation seems great.
  2. The approach to de-allocation will de-allocate on every call to Message::push_at, rather than at the end of a stream of messages. The Push trait that gets used everywhere communicates "I have no more data" with a None message, and de-allocating on that seems better. So, once every burst of data, rather than once for every message.
  3. The above approach might mean that this can be implemented by e.g. Exchange and then just be another pact rather than a system-wide change to resource management. If nothing else, it's probably worth tracking down where these buffers get stashed.

@ryzhyk
Copy link
Contributor Author

ryzhyk commented Jun 26, 2021

Thanks for the feedback!

So, once every burst of data, rather than once for every message.

Makes sense! Instead of always deallocating inside push_at, I will modify pusher implementations that maintain internal buffers (Exchange and Tee) to deallocate the buffer on a None message.

The above approach might mean that this can be implemented by e.g. Exchange and then just be another pact rather than a system-wide change to resource management.

Sorry, I'm not sure how to implement this. Are you suggesting that pushers::Exchange should be configurable to optionally deallocate at the end of a burst, and then we create a pact similar to ExchangePact but with this option enabled? If so, how do I get timely to use this new pact when constructing the dataflow?

The current implementation of channel pushers preallocates
`Message::default_length` entries per pusher.  For very large graphs this
adds up to a lot of memory being allocated even when the system is idle
with no outstanding messages.  This patch changes the allocation policy
to only allocate channel memory when there are messages to send and to
deallocate it at the end of a burst of messages (signaled by pushing
a `None` message), reducing the memory footprint to 0 in idle state at
the cost of some potential slow-down due to a larger number of allocations.

See TimelyDataflow#394.
@frankmcsherry
Copy link
Member

Re-reading things, I think the Exchange implementation is missing an opportunity to blank out its vector on lines 44-46, where for single-worker instantiations it just passes the message through (including a None indicating a flush).

@ryzhyk
Copy link
Contributor Author

ryzhyk commented Jun 30, 2021

I thought in this case the buffer would never be allocated in the first place?

@frankmcsherry
Copy link
Member

Ah good point. That may be true!

@ryzhyk
Copy link
Contributor Author

ryzhyk commented Jun 30, 2021

I confirmed using out internal benchmarks that the fixed up version that only deallocates on None yields memory savings comparable to the version that aggressively deallocated on every push.

@frankmcsherry, do you see this being merged as is or do you feel we should work on making this an optional pact?

@frankmcsherry
Copy link
Member

I'd need to think a bit before merging it as is, due to the impact on others. If it were an optional pact, it would be easy to accept. On the other hand, I'm not sure that the Tee stuff is as easy to make optional, which is too bad. I do like the idea, and we did something like this in Naiad, but I do want to shake out the performance penalty before landing it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants