-
Notifications
You must be signed in to change notification settings - Fork 5.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Pubsub] reduce memory usage for channels that do not require total memory cap #23985
Conversation
src/ray/pubsub/publisher.h
Outdated
|
||
/// State for an entity that streams published messages to subscribers, with total size | ||
/// cap. | ||
class StreamEntityState : public EntityState { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think "Stream" is not really the right word here since they both send streams to the subscribers. Maybe something like "CappedEntityState" vs "BufferedEntityState"?
Also, could you update the comment to explain what happens when we exceed the cap and to explain when Basic vs Streamed should be used/why we have two different kinds?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. Renamed to CappedEntityState
and added comment on their differences.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice find! It looks good, I just left some comments about naming and documentation.
@@ -90,24 +101,32 @@ const absl::flat_hash_map<SubscriberID, SubscriberState *> &EntityState::Subscri | |||
return subscribers_; | |||
} | |||
|
|||
SubscriptionIndex::SubscriptionIndex(rpc::ChannelType channel_type) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I feel it's a little bit wired to put channel_type into SubscriptionIndex. it seems that it's only used to construct EntityState. We don't need to store it inside channel_type_
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another thing I feel bad about is that it's an application layer decision, but here it's hardcoded in the infra layer. I'm wondering whether we can make it better?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
synced offline. I'm good with this right now since it's still maintainable.
Why are these changes needed?
In a1e06f6, memory bound was added for each subscribed entity in the publisher. It adds two extra
std::deque
per subscribed entity, which turns out to cost a lot more memory when there are a large number ofObjectRef
s: #23853 (comment)This PR avoids the extra memory usage for entities in channels unlikely to grow too large, i.e. all channels except those for logs and error info. Subscribed entity memory usage no longer shows up in the memory profile when there are 1M object refs:
Raw data: profile006.pb.gz
Related issue number
#23604
Checks
scripts/format.sh
to lint the changes in this PR.