Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[scheduling][5] Refactor resource syncer. #23270

Merged
merged 79 commits into from
Mar 30, 2022

Conversation

fishbone
Copy link
Contributor

@fishbone fishbone commented Mar 17, 2022

Why are these changes needed?

This PR refactor the resource syncer to decouple it from GCS and raylet. GCS and raylet will use the same module to sync data. The integration will happen in the next PR.

There are several new introduced components:

  • RaySyncer: the place where remote and local information sits. It's a coordinator layer.
  • NodeState: keeps track of the local status, similar to NodeSyncConnection.
  • NodeSyncConnection: keeps track of the sending and receiving information and make sure not sending the information the remote node knows.

The core protocol is that each node will send {what it has} - {what the target has} to the target.
For example, think about node A <-> B. A will send all A has exclude what B has to B.

Whenever when there is new information (from NodeState or NodeSyncConnection), it'll be passed to RaySyncer broadcast message to broadcast.

NodeSyncConnection is for the communication layer. It has two implementations Client and Server:

  • Server => Client: client will send a long-polling request and server will response every 100ms if there is data to be sent.
  • Client => Server: client will check every 100ms to see whether there is new data to be sent. If there is, just use RPC call to send the data.

Here is one example:

flowchart LR;
    A-->B;
    B-->C;
    B-->D;
Loading

It means A initialize the connection to B and B initialize the connections to C and D

Now C generate a message M:

  1. [C] RaySyncer check whether there is new message generated in C and get M
  2. [C] RaySyncer will push M to NodeSyncConnection in local component (B)
  3. [C] ServerSyncConnection will wait until B send a long polling and send the data to B
  4. [B] B received the message from C and push it to local sync connection (C, A, D)
  5. [B] ClientSyncConnection of C will not push it to its local queue since it's received by this channel.
  6. [B] ClientSyncConnection of D will send this message to D
  7. [B] ServerSyncConnection of A will be used to send this message to A (long-polling here)
  8. [B] B will update NodeState (local component) with this message M
  9. [D] D's pipelines is similar to 5) (with ServerSyncConnection) and 8)
  10. [A] A's pipeline is similar to 5) and 8)

Related issue number

Checks

  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • Unit tests
    • Release tests
    • This PR is not tested :(

@fishbone fishbone changed the title Refactor syncer [wip][scheduling][5] Refactor resource syncer. Mar 17, 2022
@fishbone fishbone marked this pull request as ready for review March 17, 2022 07:42
@fishbone
Copy link
Contributor Author

Still working on writing the test cases and toy example. Given that this PR will be too big, the actual integration will happen in the next PR.

src/ray/common/ray_syncer.h Outdated Show resolved Hide resolved
src/ray/common/ray_syncer.h Outdated Show resolved Hide resolved
src/ray/common/ray_syncer.h Outdated Show resolved Hide resolved
src/ray/common/ray_syncer.h Outdated Show resolved Hide resolved
Copy link
Contributor Author

@fishbone fishbone left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Synced offline with @scv119 and the following concerns should be addressed in the updates:

  • We should put local node status into a separate class and provide Updates to update local status. (cluster_view_/reporters_/receivers_)
  • Consider using upstream/downstream model.

@fishbone
Copy link
Contributor Author

fishbone commented Mar 18, 2022

Several improvements to address the comments.

  • Local node info is put into a separate structure
  • Hide all roll-related behavior to the communication layer
  • Move internal API to another header file to make the public API clean.

The next step:

  1. Re-iterate the API with @scv119
  2. Make toy example working
  3. More unit tests
  4. Make the PR ready for formal review and collect feedback from more people
  5. (following PR) integrate with gcs and raylet.

// The component this message is for.
RayComponentId component_id = 2;
// The actual payload.
bytes sync_message = 3;
Copy link
Contributor

@scv119 scv119 Mar 18, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will this incur one more string copy? we probably can benchmark it. if it's a concern we can see if union makes it better.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have one extra copy to avoid one extra serialization when sending data.

Furthermore, I think in the future, the component actually can use flatbuffer to optimize for performance if needed.

Copy link
Collaborator

@jjyao jjyao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Didn't finish, will continue.

src/ray/common/ray_syncer/ray_syncer-inl.h Show resolved Hide resolved

RaySyncer &instance_;
instrumented_io_context &io_context_;
std::string node_id_;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is local_node_id or target_node_id?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is the node id of the connection. do you think the target node makes sense here?

This comment was marked as resolved.

Comment on lines 161 to 162
/// It'll send nothing unless there is a request from the remote node
/// for the sending request.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it. Maybe make the comment a bit clearer: like "send nothing unless there is a long-polling request ..."?

src/ray/common/ray_syncer/ray_syncer-inl.h Outdated Show resolved Hide resolved
src/ray/common/ray_syncer/ray_syncer.cc Outdated Show resolved Hide resolved
src/ray/common/ray_syncer/ray_syncer.cc Outdated Show resolved Hide resolved

absl::flat_hash_set<std::shared_ptr<const RaySyncMessage>, _MessageHash, _MessageEq>
sending_queue_;
// Keep track of the versions of components in this node.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm still a bit confused: is this about the latest version the local node has received or sent?

@fishbone
Copy link
Contributor Author

Could you update the comment then? The comment is saying handling updates send from this node.

Here this node means the node associated with the connection.

@fishbone
Copy link
Contributor Author

fishbone commented Mar 28, 2022

@jjyao I think you mistake the connection and the local node here. Connection is about the node connected to the local node. All node_id in the connection is about the node connected to the local node.

We went through the code in the morning. Do you mind revisiting your comment and trying to see which one still makes sense?

src/ray/common/ray_syncer/ray_syncer-inl.h Show resolved Hide resolved

RaySyncer &instance_;
instrumented_io_context &io_context_;
std::string node_id_;

This comment was marked as resolved.


absl::flat_hash_set<std::shared_ptr<const RaySyncMessage>, _MessageHash, _MessageEq>
sending_queue_;
// Keep track of the versions of components in this node.

This comment was marked as resolved.

if (node_versions[message.component_id()] < message.version()) {
node_versions[message.component_id()] = message.version();
}
instance_.BroadcastMessage(std::make_shared<RaySyncMessage>(std::move(message)));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to broadcast in the else case? Doesn't that mean we received an old message that should be discarded?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with you.

src/ray/common/ray_syncer/ray_syncer.h Show resolved Hide resolved
@fishbone
Copy link
Contributor Author

Thanks, @jjyao for the comments. I'll update the PR.

Copy link
Collaborator

@jjyao jjyao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

if (!message) {
continue;
}
if (upward_only_[message->component_id()] && !is_upward_conn) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the example of upward_only component?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I put it in the Register comments.

}

bool NodeState::ConsumeMessage(std::shared_ptr<const RaySyncMessage> message) {
auto &current = cluster_view_[message->node_id()][message->component_id()];
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to check whether message->node_id() == node_id and skip if so?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do it on purpose. 1) if there is a new connection, sending the snapshot will be easy to do. 2) receiver might want to handle the local updates and this gives a way for the local reporter communicates to the local receiver.

src/ray/common/ray_syncer/ray_syncer.cc Outdated Show resolved Hide resolved
src/ray/common/ray_syncer/ray_syncer.h Outdated Show resolved Hide resolved
src/ray/common/ray_syncer/ray_syncer.h Outdated Show resolved Hide resolved
src/ray/common/ray_syncer/ray_syncer.h Show resolved Hide resolved
cluster_view_;
};

class NodeSyncConnection {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

think again about this NodeSyncConnection. If we are using it to simulating streaming, should it have following APIs?

class NodeSyncConnection {
  // Receives stream of responses from remote node. 
  using ResponseObserver = std::function<void(Response)>;
  public:
   NodeSyncConnection(ResponseObserver response_observer);
   // Sends requests to remote node.
   bool SendRequest(Request request);
};

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@scv119 even with streaming, we still need Server and Client concept here. The one who initializes the work is client.

Here the difference is: 1) initialization; 2) how to send; 3) how to receive. All these are communication layer things.

Put them into NodeSyncConnection gives me the feeling that it's not a good abstraction.

Now it's like:

  • NodeSyncConnection -> application layer filtering
  • Server/ClientSyncConneciton-> Handling how to do communication.

Copy link
Contributor

@scv119 scv119 Mar 29, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm could you hide them under the NodeSyncConnection interface/abstract class?

class NodeSyncConnection {
  // Receives stream of responses from remote node. 
  using ResponseObserver = std::function<void(Response)>;
  public:
   NodeSyncConnection(ResponseObserver response_observer);
   // Sends requests to remote node.
   bool SendRequest(Request request);
};

class ServerSyncConnection : public NodeSyncConnection {
 public:
   ServerSyncConnection(ResponseObserver response_observer);
   bool SendRequest(Request request) override;
};

class ClientSyncConnection : public NodeSyncConnection {
 public:
   ClientSyncConnection(ResponseObserver response_observer);
   bool SendRequest(Request request) override;
};

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's not changing anything.

  • PushToSendingQueue is SendRequest
  • ReceiveUpdates is ResponseObserver

Btw, the only different part is about communication, what to do when we receive the updates is the same.

And we still end up having two subclasses of NodeSyncConnection.

Once, thing I think we probably can do is DoSend is a callback. But is this better than the current implementation?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just updated it. I think it looks better than before.

@fishbone
Copy link
Contributor Author

@scv119 @rkooo567 could you give another review for this PR?

@fishbone
Copy link
Contributor Author

Discussed with @scv119 offline and I'll move ray syncer out of NodeSyncConnection.

@fishbone
Copy link
Contributor Author

Mac is hanging forever ... and the failure is not related.

@fishbone fishbone merged commit 781c46a into ray-project:master Mar 30, 2022
@fishbone
Copy link
Contributor Author

Thanks @scv119 @jjyao @rkooo567 who help reviewed this PR. I feel along with the discussion and iterating, the PR has been better than the first one. Really appreciate your time!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants