Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce Follower Replication #33

Open
wants to merge 9 commits into
base: master
Choose a base branch
from

Conversation

Fullstop000
Copy link
Member

@Fullstop000 Fullstop000 commented Nov 13, 2019

Relate to tikv/raft-rs#249
Signed-off-by: Fullstop000 [email protected]

# Follower Replication

## Summary
This RFC introduces a new mechanism in Raft Protocol which allows a follower to send raft logs to other followers and learners. The target of this feature is to reduce network transmission costs between different data centers in Log Replication.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this can also reduce the pressure of the leader when it has many followers or learners.

Signed-off-by: Fullstop000 <[email protected]>
Signed-off-by: Fullstop000 <[email protected]>
@siddontang
Copy link
Contributor

can you paste your previous benchmark results here, so others can see the benefit intuitively.

Signed-off-by: Fullstop000 <[email protected]>
@Fullstop000
Copy link
Member Author

@siddontang I'll make a more concrete benchmark for this. Some previous results are hard to explain why. But I'm ok to post them here :).

Signed-off-by: Fullstop000 <[email protected]>
@Hoverbear Hoverbear added the Initial Comment Period This RFC is in the initial comment period, and has quite some time to give input on. label Nov 15, 2019
2. The progress state should be `Replicate` but not `paused`
3. The progress has the smallest `match_index`

3. If no delegate is picked, the leader does Log Replication itself. Especially, if a group contains the leader it self, no delegate will be set by default except in some cases such as massively large group, which is able to be controlled by upper layer.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how do we do by the upper layer for the group which contains the leader?

Copy link
Member Author

@Fullstop000 Fullstop000 Nov 18, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's unnecessary for the upper layer to acknowledge which group contains the leader because only the leader can choose delegates. And the leader itself must know which group it belongs to.

The description here might be somewhat confused. which is able to be controlled by upper layer means that the upper layer can decide whether the leader can choose a delegate in the group itself belongs to or not. I'll make it more clear.

Signed-off-by: Fullstop000 <[email protected]>
* update for the implementation is changed

Signed-off-by: qupeng <[email protected]>

* address comments

Signed-off-by: qupeng <[email protected]>
@Hoverbear Hoverbear added Final Comment Period This RFC is in the final comment period, and has a limited amount of time to give input on. and removed Initial Comment Period This RFC is in the initial comment period, and has quite some time to give input on. labels Jan 20, 2020
Copy link

@abbccdda abbccdda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the detailed design. Could we have a compatibility section, to discuss things such as some nodes in a group are still in old version, and could not recognize a delegate's request for entry replication?

@Fullstop000
Copy link
Member Author

@abbccdda A node in a group will send its group_id in msg and the receiver will update the sender's group info based on it. The leader only picks a delegate when the group info is enough. In a rolling-upgrade/downgrade situation, This can introduce several cases:

Upgrade

  • If only the leader uses follower replication, it can only know the group info until followers finish upgrading and send their group_id so that the leader uses origin log replication at this point
  • If only a follower or part of them use follower replication, the leader will just ignore the group_id in the msg so no delegate will be picked and origin log replication keeps processing

Downgrade

  • If only the leader use origin log replication, the case is just like common raft cluster and nothing special happens (pick a delegate, broadcast appends)
  • If a follower is downgraded and stop informing the leader its group_id, the leader will remove it from the group system and send entries directly to it

By such a design, the compatibility can be guaranteed when nodes in the cluster use either origin log replication or follower replication

It seems this feature description is missing in the RFC. I'll add it soon.

@abbccdda
Copy link

@Fullstop000 Thanks for the reply. It would be good to add such details to RFC for sure :)


There are four key concepts of design:

- Every peer in a Raft group is associated with a `group_id`.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was actually thinking whether we could make the delegation group support native in the first version. Suppose we want to form another delegation group in runtime, we need to change the static configs and do the cluster rolling bounce.

While it makes sense for the first version to focus on static configs, we may also propose runtime formulation of delegation group through third-party control such as PD. When the leader load goes up, it makes sense to distribute its load to other up-to-date followers. We could briefly talk about how we could keep the door open for such a future improvement, with the first version only supporting static group_id.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea! Actually it's the first design of how the leader manages all the group info.

Signed-off-by: Fullstop000 <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Final Comment Period This RFC is in the final comment period, and has a limited amount of time to give input on.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants