Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

*: support dynamic tso service #8517

Open
wants to merge 24 commits into
base: master
Choose a base branch
from
Open

Conversation

rleungx
Copy link
Member

@rleungx rleungx commented Aug 12, 2024

What problem does this PR solve?

Issue Number: ref #8477

What is changed and how does it work?

Check List

Tests

  • Unit test

Release note

None.

@ti-chi-bot ti-chi-bot bot added do-not-merge/needs-linked-issue release-note-none Denotes a PR that doesn't merit a release note. dco-signoff: yes Indicates the PR's author has signed the dco. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Aug 12, 2024
@ti-chi-bot ti-chi-bot bot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Aug 15, 2024
@@ -660,7 +660,7 @@ func (c *client) Close() {
}
}

func (c *client) setServiceMode(newMode pdpb.ServiceMode) {
func (c *client) setServiceMode(newMode pdpb.ServiceMode, skipSameMode bool) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Prefer using a more straightforward word.

Suggested change
func (c *client) setServiceMode(newMode pdpb.ServiceMode, skipSameMode bool) {
func (c *client) setServiceMode(newMode pdpb.ServiceMode, force bool) {

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not the same as force.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not skipSameMode always?

}
errMsg := err.Error()
return strings.Contains(errMsg, "not found tso address") ||
strings.Contains(errMsg, "maximum number of retries exceeded")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This error would also occur when the leadership cannot be elected. In which case will this be a misjudgment?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't check this error on client side, do you know about the reason?

@@ -406,6 +406,8 @@ func TestTSOFollowerProxyWithTSOService(t *testing.T) {
backendEndpoints := pdLeaderServer.GetAddr()
tsoCluster, err := tests.NewTestTSOCluster(ctx, 2, backendEndpoints)
re.NoError(err)
// let service discovery know the TSO service
time.Sleep(500 * time.Millisecond)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can it be replaced with an Eventually?

Copy link

codecov bot commented Aug 21, 2024

Codecov Report

Attention: Patch coverage is 67.07317% with 54 lines in your changes missing coverage. Please review.

Project coverage is 77.48%. Comparing base (5ba4db7) to head (63c78f8).
Report is 3 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #8517      +/-   ##
==========================================
- Coverage   77.53%   77.48%   -0.05%     
==========================================
  Files         474      474              
  Lines       62355    62469     +114     
==========================================
+ Hits        48345    48404      +59     
- Misses      10437    10485      +48     
- Partials     3573     3580       +7     
Flag Coverage Δ
unittests 77.48% <67.07%> (-0.05%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

@rleungx rleungx force-pushed the dynamic-switch branch 2 times, most recently from a36ad5d to f2c0c14 Compare August 23, 2024 07:38
@ti-chi-bot ti-chi-bot bot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. and removed size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Aug 26, 2024
@ti-chi-bot ti-chi-bot bot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Aug 27, 2024
@@ -38,6 +38,16 @@ func IsLeaderChange(err error) bool {
strings.Contains(errMsg, NotPrimaryErr)
}

// IsServiceModeChange will determine whether there is a service mode change.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// IsServiceModeChange will determine whether there is a service mode change.
// IsServiceModeChange determines whether there is a service mode change.

if err != nil {
if needRetry := handleStreamError(err); needRetry {
continue
if s.forwardToTSOService() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can reduce some of the indentation

Suggested change
if s.forwardToTSOService() {
if !s.forwardToTSOService() {
return s.tsoAllocatorManager.HandleRequest(ctx, tso.GlobalDCLocation, 1)
}
request := xxxx
.....

@@ -569,6 +590,72 @@ func (s *GrpcServer) Tso(stream pdpb.PD_TsoServer) error {
continue
}

if s.forwardToTSOService() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe wrap three situation(local tso/ms tso/normal) to three functions. Just a suggestion.

Comment on lines 410 to 419
if !c.IsServiceIndependent(constant.TSOServiceName) {
// leader tso service exit, tso independent service provide tso
c.tsoAllocator.ResetAllocatorGroup(tso.GlobalDCLocation, true)
}
if !c.IsServiceIndependent(constant.TSOServiceName) {
log.Info("TSO server starts to provide timestamp")
}
c.SetServiceIndependent(constant.TSOServiceName)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if !c.IsServiceIndependent(constant.TSOServiceName) {
// leader tso service exit, tso independent service provide tso
c.tsoAllocator.ResetAllocatorGroup(tso.GlobalDCLocation, true)
}
if !c.IsServiceIndependent(constant.TSOServiceName) {
log.Info("TSO server starts to provide timestamp")
}
c.SetServiceIndependent(constant.TSOServiceName)
if !c.IsServiceIndependent(constant.TSOServiceName) {
// leader tso service exit, tso independent service provide tso
c.tsoAllocator.ResetAllocatorGroup(tso.GlobalDCLocation, true)
log.Info("TSO server starts to provide timestamp")
}
c.SetServiceIndependent(constant.TSOServiceName)

@@ -390,24 +397,84 @@ func (c *RaftCluster) checkServices() {
}
}

// checkTSOService checks the TSO service.
func (c *RaftCluster) checkTSOService() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add more comments for this function and inside this function. Seems there are too many situations in this function


ctx, cancel := context.WithCancel(c.ctx)
defer cancel()
ticker := time.NewTicker(serviceModeUpdateInterval)
failpoint.Inject("fastUpdateServiceMode", func() {
ticker.Stop()
ticker = time.NewTicker(10 * time.Millisecond)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ticker.Reset()?

client/client.go Outdated
@@ -713,6 +712,7 @@ func (c *client) resetTSOClientLocked(mode pdpb.ServiceMode) {
log.Warn("[pd] intend to switch to unknown service mode, just return")
return
}
// Replace the old TSO client.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

duplicate

@@ -660,7 +660,7 @@ func (c *client) Close() {
}
}

func (c *client) setServiceMode(newMode pdpb.ServiceMode) {
func (c *client) setServiceMode(newMode pdpb.ServiceMode, skipSameMode bool) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not skipSameMode always?

@ti-chi-bot ti-chi-bot bot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. and removed size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Sep 2, 2024
Copy link
Contributor

ti-chi-bot bot commented Sep 6, 2024

PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Signed-off-by: Ryan Leung <[email protected]>
Signed-off-by: Ryan Leung <[email protected]>
Signed-off-by: Ryan Leung <[email protected]>
Signed-off-by: Ryan Leung <[email protected]>
Signed-off-by: Ryan Leung <[email protected]>
Signed-off-by: Ryan Leung <[email protected]>
Signed-off-by: Ryan Leung <[email protected]>
Signed-off-by: Ryan Leung <[email protected]>
Signed-off-by: Ryan Leung <[email protected]>
Signed-off-by: Ryan Leung <[email protected]>
Signed-off-by: Ryan Leung <[email protected]>
Signed-off-by: Ryan Leung <[email protected]>
Signed-off-by: Ryan Leung <[email protected]>
Signed-off-by: Ryan Leung <[email protected]>
Signed-off-by: Ryan Leung <[email protected]>
Signed-off-by: Ryan Leung <[email protected]>
Signed-off-by: Ryan Leung <[email protected]>
Signed-off-by: Ryan Leung <[email protected]>
@ti-chi-bot ti-chi-bot bot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. and removed size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. do-not-merge/needs-linked-issue labels Sep 29, 2024
@rleungx rleungx removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Sep 29, 2024
Signed-off-by: Ryan Leung <[email protected]>
Copy link
Contributor

ti-chi-bot bot commented Sep 29, 2024

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please ask for approval from rleungx and additionally assign yudongusa for approval(Please ensuring that each of them provides their approval before proceeding). For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@ti-chi-bot ti-chi-bot bot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Sep 29, 2024
Signed-off-by: Ryan Leung <[email protected]>
@ti-chi-bot ti-chi-bot bot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. and removed size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Sep 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
dco-signoff: yes Indicates the PR's author has signed the dco. release-note-none Denotes a PR that doesn't merit a release note. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants