Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support receving health feedback #1153

Merged
merged 23 commits into from
Feb 28, 2024

Conversation

MyonKeminta
Copy link
Contributor

Signed-off-by: MyonKeminta <[email protected]>
Signed-off-by: MyonKeminta <[email protected]>
c.forEachStore(func(store *Store) {
store.updateSlowScoreStat()
slowScoreMetrics[store.storeID] = float64(store.getSlowScore())
store.healthStatus.update()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to make sure the update run as fast as possible, otherwise storeMu will be locked for a long time. Expensive operations need to be processed outside (maybe iterate stores by batch in the future). It's OK by now.

})
for store, score := range slowScoreMetrics {
metrics.TiKVStoreSlowScoreGauge.WithLabelValues(strconv.FormatUint(store, 10)).Set(score)
logutil.BgLogger().Info("checkAndUpdateStoreHealthStats: get health details", zap.Reflect("details", healthDetails))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would this log be annoying?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I added this for debugging.

@@ -108,6 +108,13 @@ type Client interface {
CloseAddr(addr string) error
// SendRequest sends Request.
SendRequest(ctx context.Context, addr string, req *tikvrpc.Request, timeout time.Duration) (*tikvrpc.Response, error)
// SetEventListener registers an event listener for the Client instance. If called more than once, the previously
// set one will be replaced.
SetEventListener(listener ClientEventListener)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we just declare that the method is not thread safe and should be called before SendRequest so that we can get rid of atomic.Pointer and make related code simpler?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but I'm afraid that the assumption might be broken easily by mistake in the future...

store.recordHealthFeedback(feedback)
}

func (c *RegionCache) GetClientEventListener() client.ClientEventListener {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe document that it will be registered to the internal client of KVStore automatically when calling tikv.NewKVStore.

@MyonKeminta MyonKeminta marked this pull request as ready for review February 27, 2024 03:41
@cfzjywxk
Copy link
Contributor

@crazycs520 PTAL

Signed-off-by: MyonKeminta <[email protected]>
@MyonKeminta
Copy link
Contributor Author

As I noticed that the interface compatibility problem of unistore can be workarounded by unistoreClientWrapper, It's actually not necessary to split into multiple PRs. cc @zyguan

Comment on lines 2582 to 2583
tikvSlowScoreDecayRate = 20. / 60. // s^(-1), linear decaying
tikvSlowScoreSlowThreshold = 80.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
tikvSlowScoreDecayRate = 20. / 60. // s^(-1), linear decaying
tikvSlowScoreSlowThreshold = 80.
tikvSlowScoreDecayRate float64 = 20.0 / 60.0 // s^(-1), linear decaying
tikvSlowScoreSlowThreshold float64 = 80.0

@@ -3164,6 +3339,10 @@ func (s *Store) recordReplicaFlowsStats(destType replicaFlowsType) {
atomic.AddUint64(&s.replicaFlowsStats[destType], 1)
}

func (s *Store) recordHealthFeedback(feedback *tikvpb.HealthFeedback) {
s.healthStatus.updateTiKVServerSideSlowScore(int64(feedback.GetSlowScore()), time.Now())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

feedback.FeedbackSeqNo is never used now?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it's not checked for now. Maybe I'd better add a comment to note it.

Copy link
Contributor

@crazycs520 crazycs520 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

REST LGTM

@zyguan zyguan mentioned this pull request Feb 28, 2024
8 tasks
return
}

// TODO: Try to get store status from PD here. But it's not mandatory.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The function would be executed inside the batchRecvLoop, would it affect the performance if we call pdClient.func here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No. This function is not expected to be run in batchRecvLoop, but should be a periodic task executed in the background.
The function that might be called inside batchRecvLoop is updateTiKVServerSideSlowScore.

Here what we need to be careful about is the mutex.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it, it's better to add comments to functions that would be used in the critical performance path.

@cfzjywxk cfzjywxk merged commit 03bbadb into tikv:master Feb 28, 2024
9 of 10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants