Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DNM] support GetRegion via CSE sync_region API #745

Open
wants to merge 59 commits into
base: master
Choose a base branch
from

Conversation

iosmanthus
Copy link
Member

No description provided.

internal/locate/cse.go Outdated Show resolved Hide resolved
internal/locate/cse.go Outdated Show resolved Hide resolved
@iosmanthus iosmanthus changed the title WIP: support GetRegion via CSE sync_region API support GetRegion via CSE sync_region API Apr 6, 2023
Signed-off-by: iosmanthus <[email protected]>
Signed-off-by: iosmanthus <[email protected]>
Signed-off-by: iosmanthus <[email protected]>

func ifMostFailures(counts gobreaker.Counts) bool {
failureRatio := float64(counts.TotalFailures) / float64(counts.Requests)
return counts.Requests >= 5 && failureRatio >= 0.4
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

5 and 0.4 should be a constant to improve readability. BTW, is it necessary to change them to configuration?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could provide an option for Fallback and CSEClient, with a default ReadyToTrip.

_ pd.Client = &Fallback{}
)

type Fallback struct {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about changing Fallback to ClientWithFallback?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make sense.

func probePD(name string, client pd.Client, timeout time.Duration) error {
ctx, cancel := context.WithTimeout(context.Background(), timeout)
defer cancel()
_, err := client.GetRegionByID(ctx, 1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PD Client has a goutine to check leader status. Maybe we can use it, like this.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I try to probe PD with GetRegionByID because the health status of PD is hard to define since there will be more services in the PD microservice. We just probe the PD with the API we care about, a.k.a the region meta service.


var (
StoresRefreshInterval = time.Second * 5
SyncRegionTimeout = time.Second * 2
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about time.Second * 10?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This duration might be too long for the newly created cluster.

// Close the idle connections.
c.httpClient.CloseIdleConnections()
// Close the stores refresh goroutine.
c.done <- struct{}{}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about close(c.done) ? That will not block the current goroutinue.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will not block the current goroutine since the channel is init with: done: make(chan struct{}, 1)

Name: fmt.Sprintf("store-%d", s.GetId()),
Interval: 5 * time.Second,
Timeout: 1 * time.Second,
ProbeInterval: 1 * time.Second,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

5, 1 and 1 should be a constant to improve readability


type ClientWithFallback struct {
pd.Client
cse pd.Client
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is it called cse

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is it called cse

This field tends to store the CSEClient, which is an implementation of the interface PDClient.

Copy link
Contributor

@zeminzhou zeminzhou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM~

rleungx and others added 30 commits May 17, 2023 12:07
Co-authored-by: MyonKeminta <[email protected]>
Co-authored-by: disksing <[email protected]>
Co-authored-by: MyonKeminta <[email protected]>
Co-authored-by: Violin <[email protected]>
Co-authored-by: Smilencer <[email protected]>
Co-authored-by: you06 <[email protected]>
Co-authored-by: Hu# <[email protected]>
Co-authored-by: Connor <[email protected]>
Co-authored-by: zyguan <[email protected]>
fix case typo in comment. (#778)
fix goroutine leak (#784)
fix TestRURuntimeStatsCleanUp (#787)
Fix wrong resource group name for some requests (#788)
resolver: support verifying primary for check_txn_status (#777)
resolver: handle pessimistic locks in BatchResolveLocks (#794)
resolved ts  (#793)
ResolveLocks for unistore (#807)
* support remote coprocessor
Co-authored-by: disksing <[email protected]>
Co-authored-by: David <[email protected]>
Co-authored-by: Hu# <[email protected]>
Co-authored-by: Weizhen Wang <[email protected]>
Co-authored-by: you06 <[email protected]>
Co-authored-by: ShuNing <[email protected]>
Co-authored-by: Hu# <[email protected]>
Co-authored-by: zzm <[email protected]>
Co-authored-by: Yongbo Jiang <[email protected]>
Co-authored-by: crazycs <[email protected]>
Co-authored-by: glorv <[email protected]>
Co-authored-by: zyguan <[email protected]>
ResolvedTS error just write in debug log (#814)
ResolvedTS error just write in debug log (#825)
fix ci (#835)
fix rpc interceptor data race (#845)
resolver: let getTxnStatusFromLock return error when backoff timeout (#847)
Co-authored-by: disksing <[email protected]>
Co-authored-by: David <[email protected]>
Co-authored-by: Hu# <[email protected]>
Co-authored-by: Weizhen Wang <[email protected]>
Co-authored-by: you06 <[email protected]>
Co-authored-by: ShuNing <[email protected]>
Co-authored-by: Hu# <[email protected]>
Co-authored-by: zzm <[email protected]>
Co-authored-by: Yongbo Jiang <[email protected]>
Co-authored-by: crazycs <[email protected]>
Co-authored-by: glorv <[email protected]>
Co-authored-by: zyguan <[email protected]>
ResolvedTS error just write in debug log (#814)
ResolvedTS error just write in debug log (#825)
fix ci (#835)
fix rpc interceptor data race (#845)
resolver: let getTxnStatusFromLock return error when backoff timeout (#847)
* client-go: add some key range info to error when PD returned no region (#862)

Signed-off-by: Chao Wang <[email protected]>

* *: refine non-global stale-read request retry logic (#863)

Signed-off-by: crazycs520 <[email protected]>

* Fix the issue that primary pessimistic lock may be left not cleared after GC (#866)

* Fix the issue that primary pessimistic lock may be left not cleared after GC

Signed-off-by: MyonKeminta <[email protected]>

* Fix mysteriously shown up thing that makes compilation failed

Signed-off-by: MyonKeminta <[email protected]>

* Fix test effectiveness (forgot to set txn2 to pessimistic txn); add more strict checks

Signed-off-by: MyonKeminta <[email protected]>

* Address comments

Signed-off-by: MyonKeminta <[email protected]>

---------

Signed-off-by: MyonKeminta <[email protected]>
Co-authored-by: MyonKeminta <[email protected]>

* add explicit request source type to label the external request like lightning/br (#868)

Signed-off-by: nolouch <[email protected]>

* use '%d' instead of '%q' for some int values in error message (#875)

Signed-off-by: Chao Wang <[email protected]>

* format key in error message in method `scanRegions` (#876)

Signed-off-by: Chao Wang <[email protected]>

* make cop request timeout a config paramter (#865)

* update

Signed-off-by: Spade A <[email protected]>

* update

Signed-off-by: Spade A <[email protected]>

* update

Signed-off-by: Spade A <[email protected]>

* update

Signed-off-by: Spade A <[email protected]>

---------

Signed-off-by: Spade A <[email protected]>

* region_cache: support check pending tiflash peer (#821)

Signed-off-by: guo-shaoge <[email protected]>
Co-authored-by: disksing <[email protected]>

* *: add `SnapshotIterReverse` and make `iterReverse` supports `lowerBound` (#883)

Signed-off-by: Jason Mo <[email protected]>

* *: fix stale read ops metric (#878) (#889)

Signed-off-by: crazycs520 <[email protected]>
Co-authored-by: disksing <[email protected]>

* add gc options (#828)

Signed-off-by: weedge <[email protected]>
Co-authored-by: disksing <[email protected]>

* reload region cache when store is resolved from invalid status (#843)

Signed-off-by: you06 <[email protected]>
Co-authored-by: disksing <[email protected]>

* ci: update setup-go action (#904)

Signed-off-by: disksing <[email protected]>

* fix unexpected slow query during GC running after stop 1 tikv-server (#899) (#909)

* fix unexpected slow query during GC running after stop 1 tikv-server

Signed-off-by: crazycs520 <[email protected]>

* fix test

Signed-off-by: crazycs520 <[email protected]>

---------

Signed-off-by: crazycs520 <[email protected]>

* resource_manager: ignore ru metrics for background request (#872)

Signed-off-by: husharp <[email protected]>
Co-authored-by: disksing <[email protected]>

* add more log for diagnose (#915)

* add more log for diagnose

Signed-off-by: crazycs520 <[email protected]>

* fix

Signed-off-by: crazycs520 <[email protected]>

* add more log for diagnose

Signed-off-by: crazycs520 <[email protected]>

* add more log

Signed-off-by: crazycs520 <[email protected]>

* address comment

Signed-off-by: crazycs520 <[email protected]>

---------

Signed-off-by: crazycs520 <[email protected]>

* use context logger as much as possible (#908)

* use context logger as much as possible

Signed-off-by: crazycs520 <[email protected]>

* refine

Signed-off-by: crazycs520 <[email protected]>

---------

Signed-off-by: crazycs520 <[email protected]>

* Resume max retry time check for stale read retry with leader option(#903) (#911)

* Resume max retry time check for stale read retry with leader option

Signed-off-by: cfzjywxk <[email protected]>

* add cancel

Signed-off-by: cfzjywxk <[email protected]>

---------

Signed-off-by: cfzjywxk <[email protected]>

* request_source: remove default label (#890)

* request_source: remove default label

Signed-off-by: nolouch <[email protected]>

* add a function to set request source task type (#925)

* add a function to set request source task type

Signed-off-by: glorv <[email protected]>

* ci: update go version (#936)

* ci: update go version

Signed-off-by: crazycs520 <[email protected]>

* fix test

Signed-off-by: crazycs520 <[email protected]>

---------

Signed-off-by: crazycs520 <[email protected]>

* use tidb_kv_read_timeout as first kv request timeout (#919)

* support tidb_kv_read_timeout as first round kv request timeout

Signed-off-by: crazycs520 <[email protected]>

* fix ci

Signed-off-by: crazycs520 <[email protected]>

* fix ci

Signed-off-by: crazycs520 <[email protected]>

* fix ci

Signed-off-by: crazycs520 <[email protected]>

* fix ci

Signed-off-by: crazycs520 <[email protected]>

* fix ci

Signed-off-by: crazycs520 <[email protected]>

* update comment

Signed-off-by: crazycs520 <[email protected]>

* refine test

Signed-off-by: crazycs520 <[email protected]>

---------

Signed-off-by: crazycs520 <[email protected]>

* [pick] resource_control: bypass some internal urgent request (#938)

* resource_control: bypass some internal urgent request (#884)

Signed-off-by: nolouch <[email protected]>

* resourcecontrol: fix nil pointer (#900)

Signed-off-by: nolouch <[email protected]>

---------

Signed-off-by: nolouch <[email protected]>

---------

Signed-off-by: Chao Wang <[email protected]>
Signed-off-by: crazycs520 <[email protected]>
Signed-off-by: MyonKeminta <[email protected]>
Signed-off-by: nolouch <[email protected]>
Signed-off-by: Spade A <[email protected]>
Signed-off-by: guo-shaoge <[email protected]>
Signed-off-by: Jason Mo <[email protected]>
Signed-off-by: weedge <[email protected]>
Signed-off-by: you06 <[email protected]>
Signed-off-by: disksing <[email protected]>
Signed-off-by: husharp <[email protected]>
Signed-off-by: cfzjywxk <[email protected]>
Signed-off-by: glorv <[email protected]>
Signed-off-by: iosmanthus <[email protected]>
Co-authored-by: 王超 <[email protected]>
Co-authored-by: crazycs <[email protected]>
Co-authored-by: MyonKeminta <[email protected]>
Co-authored-by: MyonKeminta <[email protected]>
Co-authored-by: ShuNing <[email protected]>
Co-authored-by: Spade  A <[email protected]>
Co-authored-by: guo-shaoge <[email protected]>
Co-authored-by: disksing <[email protected]>
Co-authored-by: Hangjie Mo <[email protected]>
Co-authored-by: weedge <[email protected]>
Co-authored-by: you06 <[email protected]>
Co-authored-by: Hu# <[email protected]>
Co-authored-by: cfzjywxk <[email protected]>
Co-authored-by: glorv <[email protected]>
…1032)

* ru detail

Signed-off-by: zzm <[email protected]>

* remove unused code

Signed-off-by: zzm <[email protected]>

* reduce waitgroup

Signed-off-by: zzm <[email protected]>

* fix ut

Signed-off-by: zzm <[email protected]>

* make lint

Signed-off-by: zzm <[email protected]>

* fix ci

Signed-off-by: zzm <[email protected]>

---------

Signed-off-by: zzm <[email protected]>
cloud-storage-engine has larger region size, we don't want to split regions into 32MB size on large transaction write.

Signed-off-by: Evan Zhou <[email protected]>
Co-authored-by: cfzjywxk <[email protected]>
Co-authored-by: cfzjywxk <[email protected]>
Co-authored-by: disksing <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: zzm <[email protected]>
Co-authored-by: husharp <[email protected]>
Co-authored-by: you06 <[email protected]>
Co-authored-by: buffer <[email protected]>
Co-authored-by: 3pointer <[email protected]>
Co-authored-by: buffer <[email protected]>
Co-authored-by: husharp <[email protected]>
Co-authored-by: crazycs520 <[email protected]>
Co-authored-by: Smilencer <[email protected]>
Co-authored-by: ShuNing <[email protected]>
Co-authored-by: zyguan <[email protected]>
Co-authored-by: Jack Yu <[email protected]>
Co-authored-by: Weizhen Wang <[email protected]>
Co-authored-by: lucasliang <[email protected]>
Co-authored-by: healthwaite <[email protected]>
Co-authored-by: xufei <[email protected]>
Co-authored-by: JmPotato <[email protected]>
Co-authored-by: ekexium <[email protected]>
Co-authored-by: 山岚 <[email protected]>
Co-authored-by: glorv <[email protected]>
Co-authored-by: Yongbo Jiang <[email protected]>
resolve locks interface for tidb gc_worker (#945)
fix some issues of replica selector (#910)  (#942)
fix some issues of replica selector (#910)
fix issue of configure kv timeout not work when disable batch client (#980)
fix batch-client wait too long and add some metrics (#973)
fix batch-client wait too long and add some metrics (#973)" (#984)
fix data race at the aggressiveLockingDirty (#913)
fix MinSafeTS might be set to MaxUint64 permanently (#994)
fix: fix invalid nil pointer when trying to record Store.SlownessStat. (#1017)
Fix batch client batchSendLoop panic (#1021)
fix request source tag unset (#1025)
Fix comment of `SuspendTime` (#1057)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants