Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor logging in tso service to separate OP and Cloud #6514

Merged
merged 2 commits into from
May 25, 2023

Conversation

binshi-bing
Copy link
Contributor

@binshi-bing binshi-bing commented May 25, 2023

What problem does this PR solve?

Issue Number: Ref #5895

What is changed and how does it work?

To keep the logging info in on-premises clean, we only print keyspace-group-id zap field
for the non-default keyspace group id.

Check List

Tests

-- No code (change log only)

printed log expectedly:

For default keyspace group:
[2023/05/24 17:48:03.942 -07:00] [INFO] [allocator_manager.go:727] ["entering into allocator daemon"] []
[2023/05/24 17:48:04.005 -07:00] [INFO] [allocator_manager.go:727] ["entering into allocator daemon"] []
[2023/05/24 17:48:04.037 -07:00] [INFO] [allocator_manager.go:727] ["entering into allocator daemon"] []
...

For non-default keyspace groups:
[2023/05/24 17:48:04.065 -07:00] [INFO] [allocator_manager.go:727] ["entering into allocator daemon"] [keyspace-group-id=1]
[2023/05/24 17:48:04.065 -07:00] [INFO] [allocator_manager.go:727] ["entering into allocator daemon"] [keyspace-group-id=1]
[2023/05/24 17:48:04.065 -07:00] [INFO] [allocator_manager.go:727] ["entering into allocator daemon"] [keyspace-group-id=1]
[2023/05/24 17:48:04.127 -07:00] [INFO] [allocator_manager.go:727] ["entering into allocator daemon"] [keyspace-group-id=2]
[2023/05/24 17:48:04.128 -07:00] [INFO] [allocator_manager.go:727] ["entering into allocator daemon"] [keyspace-group-id=2]
...

Release note

None.

@ti-chi-bot
Copy link
Contributor

ti-chi-bot bot commented May 25, 2023

[REVIEW NOTIFICATION]

This pull request has been approved by:

  • JmPotato
  • rleungx

To complete the pull request process, please ask the reviewers in the list to review by filling /cc @reviewer in the comment.
After your PR has acquired the required number of LGTMs, you can assign this pull request to the committer in the list by filling /assign @committer in the comment to help you merge this pull request.

The full list of commands accepted by this bot can be found here.

Reviewer can indicate their review by submitting an approval review.
Reviewer can cancel approval by submitting a request changes review.

@ti-chi-bot ti-chi-bot bot added do-not-merge/needs-linked-issue release-note-none Denotes a PR that doesn't merit a release note. labels May 25, 2023
@ti-chi-bot ti-chi-bot bot requested review from JmPotato and lhy1024 May 25, 2023 00:51
@binshi-bing binshi-bing requested a review from rleungx May 25, 2023 00:51
To keep the logging info in on-premises clean, we only print keyspace-group-id zap field for the non-default keyspace group id.

Signed-off-by: Bin Shi <[email protected]>
@binshi-bing binshi-bing force-pushed the print-group-id-conditionally branch from e971505 to d437764 Compare May 25, 2023 00:55
@codecov
Copy link

codecov bot commented May 25, 2023

Codecov Report

Patch coverage: 48.71% and project coverage change: -0.19 ⚠️

Comparison is base (78bc7c1) 75.07% compared to head (d437764) 74.89%.

❗ Current head d437764 differs from pull request most recent head 79c8b21. Consider uploading reports for the commit 79c8b21 to get more accurate results

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #6514      +/-   ##
==========================================
- Coverage   75.07%   74.89%   -0.19%     
==========================================
  Files         410      410              
  Lines       41909    41982      +73     
==========================================
- Hits        31465    31444      -21     
- Misses       7689     7774      +85     
- Partials     2755     2764       +9     
Flag Coverage Δ
unittests 74.89% <48.71%> (-0.19%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
pkg/tso/global_allocator.go 62.65% <47.16%> (-7.49%) ⬇️
pkg/tso/allocator_manager.go 63.17% <47.54%> (-2.65%) ⬇️
pkg/utils/logutil/log.go 82.92% <100.00%> (+1.34%) ⬆️

... and 33 files with indirect coverage changes

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

@ti-chi-bot ti-chi-bot bot added the status/LGT1 Indicates that a PR has LGTM 1. label May 25, 2023
@ti-chi-bot ti-chi-bot bot added status/LGT2 Indicates that a PR has LGTM 2. and removed status/LGT1 Indicates that a PR has LGTM 1. labels May 25, 2023
@JmPotato
Copy link
Member

/merge

@ti-chi-bot
Copy link
Contributor

ti-chi-bot bot commented May 25, 2023

@JmPotato: It seems you want to merge this PR, I will help you trigger all the tests:

/run-all-tests

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the ti-community-infra/tichi repository.

@ti-chi-bot
Copy link
Contributor

ti-chi-bot bot commented May 25, 2023

This pull request has been accepted and is ready to merge.

Commit hash: d437764

@ti-chi-bot ti-chi-bot bot added the status/can-merge Indicates a PR has been approved by a committer. label May 25, 2023
@ti-chi-bot
Copy link
Contributor

ti-chi-bot bot commented May 25, 2023

@binshi-bing: Your PR was out of date, I have automatically updated it for you.

If the CI test fails, you just re-trigger the test that failed and the bot will merge the PR for you after the CI passes.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the ti-community-infra/tichi repository.

@ti-chi-bot ti-chi-bot bot merged commit 73b91b0 into tikv:master May 25, 2023
@binshi-bing binshi-bing changed the title Print keyspace-group-id zap field in the tso log conditionally Refactor logging in tso service to separate OP and Cloud May 25, 2023
rleungx pushed a commit to rleungx/pd that referenced this pull request Jun 5, 2023
* metrics: add tso events metrics (tikv#6501)

close tikv#6502

Signed-off-by: bufferflies <[email protected]>

* keyspace: add benchmarks for keyspace assignment patrol (tikv#6507)

ref tikv#5895

Add benchmarks for keyspace assignment patrol.

Signed-off-by: JmPotato <[email protected]>

Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com>

* Print keyspace-group-id zap field in the tso log conditionally (tikv#6514)

ref tikv#5895

To keep the logging info in on-premises clean, we only print keyspace-group-id zap field
for the non-default keyspace group id.

Signed-off-by: Bin Shi <[email protected]>

Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com>

* Improve logging when tso keyspace group meta is updated. (tikv#6513)

close tikv#6512

Improve logging when tso keyspace group meta is updated.

Signed-off-by: Bin Shi <[email protected]>

Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com>

* tso: log tso service discovery info on the client side only when the primary is changed (tikv#6511)

close tikv#6508

Skip logging of the tso service discovery info when the secondar list is changed,
because tso servers currently don't have consistent view of the member list due to
remote etcd being used by tso service, which results in changing member list when
the client queries the tso servers in round-robin. We need to improve the server side
so that all tso servers can return the global consistent view of the keyspace groups'
serving or membership info.

Signed-off-by: Bin Shi <[email protected]>

Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com>

---------

Signed-off-by: bufferflies <[email protected]>
Co-authored-by: buffer <[email protected]>
Co-authored-by: JmPotato <[email protected]>
Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com>
rleungx pushed a commit to rleungx/pd that referenced this pull request Aug 2, 2023
…6514)

ref tikv#5895

To keep the logging info in on-premises clean, we only print keyspace-group-id zap field
for the non-default keyspace group id.

Signed-off-by: Bin Shi <[email protected]>

Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com>
rleungx pushed a commit to rleungx/pd that referenced this pull request Aug 2, 2023
…6514)

ref tikv#5895

To keep the logging info in on-premises clean, we only print keyspace-group-id zap field
for the non-default keyspace group id.

Signed-off-by: Bin Shi <[email protected]>

Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com>
rleungx added a commit to rleungx/pd that referenced this pull request Dec 1, 2023
* Fixed bugs in tso service registry watching loop. (tikv#6346)

ref tikv#6343

Fixed the following two bugs:
1. When re-watch a range, to continue from what left by the last watch, the revision is wresp.Header.Revision + 1 instead of wresp.Header.Revision, where wresp.Header.Revision is the revision indicated in the response of the last watch. Because of this bug, it was processing the same event endless as you can see from the log below.
2. In tso service watch loop in /Users/binshi/code/pingcap/my-pd/pkg/keyspace/tso_keyspace_group.go, If this is delete event, the json.Unmarshal(event.Kv.Value, s) will fail with the error "unexpected end of JSON input", so there is no way to get s.serviceAddr from the result of json.Unmarshal.

Signed-off-by: Bin Shi <[email protected]>

* mcs: fix double compression of prom handler (tikv#6339)

ref prometheus/client_golang#622, ref tikv#5895

Signed-off-by: Ryan Leung <[email protected]>

* tests, tso: add more TSO split tests (tikv#6338)

ref tikv#6232

Add more TSO split tests.

Signed-off-by: JmPotato <[email protected]>

Co-authored-by: Ti Chi Robot <[email protected]>

* keyspace, tso: fix next revision to watch after watch/Get/RangeScan (tikv#6353)

ref tikv#6232

The next revision to watch should always be Header.Revision + 1 where header is response header of watch/Get/RangeScan

Signed-off-by: Bin Shi <[email protected]>

* mcs, tests: use TSO cluster to do the failover test (tikv#6356)

ref tikv#5895

Use TSO cluster to do the failover test.

Signed-off-by: JmPotato <[email protected]>

* fix startWatchLoop leak (tikv#6352)

Signed-off-by: Ryan Leung <[email protected]>
Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com>

* mcs: update client when meet transport is closing (tikv#6341)

* mcs: update client when meet transport is closing

Signed-off-by: lhy1024 <[email protected]>

* address comments

Signed-off-by: lhy1024 <[email protected]>

* add retry

Signed-off-by: lhy1024 <[email protected]>

---------

Signed-off-by: lhy1024 <[email protected]>
Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com>

* add bootstrap test (tikv#6347)

Signed-off-by: Ryan Leung <[email protected]>
Co-authored-by: Ti Chi Robot <[email protected]>
Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com>

* mcs, tso: fix ts fallback caused by multi-primary of the same keyspace group  (tikv#6362)

* Change participant election-prifix from listen-addr to advertise-listen-addr to gurantee uniqueness.

Signed-off-by: Bin Shi <[email protected]>

* Add TestPariticipantStartWithAdvertiseListenAddr

Signed-off-by: Bin Shi <[email protected]>

* Add comments to fix go fmt errors

Signed-off-by: Bin Shi <[email protected]>

---------

Signed-off-by: Bin Shi <[email protected]>
Co-authored-by: Ryan Leung <[email protected]>

* fix log output (tikv#6364)

Signed-off-by: Ryan Leung <[email protected]>
Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com>

* fix data race issue in the store limit test (tikv#6370)

* fix data race issue in th estore limit test

Signed-off-by: Bin Shi <[email protected]>

* fix gmt error

Signed-off-by: Bin Shi <[email protected]>

---------

Signed-off-by: Bin Shi <[email protected]>

* mcs: fix duplicate start of RaftCluster. (tikv#6358)

* Using double-checked locking to avoid duplicate start of RaftCluster.

Signed-off-by: Bin Shi <[email protected]>

* Handle feedback

Signed-off-by: Bin Shi <[email protected]>

* improve locking

Signed-off-by: Bin Shi <[email protected]>

* handle feedback

Signed-off-by: Bin Shi <[email protected]>

---------

Signed-off-by: Bin Shi <[email protected]>
Co-authored-by: Ryan Leung <[email protected]>

* Add retry mechanism for updating keyspace group (tikv#6372)

Signed-off-by: JmPotato <[email protected]>

* etcdutil: revert etcd client without multi endpoint (tikv#6374)

ref tikv#6124

Signed-off-by: lhy1024 <[email protected]>

Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com>

* mcs: add set handler for balancer and alloc node for default keyspace group (tikv#6342)

ref tikv#6233

Signed-off-by: lhy1024 <[email protected]>

Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com>

* mcs, tso: fix Nil pointer deference when (*AllocatorManager).GetMember (tikv#6383)

close tikv#6381

If the desired keyspace group fall back to the default keyspace group and the AM isn't initialized, return not served error.

Signed-off-by: Bin Shi <[email protected]>

* mcs, tso: support multi-keyspace-group and its service discovery in E2E path (tikv#6321)

ref tikv#6232

Support multi-keyspace-group in PD(TSO) client

Signed-off-by: Bin Shi <[email protected]>

Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com>

* client: add `NewClientWithKeyspaceName` for client (tikv#6380)

ref tikv#5895

Signed-off-by: Ryan Leung <[email protected]>

* keyspace, tso: check the replica count before the split (tikv#6382)

ref tikv#6233

Check the replica count before the split.

Signed-off-by: JmPotato <[email protected]>

Co-authored-by: lhy1024 <[email protected]>
Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com>

* tso: fix bugs to make split test case to pass (tikv#6389)

ref tikv#6232

fix bugs to make split test case to pass

Signed-off-by: Bin Shi <[email protected]>

Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com>

* keyspace: patrol keyspace assignment before the first split (tikv#6388)

ref tikv#6232

Patrol the keyspace assignment before the first split to make sure every keyspace has its group assignment.

Signed-off-by: JmPotato <[email protected]>

Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com>

* tests: fix Flaky TestMicroserviceTSOServer/TestConcurrentlyReset (tikv#6396)

close tikv#6385

Get a copy of now then call base.add, because now is shared by all goroutines and now.add() will add to itself which isn't atomic and multi-goroutine safe.

Signed-off-by: Bin Shi <[email protected]>

* keyspace, slice: improve code efficiency in membership ops (tikv#6392)

ref tikv#6231

Improve code efficiency in membership ops

Signed-off-by: Bin Shi <[email protected]>

* tests: enable TestTSOKeyspaceGroupSplitClient (tikv#6398)

ref tikv#6232

Enable `TestTSOKeyspaceGroupSplitClient`.

Signed-off-by: JmPotato <[email protected]>

Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com>

* tests: add more tests for multiple keyspace groups (tikv#6395)

ref tikv#5895

Add CheckMultiKeyspacesTSO() and WaitForMultiKeyspacesTSOAvailable in test utility. Add TestTSOKeyspaceGroupManager/TestKeyspacesServedByNonDefaultKeyspaceGroup. Cover TestGetTS, TestGetTSAsync, TestUpdateAfterResetTSO in TestMicroserviceTSOClient for multiple keyspace groups.

Signed-off-by: Bin Shi <[email protected]>

* tests: fix failpoint disable (tikv#6401)

ref tikv#4399

Signed-off-by: lhy1024 <[email protected]>

Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com>

* client: retry load keyspace meta when creating a new client (tikv#6402)

ref tikv#5895

Signed-off-by: Ryan Leung <[email protected]>

Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com>

* Fix test issue in TestRandomResignLeader. (tikv#6410)

close tikv#6404

We need to make sure the selected keyspaces are from different keyspace groups, otherwise multiple goroutines below could try to resign the primary of the same keyspace group and cause race condition.

Signed-off-by: Bin Shi <[email protected]>

* keyspace, api2: fix the keyspace assignment patrol consistency (tikv#6397)

ref tikv#6232

Fix the keyspace assignment patrol consistency.

Signed-off-by: JmPotato <[email protected]>

Co-authored-by: Ryan Leung <[email protected]>

* election, tso: fix data race in lease.go (tikv#6379)

close tikv#6378

fix data race in lease.go

Signed-off-by: Bin Shi <[email protected]>

Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com>

* mcs: fix forward test with pd mode client (tikv#6290)

ref tikv#5895, ref tikv#6279, close tikv#6289

Signed-off-by: lhy1024 <[email protected]>

* keyspace: patrol the keyspace assignment in batch (tikv#6411)

ref tikv#6232

Patrol the keyspace assignment in batch.

Signed-off-by: JmPotato <[email protected]>

* etcdutil: add watch loop (tikv#6390)

close tikv#6391

Signed-off-by: lhy1024 <[email protected]>

* mcs, tso: add API interface to obtain the TSO keyspace group member info (tikv#6373)

ref tikv#6232

Add API interface to obtain the TSO keyspace group member info.

Signed-off-by: JmPotato <[email protected]>

Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com>

* keysapce: wait region split when creating keyspace (tikv#6414)

ref tikv#6231

Signed-off-by: zeminzhou <[email protected]>
Signed-off-by: lhy1024 <[email protected]>

Co-authored-by: zzm <[email protected]>
Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com>

* mcs: use getClusterInfo to check whether api service is ready (tikv#6422)

ref tikv#5836

Signed-off-by: lhy1024 <[email protected]>

Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com>

* pd-ctl, tests: add the keyspace group commands (tikv#6423)

ref tikv#6232

Add the keyspace group commands to show and split keyspace groups.

Signed-off-by: JmPotato <[email protected]>

Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com>

* Handle compatibility issue in GetClusterInfo RPC (tikv#6434)

ref tikv#5895, close tikv#6448

Handle the compatibility issue in the GetClusterInfo RPC

Signed-off-by: Bin Shi <[email protected]>

* Provide GetMinTS API to solve the compatibility issue brought by multi-timeline tso (tikv#6421)

ref tikv#6142

1. Import kvproto change to introduce GetMinTS rpc in the TSO service.
6. Add server side implementation for GetMinTS rpc.
7. Add client side implementation for GetMinTS rpc.
8. Add unit test

Signed-off-by: Bin Shi <[email protected]>

* Disable TestGetMinTS since it's not stable (tikv#6459)

ref tikv#6453

Disable TestGetMinTS due to tikv#6453

Signed-off-by: Bin Shi <[email protected]>

* tso: use less interval when waiting api service (tikv#6451)

close tikv#6449

Signed-off-by: lhy1024 <[email protected]>

* etcdutil: fix ctx in watch loop (tikv#6445)

close tikv#6439

Signed-off-by: lhy1024 <[email protected]>

Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com>

* *: fix disable failpoint  (tikv#6412)

ref tikv#4399

Signed-off-by: Ryan Leung <[email protected]>

Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com>

* Fix "non-default keyspace groups use the same timestamp path by mistake" (tikv#6457)

close tikv#6453, close tikv#6465

The tso servers are loading keyspace groups asynchronously. Make sure all keyspace groups
are available for serving tso requests from corresponding keyspaces by querying
IsKeyspaceServing(keyspaceID, the Desired KeyspaceGroupID). if use default keyspace group id
in the query, it will always return true as the keyspace will be served by default keyspace group
before the keyspace groups are loaded.

Signed-off-by: Bin Shi <[email protected]>

* TSO microservice discovery fallback path shouldn't call FindGroupByKeyspaceID (tikv#6473)

close tikv#6472

TSO microservice discovery fallback path shouldn't call FindGroupByKeyspaceID

Signed-off-by: Bin Shi <[email protected]>

* mcs, tso: handle null keyspace (tikv#6476)

ref tikv#5895

For API V1 and legacy path (NewClientWithContext w/o keyspace id/name),
using Null Keypsace ID (uint32max) instead of default keyspace id and
make sure it can be served by the default keyspace group's timeline. Modifying test accordingly.

Signed-off-by: Bin Shi <[email protected]>

* mcs, tso: print TSO service discovery fallback log just once (tikv#6478)

ref tikv#5895

Print TSO service discovery fallback log just once

Signed-off-by: Bin Shi <[email protected]>

* client: return error if the keyspace meta cannot be found (tikv#6479)

ref tikv#6142

Signed-off-by: Ryan Leung <[email protected]>

Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com>

* client: support use API context to create client (tikv#6482)

ref tikv#6142

Signed-off-by: Ryan Leung <[email protected]>

* client: fix the leak of the keyspace watch channel (tikv#6494)

ref tikv#4399

Signed-off-by: Ryan Leung <[email protected]>

* mcs, tso: implement GetMinTS gPRC on both API leader and PD client  (tikv#6488)

ref tikv#5895

implement GetMinTS gPRC on both API leader and the PD client.

Signed-off-by: Bin Shi <[email protected]>

* keyspace: add benchmarks for keyspace assignment patrol (tikv#6507)

ref tikv#5895

Add benchmarks for keyspace assignment patrol.

Signed-off-by: JmPotato <[email protected]>

Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com>

* Print keyspace-group-id zap field in the tso log conditionally (tikv#6514)

ref tikv#5895

To keep the logging info in on-premises clean, we only print keyspace-group-id zap field
for the non-default keyspace group id.

Signed-off-by: Bin Shi <[email protected]>

Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com>

* Improve logging when tso keyspace group meta is updated. (tikv#6513)

close tikv#6512

Improve logging when tso keyspace group meta is updated.

Signed-off-by: Bin Shi <[email protected]>

Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com>

* tso: log tso service discovery info on the client side only when the primary is changed (tikv#6511)

close tikv#6508

Skip logging of the tso service discovery info when the secondar list is changed,
because tso servers currently don't have consistent view of the member list due to
remote etcd being used by tso service, which results in changing member list when
the client queries the tso servers in round-robin. We need to improve the server side
so that all tso servers can return the global consistent view of the keyspace groups'
serving or membership info.

Signed-off-by: Bin Shi <[email protected]>

Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com>

* mcs: fix the members field is null (tikv#6518)

close tikv#6519

Signed-off-by: Ryan Leung <[email protected]>

* mcs, tso: remove unnecessary "create tso forwarding stream" log on the common happy path (tikv#6524)

close tikv#6517

Remove unnecessary "create tso forwarding stream" on the common happy path

Signed-off-by: Bin Shi <[email protected]>

* fix watch keyspace (tikv#6528)

close tikv#6527

Signed-off-by: AmoebaProtozoa <[email protected]>

* Fix tso server close stuck issue (tikv#6529)

ref tikv#5895, close tikv#6304

Rewrite TSO gPRC/HTTP server Close().

Signed-off-by: Bin Shi <[email protected]>

* mcs, tso: change keyspace group primary path. (tikv#6526)

ref tikv#5895

mcs, tso: change keyspace group primary path.

The path for non-default keyspace group primary election changes
from  "/ms/{cluster_id}/tso/{group}/primary" to "/ms/{cluster_id}/tso/keyspace_groups/election/{group}/primary".
Default keyspace group keeps /ms/{cluster_id}/tso/00000/primary.

Signed-off-by: Bin Shi <[email protected]>

* tso: fix flaky test TestHandleTSORequestWithWrongMembership (tikv#6533)

close tikv#6532

Add loop to wait for primary election to complete so that the tso service availability is deterministic.

Signed-off-by: Bin Shi <[email protected]>

* Add TestUpgradingAPIandTSOClusters (tikv#6534)

ref tikv#5895

Add TestUpgradingAPIandTSOClusters to test the scenario that after we restart the API cluster
then restart the TSO cluster, the TSO service can still serve TSO requests normally.

Signed-off-by: Bin Shi <[email protected]>

* mcs: add a test for starting tso server first (tikv#6535)

ref tikv#5895

Signed-off-by: Ryan Leung <[email protected]>

Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com>

* mcs: support pprof for microservices (tikv#6541)

ref tikv#5895

Signed-off-by: Ryan Leung <[email protected]>

* mcs, tso: delete the tso ms discovery when switching away from API mode. (tikv#6544)

close tikv#6543

When switching from the PD mode to the API mode, the old tso microservice discovery isn't needed anymore,
and all resources can be released to avoid noisy error logs when the component trying to discover
the non-existent tso microservice.

Signed-off-by: Bin Shi <[email protected]>

* keyspace: adjust `keyspacePatrolBatchSize` to avoid too many operations in txn request (tikv#6562)

close tikv#6561

Signed-off-by: lhy1024 <[email protected]>

* *: fix some log issues (tikv#6568)

ref tikv#5895, ref tikv#6390

Signed-off-by: Ryan Leung <[email protected]>

* Simplify TSO Proxy implementation by using one forward stream for one gPRC stream (tikv#6572)

close tikv#6549, ref tikv#6565

Simplify tso proxy implementation by using one forward stream for one grpc.ServerStream.
tikv#6565 is a longer term solution for both follower batching and tso microservice. 
It's well implemented, but just need more time to bake, and we need a short term workable solution for now.

Signed-off-by: Bin Shi <[email protected]>

* Improve tso proxy reliability (tikv#6585)

ref tikv#5895

Improve tso proxy reliability.

1. Add protection mechanisms to TSO Proxy.
    a. Throttle the concurrency of TSO Proxy streamings. Default 5000.
    b. If TSO Proxy didn't receive the TSO request from the client for 1 hour, close the stream.
2. Optimize forceLoad lock with RW lock.
3. Enable stress test.
4. Add deadline for API leader forwarding request to TSO service.
5. Make tso response channel more safely.
6. Move tso proxy stress test away from the test suite as it has impact on other test cases.
7. Fix grpc client connection pool (server side) resource leak problem.
8. Make MaxConcurrentTSOProxyStreamings (5000 as default) and TSOProxyClientRecvTimeout (1 hour as default) configurable.
9. Add metrics tsoProxyHandleDuration, tsoProxyBatchSize and tsoProxyForwardTimeoutCounter.

Signed-off-by: Bin Shi <[email protected]>

* tests: make TestSplitKeyspaceGroup stable (tikv#6584)

close tikv#6571

Signed-off-by: lhy1024 <[email protected]>

* Add keyspace and keyspace group info to the time fallback log. (tikv#6581)

ref tikv#5895

Add keyspace and keyspace group info to the time fallback log to help debugging time fallback issue in multi-timeline scenario.

Signed-off-by: Bin Shi <[email protected]>

* config, server: fault injection in TSO Proxy (tikv#6588)

ref tikv#5895

Add failure test cases.

Signed-off-by: Bin Shi <[email protected]>

* mcs, tso: fix stream Send() and CloseSend() data race issue in TSO proxy (tikv#6591)

close tikv#6590

No need to call SendClose(), because TSO proxy will cancel the stream context which will cause the corresponding grpc stream on the server side to exit.

Signed-off-by: Bin Shi <[email protected]>

Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com>

* mcs, tso: fix potential inconsistency caused by non-atomic applying keyspace movement state change in the persistent store (tikv#6596)

ref tikv#5895

fix potential inconsistency caused by non-atomic applying the state change in the persistent in the following cases:
1. Keyspace group split/merge
2. Keyspace movement across keyspace groups.

Signed-off-by: Bin Shi <[email protected]>

* keyspace, apiv2: implement the keyspace group merging API (tikv#6594)

ref tikv#6589

Implement the keyspace group merging API.

Signed-off-by: JmPotato <[email protected]>

Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com>

* keyspace: prohibit merging the default keyspace group (tikv#6606)

ref tikv#6589

Prohibit merging the default keyspace group.

Signed-off-by: JmPotato <[email protected]>

* client: fix keyspace update in `tsoSvcDiscovery` (tikv#6612)

close tikv#6611

Signed-off-by: lhy1024 <[email protected]>

* keyspace: enhance LockGroup with RemoveEntryOnUnlock LockOption (tikv#6629)

close tikv#6628

Enhance LockGroup with RemoveEntryOnUnlock.
Remove the lock of the given key from the lock group when unlock to keep minimal working set, which is suited for low qps, non-time-critical and non-consecutive large key space scenarios. One example of the last use case is that keyspace group split loads non-consecutive keyspace meta in batches and lock all loaded keyspace meta within a batch at the same time.

Signed-off-by: Bin Shi <[email protected]>

* keyspace: add priority of tso node for the keyspace group (tikv#6602)

ref tikv#6599

Signed-off-by: lhy1024 <[email protected]>

Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com>

* tests: reduce unnecessary time.sleep in keyspace group (tikv#6632)

ref tikv#6599

Signed-off-by: lhy1024 <[email protected]>

Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com>

* Add keyspace group info in the timestamp fallback log in the client. (tikv#6654)

ref tikv#5895

Add keyspace group info in the timestamp fallback log in the client.

Signed-off-by: Bin Shi <[email protected]>

* tso: fix checkTSOSplit to finish split correctly (tikv#6652)

ref tikv#6232

Fix `checkTSOSplit` to finish split correctly.

Signed-off-by: JmPotato <[email protected]>

Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com>

* tests: fix TestTSOKeyspaceGroupSplitClient to avoid unexpected panic (tikv#6655)

close tikv#6634

Fix `TestTSOKeyspaceGroupSplitClient` to avoid unexpected panic

Signed-off-by: JmPotato <[email protected]>

* mcs: add log for finishing split (tikv#6656)

ref tikv#5895

Signed-off-by: Ryan Leung <[email protected]>

Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com>

* client: fix the keyspace ID RW race inside tsoServiceDiscovery (tikv#6657)

ref tikv#5895

Fix the keyspace ID RW race inside `tsoServiceDiscovery`.

Signed-off-by: JmPotato <[email protected]>

Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com>

* tso, tests: implement the keyspace group merge checker (tikv#6625)

ref tikv#6589

Implement the keyspace group merge checker.

Signed-off-by: JmPotato <[email protected]>

* keyspace, apiv2: support to split keyspace group with the keyspace ID range (tikv#6646)

ref tikv#6232

Support to split keyspace group with the keyspace ID range.

Signed-off-by: JmPotato <[email protected]>

Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com>

* mcs, tso: fix expensive async forwardTSORequest() and its timeout mechanism. (tikv#6664)

ref tikv#6659

Fix expensive async forwardTSORequest() and its timeout mechanism.

In order to handle the timeout case for forwardStream send/recv, the existing logic is to create 
context.withTimeout(forwardCtx,...) for every request, then start a new goroutine "forwardTSORequest", 
which is very expensive as shown by the profiling in tikv#6659. 

This change create a watchDeadline routine per forward stream and reuse it for all the forward requests
in which forwardTSORequest is called synchronously. Compared to the existing logic, the new change
is much cheaper and the latency is much stable.

Signed-off-by: Bin Shi <[email protected]>

* mcs, tso: support weighted-election for TSO keyspace group primary election (tikv#6617)

close tikv#6616

Add the tso server registry watch loop in tso's keyspace group manager.
re-distribute TSO keyspace group primaries according to their replica priorities

Signed-off-by: Bin Shi <[email protected]>

* tools: add merge commands for pd-ctl (tikv#6675)

ref tikv#6589

Signed-off-by: Ryan Leung <[email protected]>

Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com>

* Add split-range cmd and fix duplicate keyspaces (tikv#6689)

close tikv#6687, close tikv#6688

Add split-range cmd and fix duplicate keyspaces
    
    1. Add split-range cmd to support StartKeyspaceID and EndKeyspaceID parameters.
    2. Fix "split 0 2 2 2" generate duplicate keyspaces in the keyspace list of the group"

Signed-off-by: Bin Shi <[email protected]>

* *: add group id to error logs (tikv#6695)

close tikv#6685

Signed-off-by: Ryan Leung <[email protected]>

Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com>

* tools: add keepalive for pd-tso-bench (tikv#6699)

close tikv#6681

Signed-off-by: lhy1024 <[email protected]>

Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com>

* Add more debugging info to time fallback log. (tikv#6700)

ref tikv#5895

Add more debugging info to time fallback log.
[2023/06/27 10:50:54.196 -07:00] [PANIC] [tso_dispatcher.go:764] ["[tso] timestamp fallback"] 
[dc-location=global] [keyspace=4294967295]
[last-ts="(1687888254152, 1)"] [cur-ts="(1687888254052, 2)"] 
[last-tso-server=127.0.0.1:3380] [cur-tso-server=127.0.0.1:3380]
[last-keyspace-group-in-request=0] [cur-keyspace-group-in-request=0] 
[last-keyspace-group-in-response=0] [cur-keyspace-group-in-response=0] 
[last-response-received-at=2023/06/27 10:50:54.195 -07:00]
[cur-response-received-at=2023/06/27 10:50:54.196 -07:00]

Signed-off-by: Bin Shi <[email protected]>

Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com>

* keyspace: refine the split and merge details (tikv#6707)

close tikv#6686, close tikv#6698

- Adjust the merge operation order.
- Add some logs.
- Refine the code.

Signed-off-by: JmPotato <[email protected]>

* tools: support querying keyspace group by state (tikv#6706)

ref tikv#5895

Signed-off-by: Ryan Leung <[email protected]>

Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com>

* tools: support get all groups (tikv#6714)

ref tikv#5895, ref tikv#6706

Signed-off-by: Ryan Leung <[email protected]>

* Fix data race between read APIs and finshiSplit/finishMerge in keyspace group manager (tikv#6723)

close tikv#6721

checkTSOMerge and checkTSOSplit will read from kgm.getKeyspaceGroupMeta

finishMergeKeyspaceGroup and finishSplitKeyspaceGroup will update kgm

so just return a copy to avoid data race

Signed-off-by: Bin Shi <[email protected]>

* tso: fix memory leak introduced by timer.After (tikv#6730)

close tikv#6719, ref tikv#6720

Signed-off-by: lhy1024 <[email protected]>

Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com>

* mcs, tso: fix split and split-range command bugs. (tikv#6732)

close tikv#6687, close tikv#6731

Fix split and split-range command bugs.

Signed-off-by: Bin Shi <[email protected]>

* mcs: use patch method in keyspace group (tikv#6713)

ref tikv#6233

Signed-off-by: lhy1024 <[email protected]>

Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com>

* *: fix memory leak introduced by timer.After (tikv#6720)

close tikv#6719

Signed-off-by: lhy1024 <[email protected]>

Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com>

* tso: implement groupSplitPatroller to speed up the split process (tikv#6736)

ref tikv#5895, close tikv#6696

Implement `groupSplitPatroller` to speed up the split process.

Signed-off-by: JmPotato <[email protected]>

Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com>

* client: fix tso service discovery at the first time for NewClientWithAPIContext (tikv#6749)

close tikv#6748

After NewClientWithAPIContextV2 returns, the keyspace group should be discovered by the passed keyspace name immediately

Signed-off-by: Bin Shi <[email protected]>

* *: add test for misusing keyspace ID when creating the client (tikv#6754)

ref tikv#6747, ref tikv#6748, ref tikv#6749

Signed-off-by: Ryan Leung <[email protected]>

* tso: support multi-keyspace, fault injection and keyspace-name in pd-tso-bench (tikv#6608)

ref tikv#5895

support multi-keyspace, fault injection and keyspace-name in pd-tso-bench

Signed-off-by: Bin Shi <[email protected]>

Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com>

* *: make test great again (tikv#6767)

close tikv#6761

Signed-off-by: Ryan Leung <[email protected]>

* tso: implement deletedGroupCleaner to clean up the legacy TSO key (tikv#6745)

close tikv#6589

- Implement `deletedGroupCleaner` to clean up the legacy TSO key.
- Extract the timestamp key path constructor.

Signed-off-by: JmPotato <[email protected]>

* client, tests: allow TSO fallback happens in TestMixedTSODeployment (tikv#6740)

close tikv#6634

Introduce `WithAllowTSOFallback` client option to bypass the panic in `TestMixedTSODeployment`.

Signed-off-by: JmPotato <[email protected]>

Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com>

* tso: allow mergedTS to be zero in mergingChecker (tikv#6758)

ref tikv#6589

Since it's possible that a keyspace group is to be deleted and merged before its TSO is initialized,
we should allow `mergedTS` to be zero in `mergingChecker`. This PR allows this case and only
block the merging when loading the TSO meets the error.

Signed-off-by: JmPotato <[email protected]>

Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com>

* *: move keyspace group primary path code to key_path.go (tikv#6755)

ref tikv#5895

Signed-off-by: lhy1024 <[email protected]>

Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com>

* mcs: add log flags (tikv#6777)

ref tikv#5766

Signed-off-by: Ryan Leung <[email protected]>

* keyspace, tso, apiv2: impl the interface to merge all keyspace groups into the default (tikv#6757)

ref tikv#6756

Impl the interface to merge all keyspace groups into the default.

Signed-off-by: JmPotato <[email protected]>

Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com>

* pdctl: support show keyspace group primary (tikv#6747)

close tikv#6746

Signed-off-by: lhy1024 <[email protected]>

Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com>

* pd-ctl, tests: impl the merge all keyspace groups command (tikv#6782)

close tikv#6756

- Impl the merge all keyspace groups command.
- Further reuse of TSO cluster-related test code.

Signed-off-by: JmPotato <[email protected]>

* *: fix test suite race (tikv#6784)

close tikv#6772

Signed-off-by: Ryan Leung <[email protected]>

* keyspace: fix data race (tikv#6797)

Signed-off-by: lhy1024 <[email protected]>
Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com>

* pdctl: support show keyspace meta with refresh group id (tikv#6751)

close tikv#6746

Signed-off-by: lhy1024 <[email protected]>

Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com>

* mcs: Refactor ServicePath to make caller's life easier (tikv#6799)

close tikv#6800

Signed-off-by: Xiaoguang Sun <[email protected]>

Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com>

* Remove the lastPhysical check in dispatchClient (tikv#6812)

* *: fix `TestGetTSOImmediately` (tikv#6811)

close tikv#6795

Signed-off-by: Ryan Leung <[email protected]>

* etcdutil, leadership: make more high availability (tikv#6577)

close tikv#6554

Signed-off-by: lhy1024 <[email protected]>

* keyspace: some cherry-pick from pd-cse (tikv#6477)

ref tikv#4399

Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com>

* *: cherry pick some keyspace related things (tikv#6840)

ref tikv#4399

Signed-off-by: disksing <[email protected]>
Signed-off-by: Evan Zhou <[email protected]>
Signed-off-by: AmoebaProtozoa <[email protected]>
Signed-off-by: Ryan Leung <[email protected]>

Co-authored-by: disksing <[email protected]>
Co-authored-by: Evan Zhou <[email protected]>
Co-authored-by: David <[email protected]>
Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com>

* *: fix the split problem caused by no enough replicas (tikv#6555)

close tikv#6550

Signed-off-by: Ryan Leung <[email protected]>

* resolve conflicts

Signed-off-by: Ryan Leung <[email protected]>

---------

Signed-off-by: Bin Shi <[email protected]>
Signed-off-by: Ryan Leung <[email protected]>
Signed-off-by: JmPotato <[email protected]>
Signed-off-by: lhy1024 <[email protected]>
Signed-off-by: AmoebaProtozoa <[email protected]>
Co-authored-by: Bin Shi <[email protected]>
Co-authored-by: JmPotato <[email protected]>
Co-authored-by: Ti Chi Robot <[email protected]>
Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com>
Co-authored-by: lhy1024 <[email protected]>
Co-authored-by: zzm <[email protected]>
Co-authored-by: David <[email protected]>
Co-authored-by: Xiaoguang Sun <[email protected]>
Co-authored-by: disksing <[email protected]>
Co-authored-by: Evan Zhou <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
release-note-none Denotes a PR that doesn't merit a release note. status/can-merge Indicates a PR has been approved by a committer. status/LGT2 Indicates that a PR has LGTM 2.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants