Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mcs, tso: timestamp fallback after merging #6686

Closed
binshi-bing opened this issue Jun 26, 2023 · 4 comments · Fixed by #6707
Closed

mcs, tso: timestamp fallback after merging #6686

binshi-bing opened this issue Jun 26, 2023 · 4 comments · Fixed by #6707
Assignees
Labels
type/enhancement The issue or PR belongs to an enhancement.

Comments

@binshi-bing
Copy link
Contributor

binshi-bing commented Jun 26, 2023

Enhancement Task

// split keyspace 1 from group 0 to group 1
root@serverless-cluster-pd-0:/# ./pd-ctl keyspace-group split 0 1 1
Success!
root@serverless-cluster-pd-0:/# ./pd-ctl keyspace-group merge 0 1
Failed! [500] "keyspace group is in split state"

// Send traffic to keyspace 1 to trigger finish split for group 1
root@tso-bench:/# ./pd-tso-bench -v -duration 250000s -pd "http://serverless-cluster-pd-0.serverless-cluster-pd-peer.tidb-serverless.svc:2379" -client 10 -c 3 -interval 30s -keyspace 1

// Now we can merge 1 to 0
root@serverless-cluster-pd-0:/# ./pd-ctl keyspace-group merge 0 1
Success!

// But pd-tso-bench reported time fallback immediately after above merging.
panic: global timestamp fallback, new ts (1687812834954, 1003) <= the last one (1687812834975, 50). keyspace: 1, keyspace group: 0

goroutine 693 [running]:
github.com/tikv/pd/client.(*tsoClient).compareAndSwapTS(0x4000439d40, {0x93f24b, 0x6}, 0x188f97c068a, 0x3eb, 0x0, 0x2)
/go/src/github.com/tikv/pd/client/tso_dispatcher.go:742 +0x244
github.com/tikv/pd/client.(*tsoClient).processRequests(0x4000439d40, {0xa6e3e0, 0x4000f94bd0}, {0x93f24b, 0x6}, 0x40000c50e0, {0x0?, 0x0, 0x4000160280?})
/go/src/github.com/tikv/pd/client/tso_dispatcher.go:720 +0x250
github.com/tikv/pd/client.(*tsoClient).handleDispatcher(0x4000439d40, {0xa79090?, 0x4000538040}, {0x93f24b, 0x6}, 0x40000c50e0)
/go/src/github.com/tikv/pd/client/tso_dispatcher.go:445 +0xde4
created by github.com/tikv/pd/client.(*tsoClient).createTSODispatcher
/go/src/github.com/tikv/pd/client/tso_dispatcher.go:290 +0x264

Here is the PD client log.
https://gist.githubusercontent.com/binshi-bing/68fadbf399dc5991e9c65341d7375636/raw/05de99538b8a5dc9d65ace1ddf8462928541c05a/gistfile1.txt

All Client/TSO log can be found https://gist.github.com/binshi-bing/68fadbf399dc5991e9c65341d7375636/edit

@binshi-bing binshi-bing added the type/enhancement The issue or PR belongs to an enhancement. label Jun 26, 2023
@lhy1024
Copy link
Contributor

lhy1024 commented Jun 27, 2023

I also meet the same problem, only stop and restart server.

[2023/06/27 09:16:58.476 +00:00] [INFO] [tso_dispatcher.go:746] ["[tso] keyspace group changed"] [dc-location=global] [old-group-id=1] [new-group-id=0]
[2023/06/27 09:16:58.476 +00:00] [INFO] [tso_dispatcher.go:313] ["[tso] exit tso dispatcher"] [dc-location=global]
panic: global timestamp fallback, new ts (1687857418453, 1) <= the last one (1687857418453, 57). last keyspace group: 1, keyspace in request: 0, keyspace group in request: 1, keyspace group in response: 0

goroutine 71 [running]:
github.com/tikv/pd/client.(*tsoClient).compareAndSwapTS(0x4000000120, {0x93f68b, 0x6}, 0x0, 0x188fc2450d5, 0x1, 0x0, 0x1)
	/go/src/github.com/tikv/pd/client/tso_dispatcher.go:756 +0x454
github.com/tikv/pd/client.(*tsoClient).processRequests(0x4000000120, {0xa6ea40, 0x4000cfb210}, {0x93f68b, 0x6}, 0x40004fa050, {0x0?, 0x0, 0x400099ee40?})
	/go/src/github.com/tikv/pd/client/tso_dispatcher.go:721 +0x254
github.com/tikv/pd/client.(*tsoClient).handleDispatcher(0x4000000120, {0xa796f0?, 0x4000265600}, {0x93f68b, 0x6}, 0x40004fa050)
	/go/src/github.com/tikv/pd/client/tso_dispatcher.go:446 +0xde4
created by github.com/tikv/pd/client.(*tsoClient).createTSODispatcher
	/go/src/github.com/tikv/pd/client/tso_dispatcher.go:291 +0x264

tso-0.log
tso-1.log

There is a lot of log about ["campaign tso primary meets error due to pre-check campaign failed, the tso keyspace group may be in split"] [keyspace-group-id=6]

@lhy1024
Copy link
Contributor

lhy1024 commented Jun 27, 2023

I also meet the same problem, only stop and restart server.

[2023/06/27 09:16:58.476 +00:00] [INFO] [tso_dispatcher.go:746] ["[tso] keyspace group changed"] [dc-location=global] [old-group-id=1] [new-group-id=0]
[2023/06/27 09:16:58.476 +00:00] [INFO] [tso_dispatcher.go:313] ["[tso] exit tso dispatcher"] [dc-location=global]
panic: global timestamp fallback, new ts (1687857418453, 1) <= the last one (1687857418453, 57). last keyspace group: 1, keyspace in request: 0, keyspace group in request: 1, keyspace group in response: 0

goroutine 71 [running]:
github.com/tikv/pd/client.(*tsoClient).compareAndSwapTS(0x4000000120, {0x93f68b, 0x6}, 0x0, 0x188fc2450d5, 0x1, 0x0, 0x1)
	/go/src/github.com/tikv/pd/client/tso_dispatcher.go:756 +0x454
github.com/tikv/pd/client.(*tsoClient).processRequests(0x4000000120, {0xa6ea40, 0x4000cfb210}, {0x93f68b, 0x6}, 0x40004fa050, {0x0?, 0x0, 0x400099ee40?})
	/go/src/github.com/tikv/pd/client/tso_dispatcher.go:721 +0x254
github.com/tikv/pd/client.(*tsoClient).handleDispatcher(0x4000000120, {0xa796f0?, 0x4000265600}, {0x93f68b, 0x6}, 0x40004fa050)
	/go/src/github.com/tikv/pd/client/tso_dispatcher.go:446 +0xde4
created by github.com/tikv/pd/client.(*tsoClient).createTSODispatcher
	/go/src/github.com/tikv/pd/client/tso_dispatcher.go:291 +0x264

tso-0.log tso-1.log

There is a lot of log about ["campaign tso primary meets error due to pre-check campaign failed, the tso keyspace group may be in split"] [keyspace-group-id=6]

There are two new problem #6698 #6696

@binshi-bing
Copy link
Contributor Author

For the timestamp fallback issue, I know what's going on now.
In short words, before we finish the merge process and pull forward target group's timeline with the merged TS, we can't let the target group to serve the keyspaces in the source groups.
image

@binshi-bing
Copy link
Contributor Author

there is a race condition. Target group was set merge state after handling target group's the put event at 2023/06/26 20:53:54.985. The entry "keyspace 1 -> group 1" was removed from lookup table at 20:53:54.978.
Between these two events, if the client query tso with keyspace group 0, getKeyspaceGroupMetaWithCheck will return the am of group 0 and pass checkTSOMerge since the group 0's put event hasn't been processed.
The keyspace group: 0 in the panic info shows it's exactly what happened.
panic: global timestamp fallback, new ts (1687812834954, 1003) <= the last one (1687812834975, 50). keyspace: 1, keyspace group: 0

The client (tsoServiceDiscovery.updateMember) periodically checks if a keyspace's group has changed with keyspace group 0 in the FindGroupByKeyspaceID request, the API will return keyspace group 0 which will be used in the next tso request.

In short words, client could send tso requests with keyspace 1 and default group 0 to server before the server side processes the put event and set merge state.

ti-chi-bot bot pushed a commit that referenced this issue Jun 28, 2023
close #6686, close #6698

- Adjust the merge operation order.
- Add some logs.
- Refine the code.

Signed-off-by: JmPotato <[email protected]>
@binshi-bing binshi-bing changed the title timestamp fallback after spliting followed by merging immediately mcs, tso: timestamp fallback after merging immediately Jun 29, 2023
@binshi-bing binshi-bing changed the title mcs, tso: timestamp fallback after merging immediately mcs, tso: timestamp fallback after merging Jun 29, 2023
rleungx pushed a commit to rleungx/pd that referenced this issue Aug 2, 2023
close tikv#6686, close tikv#6698

- Adjust the merge operation order.
- Add some logs.
- Refine the code.

Signed-off-by: JmPotato <[email protected]>
rleungx pushed a commit to rleungx/pd that referenced this issue Aug 2, 2023
close tikv#6686, close tikv#6698

- Adjust the merge operation order.
- Add some logs.
- Refine the code.

Signed-off-by: JmPotato <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type/enhancement The issue or PR belongs to an enhancement.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants