Capture list increased 1 unexpected after all PD restarting #2388

Tammyxia · 2021-07-27T11:24:33Z

Bug Report

Please answer these questions before submitting your issue. Thanks!

What did you do? If possible, provide a recipe for reproducing the error.

2x capture:
Starting component cdc: /root/.tiup/components/cdc/v5.1.0/cdc cli capture list --pd=http://172.16.6.24:2379
[
{
"id": "3378b726-26cb-4963-8280-3ee679024a76",
"is-owner": false,
"address": "172.16.6.32:8300"
},
{
"id": "d2600da4-cdc4-4420-84a4-e57e826ffbc7",
"is-owner": true,
"address": "172.16.6.31:8300"
}
]
Restart all PD: $ tiup cluster restart 360UP -R pd
Check capture list

What did you expect to see?

2x capture, and their status is normal.

What did you see instead?

3x capture list unexpected in a short time after PD restarting:
Starting component cdc: /root/.tiup/components/cdc/v5.1.0/cdc cli capture list --pd=http://172.16.6.24:2379
[
{
"id": "2b0211bd-f3fc-4551-b4d1-a5c6bba5818e",
"is-owner": false,
"address": "172.16.6.32:8300"
},
{
"id": "3378b726-26cb-4963-8280-3ee679024a76",
"is-owner": false,
"address": "172.16.6.32:8300"
},
{
"id": "d2600da4-cdc4-4420-84a4-e57e826ffbc7",
"is-owner": true,
"address": "172.16.6.31:8300"
}
]
Waiting for several seconds, 2x capture list as expected:
Starting component cdc: /root/.tiup/components/cdc/v5.1.0/cdc cli capture list --pd=http://172.16.6.24:2379
[
{
"id": "2b0211bd-f3fc-4551-b4d1-a5c6bba5818e",
"is-owner": true,
"address": "172.16.6.32:8300"
},
{
"id": "4b8e6f29-847d-4caf-bc7a-ea8cba317a28",
"is-owner": false,
"address": "172.16.6.31:8300"
}
]

Versions of the cluster
- Upstream TiDB cluster version (execute SELECT tidb_version(); in a MySQL client):
```
4.0.14
```
- TiCDC version (execute cdc version):
```
[release-version=v4.0.14] [git-hash=5a7851967f686da896b45acd3f3e968bfe53d6bd] [git-branch=heads/refs/tags/v4.0.14]
```

The text was updated successfully, but these errors were encountered:

3AceShowHand · 2021-08-01T17:27:04Z

3378b726-26cb-4963-8280-3ee679024a76_172.16.6.32:8300_ false / d2600da4-cdc4-4420-84a4-e57e826ffbc7_172.16.6.31:8300_true
2b0211bd-f3fc-4551-b4d1-a5c6bba5818e_172.16.6.32:8300_false / 3378b726-26cb-4963-8280-3ee679024a76_172.16.6.32:8300_false / d2600da4-cdc4-4420-84a4-e57e826ffbc7_172.16.6.31:8300_true
2b0211bd-f3fc-4551-b4d1-a5c6bba5818e_172.16.6.32:8300_ true / 4b8e6f29-847d-4caf-bc7a-ea8cba317a28_172.16.6.31:8300_false

3AceShowHand · 2021-08-01T17:33:59Z

a new capture (2b02) was assigned to 6.32.
old captures (3378 / d260) dropped, and 6.32 become the owner.
a new capture (4b8e) was assigned to 6.31.

What and when capture id generated and assigned.
Is there any mechanism to prevent 2 captures assigned to the same server?
- If there is, we should make sure it becomes persistent after the old one gets dropped.

3AceShowHand · 2021-08-01T17:58:48Z

defer func() {
	timeoutCtx, cancel := context.WithTimeout(context.Background(), 5*time.Second)
	if err := ctx.GlobalVars().EtcdClient.DeleteCaptureInfo(timeoutCtx, c.info.ID); err != nil {
		log.Warn("failed to delete capture info when capture exited", zap.Error(err))
	}
	cancel()
}()

when all PD restart, CDC also cannot get touch with Etcd, so 5s timeout is too small, maybe set to a larger timeout for a workaround.

3AceShowHand · 2021-08-01T18:00:40Z

when CDC server cannot get in touch with PD, they will meet PD related error, then failed to run, then get dropped.

3AceShowHand · 2021-08-01T18:02:07Z

drop old capture info and put new capture info should be atomic.

3AceShowHand · 2021-08-02T18:01:38Z

the problem only happens in v4.0.14

3AceShowHand · 2021-08-11T03:44:11Z

close with #2388

Tammyxia added type/bug The issue is confirmed as a bug. severity/minor labels Jul 27, 2021

asddongmen added bug-from-internal-test Bugs found by internal testing. component/status-server Status server component. difficulty/easy Easy task. labels Jul 28, 2021

overvenus assigned 3AceShowHand Jul 29, 2021

3AceShowHand mentioned this issue Aug 2, 2021

[DNM] capture: make capture info register & unregister atomic. #2432

Closed

3AceShowHand mentioned this issue Aug 2, 2021

server: close session if old capture dead. #2447

Merged

ti-chi-bot mentioned this issue Aug 5, 2021

server: close session if old capture dead. (#2447) #2466

Merged

3AceShowHand closed this as completed Aug 11, 2021

AkiraXie added the area/ticdc Issues or PRs related to TiCDC. label Mar 9, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Capture list increased 1 unexpected after all PD restarting #2388

Capture list increased 1 unexpected after all PD restarting #2388

Tammyxia commented Jul 27, 2021

3AceShowHand commented Aug 1, 2021

3AceShowHand commented Aug 1, 2021

3AceShowHand commented Aug 1, 2021

3AceShowHand commented Aug 1, 2021

3AceShowHand commented Aug 1, 2021

3AceShowHand commented Aug 2, 2021

3AceShowHand commented Aug 11, 2021

Capture list increased 1 unexpected after all PD restarting #2388

Capture list increased 1 unexpected after all PD restarting #2388

Comments

Tammyxia commented Jul 27, 2021

Bug Report

3AceShowHand commented Aug 1, 2021

3AceShowHand commented Aug 1, 2021

3AceShowHand commented Aug 1, 2021

3AceShowHand commented Aug 1, 2021

3AceShowHand commented Aug 1, 2021

3AceShowHand commented Aug 2, 2021

3AceShowHand commented Aug 11, 2021