Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Always failed with error "rpc error: code = Canceled desc = CANCELLED" #1120

Closed
pingyu opened this issue Jan 17, 2024 · 1 comment
Closed

Comments

@pingyu
Copy link
Contributor

pingyu commented Jan 17, 2024

Summary

TiDB always failed with error: "rpc error: code = Canceled desc = CANCELLED"

[2024/01/15 10:09:35.346 +00:00] [WARN] [region_request.go:1838] ["receive a grpc cancel signal from remote"] [keyspaceName=ks1] [keyspaceID=1] [error="rpc error: code = Canceled desc = CANCELLED"]
[2024/01/15 10:09:35.444 +00:00] [WARN] [region_request.go:1838] ["receive a grpc cancel signal from remote"] [keyspaceName=ks1] [keyspaceID=1] [error="rpc error: code = Canceled desc = CANCELLED"]
[2024/01/15 10:09:35.586 +00:00] [WARN] [region_request.go:1838] ["receive a grpc cancel signal from remote"] [keyspaceName=ks1] [keyspaceID=1] [error="rpc error: code = Canceled desc = CANCELLED"]
[2024/01/15 10:09:35.881 +00:00] [WARN] [region_request.go:1838] ["receive a grpc cancel signal from remote"] [keyspaceName=ks1] [keyspaceID=1] [error="rpc error: code = Canceled desc = CANCELLED"]
[2024/01/15 10:09:36.407 +00:00] [WARN] [region_request.go:1838] ["receive a grpc cancel signal from remote"] [keyspaceName=ks1] [keyspaceID=1] [error="rpc error: code = Canceled desc = CANCELLED"]
[2024/01/15 10:09:37.524 +00:00] [WARN] [region_request.go:1838] ["receive a grpc cancel signal from remote"] [keyspaceName=ks1] [keyspaceID=1] [error="rpc error: code = Canceled desc = CANCELLED"]
[2024/01/15 10:09:39.281 +00:00] [WARN] [region_request.go:1838] ["receive a grpc cancel signal from remote"] [keyspaceName=ks1] [keyspaceID=1] [error="rpc error: code = Canceled desc = CANCELLED"]
[2024/01/15 10:09:40.417 +00:00] [WARN] [region_request.go:1838] ["receive a grpc cancel signal from remote"] [keyspaceName=ks1] [keyspaceID=1] [error="rpc error: code = Canceled desc = CANCELLED"]
[2024/01/15 10:09:42.013 +00:00] [WARN] [region_request.go:1838] ["receive a grpc cancel signal from remote"] [keyspaceName=ks1] [keyspaceID=1] [error="rpc error: code = Canceled desc = CANCELLED"]
...
...
[2024/01/15 10:09:43.875 +00:00] [WARN] [region_request.go:1838] ["receive a grpc cancel signal from remote"] [keyspaceName=ks1] [keyspaceID=1] [error="rpc error: code = Canceled desc = CANCELLED"]
[2024/01/15 10:09:45.700 +00:00] [WARN] [region_request.go:1838] ["receive a grpc cancel signal from remote"] [keyspaceName=ks1] [keyspaceID=1] [error="rpc error: code = Canceled desc = CANCELLED"]
[2024/01/15 10:09:47.357 +00:00] [WARN] [region_request.go:1838] ["receive a grpc cancel signal from remote"] [keyspaceName=ks1] [keyspaceID=1] [error="rpc error: code = Canceled desc = CANCELLED"]
[2024/01/15 10:09:48.798 +00:00] [WARN] [region_request.go:1838] ["receive a grpc cancel signal from remote"] [keyspaceName=ks1] [keyspaceID=1] [error="rpc error: code = Canceled desc = CANCELLED"]
[2024/01/15 10:09:50.696 +00:00] [WARN] [region_request.go:1838] ["receive a grpc cancel signal from remote"] [keyspaceName=ks1] [keyspaceID=1] [error="rpc error: code = Canceled desc = CANCELLED"]
[2024/01/15 10:09:52.686 +00:00] [WARN] [region_request.go:1838] ["receive a grpc cancel signal from remote"] [keyspaceName=ks1] [keyspaceID=1] [error="rpc error: code = Canceled desc = CANCELLED"]
[2024/01/15 10:09:54.047 +00:00] [WARN] [region_request.go:1838] ["receive a grpc cancel signal from remote"] [keyspaceName=ks1] [keyspaceID=1] [error="rpc error: code = Canceled desc = CANCELLED"]
[2024/01/15 10:09:55.050 +00:00] [WARN] [region_request.go:1838] ["receive a grpc cancel signal from remote"] [keyspaceName=ks1] [keyspaceID=1] [error="rpc error: code = Canceled desc = CANCELLED"]
[2024/01/15 10:09:56.344 +00:00] [WARN] [region_request.go:1838] ["receive a grpc cancel signal from remote"] [keyspaceName=ks1] [keyspaceID=1] [error="rpc error: code = Canceled desc = CANCELLED"]
[2024/01/15 10:09:57.729 +00:00] [WARN] [region_request.go:1838] ["receive a grpc cancel signal from remote"] [keyspaceName=ks1] [keyspaceID=1] [error="rpc error: code = Canceled desc = CANCELLED"]
[2024/01/15 10:09:59.091 +00:00] [WARN] [region_request.go:1838] ["receive a grpc cancel signal from remote"] [keyspaceName=ks1] [keyspaceID=1] [error="rpc error: code = Canceled desc = CANCELLED"]
[2024/01/15 10:10:00.986 +00:00] [WARN] [region_request.go:1838] ["receive a grpc cancel signal from remote"] [keyspaceName=ks1] [keyspaceID=1] [error="rpc error: code = Canceled desc = CANCELLED"]

Detail

When TiDB requested split on region 57:

Met "NotLeader", "EpochNotMatch" as region 57 was also a newly splitted region:

[2024/01/16 05:02:13.628 +00:00] [DEBUG] [region_request.go:2002] ["tikv reports `NotLeader` retry later"] [keyspaceName=ks1] [keyspaceID=1] [notLeader="region_id:57 "] [ctx="region ID: 57, meta: id:57 end_key:\"t\\200\\000\\377\\377\\377\\377\\377\\372\" region_epoch:<conf_ver:5 version:12 > peers:<id:58 store_id:1 > peers:<id:59 store_id:4 > peers:<id:60 store_id:5 > , peer: id:58 store_id:1 , addr: 127.0.0.1:21001, idx: 0, reqStoreType: TiKvOnly, runStoreType: tikv"]
[2024/01/16 05:02:13.629 +00:00] [DEBUG] [config.go:180] [backoff] [keyspaceName=ks1] [keyspaceID=1] [base=2] [sleep=2] [attempts=0]
[2024/01/16 05:02:13.631 +00:00] [DEBUG] [backoff.go:230] ["retry later"] [keyspaceName=ks1] [keyspaceID=1] [error="no leader, ctx: region ID: 57, meta: id:57 end_key:\"t\\200\\000\\377\\377\\377\\377\\377\\372\" region_epoch:<conf_ver:5 version:12 > peers:<id:58 store_id:1 > peers:<id:59 store_id:4 > peers:<id:60 store_id:5 > , peer: id:58 store_id:1 , addr: 127.0.0.1:21001, idx: 0, reqStoreType: TiKvOnly, runStoreType: tikv"] [totalSleep=4] [excludedSleep=0] [maxSleep=40000] [type=regionScheduling] [txnStartTS=null]
[2024/01/16 05:02:13.634 +00:00] [DEBUG] [region_request.go:2002] ["tikv reports `NotLeader` retry later"] [keyspaceName=ks1] [keyspaceID=1] [notLeader="region_id:57 "] [ctx="region ID: 57, meta: id:57 end_key:\"t\\200\\000\\377\\377\\377\\377\\377\\372\" region_epoch:<conf_ver:5 version:12 > peers:<id:58 store_id:1 > peers:<id:59 store_id:4 > peers:<id:60 store_id:5 > , peer: id:59 store_id:4 , addr: 127.0.0.1:21002, idx: 0, reqStoreType: TiKvOnly, runStoreType: tikv"]
[2024/01/16 05:02:13.636 +00:00] [DEBUG] [config.go:180] [backoff] [keyspaceName=ks1] [keyspaceID=1] [base=2] [sleep=4] [attempts=1]
[2024/01/16 05:02:13.640 +00:00] [DEBUG] [backoff.go:230] ["retry later"] [keyspaceName=ks1] [keyspaceID=1] [error="no leader, ctx: region ID: 57, meta: id:57 end_key:\"t\\200\\000\\377\\377\\377\\377\\377\\372\" region_epoch:<conf_ver:5 version:12 > peers:<id:58 store_id:1 > peers:<id:59 store_id:4 > peers:<id:60 store_id:5 > , peer: id:59 store_id:4 , addr: 127.0.0.1:21002, idx: 0, reqStoreType: TiKvOnly, runStoreType: tikv"] [totalSleep=8] [excludedSleep=0] [maxSleep=40000] [type=regionScheduling] [txnStartTS=null]
[2024/01/16 05:02:13.656 +00:00] [DEBUG] [region_request.go:2002] ["tikv reports `NotLeader` retry later"] [keyspaceName=ks1] [keyspaceID=1] [notLeader="region_id:57 "] [ctx="region ID: 57, meta: id:57 end_key:\"t\\200\\000\\377\\377\\377\\377\\377\\372\" region_epoch:<conf_ver:5 version:12 > peers:<id:58 store_id:1 > peers:<id:59 store_id:4 > peers:<id:60 store_id:5 > , peer: id:60 store_id:5 , addr: 127.0.0.1:21003, idx: 0, reqStoreType: TiKvOnly, runStoreType: tikv"]
[2024/01/16 05:02:13.659 +00:00] [DEBUG] [config.go:180] [backoff] [keyspaceName=ks1] [keyspaceID=1] [base=2] [sleep=8] [attempts=2]
[2024/01/16 05:02:13.667 +00:00] [DEBUG] [backoff.go:230] ["retry later"] [keyspaceName=ks1] [keyspaceID=1] [error="no leader, ctx: region ID: 57, meta: id:57 end_key:\"t\\200\\000\\377\\377\\377\\377\\377\\372\" region_epoch:<conf_ver:5 version:12 > peers:<id:58 store_id:1 > peers:<id:59 store_id:4 > peers:<id:60 store_id:5 > , peer: id:60 store_id:5 , addr: 127.0.0.1:21003, idx: 0, reqStoreType: TiKvOnly, runStoreType: tikv"] [totalSleep=16] [excludedSleep=0] [maxSleep=40000] [type=regionScheduling] [txnStartTS=null]
[2024/01/16 05:02:13.667 +00:00] [INFO] [region_request.go:1533] ["throwing pseudo region error due to no replica available"] [keyspaceName=ks1] [keyspaceID=1] [req-ts=0] [req-type=SplitRegion] [region="{ region id: 57, ver: 12, confVer: 5 }"] [replica-read-type=leader] [stale-read=false] [request-sender="{rpcError:<nil>,replicaSelector: replicaSelector{selectorStateStr: tryFollower, cacheRegionIsValid: false, replicaStatus: [peer: 58, store: 1, isEpochStale: false, attempts: 1, replica-epoch: 0, store-epoch: 0, store-state: resolved, store-liveness-state: reachable peer: 59, store: 4, isEpochStale: false, attempts: 1, replica-epoch: 0, store-epoch: 0, store-state: resolved, store-liveness-state: reachable peer: 60, store: 5, isEpochStale: false, attempts: 1, replica-epoch: 0, store-epoch: 0, store-state: resolved, store-liveness-state: reachable]}}"] [retry-times=3] [total-backoff-ms=16] [total-backoff-times=4] [max-exec-timeout-ms=30000] [total-region-errors="58-not_leader-nil:1, 59-not_leader-nil:1, 60-not_leader-nil:1"]
[2024/01/16 05:02:13.667 +00:00] [DEBUG] [config.go:180] [backoff] [keyspaceName=ks1] [keyspaceID=1] [base=2] [sleep=4] [attempts=1]
[2024/01/16 05:02:13.671 +00:00] [DEBUG] [backoff.go:230] ["retry later"] [keyspaceName=ks1] [keyspaceID=1] [error="epoch_not_match:<> "] [totalSleep=20] [excludedSleep=0] [maxSleep=40000] [type=regionMiss] [txnStartTS=null]
[2024/01/16 05:02:13.685 +00:00] [DEBUG] [config.go:180] [backoff] [keyspaceName=ks1] [keyspaceID=1] [base=2000] [sleep=1801] [attempts=0]
[2024/01/16 05:02:15.487 +00:00] [DEBUG] [backoff.go:230] ["retry later"] [keyspaceName=ks1] [keyspaceID=1] [error="server is busy, ctx: region ID: 57, meta: id:57 end_key:\"t\\200\\000\\377\\377\\377\\377\\377\\372\" region_epoch:<conf_ver:5 version:12 > peers:<id:58 store_id:1 > peers:<id:59 store_id:4 > peers:<id:60 store_id:5 > , peer: id:58 store_id:1 , addr: 127.0.0.1:21001, idx: 0, reqStoreType: TiKvOnly, runStoreType: tikv"] [totalSleep=1821] [excludedSleep=1801] [maxSleep=40000] [type=tikvServerBusy] [txnStartTS=null]

(The "server is busy" was not an exception, which was caused by the region 57 was not initial flush yet at that time).

Then met the first "receive a grpc cancel signal from remote", "rpc error: code = Canceled desc = CANCELLED":

[2024/01/16 05:02:15.487 +00:00] [WARN] [region_request.go:1838] ["receive a grpc cancel signal from remote"] [keyspaceName=ks1] [keyspaceID=1] [error="rpc error: code = Canceled desc = CANCELLED"] [error="rpc error: code = Canceled desc = CANCELLED"]
[2024/01/16 05:02:15.488 +00:00] [DEBUG] [config.go:180] [backoff] [keyspaceName=ks1] [keyspaceID=1] [base=100] [sleep=53] [attempts=0]

After that, it kept failed on the same error. Also note that the interval from "retry later" and "receive a grpc cancel signal from remote" was short (about 1ms):

[2024/01/16 05:02:15.542 +00:00] [DEBUG] [backoff.go:230] ["retry later"] [keyspaceName=ks1] [keyspaceID=1] [error="send tikv request error: rpc error: code = Canceled desc = CANCELLED, ctx: region ID: 57, meta: id:57 end_key:\"t\\200\\000\\377\\377\\377\\377\\377\\372\" region_epoch:<conf_ver:5 version:12 > peers:<id:58 store_id:1 > peers:<id:59 store_id:4 > peers:<id:60 store_id:5 > , peer: id:58 store_id:1 , addr: 127.0.0.1:21001, idx: 0, reqStoreType: TiKvOnly, runStoreType: tikv, try next peer later"] [totalSleep=1874] [excludedSleep=1801] [maxSleep=40000] [type=tikvRPC] [txnStartTS=null]
[2024/01/16 05:02:15.542 +00:00] [WARN] [region_request.go:1838] ["receive a grpc cancel signal from remote"] [keyspaceName=ks1] [keyspaceID=1] [error="rpc error: code = Canceled desc = CANCELLED"] [error="rpc error: code = Canceled desc = CANCELLED"]
[2024/01/16 05:02:15.543 +00:00] [DEBUG] [config.go:180] [backoff] [keyspaceName=ks1] [keyspaceID=1] [base=100] [sleep=178] [attempts=1]
[2024/01/16 05:02:15.724 +00:00] [DEBUG] [backoff.go:230] ["retry later"] [keyspaceName=ks1] [keyspaceID=1] [error="send tikv request error: rpc error: code = Canceled desc = CANCELLED, ctx: region ID: 57, meta: id:57 end_key:\"t\\200\\000\\377\\377\\377\\377\\377\\372\" region_epoch:<conf_ver:5 version:12 > peers:<id:58 store_id:1 > peers:<id:59 store_id:4 > peers:<id:60 store_id:5 > , peer: id:58 store_id:1 , addr: 127.0.0.1:21001, idx: 0, reqStoreType: TiKvOnly, runStoreType: tikv, try next peer later"] [totalSleep=2052] [excludedSleep=1801] [maxSleep=40000] [type=tikvRPC] [txnStartTS=null]
[2024/01/16 05:02:15.725 +00:00] [WARN] [region_request.go:1838] ["receive a grpc cancel signal from remote"] [keyspaceName=ks1] [keyspaceID=1] [error="rpc error: code = Canceled desc = CANCELLED"] [error="rpc error: code = Canceled desc = CANCELLED"]
[2024/01/16 05:02:15.727 +00:00] [DEBUG] [config.go:180] [backoff] [keyspaceName=ks1] [keyspaceID=1] [base=100] [sleep=344] [attempts=2]
[2024/01/16 05:02:16.072 +00:00] [DEBUG] [backoff.go:230] ["retry later"] [keyspaceName=ks1] [keyspaceID=1] [error="send tikv request error: rpc error: code = Canceled desc = CANCELLED, ctx: region ID: 57, meta: id:57 end_key:\"t\\200\\000\\377\\377\\377\\377\\377\\372\" region_epoch:<conf_ver:5 version:12 > peers:<id:58 store_id:1 > peers:<id:59 store_id:4 > peers:<id:60 store_id:5 > , peer: id:58 store_id:1 , addr: 127.0.0.1:21001, idx: 0, reqStoreType: TiKvOnly, runStoreType: tikv, try next peer later"] [totalSleep=2396] [excludedSleep=1801] [maxSleep=40000] [type=tikvRPC] [txnStartTS=null]
[2024/01/16 05:02:16.073 +00:00] [WARN] [region_request.go:1838] ["receive a grpc cancel signal from remote"] [keyspaceName=ks1] [keyspaceID=1] [error="rpc error: code = Canceled desc = CANCELLED"] [error="rpc error: code = Canceled desc = CANCELLED"]
[2024/01/16 05:02:16.076 +00:00] [DEBUG] [config.go:180] [backoff] [keyspaceName=ks1] [keyspaceID=1] [base=100] [sleep=735] [attempts=3]
[2024/01/16 05:02:16.791 +00:00] [DEBUG] [pd.go:108] ["min resolved ts"] [keyspaceName=ks1] [keyspaceID=1] [resp="{\n  \"is_real_time\": true,\n  \"min_resolved_ts\": 0,\n  \"persist_interval\": \"1s\"\n}\n"]
[2024/01/16 05:02:16.791 +00:00] [DEBUG] [pd.go:108] ["min resolved ts"] [keyspaceName=ks1] [keyspaceID=1] [resp="{\n  \"is_real_time\": true,\n  \"min_resolved_ts\": 0,\n  \"persist_interval\": \"1s\"\n}\n"]
[2024/01/16 05:02:16.815 +00:00] [DEBUG] [backoff.go:230] ["retry later"] [keyspaceName=ks1] [keyspaceID=1] [error="send tikv request error: rpc error: code = Canceled desc = CANCELLED, ctx: region ID: 57, meta: id:57 end_key:\"t\\200\\000\\377\\377\\377\\377\\377\\372\" region_epoch:<conf_ver:5 version:12 > peers:<id:58 store_id:1 > peers:<id:59 store_id:4 > peers:<id:60 store_id:5 > , peer: id:58 store_id:1 , addr: 127.0.0.1:21001, idx: 0, reqStoreType: TiKvOnly, runStoreType: tikv, try next peer later"] [totalSleep=3131] [excludedSleep=1801] [maxSleep=40000] [type=tikvRPC] [txnStartTS=null]
[2024/01/16 05:02:16.817 +00:00] [WARN] [region_request.go:1838] ["receive a grpc cancel signal from remote"] [keyspaceName=ks1] [keyspaceID=1] [error="rpc error: code = Canceled desc = CANCELLED"] [error="rpc error: code = Canceled desc = CANCELLED"]
[2024/01/16 05:02:16.819 +00:00] [DEBUG] [config.go:180] [backoff] [keyspaceName=ks1] [keyspaceID=1] [base=100] [sleep=1255] [attempts=4]
[2024/01/16 05:02:18.075 +00:00] [DEBUG] [backoff.go:230] ["retry later"] [keyspaceName=ks1] [keyspaceID=1] [error="send tikv request error: rpc error: code = Canceled desc = CANCELLED, ctx: region ID: 57, meta: id:57 end_key:\"t\\200\\000\\377\\377\\377\\377\\377\\372\" region_epoch:<conf_ver:5 version:12 > peers:<id:58 store_id:1 > peers:<id:59 store_id:4 > peers:<id:60 store_id:5 > , peer: id:58 store_id:1 , addr: 127.0.0.1:21001, idx: 0, reqStoreType: TiKvOnly, runStoreType: tikv, try next peer later"] [totalSleep=4386] [excludedSleep=1801] [maxSleep=40000] [type=tikvRPC] [txnStartTS=null]
[2024/01/16 05:02:18.107 +00:00] [WARN] [region_request.go:1838] ["receive a grpc cancel signal from remote"] [keyspaceName=ks1] [keyspaceID=1] [error="rpc error: code = Canceled desc = CANCELLED"] [error="rpc error: code = Canceled desc = CANCELLED"]
[2024/01/16 05:02:18.108 +00:00] [DEBUG] [config.go:180] [backoff] [keyspaceName=ks1] [keyspaceID=1] [base=100] [sleep=1515] [attempts=5]
[2024/01/16 05:02:18.791 +00:00] [DEBUG] [pd.go:108] ["min resolved ts"] [keyspaceName=ks1] [keyspaceID=1] [resp="{\n  \"is_real_time\": true,\n  \"min_resolved_ts\": 0,\n  \"persist_interval\": \"1s\"\n}\n"]
[2024/01/16 05:02:18.807 +00:00] [DEBUG] [pd.go:108] ["min resolved ts"] [keyspaceName=ks1] [keyspaceID=1] [resp="{\n  \"is_real_time\": true,\n  \"min_resolved_ts\": 0,\n  \"persist_interval\": \"1s\"\n}\n"]
[2024/01/16 05:02:19.623 +00:00] [DEBUG] [backoff.go:230] ["retry later"] [keyspaceName=ks1] [keyspaceID=1] [error="send tikv request error: rpc error: code = Canceled desc = CANCELLED, ctx: region ID: 57, meta: id:57 end_key:\"t\\200\\000\\377\\377\\377\\377\\377\\372\" region_epoch:<conf_ver:5 version:12 > peers:<id:58 store_id:1 > peers:<id:59 store_id:4 > peers:<id:60 store_id:5 > , peer: id:58 store_id:1 , addr: 127.0.0.1:21001, idx: 0, reqStoreType: TiKvOnly, runStoreType: tikv, try next peer later"] [totalSleep=5901] [excludedSleep=1801] [maxSleep=40000] [type=tikvRPC] [txnStartTS=null]
[2024/01/16 05:02:19.624 +00:00] [WARN] [region_request.go:1838] ["receive a grpc cancel signal from remote"] [keyspaceName=ks1] [keyspaceID=1] [error="rpc error: code = Canceled desc = CANCELLED"] [error="rpc error: code = Canceled desc = CANCELLED"]
[2024/01/16 05:02:19.625 +00:00] [DEBUG] [config.go:180] [backoff] [keyspaceName=ks1] [keyspaceID=1] [base=100] [sleep=1174] [attempts=6]
[2024/01/16 05:02:20.791 +00:00] [DEBUG] [pd.go:108] ["min resolved ts"] [keyspaceName=ks1] [keyspaceID=1] [resp="{\n  \"is_real_time\": true,\n  \"min_resolved_ts\": 0,\n  \"persist_interval\": \"1s\"\n}\n"]
[2024/01/16 05:02:20.801 +00:00] [DEBUG] [backoff.go:230] ["retry later"] [keyspaceName=ks1] [keyspaceID=1] [error="send tikv request error: rpc error: code = Canceled desc = CANCELLED, ctx: region ID: 57, meta: id:57 end_key:\"t\\200\\000\\377\\377\\377\\377\\377\\372\" region_epoch:<conf_ver:5 version:12 > peers:<id:58 store_id:1 > peers:<id:59 store_id:4 > peers:<id:60 store_id:5 > , peer: id:58 store_id:1 , addr: 127.0.0.1:21001, idx: 0, reqStoreType: TiKvOnly, runStoreType: tikv, try next peer later"] [totalSleep=7075] [excludedSleep=1801] [maxSleep=40000] [type=tikvRPC] [txnStartTS=null]
[2024/01/16 05:02:20.807 +00:00] [DEBUG] [pd.go:108] ["min resolved ts"] [keyspaceName=ks1] [keyspaceID=1] [resp="{\n  \"is_real_time\": true,\n  \"min_resolved_ts\": 0,\n  \"persist_interval\": \"1s\"\n}\n"]
[2024/01/16 05:02:20.807 +00:00] [WARN] [region_request.go:1838] ["receive a grpc cancel signal from remote"] [keyspaceName=ks1] [keyspaceID=1] [error="rpc error: code = Canceled desc = CANCELLED"] [error="rpc error: code = Canceled desc = CANCELLED"]
[2024/01/16 05:02:20.808 +00:00] [DEBUG] [config.go:180] [backoff] [keyspaceName=ks1] [keyspaceID=1] [base=100] [sleep=1646] [attempts=7]
[2024/01/16 05:02:22.455 +00:00] [DEBUG] [backoff.go:230] ["retry later"] [keyspaceName=ks1] [keyspaceID=1] [error="send tikv request error: rpc error: code = Canceled desc = CANCELLED, ctx: region ID: 57, meta: id:57 end_key:\"t\\200\\000\\377\\377\\377\\377\\377\\372\" region_epoch:<conf_ver:5 version:12 > peers:<id:58 store_id:1 > peers:<id:59 store_id:4 > peers:<id:60 store_id:5 > , peer: id:58 store_id:1 , addr: 127.0.0.1:21001, idx: 0, reqStoreType: TiKvOnly, runStoreType: tikv, try next peer later"] [totalSleep=8721] [excludedSleep=1801] [maxSleep=40000] [type=tikvRPC] [txnStartTS=null]
[2024/01/16 05:02:22.457 +00:00] [WARN] [region_request.go:1838] ["receive a grpc cancel signal from remote"] [keyspaceName=ks1] [keyspaceID=1] [error="rpc error: code = Canceled desc = CANCELLED"] [error="rpc error: code = Canceled desc = CANCELLED"]
[2024/01/16 05:02:22.462 +00:00] [DEBUG] [config.go:180] [backoff] [keyspaceName=ks1] [keyspaceID=1] [base=100] [sleep=1138] [attempts=8]
[2024/01/16 05:02:22.791 +00:00] [DEBUG] [pd.go:108] ["min resolved ts"] [keyspaceName=ks1] [keyspaceID=1] [resp="{\n  \"is_real_time\": true,\n  \"min_resolved_ts\": 0,\n  \"persist_interval\": \"1s\"\n}\n"]
[2024/01/16 05:02:22.807 +00:00] [DEBUG] [pd.go:108] ["min resolved ts"] [keyspaceName=ks1] [keyspaceID=1] [resp="{\n  \"is_real_time\": true,\n  \"min_resolved_ts\": 0,\n  \"persist_interval\": \"1s\"\n}\n"]
[2024/01/16 05:02:23.603 +00:00] [DEBUG] [backoff.go:230] ["retry later"] [keyspaceName=ks1] [keyspaceID=1] [error="send tikv request error: rpc error: code = Canceled desc = CANCELLED, ctx: region ID: 57, meta: id:57 end_key:\"t\\200\\000\\377\\377\\377\\377\\377\\372\" region_epoch:<conf_ver:5 version:12 > peers:<id:58 store_id:1 > peers:<id:59 store_id:4 > peers:<id:60 store_id:5 > , peer: id:58 store_id:1 , addr: 127.0.0.1:21001, idx: 0, reqStoreType: TiKvOnly, runStoreType: tikv, try next peer later"] [totalSleep=9859] [excludedSleep=1801] [maxSleep=40000] [type=tikvRPC] [txnStartTS=null]
[2024/01/16 05:02:23.615 +00:00] [DEBUG] [region_request.go:2119] ["tikv reports `EpochNotMatch` retry later"] [keyspaceName=ks1] [keyspaceID=1] [EpochNotMatch="current_regions:<id:57 end_key:\"t\\200\\000\\377\\377\\377\\377\\377\\372\" region_epoch:<conf_ver:12 version:13 > peers:<id:59 store_id:4 > peers:<id:60 store_id:5 > peers:<id:65 store_id:6 > > "] [ctx="region ID: 57, meta: id:57 end_key:\"t\\200\\000\\377\\377\\377\\377\\377\\372\" region_epoch:<conf_ver:5 version:12 > peers:<id:58 store_id:1 > peers:<id:59 store_id:4 > peers:<id:60 store_id:5 > , peer: id:59 store_id:4 , addr: 127.0.0.1:21002, idx: 0, reqStoreType: TiKvOnly, runStoreType: tikv"]
[2024/01/16 05:02:23.619 +00:00] [INFO] [region_cache.go:2682] ["change store resolve state"] [keyspaceName=ks1] [keyspaceID=1] [store=6] [addr=127.0.0.1:21004] [from=unresolved] [to=resolved] [liveness-state=reachable]

Then the request was finally failed due to retry exhausted:

[2024/01/16 05:02:23.619 +00:00] [INFO] [region_request.go:1533] ["send request meet region error without retry"] [keyspaceName=ks1] [keyspaceID=1] [req-ts=0] [req-type=SplitRegion] [region="{ region id: 57, ver: 12, confVer: 5 }"] [replica-read-type=leader] [stale-read=false] [request-sender="{rpcError:rpc error: code = Canceled desc = CANCELLED,replicaSelector: replicaSelector{selectorStateStr: tryFollower, cacheRegionIsValid: false, replicaStatus: [peer: 58, store: 1, isEpochStale: false, attempts: 10, replica-epoch: 0, store-epoch: 0, store-state: resolved, store-liveness-state: reachable peer: 59, store: 4, isEpochStale: false, attempts: 1, replica-epoch: 0, store-epoch: 0, store-state: resolved, store-liveness-state: reachable peer: 60, store: 5, isEpochStale: false, attempts: 0, replica-epoch: 0, store-epoch: 0, store-state: resolved, store-liveness-state: reachable]}}"] [retry-times=10] [total-backoff-ms=9859] [total-backoff-times=15] [max-exec-timeout-ms=30000] [total-region-errors="58-server_is_busy:1, 59-epoch_not_match:1"]

At TiKV side, the region split was actually completed, and no other warning or error except the "server is busy" due to the region had not initially flushed.

So it seems that the canceled was not initiated by TiKV side.

After investigate some codes, along with the behavior that TiDB kept failed on the same error, and the duration of request was short, the cancel seems to be caused by keep alive timeout.

When keepalive timeout (see http2Client.keepalive), grpc-go will close the connection, and cancel the http2Client.ctx. From then on, when invoking RPC requests, as ctx.Done() is closed (see http2Client.NewStream), we will always get the error as Canceled.

@pingyu
Copy link
Contributor Author

pingyu commented Jan 19, 2024

Close by #1121

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant