Skip to content
This repository has been archived by the owner on Jul 24, 2024. It is now read-only.

br restore failed when split range and pd unavailable for in 3-5s, which is not expected #1305

Open
Tammyxia opened this issue Jul 1, 2021 · 0 comments

Comments

@Tammyxia
Copy link

Tammyxia commented Jul 1, 2021

Please answer these questions before submitting your issue. Thanks!

  1. What did you do?
    If possible, provide a recipe for reproducing the error.
  • br restore full to S3
  • tiup cluster restart xxx -R pd, the tidb cluster has only one pd, so pd unavaible for only 3-5s.
  • br restore failed.
  1. What did you expect to see?
    br restore can tolerate 1-3minutes when split range and pd unavailable

  2. What did you see instead?
    br log:
    [2021/07/01 14:11:26.245 +08:00] [INFO] [base_client.go:296] ["[pd] cannot update member from this address"] [address=http://172.16.6.6:12379] [error="[PD:client:ErrClientGetMember]error:rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial tcp 172.16.6.6:12379: connect: connection refused" target:172.16.6.6:12379 status:TRANSIENT_FAILURE"]
    [2021/07/01 14:11:26.245 +08:00] [ERROR] [base_client.go:166] ["[pd] failed updateMember"] [error="[PD:client:ErrClientGetLeader]get leader from [http://172.16.6.6:12379] error"] [stack="github.com/tikv/pd/client.(*baseClient).memberLoop\n\tgithub.com/tikv/[email protected]/client/base_client.go:166"]

...
[2021/07/01 14:11:26.855 +08:00] [ERROR] [base_client.go:166] ["[pd] failed updateMember"] [error="[PD:client:ErrClientGetLeader]get leader from [http://172.16.6.6:12379] error"] [stack="github.com/tikv/pd/client.(*baseClient).memberLoop\n\tgithub.com/tikv/[email protected].
0.20210323121136-78679e5e209d/client/base_client.go:166"]
[2021/07/01 14:11:26.855 +08:00] [ERROR] [pipeline_items.go:236] ["failed on split range"] [ranges="{total=178,ranges="[\"[7480000000000014855F69800000000000000300, 7480000000000014855F698000000000000003FB)\",\"(skip 176)\",\"[74800000000000F6075F72000000000000
0000, 74800000000000F6075F72FFFFFFFFFFFFFFFF00)\"]",totalFiles=205,totalKVs=5309510,totalBytes=737344350,totalSize=737344350}"] [error="rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial tcp 172.16.6.6:12379: connect: co
nnection refused""] [errorVerbose="rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial tcp 172.16.6.6:12379: connect: connection refused"\ngithub.com/tikv/pd/client.(*client).ScanRegions\n\tgithub.com/tikv/[email protected]
.20210323121136-78679e5e209d/client/client.go:1100\ngithub.com/pingcap/br/pkg/restore.(*pdClient).ScanRegions\n\tgithub.com/pingcap/br/pkg/restore/split_client.go:385\ngithub.com/pingcap/br/pkg/restore.PaginateScanRegion\n\tgithub.com/pingcap/br/pkg/restore/split.go:298\n
github.com/pingcap/br/pkg/restore.(*RegionSplitter).Split\n\tgithub.com/pingcap/br/pkg/restore/split.go:113\ngithub.com/pingcap/br/pkg/restore.SplitRanges\n\tgithub.com/pingcap/br/pkg/restore/util.go:390\ngithub.com/pingcap/br/pkg/restore.(*tikvSender).splitWorker\n\tgith
ub.com/pingcap/br/pkg/restore/pipeline_items.go:235\nruntime.goexit\n\truntime/asm_amd64.s:1371"] [stack="github.com/pingcap/br/pkg/restore.(*tikvSender).splitWorker\n\tgithub.com/pingcap/br/pkg/restore/pipeline_items.go:236"]
...

[2021/07/01 14:11:29.487 +08:00] [ERROR] [restore.go:35] ["failed to restore"] [error="rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial tcp 172.16.6.6:12379: connect: connection refused""] [errorVerbose="rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial tcp 172.16.6.6:12379: connect: connection refused\

  1. What version of BR and TiDB/TiKV/PD are you using?
  1. Operation logs

    • Please upload br.log for BR if possible
    • Please upload tidb-lightning.log for TiDB-Lightning if possible
    • Please upload tikv-importer.log from TiKV-Importer if possible
    • Other interesting logs
  2. Configuration of the cluster and the task

    • tidb-lightning.toml for TiDB-Lightning if possible
    • tikv-importer.toml for TiKV-Importer if possible
    • topology.yml if deployed by TiUP
  3. Screenshot/exported-PDF of Grafana dashboard or metrics' graph in Prometheus if possible

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.