Skip to content
This repository has been archived by the owner on Aug 21, 2023. It is now read-only.

The speed of dumpling data from tidb to s3 drops 2+GBit/s to 200Mbit/s #223

Closed
cyliu0 opened this issue Dec 26, 2020 · 3 comments
Closed
Labels
priority/P2 Medium priority issue

Comments

@cyliu0
Copy link

cyliu0 commented Dec 26, 2020

Bug Report

Please answer these questions before submitting your issue. Thanks!

  1. What did you do? If possible, provide a recipe for reproducing the error.
    Dumpling 4.2TB tpcc data from tidb to s3.

  2. What did you expect to see?
    The speed is stable.

  3. What did you see instead?
    The speed drops from 2Gbit/s to 200Mbit/s

  4. Versions of the cluster

    • Dumpling version (run dumpling -V):

      # ./bin/dumpling -V
      Release version: v4.0.9-2-g34903c7
      Git commit hash: 34903c7475cfbfc715c03e0a4e5029c640f33234
      Git branch:      master
      Build timestamp: 2020-12-25 12:35:36Z
      Go version:      go version go1.15.6 linux/amd64
      
    • Source database version (execute SELECT version(); in a MySQL client):

      TiDB v4.0.9
      
    • Other interesting information (system version, hardware config, etc):

      The dumpling host is an AWS EC2 c5.9xlarge instance w/ ubuntu

The flame graph for 60 seconds during the 200Mbit/s period flame-graph.tar.gz

  1. Operation logs
    • Please upload dumpling.log for Dumpling if possible
root@tidb-lightning-ln57s:~/dumpling# ./bin/dumpling -h xxxx -P 4000 -u root -p 'xxxx' -o "s3://xxxxx" -t 32 -r 20000 -F 256M | tee dumpling.log
Release version: v4.0.9-2-g34903c7
Git commit hash: 34903c7475cfbfc715c03e0a4e5029c640f33234
Git branch:      master
Build timestamp: 2020-12-25 12:35:36Z
Go version:      go version go1.15.6 linux/amd64
[2020/12/25 14:47:55.485 +08:00] [INFO] [config.go:598] ["detect server type"] [type=TiDB]
[2020/12/25 14:47:55.485 +08:00] [INFO] [config.go:617] ["detect server version"] [version=4.0.9]
{"level":"warn","ts":"2020-12-25T14:48:05.702+0800","caller":"clientv3/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"xxxxxxxxxxxxxxxxxxxxxxxxxxxx","attempt":0,"error":"rpc error: code = DeadlineExceeded desc = latest balancer error: all SubConns are in TransientFailure, latest connection error: connection closed"}
[2020/12/25 14:48:05.702 +08:00] [WARN] [dump.go:699] ["meet error while check whether fetched pd addr and TiDB belongs to one cluster"] [error="context deadline exceeded"] [errorVerbose="context deadline exceeded\ngithub.com/pingcap/errors.AddStack\n\tgithub.com/pingcap/[email protected]/errors.go:174\ngithub.com/pingcap/errors.Trace\n\tgithub.com/pingcap/[email protected]/juju_adaptor.go:15\ngithub.com/pingcap/dumpling/v4/export.getPdDDLIDs\n\tgithub.com/pingcap/dumpling/v4/export/util.go:24\ngithub.com/pingcap/dumpling/v4/export.checkSameCluster\n\tgithub.com/pingcap/dumpling/v4/export/util.go:46\ngithub.com/pingcap/dumpling/v4/export.tidbSetPDClientForGC\n\tgithub.com/pingcap/dumpling/v4/export/dump.go:697\ngithub.com/pingcap/dumpling/v4/export.runSteps\n\tgithub.com/pingcap/dumpling/v4/export/dump.go:607\ngithub.com/pingcap/dumpling/v4/export.NewDumper\n\tgithub.com/pingcap/dumpling/v4/export/dump.go:55\nmain.main\n\tcommand-line-arguments/main.go:65\nruntime.main\n\truntime/proc.go:204\nruntime.goexit\n\truntime/asm_amd64.s:1374"] [pdAddrs="[db-pd-0.db-pd-peer.tidb1341928121892343808.svc:2379,db-pd-1.db-pd-peer.tidb1341928121892343808.svc:2379,db-pd-2.db-pd-peer.tidb1341928121892343808.svc:2379]"]
[2020/12/25 14:48:05.704 +08:00] [WARN] [dump.go:745] ["If the amount of data to dump is large, criteria: (data more than 60GB or dumped time more than 10 minutes)\nyou'd better adjust the tikv_gc_life_time to avoid export failure due to
TiDB GC during the dump process.\nBefore dumping: run sql `update mysql.tidb set VARIABLE_VALUE = '720h' where VARIABLE_NAME = 'tikv_gc_life_time';` in tidb.\nAfter dumping: run sql `update mysql.tidb set VARIABLE_VALUE = '10m' where VARIABLE_NAME = 'tikv_gc_life_time';` in tidb.\n"]
[2020/12/25 14:48:06.779 +08:00] [WARN] [dump.go:419] ["no data to dump"] [schema=test] [table=test]
[2020/12/25 14:48:06.780 +08:00] [INFO] [dump.go:337] ["get estimated rows count"] [estimateCount=10000]
[2020/12/25 14:48:19.540 +08:00] [INFO] [dump.go:337] ["get estimated rows count"] [estimateCount=1800000000]
[2020/12/25 15:18:12.837 +08:00] [INFO] [dump.go:337] ["get estimated rows count"] [estimateCount=600000]
[2020/12/25 15:18:17.244 +08:00] [INFO] [dump.go:337] ["get estimated rows count"] [estimateCount=1800000000]
[2020/12/25 15:26:48.026 +08:00] [INFO] [dump.go:337] ["get estimated rows count"] [estimateCount=100000]
[2020/12/25 15:26:50.009 +08:00] [INFO] [dump.go:337] ["get estimated rows count"] [estimateCount=540000000]
[2020/12/25 15:28:59.700 +08:00] [INFO] [dump.go:337] ["get estimated rows count"] [estimateCount=10000]
[2020/12/25 15:29:02.412 +08:00] [INFO] [dump.go:337] ["get estimated rows count"] [estimateCount=1800000000]
[2020/12/25 15:37:28.753 +08:00] [INFO] [dump.go:337] ["get estimated rows count"] [estimateCount=6000000000]
[2020/12/25 16:40:00.101 +08:00] [INFO] [dump.go:337] ["get estimated rows count"] [estimateCount=60000]
[2020/12/26 08:27:25.005 +08:00] [INFO] [collector.go:212] ["backup Success summary: total backup ranges: 597051, total success: 597051, total failed: 0, total take(backup time): 17h39m18.282210068s, total take(real time): 17h39m18.282233836s, total size(MB): 4649601.89, avg speed(MB/s): 73.15, total rows: 29940972129"]
[2020/12/26 08:27:25.123 +08:00] [INFO] [main.go:78] ["dump data successfully, dumpling will exit now"]
  • Other interesting logs
  1. Configuration of the cluster and the task
    • running command for Dumpling if possible
@lichunzhu
Copy link
Contributor

image
After check the dumped data, nearly 1.5TB data are dumped sequentially. That's the reason why the speed of dumpling data from tidb to s3 drops 2+GBit/s to 200Mbit/s.

In the local test, I insert 200 records without analyze and also get similar estimate count:
[2020/12/28 14:59:34.669 +08:00] [INFO] [dump.go:337] ["get estimated rows count"] [estimateCount=10000]
Since the parameter is -r 20000, this table will be dumped sequentially.

In a nutshell, incorrect estimate count causes this result.

@IANTHEREAL
Copy link
Collaborator

can we close it now?

@lichunzhu
Copy link
Contributor

lichunzhu commented Jul 22, 2021

tablesample and tableregion will solve this problem for TiDB v3.0+.
But for TiDB < 3.0 and MySQL it will cost too much to run select count(*) rather than use estimateCount. So we won't make further support for this.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
priority/P2 Medium priority issue
Projects
None yet
Development

No branches or pull requests

3 participants