Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lightning panic when host ec2 is stopped #33524

Closed
fubinzh opened this issue Mar 29, 2022 · 3 comments · Fixed by #33738
Closed

Lightning panic when host ec2 is stopped #33524

fubinzh opened this issue Mar 29, 2022 · 3 comments · Fixed by #33738
Assignees
Labels
component/lightning This issue is related to Lightning of TiDB. severity/moderate type/bug The issue is confirmed as a bug.

Comments

@fubinzh
Copy link

fubinzh commented Mar 29, 2022

Bug Report

Please answer these questions before submitting your issue. Thanks!

1. Minimal reproduce step (Required)

  1. Trigger a lightning import in DBaaS
  2. While import is in progress, stop the host ec2 of the lighting pod

2. What did you expect to see? (Required)

Lightning should not panic

3. What did you see instead (Required)

Lightning panic, and it results in lightning pod can't be started after host ec2 is started.

lightning received terminated signal when ec2 was stopped

[2022/03/28 08:04:24.699 +00:00] [INFO] [main.go:49] ["got signal to exit"] [signal=terminated]
[2022/03/28 08:04:24.700 +00:00] [WARN] [restore.go:1123] ["stopping periodic actions"] [error="context canceled"]

then lightning panic

[2022/03/28 08:04:24.729 +00:00] [ERROR] [local.go:2078] ["split & scatter ranges failed"] [uuid=132c73ad-f7fe-550a-bf5f-6270674024c4] [error="context canceled"]
[2022/03/28 08:04:24.729 +00:00] [INFO] [pd.go:462] ["resume scheduler"] [schedulers="[balance-leader-scheduler,balance-hot-region-scheduler,balance-region-scheduler]"]
[2022/03/28 08:04:24.730 +00:00] [INFO] [pd.go:482] ["resume scheduler successful"] [scheduler=balance-leader-scheduler]
[2022/03/28 08:04:24.730 +00:00] [INFO] [pd.go:482] ["resume scheduler successful"] [scheduler=balance-hot-region-scheduler]
[2022/03/28 08:04:24.731 +00:00] [INFO] [pd.go:482] ["resume scheduler successful"] [scheduler=balance-region-scheduler]
[2022/03/28 08:04:24.731 +00:00] [INFO] [pd.go:573] ["restoring config"] [config="{\"enable-location-replacement\":\"true\",\"leader-schedule-limit\":4,\"max-merge-region-keys\":200000,\"max-merge-region-size\":20,\"max-pending-peer-count\":64,\"max-snapshot-count\":64,\"region-schedule-limit\":2048}"]
[2022/03/28 08:04:24.736 +00:00] [INFO] [restore.go:1375] ["add back PD leader&region schedulers"]
[2022/03/28 08:04:24.736 +00:00] [INFO] [restore.go:1378] ["cleanup task metas"]
[2022/03/28 08:04:24.736 +00:00] [INFO] [restore.go:1227] ["cancel periodic actions"] [do=true]
[2022/03/28 08:04:24.737 +00:00] [INFO] [restore.go:1681] ["switch import mode"] [mode=Normal]
[2022/03/28 08:04:24.740 +00:00] [INFO] [restore.go:459] ["task canceled"] [step=4] [error="restore table `gotpc3000`.`order_line` failed: context canceled"]
[2022/03/28 08:04:24.740 +00:00] [INFO] [restore.go:474] ["the whole procedure completed"] [takeTime=14m54.044151364s] []
[2022/03/28 08:04:25.348 +00:00] [INFO] [local.go:3271] ["compact sst"] [fileCount=6] [size=668405745] [count=10609615] [cost=7.652965611s] [file=/etc/endpoint/feace5ee-67f2-55c0-81b9-8990a337bc8c.sst/40d8e331-2577-4503-a59f-a441404be79d.sst]
[2022/03/28 08:04:25.348 +00:00] [INFO] [local.go:689] ["write data to local DB"] [size=668405745] [kvs=10609615] [files=1] [sstFileSize=70829138] [file=/etc/endpoint/feace5ee-67f2-55c0-81b9-8990a337bc8c.sst/40d8e331-2577-4503-a59f-a441404be79d.sst] [firstKey=7480000000000000495F698000000000000001038000000000000580038000000000000004038000000000000001038000000000000001] [lastKey=7480000000000000495F6980000000000000010380000000000006AD038000000000000006038000000000000BB803800000000000000B]
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x160 pc=0x33a225a]

goroutine 11489 [running]:
github.com/cockroachdb/pebble.(*DB).Ingest(0x0, 0xc002803650, 0x1, 0x1, 0x0, 0xc0e0f03a88)
	/nfs/cache/mod/github.com/cockroachdb/[email protected]/ingest.go:505 +0x3a
github.com/pingcap/tidb/br/pkg/lightning/backend/local.dbSSTIngester.ingest(0xc000708820, 0xc0005a4880, 0x1, 0x1, 0x7, 0x7)
	/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/pkg/lightning/backend/local/local.go:3298 +0x14c
github.com/pingcap/tidb/br/pkg/lightning/backend/local.(*File).ingestSSTs(0xc000708820, 0xc0005a4880, 0x1, 0x1, 0x0, 0x0)
	/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/pkg/lightning/backend/local/local.go:697 +0x6ec
github.com/pingcap/tidb/br/pkg/lightning/backend/local.(*File).batchIngestSSTs(0xc000708820, 0xc0005a4870, 0x1, 0x1, 0xc0415328c0, 0x36)
	/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/pkg/lightning/backend/local/local.go:667 +0x45a
github.com/pingcap/tidb/br/pkg/lightning/backend/local.(*File).ingestSSTLoop.func1(0xc000708820, 0xc0090d22d0, 0xc0014860a8, 0xc0008c8000, 0xc0090d22c8, 0xc0014860c0)
	/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/pkg/lightning/backend/local/local.go:478 +0x3bb
created by github.com/pingcap/tidb/br/pkg/lightning/backend/local.(*File).ingestSSTLoop
	/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/pkg/lightning/backend/local/local.go:446 +0x259

lightning.log

4. What is your TiDB version? (Required)

[release-version=v5.3.1] [git-hash=459917c6f83df155edcc03f6ebde24942ff73b0e] [git-branch=heads/refs/tags/v5.3.1] [go-version=go1.16.4] [utc-build-time="2022-03-02 08:34:21"]

@fubinzh fubinzh added type/bug The issue is confirmed as a bug. severity/moderate component/lightning This issue is related to Lightning of TiDB. labels Mar 29, 2022
@niubell
Copy link
Contributor

niubell commented Apr 6, 2022

/unassign gozssky

@niubell
Copy link
Contributor

niubell commented Apr 6, 2022

/assign lichunzhu

@MimeLyc
Copy link

MimeLyc commented May 17, 2022

test

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component/lightning This issue is related to Lightning of TiDB. severity/moderate type/bug The issue is confirmed as a bug.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants