Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

v2 backup can not be restored for etcd 3.1.7 #8331

Closed
r7vme opened this issue Jul 28, 2017 · 9 comments
Closed

v2 backup can not be restored for etcd 3.1.7 #8331

r7vme opened this issue Jul 28, 2017 · 9 comments
Assignees
Labels

Comments

@r7vme
Copy link

r7vme commented Jul 28, 2017

I have etcd 3.1.7 (etcdctl 3.2.0) and i'm doing v2 backup like this

./etcdctl backup --data-dir /var/lib/etcd3/ --backup-dir etcd-backup

while i'm trying to start etcd from this backup i'm getting

tmp|⇒ /tmp/test-etcd/etcd -debug -data-dir etcd-backup --force-new-cluster 
2017-07-28 19:55:05.529861 I | etcdmain: etcd Version: 3.1.7
2017-07-28 19:55:05.529996 I | etcdmain: Git SHA: 43b7507
2017-07-28 19:55:05.530022 I | etcdmain: Go Version: go1.7.5
2017-07-28 19:55:05.530047 I | etcdmain: Go OS/Arch: linux/amd64
2017-07-28 19:55:05.530075 I | etcdmain: setting maximum number of CPUs to 4, total number of available CPUs is 4
2017-07-28 19:55:05.530267 N | etcdmain: the server is already initialized as member before, starting as etcd member...
2017-07-28 19:55:05.531525 I | embed: listening for peers on http://localhost:2380
2017-07-28 19:55:05.533281 I | embed: listening for client requests on localhost:2379
2017-07-28 19:55:05.596536 I | etcdserver: recovered store from snapshot at index 8224743
2017-07-28 19:55:05.596561 I | etcdserver: name = default
2017-07-28 19:55:05.596572 I | etcdserver: force new cluster
2017-07-28 19:55:05.596580 I | etcdserver: data dir = etcd-backup
2017-07-28 19:55:05.596591 I | etcdserver: member dir = etcd-backup/member
2017-07-28 19:55:05.596597 I | etcdserver: heartbeat = 100ms
2017-07-28 19:55:05.596605 I | etcdserver: election = 1000ms
2017-07-28 19:55:05.596613 I | etcdserver: snapshot count = 10000
2017-07-28 19:55:05.596628 I | etcdserver: advertise client URLs = http://localhost:2379
2017-07-28 19:55:05.613536 I | etcdserver: discarding 1 uncommitted WAL entries 
2017-07-28 19:55:05.618943 I | etcdserver: forcing restart of member 5d8a45c49301 in cluster 5d8a45c49302 at commit index 8228058
2017-07-28 19:55:05.619056 I | raft: 5d8a45c49301 became follower at term 289
2017-07-28 19:55:05.619077 I | raft: newRaft 5d8a45c49301 [peers: [991e06b2af9c0a9,3df5a87582ac865c,cf5b12a1cc6d4c01], term: 289, commit: 8228058, applied: 8224743, lastindex: 8228058, lastterm: 289]
2017-07-28 19:55:05.619254 I | etcdserver/api: enabled capabilities for version 3.1
2017-07-28 19:55:05.619274 I | etcdserver/membership: added member 3df5a87582ac865c [https://172.16.238.103:2380] to cluster 5d8a45c49302 from store
2017-07-28 19:55:05.619281 I | etcdserver/membership: added member 991e06b2af9c0a9 [https://172.16.238.102:2380] to cluster 5d8a45c49302 from store
2017-07-28 19:55:05.619288 I | etcdserver/membership: added member cf5b12a1cc6d4c01 [https://172.16.238.101:2380] to cluster 5d8a45c49302 from store
2017-07-28 19:55:05.619294 I | etcdserver/membership: set the cluster version to 3.1 from store
2017-07-28 19:55:05.627904 C | etcdmain: database file (etcd-backup/member/snap/db) of the backend is missing

This issue reproduced on two independent environments already.

@gyuho
Copy link
Contributor

gyuho commented Jul 28, 2017

Do you store only v2 keys? Related #7615.

@r7vme
Copy link
Author

r7vme commented Jul 29, 2017

We store both v2 and v3.

@r7vme
Copy link
Author

r7vme commented Jul 29, 2017

I was able to restore with precreating db file touch $BACKUP_DIR/member/snap/db as was recommended in #7615

@fanminshi
Copy link
Member

fanminshi commented Aug 4, 2017

@r7vme I was able to restore with precreating db file touch $BACKUP_DIR/member/snap/db as was recommended in #7615 If you read the comment #7615 (comment). That method is not safe as explained here #7615 (comment)

@fanminshi
Copy link
Member

@r7vme does your restored cluster needs both v2 and v3 keys? the ./etcdctl backup --data-dir /var/lib/etcd3/ --backup-dir etcd-backup is intended to restore v2 key space only; it will not restore any v3 keys.

@heyitsanthony
Copy link
Contributor

@fanminshi the complaint is etcd is refusing to start because the backup command isn't creating a db file. The e2e test TestCtlV2Backup checks restoring from a v2 backup already, so why isn't it working here?

@fanminshi
Copy link
Member

fanminshi commented Aug 30, 2017

reproducible steps:

etcd 3.1.7 43b7507

func TestCtlV2BackupFromExistingSnapshot(t *testing.T) { 
	defer testutil.AfterTest(t)

	backupDir, err := ioutil.TempDir("", "testbackup0.etcd")
	if err != nil {
		t.Fatal(err)
	}
	defer os.RemoveAll(backupDir)
	cfg := configNoTLS
	cfg.snapCount = 5
	epc1 := setupEtcdctlTest(t, &cfg, false)

	// trigger snapshot
	for i := 0; i < cfg.snapCount; i++ {
		if err := etcdctlSet(epc1, "foo", "bar"); err != nil {
			t.Fatal(err)
		}
	}

	if err := etcdctlBackup(epc1, epc1.procs[0].cfg.dataDirPath, backupDir); err != nil {
		t.Fatal(err)
	}

	if err := epc1.Close(); err != nil {
		t.Fatalf("error closing etcd processes (%v)", err)
	}

	// restart from the backup directory
	cfg2 := configNoTLS
	cfg2.dataDirPath = backupDir
	cfg2.keepDataDir = true
	cfg2.forceNewCluster = true
	epc2 := setupEtcdctlTest(t, &cfg2, false)

	// check if backup went through correctly
	if err := etcdctlGet(epc2, "foo", "bar", false); err != nil {
		t.Fatal(err)
	}

	if err := epc2.Close(); err != nil {
		t.Fatalf("error closing etcd processes (%v)", err)
	}
}

$ EXPECT_DEBUG=true go test -run TestCtlV2BackupFromExistingSnapshot

output:

../bin/etcd-24520: 2017-08-30 16:37:16.539271 C | etcdmain: database file (/var/folders/bk/ph52lmzd5hxcx_0jvqr3rh5h0000gn/T/testbackup0.etcd227233542/member/snap/db) of the backend is missing
--- FAIL: TestCtlV2BackupFromExistingSnapshot (1.05s)
	ctl_v2_test.go:514: could not start etcd process cluster (EOF)
FAIL
exit status 1

@fanminshi
Copy link
Member

fanminshi commented Aug 31, 2017

When running TestCtlV2Backup

The backup code create a /backup that contains onlywal file.

etcd starts with /backup dir without a cluster version; cl.Version() = nil.

Even though backend file doesn't exist;beExist == false, the if evals to false due to cl.Version() = nil. So the cluster starts without any issue.

https://github.com/coreos/etcd/blob/43b75072bfaca5a7c35c718179defbcabd9a0886/etcdserver/server.go#L391-L393

However, When run TestCtlV2BackupFromExistingSnapshot

The backup code create a /backup that contains onlywal and .snap files.

etcd starts and gets cluster revision from the snapshot.

https://github.com/coreos/etcd/blob/43b75072bfaca5a7c35c718179defbcabd9a0886/etcdserver/server.go#L376-L380

Since backend file doesn't exist and cl.Version() = 3.1.0, then the error database file (%v) of the backend is missing is thrown.

https://github.com/coreos/etcd/blob/43b75072bfaca5a7c35c718179defbcabd9a0886/etcdserver/server.go#L391-L393

@heyitsanthony
Copy link
Contributor

Fixed by #8479

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Development

No branches or pull requests

4 participants