-
Notifications
You must be signed in to change notification settings - Fork 9.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Draft: etcdctl: backup and restore #2366
Conversation
The previous force-new command cannot inject new configuration into the backup cluster. So @jedsmith, @barakmich and I propose a new workflow to restore a cluster directly from a backup-without-configuration. |
This workflow seems pretty odd. Ideally it would flow like this:
|
@kelseyhightower That is a chicken-egg problem. Basically, you will not be able to have a cluster without a data dir. |
@kelseyhightower As a thought experiment, it would also wouldn't be much of a restored backup; it'd be a replay. Writing with a little inspiration from The Part Time Parliament the records in the log wouldn't quite be the same, even if they were in the same order (but they also may not be, if some other message were to come in in between -- but let's assume full operator control) . This log: By persisting the history, it's much easier to discuss what and when things happen. It also means that backups of things restored from backups share a piece of history with the original backup, which is both true and kind of what you want. The former case doesn't have that property, full-stop. |
return snap, nil | ||
} | ||
|
||
func purgeConfInEnts(ents []raftpb.Entry) []raftpb.Entry { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is doing the right thing, but could it make sense to make the entries that we skip into Noops instead of removing them?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure.
@xiang90 I thought we were going to replace the existing etcdctl backup with this, not add a |
@philips Correct. We need to kill |
@philips @kelseyhightower @barakmich @yichengq @jedsmith I want to push forward this pull request a little bit. So our decision is:
Sound good? |
On Mon, Apr 6, 2015 at 2:50 PM, Xiang Li [email protected] wrote:
How does this change the existing backup command? The current command
This means that etcdctl would have to take all of the configuration |
When we depreciate the
Sure. I like that more. Then we need to rethink the etcd init process and that will lead to more changes. |
@philips @xiang90 I think when we restore from a backup, We just need to support static bootstrap for 100% correctness. It only needs to take |
My concern is adding WAL and configuration logic to etcdctl takes us down a road where etcdctl needs to know more about the internal details of etcd than it has before. With the exception of the Any thoughts on this @barakmich @kelseyhightower ? |
👍 - I've been orchestrating the recovery process and this sounds like a much better process. |
I've got a pr up which automates the backup and restore procedures for etcd2 clusters. As this procedure is a last-line-of-defense type of thing, I've opted for a simple/robust approach.
Requires the operator to pick the most recent backup, but removes any risk of the restore failing due to disagreements between nodes on ordering of events. Here's my take on what should be improved:
This means that the operator can restore a failed cluster by:
The bootstrapping should look very similar to how things work now if all the nodes don't have a data directory. The only difference is that restore node needs to be the founding member of the cluster.
As the desired cluster topology is already be expressed in the |
@xiang90 and i had a conversation this morning about the backup situation for v3. Here's my notes on the discussion.
Restoring a new cluster from backup The A new command, provisionally named
After |
@colhom Great summary! Thanks. |
/cc @jedsmith
freeze.etcd is a 5 member cluster.