Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

contrib/raftexample: save snapshot order #8082

Closed
jbowens opened this issue Jun 12, 2017 · 1 comment
Closed

contrib/raftexample: save snapshot order #8082

jbowens opened this issue Jun 12, 2017 · 1 comment

Comments

@jbowens
Copy link
Contributor

jbowens commented Jun 12, 2017

The documentation for the etcd WAL package says of wal.Open's snapshot argument:

The snapshot must have been written to the WAL.

While looking over the example Raft use in contrib/raftexample, I noticed these lines for saving a new snapshot:

func (rc *raftNode) saveSnap(snap raftpb.Snapshot) error {
	if err := rc.snapshotter.SaveSnap(snap); err != nil {
		return err
	}
	walSnap := walpb.Snapshot{
		Index: snap.Metadata.Index,
		Term:  snap.Metadata.Term,
	}
	if err := rc.wal.SaveSnapshot(walSnap); err != nil {
		return err
	}
	return rc.wal.ReleaseLockTo(snap.Metadata.Index)
}

https://github.com/coreos/etcd/blob/933aa09b73ef0b3556a1c663f9bd2bd92b714be9/contrib/raftexample/raft.go#L109-L121

For my understanding of the code, the wal.Open invariant can be violated if the process crashes between saving the snapshot to the file system and saving the snapshot index to the WAL. On the next startup, the snapshotter would load the most recent snapshot which was never saved to the WAL.

Looking at etcdserver's code, etcd seems to avoid this by writing the index to the WAL before persisting the snapshot. Is that the correct sequence of persisting a snapshot? If so, I'd be happy to submit a PR updating the raft example.

@jbowens
Copy link
Contributor Author

jbowens commented Jun 12, 2017

The ordering in etcdserver does seem to be deliberate given 1231f82 (from #2597).

jbowens added a commit to Onyx-Protocol/Onyx that referenced this issue Jun 12, 2017
Save the snapshot position to the WAL before saving the snapshot data to
disk. On boot, we open the WAL at the most recently saved snapshot's
position. If we don't save the position to the WAL first, we might open
the WAL at a snapshot position that was never saved to the WAL.

We might want to wait until the CoreOS folks confirm this in
etcd-io/etcd#8082. Also, it's debatable whether this meets the
requirements for a point release given that this bug has a very low
probablity of surfacing.
jbowens added a commit to Onyx-Protocol/Onyx that referenced this issue Jun 12, 2017
Save the snapshot position to the WAL before saving the snapshot data to
disk. On boot, we open the WAL at the most recently saved snapshot's
position. If we don't save the position to the WAL first, we might open
the WAL at a snapshot position that was never saved to the WAL.

We might want to wait until the CoreOS folks confirm this in
etcd-io/etcd#8082. Also, it's debatable whether this meets the
requirements for a point release given that this bug has a very low
probablity of surfacing. WDYT?
jbowens added a commit to Onyx-Protocol/Onyx that referenced this issue Jun 12, 2017
Save the snapshot position to the WAL before saving the snapshot data to
disk. On boot, we open the WAL at the most recently saved snapshot's
position. If we don't save the position to the WAL first, we might open
the WAL at a snapshot position that was never saved to the WAL.

We might want to wait until the CoreOS folks confirm this in
etcd-io/etcd#8082.
jbowens added a commit to Onyx-Protocol/Onyx that referenced this issue Jun 12, 2017
Save the snapshot position to the WAL before saving the snapshot data to
disk. On boot, we open the WAL at the most recently saved snapshot's
position. If we don't save the position to the WAL first, we might open
the WAL at a snapshot position that was never saved to the WAL.

We might want to wait until the CoreOS folks confirm this in
etcd-io/etcd#8082.
iampogo pushed a commit to Onyx-Protocol/Onyx that referenced this issue Jun 13, 2017
Save the snapshot position to the WAL before saving the snapshot data to
disk. On boot, we open the WAL at the most recently saved snapshot's
position. If we don't save the position to the WAL first, we might open
the WAL at a snapshot position that was never saved to the WAL.

We might want to wait until the CoreOS folks confirm this in
etcd-io/etcd#8082. Also, it's debatable whether this meets the
requirements for a point release given that this bug has a very low
probablity of surfacing. WDYT?
jbowens added a commit to Onyx-Protocol/Onyx that referenced this issue Jun 13, 2017
Save the snapshot position to the WAL before saving the snapshot data to
disk. On boot, we open the WAL at the most recently saved snapshot's
position. If we don't save the position to the WAL first, we might open
the WAL at a snapshot position that was never saved to the WAL.

We might want to wait until the CoreOS folks confirm this in
etcd-io/etcd#8082. Also, it's debatable whether this meets the
requirements for a point release given that this bug has a very low
probablity of surfacing. WDYT?
iampogo pushed a commit to Onyx-Protocol/Onyx that referenced this issue Jun 13, 2017
Save the snapshot position to the WAL before saving the snapshot data to
disk. On boot, we open the WAL at the most recently saved snapshot's
position. If we don't save the position to the WAL first, we might open
the WAL at a snapshot position that was never saved to the WAL.

This is a fix for the 1.2.x release line.

See etcd-io/etcd#8082, etcd-io/etcd#8088.

Closes #1310
jbowens added a commit to Onyx-Protocol/Onyx that referenced this issue Jun 13, 2017
Save the snapshot position to the WAL before saving the snapshot data to
disk. On boot, we open the WAL at the most recently saved snapshot's
position. If we don't save the position to the WAL first, we might open
the WAL at a snapshot position that was never saved to the WAL.

We might want to wait until the CoreOS folks confirm this in
etcd-io/etcd#8082.
jbowens added a commit to Onyx-Protocol/Onyx that referenced this issue Jun 14, 2017
Save the snapshot position to the WAL before saving the snapshot data to
disk. On boot, we open the WAL at the most recently saved snapshot's
position. If we don't save the position to the WAL first, we might open
the WAL at a snapshot position that was never saved to the WAL.

We might want to wait until the CoreOS folks confirm this in
etcd-io/etcd#8082.
iampogo pushed a commit to Onyx-Protocol/Onyx that referenced this issue Jun 14, 2017
Save the snapshot position to the WAL before saving the snapshot data to
disk. On boot, we open the WAL at the most recently saved snapshot's
position. If we don't save the position to the WAL first, we might open
the WAL at a snapshot position that was never saved to the WAL.

See etcd-io/etcd#8082, etcd-io/etcd#8088.

Closes #1312
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

1 participant