-
Notifications
You must be signed in to change notification settings - Fork 596
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ensure node leave intents are handled correctly #510
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Although you probably want another review from someone more experienced in serf.
serf/config.go
Outdated
@@ -255,6 +262,7 @@ func DefaultConfig() *Config { | |||
return &Config{ | |||
NodeName: hostname, | |||
BroadcastTimeout: 5 * time.Second, | |||
LeavePropagateDelay: 1 * time.Second, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Formatting
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, just had a question 👍
@@ -174,6 +174,9 @@ func TestSerf_eventsLeave(t *testing.T) { | |||
eventCh := make(chan Event, 4) | |||
s1Config := testConfig() | |||
s1Config.EventCh = eventCh | |||
// Make the reap interval longer in this test | |||
// so that the leave does not also cause a reap | |||
s1Config.ReapInterval = 30 * time.Second |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was this hiding the issue with the leave not being propagated in this test?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not exactly, default reap interval in the test is too small, and the leave now waits before the method returns so it was causing this test to fail because the left node also got reaped.
testMember(t, s1.Members(), s2Config.NodeName, StatusLeft) | ||
// Verify that s2 is not in the memberlist | ||
// The join intent is ignored because the leave sleeps long enough for the leave intent to broadcast | ||
testMember(t, s1.Members(), s2Config.NodeName, StatusNone) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@preetapan This doesn't seem right to me. After S2 initiates a leave, it shouldn't disappear as a member until it is reaped. Seems like we expect it to be Left state?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was addressed by changing the test setup to increase ReapInterval in #511
This PR adds a configurable leave timeout to allow leave intent broadcasts to propagate before returning from the Leave method.
It also fixes an edge case in anti entropy sync where a left member in the push/pull message's payload would not get processed because its lamport time was incorrectly set.
Brings in work done in an experimental branch by @slackpad