Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ensure node leave intents are handled correctly #510

Merged
merged 3 commits into from
May 4, 2018
Merged

Conversation

preetapan
Copy link

This PR adds a configurable leave timeout to allow leave intent broadcasts to propagate before returning from the Leave method.

It also fixes an edge case in anti entropy sync where a left member in the push/pull message's payload would not get processed because its lamport time was incorrectly set.

Brings in work done in an experimental branch by @slackpad

@preetapan preetapan requested review from kyhavlov and armon May 3, 2018 01:35
Copy link
Member

@schmichael schmichael left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although you probably want another review from someone more experienced in serf.

serf/config.go Outdated
@@ -255,6 +262,7 @@ func DefaultConfig() *Config {
return &Config{
NodeName: hostname,
BroadcastTimeout: 5 * time.Second,
LeavePropagateDelay: 1 * time.Second,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Formatting

Copy link
Contributor

@kyhavlov kyhavlov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, just had a question 👍

@@ -174,6 +174,9 @@ func TestSerf_eventsLeave(t *testing.T) {
eventCh := make(chan Event, 4)
s1Config := testConfig()
s1Config.EventCh = eventCh
// Make the reap interval longer in this test
// so that the leave does not also cause a reap
s1Config.ReapInterval = 30 * time.Second
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was this hiding the issue with the leave not being propagated in this test?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not exactly, default reap interval in the test is too small, and the leave now waits before the method returns so it was causing this test to fail because the left node also got reaped.

@preetapan preetapan merged commit 4b67f2c into master May 4, 2018
testMember(t, s1.Members(), s2Config.NodeName, StatusLeft)
// Verify that s2 is not in the memberlist
// The join intent is ignored because the leave sleeps long enough for the leave intent to broadcast
testMember(t, s1.Members(), s2Config.NodeName, StatusNone)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@preetapan This doesn't seem right to me. After S2 initiates a leave, it shouldn't disappear as a member until it is reaped. Seems like we expect it to be Left state?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was addressed by changing the test setup to increase ReapInterval in #511

@banks banks deleted the b-clean-leave branch July 20, 2020 13:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants