Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Shuffle seed brokers so we don't always connect to the first one provided #441

Merged
merged 1 commit into from
May 1, 2015

Conversation

wvanbergen
Copy link
Contributor

I stole this idea from the Zookeeper library because is such a simple yet powerful way to load balance the metadata requests in the cluster over the different brokers. After the initial seed broker shuffle, seed broker handling is deterministic, and will work the exact same way before.

The way I fixed the "resurrect dead brokers" test is not great, but that is primarily because of the limitation of the mock broker implementation. If you have ideas to do this in a better way, tell me :)

@Shopify/kafka

@@ -720,3 +723,11 @@ func (client *client) getConsumerMetadata(consumerGroup string, attemptsRemainin
client.resurrectDeadBrokers()
return retry(ErrOutOfBrokers)
}

// shuffleSeedBrokers performs a Fisher-Yates shuffle on the seed broker list
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Simpler just to use rand.Perm()?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@eapache
Copy link
Contributor

eapache commented May 1, 2015

Just cast to *client briefly in the test and read the order out of the private member, then the rest of the test can proceed exactly as it was.

@wvanbergen
Copy link
Contributor Author

Yup, that's what I am doing right now, but I have to do that after the client is initialized.

for _, addr := range addrs {
client.seedBrokers = append(client.seedBrokers, NewBroker(addr))

for _, index := range rand.Perm(len(addrs)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we need to seed the random number generator; to avoid destroying our parent program (if it has already seeded with a specific value for some reason) we shouldn't use the global but instantiate a local source instead... not sure if it makes sense to create a package-local in sarama.go or just a real local right here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added a package-local sarama.Random, which I seed using time.Now().UnixNano().

@eapache
Copy link
Contributor

eapache commented May 1, 2015

Did you intend for this to be in 1.4 or not?

@wvanbergen
Copy link
Contributor Author

I think this would be a nice addition to 1.4. It's such a minimal change that it doesn't require an extra batch of stress testing.

@@ -22,6 +24,8 @@ import (
// but you can set it to redirect wherever you want.
var Logger StdLogger = log.New(ioutil.Discard, "[Sarama] ", log.LstdFlags)

var Random = rand.New(rand.NewSource(time.Now().UnixNano()))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this really be public? I'm not sure why anyone would want to override it. If you want to leave it public, it needs godoc

addr3 := seed3.Addr()

// Overwrite the seed brokers with a fixed ordering to make this test deterministic.
client.seedBrokers = []*Broker{NewBroker(addr1), NewBroker(addr2), NewBroker(addr3)}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rather than completely overriding it, just append the two new ones is simpler

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel this is simpler to understand, because now you can basically ignore what happened before; the seed broker state is complete defined here. YMMV.

@eapache
Copy link
Contributor

eapache commented May 1, 2015

This version LGTM.

@shivnagarajan
Copy link

Looks good. How often is the NewClient called, generating a new permutation? More specifically, how large can the list of brokers be? Generating a permutation often can be expensive, but if the list is small, it should not matter.

Shiv

@wvanbergen
Copy link
Contributor Author

NewClient is only called once for an application (usually), so I am not too worried about the cost of this.

The number of brokers in a cluster may be hundreds for installations that are significantly larger than ours, but especially in those environments I would like to prevent all my clients from connecting to the same broker :)

@shivnagarajan
Copy link

looks good. :shipit:

wvanbergen added a commit that referenced this pull request May 1, 2015
Shuffle seed brokers so we don't always connect to the first one provided
@wvanbergen wvanbergen merged commit a205be2 into master May 1, 2015
@wvanbergen wvanbergen deleted the shuffle_seeds branch May 1, 2015 14:28
@eapache
Copy link
Contributor

eapache commented May 1, 2015

Also, for what it's worth: building the permutation appears to be O(n) in the current implementation: https://golang.org/src/math/rand/rand.go?s=4735:4767#L136

@shivnagarajan
Copy link

Nice.. I should have looked at that before my claims :-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants