Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make thriftbp.NewBaseplateClientPool() behave more like the grpcbp one #602

Merged
merged 4 commits into from
Feb 27, 2023

Conversation

nanassito
Copy link
Contributor

@nanassito nanassito commented Feb 16, 2023

💸 TL;DR

Changing the behavior of the pool initialization to that it retries to open connections until the required number is opened instead of failing on the first issue.

📜 Details

Prior to this PR the behavior of the thriftbp pool initialization was as followed:

  • Try up to initialConnections times to open a connection, but bail out at the first error.
  • If the number of connections that ended up being opened is lower than requiredInitialConnections then crash.
  • Otherwise proceed despite having less than initialConnections connections opened.

This lead to reliability issues for services as the pods sometimes end up starting with very few connections ready. These connections are opened inline with the requests but that adds extra latency and opportunity for failures.
In my opinion this also is a confusing behavior since we end up returning a lower number than configured by the user. It is explicit in the documentation but I still found more than a few people surprised by this behavior.

With this PR applied the new behavior is as followed:

  • Open up to requiredInitialConnections and retry if necessary until either success or context error
  • Open the rest of the initialConnections using the same behavior as before.

While not my ideal approach, this allows the user to easily increase the effectively-usable number on requiredInitialConnections while leaving the initialConnection behavior unchanged.

Rollout:

  • Once merged I will need to work with econ-be to update their calls as they are the only ones directly impacted by the behavior change.
  • Services will be able to start setting requiredInitialConnections to higher values than currently.

🧪 Testing Steps / Validation

✅ Checks

  • CI tests (if present) are passing
  • Adheres to code style for repo
  • Contributor License Agreement (CLA) completed if not a Reddit employee

@nanassito nanassito force-pushed the thriftbp.init branch 7 times, most recently from 842275c to 242517c Compare February 16, 2023 23:56
@nanassito nanassito marked this pull request as ready for review February 23, 2023 22:37
@nanassito nanassito requested a review from a team as a code owner February 23, 2023 22:37
@nanassito nanassito requested review from fishy, kylelemons and mterwill and removed request for a team February 23, 2023 22:37
@mterwill mterwill removed their request for review February 24, 2023 14:57
i++
} else {
lastAttemptErr = err
log.Warnf("clientpool: error creating required client (will retry): %w", err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know that I would add a log.Warn here, I would worry about this causing a lot of log spam if there's an issue?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fishy really wanted a way to make the error clear to users.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can limit the log spam with a rate limiter:

https://sourcegraph.build.ue1.snooguts.net/github.snooguts.net/reddit-go/httpbp/-/blob/httpbp/middleware/helpers.go?L17:12

(it should be at whatever scope you want to collect together, so global if you want to limit the spam at a per process level)

Copy link
Contributor Author

@nanassito nanassito Feb 24, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added RL at 2s

Comment on lines +108 to +118
addrGen: func() thriftbp.AddressGenerator {
i := 0
return func() (string, error) {
i += 1
var err error
if i == 1 {
err = fmt.Errorf("something broke")
}
return ln.Addr().String(), err
}
}(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔕 Is there any other way for us to fake these errors other than hacking them into the AddressGenerator? This isn't really the kind of error we would expect to get since most of these just return a static string (we added it to be compatible with some old ads code).

Not a blocker, but it might be nice to have the errors be closer to errors that we would actually expect/not rely on something that is really just a backwards compatibility hack.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know, that's how every other test I found does it so I used the same approach

clientpool/channel.go Outdated Show resolved Hide resolved
Copy link
Contributor

@kylelemons kylelemons left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've been swamped, from a cursory review this looks fine. We might want to discuss always trying at least initialConnection times (i.e. not bailing on the first failure) as well, since I think that is an innocuous behavior change, but we can do that separately.

i++
} else {
lastAttemptErr = err
log.Warnf("clientpool: error creating required client (will retry): %w", err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can limit the log spam with a rate limiter:

https://sourcegraph.build.ue1.snooguts.net/github.snooguts.net/reddit-go/httpbp/-/blob/httpbp/middleware/helpers.go?L17:12

(it should be at whatever scope you want to collect together, so global if you want to limit the spam at a per process level)

@kylelemons kylelemons merged commit 6f97656 into reddit:master Feb 27, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

3 participants