-
Notifications
You must be signed in to change notification settings - Fork 373
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: correct exponential backoff ranges #11529
fix: correct exponential backoff ranges #11529
Conversation
d1cf191
to
1e77ece
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 5 files reviewed, 4 unresolved discussions (waiting on @alevenberg and @devbww)
google/cloud/internal/backoff_policy.h
line 131 at r4 (raw file):
: initial_delay_(std::chrono::duration_cast<std::chrono::microseconds>( initial_delay)), current_delay_range_(2 * initial_delay_),
Part of the problem is that we should not be using 2
here. We should be using scaling
. (We should also use scaling
in the range calculation).
I should be able to write the following test:
TEST(ExponentialBackoffPolicy, RangeDeterminedByScalingFactor) {
ExponentialBackoffPolicy tested(ms(10), ms(20), 1.1);
auto delay = tested.OnCompletion();
EXPECT_LE(ms(10), delay);
EXPECT_GE(ms(11), delay);
}
Codecov ReportPatch coverage:
Additional details and impacted files@@ Coverage Diff @@
## main #11529 +/- ##
=======================================
Coverage 93.80% 93.80%
=======================================
Files 1818 1818
Lines 163671 163689 +18
=======================================
+ Hits 153531 153550 +19
+ Misses 10140 10139 -1
☔ View full report in Codecov by Sentry. |
- Use () around min and max to avoid macro expansion - Add a test for floating point numbers - Clarify the naming of current_delay_range_ by adding two parameters (one for the start and one for the end). In the long term we can remove this since we want to implement min jitter in issue googleapis#8755. Then the current_delay_start_ will always equal the initial_delay_.
b485054
to
38c007e
Compare
Discussed with @dbolduc Will be rewriting to catch some edge cases |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 5 files reviewed, 6 unresolved discussions (waiting on @alevenberg, @dbolduc, and @devbww)
google/cloud/internal/backoff_policy.h
line 137 at r3 (raw file):
Previously, devbww (Bradley White) wrote…
Could we move this into the initialization (and move "current" down in the member list) with:
current_delay_range_((std::min)(2 * initial_delay_, maximum_delay_))
And then (unrelated to this PR, but perhaps addressable here), it seems like just
current_delay_
would be preferable tocurrent_delay_range_
.
ack, addressed with the changes
google/cloud/internal/backoff_policy.h
line 131 at r4 (raw file):
Previously, dbolduc (Darren Bolduc) wrote…
Part of the problem is that we should not be using
2
here. We should be usingscaling
. (We should also usescaling
in the range calculation).I should be able to write the following test:
TEST(ExponentialBackoffPolicy, RangeDeterminedByScalingFactor) { ExponentialBackoffPolicy tested(ms(10), ms(20), 1.1); auto delay = tested.OnCompletion(); EXPECT_LE(ms(10), delay); EXPECT_GE(ms(11), delay); }
Done
google/cloud/internal/backoff_policy.cc
line 45 at r3 (raw file):
Previously, coryan (Carlos O'Ryan) wrote…
nit: on some platforms1
max()
is a macro. We protect against macro expansion using(std::max)(...)
.
done
Footnotes
-
Windows really, depending on what set of
#include
directives you have, including indirect ones ↩
Changes are made. Ready for review. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 2 files at r2, 1 of 1 files at r7, 2 of 3 files at r9, all commit messages.
Reviewable status: 4 of 5 files reviewed, 7 unresolved discussions (waiting on @alevenberg, @dbolduc, and @devbww)
google/cloud/internal/backoff_policy.cc
line 29 at r9 (raw file):
std::chrono::milliseconds ExponentialBackoffPolicy::OnCompletion() { using std::chrono::duration_cast; using std::chrono::microseconds;
I think you want something like std::duration<double, std::micro>
instead?
google/cloud/internal/backoff_policy.cc
line 50 at r9 (raw file):
} std::uniform_int_distribution<microseconds::rep> rng_distribution(
Since you changed current_delay_start
to be std::duration<double, ..>
you want this to be std::uniform_real_distribution<...>
.
…on, modify using to the double granularity for micros
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 4 of 5 files reviewed, 7 unresolved discussions (waiting on @alevenberg, @dbolduc, and @devbww)
google/cloud/internal/backoff_policy.cc
line 51 at r6 (raw file):
Previously, devbww (Bradley White) wrote…
I'm all for keeping values with the min:max range, but starting back at the initial delay after the start hits the end seems contrary to "backoff".
Ack. Modified the code so it will either return to the initial delay or the max divided by the scaling factor, whichever is higher.
Do you think it is worth throwing an error when we can't "backoff"?
google/cloud/internal/backoff_policy.cc
line 56 at r6 (raw file):
Previously, devbww (Bradley White) wrote…
Consider these diffs. I believe that clarify the intention, and reduce casting.
diff --git a/google/cloud/internal/backoff_policy.cc b/google/cloud/internal/backoff_policy.cc index 61bbe4df56..d4c21acd00 100644 --- a/google/cloud/internal/backoff_policy.cc +++ b/google/cloud/internal/backoff_policy.cc @@ -25,9 +25,6 @@ std::unique_ptr<BackoffPolicy> ExponentialBackoffPolicy::clone() const { } std::chrono::milliseconds ExponentialBackoffPolicy::OnCompletion() { - using std::chrono::duration_cast; - using std::chrono::microseconds; - using std::chrono::milliseconds; // We do not want to copy the seed in `clone()` because then all operations // will have the same sequence of backoffs. Nor do we want to use a shared // PRNG because that would require locking and some more complicated lifecycle @@ -41,21 +38,17 @@ std::chrono::milliseconds ExponentialBackoffPolicy::OnCompletion() { if (!generator_) { generator_ = google::cloud::internal::MakeDefaultPRNG(); } - std::uniform_int_distribution<microseconds::rep> rng_distribution( + std::uniform_real_distribution<DoubleMicroseconds::rep> rng_distribution( current_delay_start_.count(), current_delay_end_.count()); + // Randomized sleep period because it is possible that after some time all // client have same sleep period if we use only exponential backoff policy. - auto delay = microseconds(rng_distribution(*generator_)); + auto delay = DoubleMicroseconds(rng_distribution(*generator_)); - current_delay_start_ = current_delay_start_ == current_delay_end_ - ? initial_delay_ - : current_delay_end_; - current_delay_end_ = microseconds(static_cast<microseconds::rep>( - static_cast<double>(current_delay_end_.count()) * scaling_)); - if (current_delay_end_ >= maximum_delay_) { - current_delay_end_ = maximum_delay_; - } - return duration_cast<milliseconds>(delay); + current_delay_start_ = current_delay_end_; + current_delay_end_ = CurrentDelayEnd(); + + return std::chrono::duration_cast<std::chrono::milliseconds>(delay); } } // namespace internal diff --git a/google/cloud/internal/backoff_policy.h b/google/cloud/internal/backoff_policy.h index 19b4ddebde..b5721ec7c7 100644 --- a/google/cloud/internal/backoff_policy.h +++ b/google/cloud/internal/backoff_policy.h @@ -132,10 +132,7 @@ class ExponentialBackoffPolicy : public BackoffPolicy { maximum_delay)), scaling_(scaling), current_delay_start_(initial_delay_), - current_delay_end_( - (std::min)(std::chrono::duration_cast<std::chrono::microseconds>( - scaling_ * initial_delay_), - maximum_delay_)) { + current_delay_end_(CurrentDelayEnd()) { if (scaling_ <= 1.0) { google::cloud::internal::ThrowInvalidArgument( "scaling factor must be > 1.0"); @@ -157,12 +154,22 @@ class ExponentialBackoffPolicy : public BackoffPolicy { std::chrono::milliseconds OnCompletion() override; private: + using DoubleMicroseconds = + std::chrono::duration<double, std::chrono::milliseconds::period>; + + DoubleMicroseconds CurrentDelayEnd() { + return std::min<DoubleMicroseconds>(current_delay_start_ * scaling_, + maximum_delay_); + } + std::chrono::microseconds initial_delay_; std::chrono::microseconds maximum_delay_; double scaling_; + // Stores both ends of the current delay range. - std::chrono::microseconds current_delay_start_; - std::chrono::microseconds current_delay_end_; + DoubleMicroseconds current_delay_start_; + DoubleMicroseconds current_delay_end_; + absl::optional<DefaultPRNG> generator_; };
Yes agreed the intention is much clearer.
google/cloud/internal/backoff_policy.cc
line 29 at r9 (raw file):
Previously, coryan (Carlos O'Ryan) wrote…
I think you want something like
std::duration<double, std::micro>
instead?
What is the purpose of the using decl inside the function body?
google/cloud/internal/backoff_policy.cc
line 50 at r9 (raw file):
Previously, coryan (Carlos O'Ryan) wrote…
Since you changed
current_delay_start
to bestd::duration<double, ..>
you want this to bestd::uniform_real_distribution<...>
.
Done
google/cloud/internal/backoff_policy_test.cc
line 49 at r9 (raw file):
Previously, devbww (Bradley White) wrote…
We probably want to
#include "google/cloud/testing_util/chrono_output.h"
here so that test failures are intelligible.
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 2 of 2 files at r10, all commit messages.
Reviewable status: all files reviewed, 7 unresolved discussions (waiting on @dbolduc and @devbww)
google/cloud/internal/backoff_policy.cc
line 29 at r9 (raw file):
Previously, alevenberg (Anna Levenberg) wrote…
What is the purpose of the using decl inside the function body?
Probably just to save typing / make the code easier to grok. You can reasonably argue that it serves no purpose anymore. Or even that it never did. I won't feel bad if you remove it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you think it is worth throwing an error when we can't "backoff"?
I think the case you are talking about is when we only use the min/max delays and ignore the scaling factor. That is fine, and we should not throw an error which could unnecessarily break existing code.
current_delay_start_ = current_delay_end_; | ||
current_delay_end_ = current_delay_end_ * scaling_; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I might have written
current_delay_start_ *= scaling_;
current_delay_end_ *= scaling_;
current_delay_end_((std::min)( | ||
std::chrono::duration_cast< | ||
std::chrono::duration<double, std::micro>>(scaling_ * | ||
initial_delay_), | ||
maximum_delay_)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: now that we do the bounds checking before generating the random number, we don't need to do a min here. We can just say it = initial_delay_ * scaling_
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: all files reviewed, 9 unresolved discussions (waiting on @dbolduc and @devbww)
google/cloud/internal/backoff_policy.h
line 104 at r10 (raw file):
Previously, dbolduc (Darren Bolduc) wrote…
optional: rename
initial_delay
tominimum_delay
.(It is fine to say: "I will do this in a later PR". FWIW, I think we will want it for #8755)
Ack. Let's wait.
google/cloud/internal/backoff_policy.h
line 141 at r10 (raw file):
Previously, dbolduc (Darren Bolduc) wrote…
nit: now that we do the bounds checking before generating the random number, we don't need to do a min here. We can just say it =
initial_delay_ * scaling_
Good catch. Fixed.
google/cloud/internal/backoff_policy.cc
line 29 at r9 (raw file):
Previously, coryan (Carlos O'Ryan) wrote…
Probably just to save typing / make the code easier to grok. You can reasonably argue that it serves no purpose anymore. Or even that it never did. I won't feel bad if you remove it.
I had never seen a using inside a function body before, I think it is fine to keep.
google/cloud/internal/backoff_policy.cc
line 57 at r10 (raw file):
Previously, dbolduc (Darren Bolduc) wrote…
nit: I might have written
current_delay_start_ *= scaling_; current_delay_end_ *= scaling_;
Done for the second.
I think since the eventual implementation is going to be setting the start to the initial I'll leave as is for now.
…g-11107-fixing-exponential-backoff
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Dismissed @dbolduc and @devbww from 12 discussions.
Reviewable status: 3 of 5 files reviewed, all discussions resolved (waiting on @coryan)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ack
Reviewable status: 3 of 5 files reviewed, all discussions resolved (waiting on @coryan)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 2 of 2 files at r14, all commit messages.
Reviewable status: complete! all files reviewed, all discussions resolved (waiting on @alevenberg)
/gcbrun |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed all commit messages.
Reviewable status: complete! all files reviewed, all discussions resolved (waiting on @alevenberg)
Fixes #11107
This change is