fix: correct exponential backoff ranges #11529

alevenberg · 2023-05-08T19:04:08Z

Fixes #11107

This change is

google/cloud/internal/backoff_policy.cc

google/cloud/internal/backoff_policy.h

google/cloud/internal/backoff_policy.cc

dbolduc

Reviewable status: 0 of 5 files reviewed, 4 unresolved discussions (waiting on @alevenberg and @devbww)

google/cloud/internal/backoff_policy.h line 131 at r4 (raw file):

      : initial_delay_(std::chrono::duration_cast<std::chrono::microseconds>(
            initial_delay)),
        current_delay_range_(2 * initial_delay_),

Part of the problem is that we should not be using 2 here. We should be using scaling. (We should also use scaling in the range calculation).

I should be able to write the following test:

TEST(ExponentialBackoffPolicy, RangeDeterminedByScalingFactor) {
  ExponentialBackoffPolicy tested(ms(10), ms(20), 1.1);

  auto delay = tested.OnCompletion();
  EXPECT_LE(ms(10), delay);
  EXPECT_GE(ms(11), delay);
}

codecov · 2023-05-08T20:51:55Z

Codecov Report

Patch coverage: 100.00% and no project coverage change.

Comparison is base (67e51d2) 93.80% compared to head (1515f5b) 93.80%.

Additional details and impacted files

@@           Coverage Diff           @@
##             main   #11529   +/-   ##
=======================================
  Coverage   93.80%   93.80%           
=======================================
  Files        1818     1818           
  Lines      163671   163689   +18     
=======================================
+ Hits       153531   153550   +19     
+ Misses      10140    10139    -1

Impacted Files	Coverage Δ
google/cloud/internal/backoff_policy.cc	`100.00% <100.00%> (ø)`
google/cloud/internal/backoff_policy.h	`100.00% <100.00%> (ø)`
google/cloud/internal/backoff_policy_test.cc	`100.00% <100.00%> (ø)`

... and 5 files with indirect coverage changes

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

google/cloud/internal/backoff_policy.cc

ci/install-cloud-sdk.sh

google/cloud/internal/backoff_policy.h

stale

- Use () around min and max to avoid macro expansion - Add a test for floating point numbers - Clarify the naming of current_delay_range_ by adding two parameters (one for the start and one for the end). In the long term we can remove this since we want to implement min jitter in issue googleapis#8755. Then the current_delay_start_ will always equal the initial_delay_.

alevenberg · 2023-05-09T21:09:36Z

Discussed with @dbolduc

Will be rewriting to catch some edge cases

alevenberg

Reviewable status: 0 of 5 files reviewed, 6 unresolved discussions (waiting on @alevenberg, @dbolduc, and @devbww)

google/cloud/internal/backoff_policy.h line 137 at r3 (raw file):

Previously, devbww (Bradley White) wrote…

Could we move this into the initialization (and move "current" down in the member list) with:
  current_delay_range_((std::min)(2 * initial_delay_, maximum_delay_))
And then (unrelated to this PR, but perhaps addressable here), it seems like just current_delay_ would be preferable to current_delay_range_.

ack, addressed with the changes

google/cloud/internal/backoff_policy.h line 131 at r4 (raw file):

Previously, dbolduc (Darren Bolduc) wrote…

Part of the problem is that we should not be using 2 here. We should be using scaling. (We should also use scaling in the range calculation).

I should be able to write the following test:
TEST(ExponentialBackoffPolicy, RangeDeterminedByScalingFactor) {
  ExponentialBackoffPolicy tested(ms(10), ms(20), 1.1);

  auto delay = tested.OnCompletion();
  EXPECT_LE(ms(10), delay);
  EXPECT_GE(ms(11), delay);
}

Done

google/cloud/internal/backoff_policy.cc line 45 at r3 (raw file):

Previously, coryan (Carlos O'Ryan) wrote…

nit: on some platforms¹ max() is a macro. We protect against macro expansion using (std::max)(...).

done

Windows really, depending on what set of #include directives you have, including indirect ones ↩

alevenberg · 2023-05-09T21:26:44Z

Changes are made. Ready for review.

google/cloud/internal/backoff_policy_test.cc

coryan

Reviewed 1 of 2 files at r2, 1 of 1 files at r7, 2 of 3 files at r9, all commit messages.
Reviewable status: 4 of 5 files reviewed, 7 unresolved discussions (waiting on @alevenberg, @dbolduc, and @devbww)

google/cloud/internal/backoff_policy.cc line 29 at r9 (raw file):

std::chrono::milliseconds ExponentialBackoffPolicy::OnCompletion() {
  using std::chrono::duration_cast;
  using std::chrono::microseconds;

I think you want something like std::duration<double, std::micro> instead?

google/cloud/internal/backoff_policy.cc line 50 at r9 (raw file):

  }

  std::uniform_int_distribution<microseconds::rep> rng_distribution(

Since you changed current_delay_start to be std::duration<double, ..> you want this to be std::uniform_real_distribution<...>.

…on, modify using to the double granularity for micros

alevenberg

Reviewable status: 4 of 5 files reviewed, 7 unresolved discussions (waiting on @alevenberg, @dbolduc, and @devbww)

google/cloud/internal/backoff_policy.cc line 51 at r6 (raw file):

Previously, devbww (Bradley White) wrote…

I'm all for keeping values with the min:max range, but starting back at the initial delay after the start hits the end seems contrary to "backoff".

Ack. Modified the code so it will either return to the initial delay or the max divided by the scaling factor, whichever is higher.

Do you think it is worth throwing an error when we can't "backoff"?

google/cloud/internal/backoff_policy.cc line 56 at r6 (raw file):

Previously, devbww (Bradley White) wrote…

Consider these diffs. I believe that clarify the intention, and reduce casting.

diff --git a/google/cloud/internal/backoff_policy.cc b/google/cloud/internal/backoff_policy.cc
index 61bbe4df56..d4c21acd00 100644
--- a/google/cloud/internal/backoff_policy.cc
+++ b/google/cloud/internal/backoff_policy.cc
@@ -25,9 +25,6 @@ std::unique_ptr<BackoffPolicy> ExponentialBackoffPolicy::clone() const {
 }
 
 std::chrono::milliseconds ExponentialBackoffPolicy::OnCompletion() {
-  using std::chrono::duration_cast;
-  using std::chrono::microseconds;
-  using std::chrono::milliseconds;
   // We do not want to copy the seed in `clone()` because then all operations
   // will have the same sequence of backoffs. Nor do we want to use a shared
   // PRNG because that would require locking and some more complicated lifecycle
@@ -41,21 +38,17 @@ std::chrono::milliseconds ExponentialBackoffPolicy::OnCompletion() {
   if (!generator_) {
     generator_ = google::cloud::internal::MakeDefaultPRNG();
   }
-  std::uniform_int_distribution<microseconds::rep> rng_distribution(
+  std::uniform_real_distribution<DoubleMicroseconds::rep> rng_distribution(
       current_delay_start_.count(), current_delay_end_.count());
+
   // Randomized sleep period because it is possible that after some time all
   // client have same sleep period if we use only exponential backoff policy.
-  auto delay = microseconds(rng_distribution(*generator_));
+  auto delay = DoubleMicroseconds(rng_distribution(*generator_));
 
-  current_delay_start_ = current_delay_start_ == current_delay_end_
-                             ? initial_delay_
-                             : current_delay_end_;
-  current_delay_end_ = microseconds(static_cast<microseconds::rep>(
-      static_cast<double>(current_delay_end_.count()) * scaling_));
-  if (current_delay_end_ >= maximum_delay_) {
-    current_delay_end_ = maximum_delay_;
-  }
-  return duration_cast<milliseconds>(delay);
+  current_delay_start_ = current_delay_end_;
+  current_delay_end_ = CurrentDelayEnd();
+
+  return std::chrono::duration_cast<std::chrono::milliseconds>(delay);
 }
 
 }  // namespace internal
diff --git a/google/cloud/internal/backoff_policy.h b/google/cloud/internal/backoff_policy.h
index 19b4ddebde..b5721ec7c7 100644
--- a/google/cloud/internal/backoff_policy.h
+++ b/google/cloud/internal/backoff_policy.h
@@ -132,10 +132,7 @@ class ExponentialBackoffPolicy : public BackoffPolicy {
             maximum_delay)),
         scaling_(scaling),
         current_delay_start_(initial_delay_),
-        current_delay_end_(
-            (std::min)(std::chrono::duration_cast<std::chrono::microseconds>(
-                           scaling_ * initial_delay_),
-                       maximum_delay_)) {
+        current_delay_end_(CurrentDelayEnd()) {
     if (scaling_ <= 1.0) {
       google::cloud::internal::ThrowInvalidArgument(
           "scaling factor must be > 1.0");
@@ -157,12 +154,22 @@ class ExponentialBackoffPolicy : public BackoffPolicy {
   std::chrono::milliseconds OnCompletion() override;
 
  private:
+  using DoubleMicroseconds =
+      std::chrono::duration<double, std::chrono::milliseconds::period>;
+
+  DoubleMicroseconds CurrentDelayEnd() {
+    return std::min<DoubleMicroseconds>(current_delay_start_ * scaling_,
+                                        maximum_delay_);
+  }
+
   std::chrono::microseconds initial_delay_;
   std::chrono::microseconds maximum_delay_;
   double scaling_;
+
   // Stores both ends of the current delay range.
-  std::chrono::microseconds current_delay_start_;
-  std::chrono::microseconds current_delay_end_;
+  DoubleMicroseconds current_delay_start_;
+  DoubleMicroseconds current_delay_end_;
+
   absl::optional<DefaultPRNG> generator_;
 };

Yes agreed the intention is much clearer.

google/cloud/internal/backoff_policy.cc line 29 at r9 (raw file):

Previously, coryan (Carlos O'Ryan) wrote…

I think you want something like std::duration<double, std::micro> instead?

What is the purpose of the using decl inside the function body?

google/cloud/internal/backoff_policy.cc line 50 at r9 (raw file):

Previously, coryan (Carlos O'Ryan) wrote…

Since you changed current_delay_start to be std::duration<double, ..> you want this to be std::uniform_real_distribution<...>.

Done

google/cloud/internal/backoff_policy_test.cc line 49 at r9 (raw file):

Previously, devbww (Bradley White) wrote…

We probably want to #include "google/cloud/testing_util/chrono_output.h" here so that test failures are intelligible.

Done.

coryan

Reviewed 2 of 2 files at r10, all commit messages.
Reviewable status: all files reviewed, 7 unresolved discussions (waiting on @dbolduc and @devbww)

google/cloud/internal/backoff_policy.cc line 29 at r9 (raw file):

Previously, alevenberg (Anna Levenberg) wrote…

What is the purpose of the using decl inside the function body?

Probably just to save typing / make the code easier to grok. You can reasonably argue that it serves no purpose anymore. Or even that it never did. I won't feel bad if you remove it.

dbolduc

Do you think it is worth throwing an error when we can't "backoff"?

I think the case you are talking about is when we only use the min/max delays and ignore the scaling factor. That is fine, and we should not throw an error which could unnecessarily break existing code.

dbolduc · 2023-05-10T14:41:15Z

google/cloud/internal/backoff_policy.cc

+  current_delay_start_ = current_delay_end_;
+  current_delay_end_ = current_delay_end_ * scaling_;


nit: I might have written

current_delay_start_ *= scaling_; current_delay_end_ *= scaling_;

dbolduc · 2023-05-10T14:44:06Z

google/cloud/internal/backoff_policy.h

+        current_delay_end_((std::min)(
+            std::chrono::duration_cast<
+                std::chrono::duration<double, std::micro>>(scaling_ *
+                                                           initial_delay_),
+            maximum_delay_)) {


nit: now that we do the bounds checking before generating the random number, we don't need to do a min here. We can just say it = initial_delay_ * scaling_

google/cloud/internal/backoff_policy.h

alevenberg

Reviewable status: all files reviewed, 9 unresolved discussions (waiting on @dbolduc and @devbww)

google/cloud/internal/backoff_policy.h line 104 at r10 (raw file):

Previously, dbolduc (Darren Bolduc) wrote…

optional: rename initial_delay to minimum_delay.

(It is fine to say: "I will do this in a later PR". FWIW, I think we will want it for #8755)

Ack. Let's wait.

google/cloud/internal/backoff_policy.h line 141 at r10 (raw file):

Previously, dbolduc (Darren Bolduc) wrote…

nit: now that we do the bounds checking before generating the random number, we don't need to do a min here. We can just say it = initial_delay_ * scaling_

Good catch. Fixed.

google/cloud/internal/backoff_policy.cc line 29 at r9 (raw file):

Previously, coryan (Carlos O'Ryan) wrote…

Probably just to save typing / make the code easier to grok. You can reasonably argue that it serves no purpose anymore. Or even that it never did. I won't feel bad if you remove it.

I had never seen a using inside a function body before, I think it is fine to keep.

google/cloud/internal/backoff_policy.cc line 57 at r10 (raw file):

Previously, dbolduc (Darren Bolduc) wrote…

nit: I might have written

  current_delay_start_ *= scaling_;
  current_delay_end_ *= scaling_;

Done for the second.

I think since the eventual implementation is going to be setting the start to the initial I'll leave as is for now.

…tructor

google/cloud/internal/backoff_policy.h

…g-11107-fixing-exponential-backoff

google/cloud/internal/backoff_policy.cc

alevenberg

Dismissed @dbolduc and @devbww from 12 discussions.
Reviewable status: 3 of 5 files reviewed, all discussions resolved (waiting on @coryan)

alevenberg

Ack

Reviewable status: 3 of 5 files reviewed, all discussions resolved (waiting on @coryan)

alevenberg

Reviewed 2 of 2 files at r14, all commit messages.
Reviewable status: complete! all files reviewed, all discussions resolved (waiting on @alevenberg)

coryan · 2023-05-11T12:13:52Z

/gcbrun

coryan

Reviewed all commit messages.
Reviewable status: complete! all files reviewed, all discussions resolved (waiting on @alevenberg)

alevenberg requested a review from a team as a code owner May 8, 2023 19:04

alevenberg linked an issue May 8, 2023 that may be closed by this pull request

Exponential Backoff ranges are incorrect #11107

Closed

alevenberg self-assigned this May 8, 2023

coryan reviewed May 8, 2023

View reviewed changes

google/cloud/internal/backoff_policy.cc Outdated Show resolved Hide resolved

alevenberg force-pushed the bug-11107-fixing-exponential-backoff branch from d1cf191 to 1e77ece Compare May 8, 2023 19:56

devbww previously approved these changes May 8, 2023

View reviewed changes

google/cloud/internal/backoff_policy.h Outdated Show resolved Hide resolved

google/cloud/internal/backoff_policy.cc Outdated Show resolved Hide resolved

devbww self-requested a review May 8, 2023 20:00

dbolduc requested changes May 8, 2023

View reviewed changes

devbww reviewed May 9, 2023

View reviewed changes

google/cloud/internal/backoff_policy.cc Outdated Show resolved Hide resolved

google/cloud/internal/backoff_policy.cc Outdated Show resolved Hide resolved

ci/install-cloud-sdk.sh Outdated Show resolved Hide resolved

google/cloud/internal/backoff_policy.h Outdated Show resolved Hide resolved

alevenberg added a commit to alevenberg/google-cloud-cpp that referenced this pull request May 9, 2023

fix(exp-backoff): Correct exponential backoff ranges (googleapis#11529)

6f18585

alevenberg added 2 commits May 9, 2023 18:54

fix(exp-backoff): Correct exponential backoff ranges (googleapis#11529)

0725e85

fix: address comments for exp backoff changes

38c007e

alevenberg force-pushed the bug-11107-fixing-exponential-backoff branch from b485054 to 38c007e Compare May 9, 2023 18:57

fix: add backoff changes moving scaling

b11c8b7

fix: rewrite code so it is cleaner

429c412

alevenberg commented May 9, 2023

View reviewed changes

devbww reviewed May 9, 2023

View reviewed changes

google/cloud/internal/backoff_policy_test.cc Show resolved Hide resolved

coryan reviewed May 9, 2023

View reviewed changes

fix: address comments (add header, change from int to real distributi…

afe1723

…on, modify using to the double granularity for micros

alevenberg commented May 10, 2023

View reviewed changes

coryan reviewed May 10, 2023

View reviewed changes

dbolduc approved these changes May 10, 2023

View reviewed changes

alevenberg commented May 10, 2023

View reviewed changes

fix: condense the line about scaling, remove unnecesary min from cons…

0004ea9

…tructor

dbolduc reviewed May 10, 2023

View reviewed changes

google/cloud/internal/backoff_policy.h Outdated Show resolved Hide resolved

alevenberg added 2 commits May 10, 2023 15:48

Merge branch 'main' of github.com:googleapis/google-cloud-cpp into bu…

73ee63c

…g-11107-fixing-exponential-backoff

fix: remove unnecessary cast

1a863a8

devbww reviewed May 10, 2023

View reviewed changes

google/cloud/internal/backoff_policy.cc Outdated Show resolved Hide resolved

fix: remove castings and rename microsecond using to double microsecond

52f66dd

devbww reviewed May 10, 2023

View reviewed changes

google/cloud/internal/backoff_policy.cc Outdated Show resolved Hide resolved

fix: move double microsecond alias to a private class decl

1515f5b

devbww approved these changes May 10, 2023

View reviewed changes

dbolduc approved these changes May 10, 2023

View reviewed changes

alevenberg commented May 11, 2023

View reviewed changes

coryan approved these changes May 11, 2023

View reviewed changes

alevenberg merged commit 2ada1b9 into googleapis:main May 11, 2023

alevenberg added a commit to alevenberg/google-cloud-cpp that referenced this pull request May 11, 2023

fix: Correct exponential backoff ranges (googleapis#11529)

67efd5c

alevenberg deleted the bug-11107-fixing-exponential-backoff branch May 11, 2023 18:22

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: correct exponential backoff ranges #11529

fix: correct exponential backoff ranges #11529

alevenberg commented May 8, 2023 •

edited

Loading

dbolduc left a comment •

edited

Loading

codecov bot commented May 8, 2023 •

edited

Loading

alevenberg commented May 9, 2023

alevenberg left a comment

alevenberg commented May 9, 2023

coryan left a comment

alevenberg left a comment

coryan left a comment

dbolduc left a comment

dbolduc May 10, 2023

dbolduc May 10, 2023

alevenberg left a comment

alevenberg left a comment

alevenberg left a comment

alevenberg left a comment

coryan commented May 11, 2023

coryan left a comment

		current_delay_start_ = current_delay_end_;
		current_delay_end_ = current_delay_end_ * scaling_;

fix: correct exponential backoff ranges #11529

fix: correct exponential backoff ranges #11529

Conversation

alevenberg commented May 8, 2023 • edited Loading

dbolduc left a comment • edited Loading

Choose a reason for hiding this comment

codecov bot commented May 8, 2023 • edited Loading

Codecov Report

alevenberg commented May 9, 2023

alevenberg left a comment

Choose a reason for hiding this comment

Footnotes

alevenberg commented May 9, 2023

coryan left a comment

Choose a reason for hiding this comment

alevenberg left a comment

Choose a reason for hiding this comment

coryan left a comment

Choose a reason for hiding this comment

dbolduc left a comment

Choose a reason for hiding this comment

dbolduc May 10, 2023

Choose a reason for hiding this comment

dbolduc May 10, 2023

Choose a reason for hiding this comment

alevenberg left a comment

Choose a reason for hiding this comment

alevenberg left a comment

Choose a reason for hiding this comment

alevenberg left a comment

Choose a reason for hiding this comment

alevenberg left a comment

Choose a reason for hiding this comment

coryan commented May 11, 2023

coryan left a comment

Choose a reason for hiding this comment

alevenberg commented May 8, 2023 •

edited

Loading

dbolduc left a comment •

edited

Loading

codecov bot commented May 8, 2023 •

edited

Loading