Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: correct exponential backoff ranges #11529

Merged

Conversation

alevenberg
Copy link
Member

@alevenberg alevenberg commented May 8, 2023

Fixes #11107

This change is Reviewable

@alevenberg alevenberg requested a review from a team as a code owner May 8, 2023 19:04
@alevenberg alevenberg linked an issue May 8, 2023 that may be closed by this pull request
@alevenberg alevenberg self-assigned this May 8, 2023
@alevenberg alevenberg force-pushed the bug-11107-fixing-exponential-backoff branch from d1cf191 to 1e77ece Compare May 8, 2023 19:56
devbww
devbww previously approved these changes May 8, 2023
google/cloud/internal/backoff_policy.h Outdated Show resolved Hide resolved
google/cloud/internal/backoff_policy.cc Outdated Show resolved Hide resolved
@devbww devbww self-requested a review May 8, 2023 20:00
Copy link
Member

@dbolduc dbolduc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 5 files reviewed, 4 unresolved discussions (waiting on @alevenberg and @devbww)


google/cloud/internal/backoff_policy.h line 131 at r4 (raw file):

      : initial_delay_(std::chrono::duration_cast<std::chrono::microseconds>(
            initial_delay)),
        current_delay_range_(2 * initial_delay_),

Part of the problem is that we should not be using 2 here. We should be using scaling. (We should also use scaling in the range calculation).

I should be able to write the following test:

TEST(ExponentialBackoffPolicy, RangeDeterminedByScalingFactor) {
  ExponentialBackoffPolicy tested(ms(10), ms(20), 1.1);

  auto delay = tested.OnCompletion();
  EXPECT_LE(ms(10), delay);
  EXPECT_GE(ms(11), delay);
}

@codecov
Copy link

codecov bot commented May 8, 2023

Codecov Report

Patch coverage: 100.00% and no project coverage change.

Comparison is base (67e51d2) 93.80% compared to head (1515f5b) 93.80%.

Additional details and impacted files
@@           Coverage Diff           @@
##             main   #11529   +/-   ##
=======================================
  Coverage   93.80%   93.80%           
=======================================
  Files        1818     1818           
  Lines      163671   163689   +18     
=======================================
+ Hits       153531   153550   +19     
+ Misses      10140    10139    -1     
Impacted Files Coverage Δ
google/cloud/internal/backoff_policy.cc 100.00% <100.00%> (ø)
google/cloud/internal/backoff_policy.h 100.00% <100.00%> (ø)
google/cloud/internal/backoff_policy_test.cc 100.00% <100.00%> (ø)

... and 5 files with indirect coverage changes

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

google/cloud/internal/backoff_policy.cc Outdated Show resolved Hide resolved
google/cloud/internal/backoff_policy.cc Outdated Show resolved Hide resolved
ci/install-cloud-sdk.sh Outdated Show resolved Hide resolved
google/cloud/internal/backoff_policy.h Outdated Show resolved Hide resolved
alevenberg added a commit to alevenberg/google-cloud-cpp that referenced this pull request May 9, 2023
alevenberg added a commit to alevenberg/google-cloud-cpp that referenced this pull request May 9, 2023
 - Use () around min and max to avoid macro expansion
 - Add a test for floating point numbers
 - Clarify the naming of current_delay_range_ by adding two parameters (one for the start and one for the end). In the long term we can remove this since we want to implement min jitter in issue googleapis#8755. Then the current_delay_start_ will always equal the initial_delay_.
@alevenberg alevenberg force-pushed the bug-11107-fixing-exponential-backoff branch from b485054 to 38c007e Compare May 9, 2023 18:57
@alevenberg
Copy link
Member Author

Discussed with @dbolduc

Will be rewriting to catch some edge cases

Copy link
Member Author

@alevenberg alevenberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 5 files reviewed, 6 unresolved discussions (waiting on @alevenberg, @dbolduc, and @devbww)


google/cloud/internal/backoff_policy.h line 137 at r3 (raw file):

Previously, devbww (Bradley White) wrote…

Could we move this into the initialization (and move "current" down in the member list) with:

  current_delay_range_((std::min)(2 * initial_delay_, maximum_delay_))

And then (unrelated to this PR, but perhaps addressable here), it seems like just current_delay_ would be preferable to current_delay_range_.

ack, addressed with the changes


google/cloud/internal/backoff_policy.h line 131 at r4 (raw file):

Previously, dbolduc (Darren Bolduc) wrote…

Part of the problem is that we should not be using 2 here. We should be using scaling. (We should also use scaling in the range calculation).

I should be able to write the following test:

TEST(ExponentialBackoffPolicy, RangeDeterminedByScalingFactor) {
  ExponentialBackoffPolicy tested(ms(10), ms(20), 1.1);

  auto delay = tested.OnCompletion();
  EXPECT_LE(ms(10), delay);
  EXPECT_GE(ms(11), delay);
}

Done


google/cloud/internal/backoff_policy.cc line 45 at r3 (raw file):

Previously, coryan (Carlos O'Ryan) wrote…

nit: on some platforms1 max() is a macro. We protect against macro expansion using (std::max)(...).

done

Footnotes

  1. Windows really, depending on what set of #include directives you have, including indirect ones

@alevenberg
Copy link
Member Author

Changes are made. Ready for review.

Copy link
Contributor

@coryan coryan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 2 files at r2, 1 of 1 files at r7, 2 of 3 files at r9, all commit messages.
Reviewable status: 4 of 5 files reviewed, 7 unresolved discussions (waiting on @alevenberg, @dbolduc, and @devbww)


google/cloud/internal/backoff_policy.cc line 29 at r9 (raw file):

std::chrono::milliseconds ExponentialBackoffPolicy::OnCompletion() {
  using std::chrono::duration_cast;
  using std::chrono::microseconds;

I think you want something like std::duration<double, std::micro> instead?


google/cloud/internal/backoff_policy.cc line 50 at r9 (raw file):

  }

  std::uniform_int_distribution<microseconds::rep> rng_distribution(

Since you changed current_delay_start to be std::duration<double, ..> you want this to be std::uniform_real_distribution<...>.

…on, modify using to the double granularity for micros
Copy link
Member Author

@alevenberg alevenberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 4 of 5 files reviewed, 7 unresolved discussions (waiting on @alevenberg, @dbolduc, and @devbww)


google/cloud/internal/backoff_policy.cc line 51 at r6 (raw file):

Previously, devbww (Bradley White) wrote…

I'm all for keeping values with the min:max range, but starting back at the initial delay after the start hits the end seems contrary to "backoff".

Ack. Modified the code so it will either return to the initial delay or the max divided by the scaling factor, whichever is higher.

Do you think it is worth throwing an error when we can't "backoff"?


google/cloud/internal/backoff_policy.cc line 56 at r6 (raw file):

Previously, devbww (Bradley White) wrote…

Consider these diffs. I believe that clarify the intention, and reduce casting.

diff --git a/google/cloud/internal/backoff_policy.cc b/google/cloud/internal/backoff_policy.cc
index 61bbe4df56..d4c21acd00 100644
--- a/google/cloud/internal/backoff_policy.cc
+++ b/google/cloud/internal/backoff_policy.cc
@@ -25,9 +25,6 @@ std::unique_ptr<BackoffPolicy> ExponentialBackoffPolicy::clone() const {
 }
 
 std::chrono::milliseconds ExponentialBackoffPolicy::OnCompletion() {
-  using std::chrono::duration_cast;
-  using std::chrono::microseconds;
-  using std::chrono::milliseconds;
   // We do not want to copy the seed in `clone()` because then all operations
   // will have the same sequence of backoffs. Nor do we want to use a shared
   // PRNG because that would require locking and some more complicated lifecycle
@@ -41,21 +38,17 @@ std::chrono::milliseconds ExponentialBackoffPolicy::OnCompletion() {
   if (!generator_) {
     generator_ = google::cloud::internal::MakeDefaultPRNG();
   }
-  std::uniform_int_distribution<microseconds::rep> rng_distribution(
+  std::uniform_real_distribution<DoubleMicroseconds::rep> rng_distribution(
       current_delay_start_.count(), current_delay_end_.count());
+
   // Randomized sleep period because it is possible that after some time all
   // client have same sleep period if we use only exponential backoff policy.
-  auto delay = microseconds(rng_distribution(*generator_));
+  auto delay = DoubleMicroseconds(rng_distribution(*generator_));
 
-  current_delay_start_ = current_delay_start_ == current_delay_end_
-                             ? initial_delay_
-                             : current_delay_end_;
-  current_delay_end_ = microseconds(static_cast<microseconds::rep>(
-      static_cast<double>(current_delay_end_.count()) * scaling_));
-  if (current_delay_end_ >= maximum_delay_) {
-    current_delay_end_ = maximum_delay_;
-  }
-  return duration_cast<milliseconds>(delay);
+  current_delay_start_ = current_delay_end_;
+  current_delay_end_ = CurrentDelayEnd();
+
+  return std::chrono::duration_cast<std::chrono::milliseconds>(delay);
 }
 
 }  // namespace internal
diff --git a/google/cloud/internal/backoff_policy.h b/google/cloud/internal/backoff_policy.h
index 19b4ddebde..b5721ec7c7 100644
--- a/google/cloud/internal/backoff_policy.h
+++ b/google/cloud/internal/backoff_policy.h
@@ -132,10 +132,7 @@ class ExponentialBackoffPolicy : public BackoffPolicy {
             maximum_delay)),
         scaling_(scaling),
         current_delay_start_(initial_delay_),
-        current_delay_end_(
-            (std::min)(std::chrono::duration_cast<std::chrono::microseconds>(
-                           scaling_ * initial_delay_),
-                       maximum_delay_)) {
+        current_delay_end_(CurrentDelayEnd()) {
     if (scaling_ <= 1.0) {
       google::cloud::internal::ThrowInvalidArgument(
           "scaling factor must be > 1.0");
@@ -157,12 +154,22 @@ class ExponentialBackoffPolicy : public BackoffPolicy {
   std::chrono::milliseconds OnCompletion() override;
 
  private:
+  using DoubleMicroseconds =
+      std::chrono::duration<double, std::chrono::milliseconds::period>;
+
+  DoubleMicroseconds CurrentDelayEnd() {
+    return std::min<DoubleMicroseconds>(current_delay_start_ * scaling_,
+                                        maximum_delay_);
+  }
+
   std::chrono::microseconds initial_delay_;
   std::chrono::microseconds maximum_delay_;
   double scaling_;
+
   // Stores both ends of the current delay range.
-  std::chrono::microseconds current_delay_start_;
-  std::chrono::microseconds current_delay_end_;
+  DoubleMicroseconds current_delay_start_;
+  DoubleMicroseconds current_delay_end_;
+
   absl::optional<DefaultPRNG> generator_;
 };
 

Yes agreed the intention is much clearer.


google/cloud/internal/backoff_policy.cc line 29 at r9 (raw file):

Previously, coryan (Carlos O'Ryan) wrote…

I think you want something like std::duration<double, std::micro> instead?

What is the purpose of the using decl inside the function body?


google/cloud/internal/backoff_policy.cc line 50 at r9 (raw file):

Previously, coryan (Carlos O'Ryan) wrote…

Since you changed current_delay_start to be std::duration<double, ..> you want this to be std::uniform_real_distribution<...>.

Done


google/cloud/internal/backoff_policy_test.cc line 49 at r9 (raw file):

Previously, devbww (Bradley White) wrote…

We probably want to #include "google/cloud/testing_util/chrono_output.h" here so that test failures are intelligible.

Done.

Copy link
Contributor

@coryan coryan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 2 of 2 files at r10, all commit messages.
Reviewable status: all files reviewed, 7 unresolved discussions (waiting on @dbolduc and @devbww)


google/cloud/internal/backoff_policy.cc line 29 at r9 (raw file):

Previously, alevenberg (Anna Levenberg) wrote…

What is the purpose of the using decl inside the function body?

Probably just to save typing / make the code easier to grok. You can reasonably argue that it serves no purpose anymore. Or even that it never did. I won't feel bad if you remove it.

Copy link
Member

@dbolduc dbolduc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think it is worth throwing an error when we can't "backoff"?

I think the case you are talking about is when we only use the min/max delays and ignore the scaling factor. That is fine, and we should not throw an error which could unnecessarily break existing code.

Comment on lines 56 to 57
current_delay_start_ = current_delay_end_;
current_delay_end_ = current_delay_end_ * scaling_;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I might have written

  current_delay_start_ *= scaling_;
  current_delay_end_ *= scaling_;

Comment on lines 137 to 141
current_delay_end_((std::min)(
std::chrono::duration_cast<
std::chrono::duration<double, std::micro>>(scaling_ *
initial_delay_),
maximum_delay_)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: now that we do the bounds checking before generating the random number, we don't need to do a min here. We can just say it = initial_delay_ * scaling_

google/cloud/internal/backoff_policy.h Show resolved Hide resolved
Copy link
Member Author

@alevenberg alevenberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 9 unresolved discussions (waiting on @dbolduc and @devbww)


google/cloud/internal/backoff_policy.h line 104 at r10 (raw file):

Previously, dbolduc (Darren Bolduc) wrote…

optional: rename initial_delay to minimum_delay.

(It is fine to say: "I will do this in a later PR". FWIW, I think we will want it for #8755)

Ack. Let's wait.


google/cloud/internal/backoff_policy.h line 141 at r10 (raw file):

Previously, dbolduc (Darren Bolduc) wrote…

nit: now that we do the bounds checking before generating the random number, we don't need to do a min here. We can just say it = initial_delay_ * scaling_

Good catch. Fixed.


google/cloud/internal/backoff_policy.cc line 29 at r9 (raw file):

Previously, coryan (Carlos O'Ryan) wrote…

Probably just to save typing / make the code easier to grok. You can reasonably argue that it serves no purpose anymore. Or even that it never did. I won't feel bad if you remove it.

I had never seen a using inside a function body before, I think it is fine to keep.


google/cloud/internal/backoff_policy.cc line 57 at r10 (raw file):

Previously, dbolduc (Darren Bolduc) wrote…

nit: I might have written

  current_delay_start_ *= scaling_;
  current_delay_end_ *= scaling_;

Done for the second.

I think since the eventual implementation is going to be setting the start to the initial I'll leave as is for now.

Copy link
Member Author

@alevenberg alevenberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dismissed @dbolduc and @devbww from 12 discussions.
Reviewable status: 3 of 5 files reviewed, all discussions resolved (waiting on @coryan)

Copy link
Member Author

@alevenberg alevenberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ack

Reviewable status: 3 of 5 files reviewed, all discussions resolved (waiting on @coryan)

Copy link
Member Author

@alevenberg alevenberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 2 of 2 files at r14, all commit messages.
Reviewable status: :shipit: complete! all files reviewed, all discussions resolved (waiting on @alevenberg)

@coryan
Copy link
Contributor

coryan commented May 11, 2023

/gcbrun

Copy link
Contributor

@coryan coryan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed all commit messages.
Reviewable status: :shipit: complete! all files reviewed, all discussions resolved (waiting on @alevenberg)

@alevenberg alevenberg merged commit 2ada1b9 into googleapis:main May 11, 2023
alevenberg added a commit to alevenberg/google-cloud-cpp that referenced this pull request May 11, 2023
@alevenberg alevenberg deleted the bug-11107-fixing-exponential-backoff branch May 11, 2023 18:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Exponential Backoff ranges are incorrect
4 participants