Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add new constructor for exponential backoff policy #11650

Merged
merged 15 commits into from
May 26, 2023

Conversation

alevenberg
Copy link
Member

@alevenberg alevenberg commented May 18, 2023

Add a new constructor to enable a full jitter policy implementation by providing a minimum delay and initial delay range (part of work for #8755)


This change is Reviewable

@alevenberg alevenberg requested a review from a team as a code owner May 18, 2023 20:23
@alevenberg alevenberg assigned alevenberg and unassigned alevenberg May 18, 2023
Copy link
Contributor

@coryan coryan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have nothing useful to contribute here, except some nits:

conventional commits do not start with an uppercase.

google/cloud/internal/backoff_policy.h Outdated Show resolved Hide resolved
@alevenberg alevenberg changed the title fix: Add new constructor for exponential backoff policy fix: add new constructor for exponential backoff policy May 18, 2023
Copy link
Contributor

@devbww devbww left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some comments about comments/naming.

google/cloud/internal/backoff_policy.h Outdated Show resolved Hide resolved
google/cloud/internal/backoff_policy.h Outdated Show resolved Hide resolved
google/cloud/internal/backoff_policy.h Outdated Show resolved Hide resolved
google/cloud/internal/backoff_policy.h Outdated Show resolved Hide resolved
google/cloud/internal/backoff_policy.h Outdated Show resolved Hide resolved
google/cloud/internal/backoff_policy.h Outdated Show resolved Hide resolved
google/cloud/internal/backoff_policy.h Outdated Show resolved Hide resolved
* auto r2 = ExponentialBackoffPolicy<S,T>(0min, 10min, 10min + 2s, 1.002);
* @endcode
*
* @param initial_delay_range how long to wait after the first (unsuccessful)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"initial_delay_range" implies (to me) two delay values that form a range, e.g., [d1 ... d2], which isn't the case here. Might there be a better name?

This is most jarring when the previous contructor's "initial_delay" is used to set this constructor's "initial_delay_range".

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree it is not ideal.

An initial change is /s/initial_delay_range/initial_delay

google/cloud/internal/backoff_policy.h Outdated Show resolved Hide resolved
google/cloud/internal/backoff_policy.h Outdated Show resolved Hide resolved
Copy link
Member

@dbolduc dbolduc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The main thing is that we should not think in terms of ranges. Let's think in terms of the endpoints of the range. We will fix the minimum delay. Then grow the upper end of the range by the scaling factor, until it hits the overall maximum delay.

The only modifications to existing tests should be the lower bounds changing to equal the minimum/initial delay. (We should still add new tests for the new feature)

Reviewed all commit messages.
Reviewable status: 0 of 4 files reviewed, 26 unresolved discussions (waiting on @alevenberg, @coryan, and @devbww)


-- commits line 2 at r4:
nit: I would call this a feat: .... You are adding something new to the library that it could not do before. And the interface is user-facing.


google/cloud/internal/backoff_policy.h line 129 at r4 (raw file):

   */
  template <typename Rep1, typename Period1, typename Rep2, typename Period2>
  ExponentialBackoffPolicy(std::chrono::duration<Rep1, Period1> initial_delay,

I would change the name from initial_delay to minimum_delay now.


google/cloud/internal/backoff_policy.h line 148 at r4 (raw file):

   * auto r2 = ExponentialBackoffPolicy<S,T>(0min, 10min, 10min + 2s, 1.002);

I am not sure what this example is demonstrating. I would just remove it.


google/cloud/internal/backoff_policy.h line 178 at r4 (raw file):

  template <typename Rep1, typename Period1, typename Rep2, typename Period2,
            typename Rep3, typename Period3>
  ExponentialBackoffPolicy(std::chrono::duration<Rep1, Period1> initial_delay,

I am very confused about the difference between the initial_delay and minimum_delay. And the examples do not help me.

I think we can make the API clearer:

ExponentialBackoffPolicy(minimum_delay, initial_delay_upper_bound, maximum_delay, scaling);

I think this is more intuitive because minimum_delay <= initial_delay_upper_bound <= maximum_delay.


google/cloud/internal/backoff_policy.h line 186 at r4 (raw file):

        maximum_delay_(maximum_delay),
        scaling_(scaling),
        current_delay_range_(initial_delay_),

We shouldn't think in terms of the range. We should delete this.


google/cloud/internal/backoff_policy.h line 189 at r4 (raw file):

        current_delay_start_(minimum_delay_),
        current_delay_end_(
            (std::min)(minimum_delay_ + current_delay_range_, maximum_delay_)) {

We don't need to do this arithmetic if we ask for an initial_delay_upper_bound instead of an initial_delay.

I do not understand the arithmetic either. Doubling the initial/minimum delay is why tests likeDetermineRangeUsingScalingFactor started failing.


google/cloud/internal/backoff_policy.h line 191 at r4 (raw file):

            (std::min)(minimum_delay_ + current_delay_range_, maximum_delay_)) {
    if (scaling_ <= 1.0) {
      google::cloud::internal::ThrowInvalidArgument(

We might want to add argument checking for the durations. Especially if our code makes assumptions about them. Something like:

if (minimum_delay_ > initial_delay_upper_bound_) {
  ThrowInvalidArgument("Initial delay upper bound must be greater than or equal to the minimum delay");
}

Note that no existing code would be broken by adding this check.


google/cloud/internal/backoff_policy.h line 219 at r4 (raw file):

  double scaling_;
  DoubleMicroseconds current_delay_range_;
  DoubleMicroseconds current_delay_start_;

current_delay_start_ should always be minimum_delay_. So we don't need it.


google/cloud/internal/backoff_policy.cc line 24 at r4 (raw file):

std::unique_ptr<BackoffPolicy> ExponentialBackoffPolicy::clone() const {
  return std::make_unique<ExponentialBackoffPolicy>(
      initial_delay_, minimum_delay_, maximum_delay_, scaling_);

Lol, we flipped the order of the parameters.

Now you need to add a test where initial_delay_ != minimum_delay_ to catch it.


google/cloud/internal/backoff_policy.cc line 42 at r4 (raw file):

  }

  if (current_delay_end_ >= maximum_delay_) {

I liked doing this check before generating the random number. We could also write it as:

current_delay_end_ = (std::min)(current_delay_end_, maximum_delay_);

google/cloud/internal/backoff_policy.cc line 48 at r4 (raw file):

  auto delay = DoubleMicroseconds(rng_distribution(*generator_));

  if (current_delay_end_ < maximum_delay_) {

If we do bound checking before random generation, this can just be:

current_delay_end_ *= scaling_;

google/cloud/internal/backoff_policy_test.cc line 36 at r4 (raw file):

  delay = tested.OnCompletion();
  EXPECT_LE(ms(10), delay);
  EXPECT_GE(ms(30), delay);

why is this changing? Only the lower bounds should change.


google/cloud/internal/backoff_policy_test.cc line 63 at r4 (raw file):

  auto delay = tested.OnCompletion();
  EXPECT_LE(ms(1000), delay);
  EXPECT_GE(ms(1001), delay);

This test was fine. Why did it change?


google/cloud/internal/backoff_policy_test.cc line 111 at r4 (raw file):

  auto delay = tested.OnCompletion();
  EXPECT_LE(ms(100), delay) << "delay=" << delay.count() << "ms";
  EXPECT_GE(ms(200), delay) << "delay=" << delay.count() << "ms";

I do not think upper bound expectations should change.


google/cloud/internal/backoff_policy_test.cc line 130 at r4 (raw file):

  delay = tested->OnCompletion();
  EXPECT_LE(ms(10), delay);
  EXPECT_GE(ms(30), delay);

same.


google/cloud/internal/backoff_policy_test.cc line 175 at r4 (raw file):

  // whatever the C++ library uses for entropy (typically /dev/random and/or the
  // RND instruction) manage to produce the same 20 numbers. If that happens,
  // my apologies.... and remember to buy yourself a lottery ticket today.

Lol


google/cloud/storage/examples/storage_object_samples.cc line 583 at r4 (raw file):

  options.set<gcs::IdempotencyPolicyOption>(
      gcs::StrictIdempotencyPolicy().clone());
  // On error, it backs off for 2 seconds, then 5 seconds, then 14 seconds, etc.

"On error, it backs off for 1-3 seconds, then 1-9 seconds, then 1-27 seconds, etc." ?

Or just revert the change. I think the reader will get the point.


google/cloud/storage/examples/storage_object_samples.cc line 589 at r4 (raw file):

      gcs::ExponentialBackoffPolicy(
          /*initial_delay=*/std::chrono::seconds(1),
          /*minimum_delay=*/std::chrono::seconds(1),

I would revert this. It is equivalent to the existing code. So why change it?

The point of adding the other argument to the constructor is to support ranges like (0, N). Storage suggests using a minimum backoff of 1 second, so the new constructor is less applicable.

Copy link
Member Author

@alevenberg alevenberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ack, made changes

Reviewable status: 0 of 4 files reviewed, 25 unresolved discussions (waiting on @coryan, @dbolduc, and @devbww)


google/cloud/internal/backoff_policy.h line 83 at r1 (raw file):

Previously, devbww (Bradley White) wrote…

s/factor.The/factor. The/

Done.


google/cloud/internal/backoff_policy.h line 84 at r1 (raw file):

Previously, devbww (Bradley White) wrote…

s/delay, if/delay. If/

Done.


google/cloud/internal/backoff_policy.h line 84 at r1 (raw file):

Previously, devbww (Bradley White) wrote…

Re "default contructor": See below.

Done.


google/cloud/internal/backoff_policy.h line 85 at r1 (raw file):

Previously, devbww (Bradley White) wrote…

s/begin //

Done.


google/cloud/internal/backoff_policy.h line 86 at r1 (raw file):

Previously, devbww (Bradley White) wrote…

s/wait //

Done.


google/cloud/internal/backoff_policy.h line 87 at r1 (raw file):

Previously, devbww (Bradley White) wrote…

Inherited problem, but s/avoid/avoid the/

Done.


google/cloud/internal/backoff_policy.h line 129 at r4 (raw file):

Previously, dbolduc (Darren Bolduc) wrote…

I would change the name from initial_delay to minimum_delay now.

Done.


google/cloud/internal/backoff_policy.h line 148 at r4 (raw file):

Previously, dbolduc (Darren Bolduc) wrote…

I am not sure what this example is demonstrating. I would just remove it.

Done.


google/cloud/internal/backoff_policy.h line 178 at r4 (raw file):

Previously, dbolduc (Darren Bolduc) wrote…

I am very confused about the difference between the initial_delay and minimum_delay. And the examples do not help me.

I think we can make the API clearer:

ExponentialBackoffPolicy(minimum_delay, initial_delay_upper_bound, maximum_delay, scaling);

I think this is more intuitive because minimum_delay <= initial_delay_upper_bound <= maximum_delay.

Done.


google/cloud/internal/backoff_policy.h line 186 at r4 (raw file):

Previously, dbolduc (Darren Bolduc) wrote…

We shouldn't think in terms of the range. We should delete this.

Agreed. Done.


google/cloud/internal/backoff_policy.h line 189 at r4 (raw file):

Previously, dbolduc (Darren Bolduc) wrote…

We don't need to do this arithmetic if we ask for an initial_delay_upper_bound instead of an initial_delay.

I do not understand the arithmetic either. Doubling the initial/minimum delay is why tests likeDetermineRangeUsingScalingFactor started failing.

Done.


google/cloud/internal/backoff_policy.h line 191 at r4 (raw file):

Previously, dbolduc (Darren Bolduc) wrote…

We might want to add argument checking for the durations. Especially if our code makes assumptions about them. Something like:

if (minimum_delay_ > initial_delay_upper_bound_) {
  ThrowInvalidArgument("Initial delay upper bound must be greater than or equal to the minimum delay");
}

Note that no existing code would be broken by adding this check.

Done.


google/cloud/internal/backoff_policy.h line 219 at r4 (raw file):

Previously, dbolduc (Darren Bolduc) wrote…

current_delay_start_ should always be minimum_delay_. So we don't need it.

Done.


google/cloud/internal/backoff_policy.cc line 24 at r4 (raw file):

Previously, dbolduc (Darren Bolduc) wrote…

Lol, we flipped the order of the parameters.

Now you need to add a test where initial_delay_ != minimum_delay_ to catch it.

I think the order of the params is correct and the check initial_delay_upper_bound_ < minimum_delay_ should catch the issue


google/cloud/internal/backoff_policy.cc line 42 at r4 (raw file):

Previously, dbolduc (Darren Bolduc) wrote…

I liked doing this check before generating the random number. We could also write it as:

current_delay_end_ = (std::min)(current_delay_end_, maximum_delay_);

Done.


google/cloud/internal/backoff_policy.cc line 48 at r4 (raw file):

Previously, dbolduc (Darren Bolduc) wrote…

If we do bound checking before random generation, this can just be:

current_delay_end_ *= scaling_;

Done.


google/cloud/internal/backoff_policy_test.cc line 36 at r4 (raw file):

Previously, dbolduc (Darren Bolduc) wrote…

why is this changing? Only the lower bounds should change.

Done.


google/cloud/internal/backoff_policy_test.cc line 63 at r4 (raw file):

Previously, dbolduc (Darren Bolduc) wrote…

This test was fine. Why did it change?

discussed offline. fixed tests.


google/cloud/internal/backoff_policy_test.cc line 111 at r4 (raw file):

Previously, dbolduc (Darren Bolduc) wrote…

I do not think upper bound expectations should change.

Done.


google/cloud/internal/backoff_policy_test.cc line 130 at r4 (raw file):

Previously, dbolduc (Darren Bolduc) wrote…

same.

Done.


google/cloud/storage/examples/storage_object_samples.cc line 583 at r4 (raw file):

Previously, dbolduc (Darren Bolduc) wrote…

"On error, it backs off for 1-3 seconds, then 1-9 seconds, then 1-27 seconds, etc." ?

Or just revert the change. I think the reader will get the point.

I think it's not obvious what the behavior is with this description. Updated it to include the intervals.


google/cloud/storage/examples/storage_object_samples.cc line 589 at r4 (raw file):

Previously, dbolduc (Darren Bolduc) wrote…

I would revert this. It is equivalent to the existing code. So why change it?

The point of adding the other argument to the constructor is to support ranges like (0, N). Storage suggests using a minimum backoff of 1 second, so the new constructor is less applicable.

Reverted this change. Leaving the change to the comment to clarify the behavior.

@alevenberg alevenberg changed the title fix: add new constructor for exponential backoff policy feat: add new constructor for exponential backoff policy May 19, 2023
google/cloud/storage/examples/storage_object_samples.cc Outdated Show resolved Hide resolved
google/cloud/storage/examples/storage_object_samples.cc Outdated Show resolved Hide resolved
google/cloud/internal/backoff_policy.h Outdated Show resolved Hide resolved
google/cloud/internal/backoff_policy.h Outdated Show resolved Hide resolved
google/cloud/internal/backoff_policy.h Outdated Show resolved Hide resolved
@codecov
Copy link

codecov bot commented May 22, 2023

Codecov Report

Patch coverage: 100.00% and project coverage change: +0.02 🎉

Comparison is base (21e4a27) 93.76% compared to head (7b24a78) 93.78%.

❗ Current head 7b24a78 differs from pull request most recent head ed34ffc. Consider uploading reports for the commit ed34ffc to get more accurate results

Additional details and impacted files
@@            Coverage Diff             @@
##             main   #11650      +/-   ##
==========================================
+ Coverage   93.76%   93.78%   +0.02%     
==========================================
  Files        1831     1829       -2     
  Lines      165012   164970      -42     
==========================================
- Hits       154729   154723       -6     
+ Misses      10283    10247      -36     
Impacted Files Coverage Δ
...e/cloud/storage/examples/storage_object_samples.cc 93.52% <ø> (ø)
google/cloud/internal/backoff_policy.cc 100.00% <100.00%> (ø)
google/cloud/internal/backoff_policy.h 96.42% <100.00%> (-3.58%) ⬇️
google/cloud/internal/backoff_policy_test.cc 100.00% <100.00%> (ø)

... and 26 files with indirect coverage changes

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

@alevenberg alevenberg force-pushed the bug-8755 branch 2 times, most recently from 02743e7 to 141655a Compare May 23, 2023 15:18
Copy link
Member Author

@alevenberg alevenberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 25 files reviewed, 30 unresolved discussions (waiting on @coryan, @dbolduc, and @devbww)


google/cloud/internal/backoff_policy.h line 129 at r5 (raw file):

Previously, dbolduc (Darren Bolduc) wrote…

Thanks for updating the name. Can we also update the @p initial_delays in the documentation?

Done


google/cloud/internal/backoff_policy.h line 151 at r5 (raw file):

Previously, dbolduc (Darren Bolduc) wrote…

s/how long to wait/the longest possible delay/

Done.


google/cloud/internal/backoff_policy.h line 180 at r5 (raw file):

Previously, dbolduc (Darren Bolduc) wrote…

I think it is way more intuitive to switch the order of these two parameters.

Agreed. Done.


google/cloud/storage/examples/storage_object_samples.cc line 583 at r5 (raw file):

Previously, dbolduc (Darren Bolduc) wrote…

s/the following intervals//

Done.


google/cloud/storage/examples/storage_object_samples.cc line 586 at r5 (raw file):

Previously, dbolduc (Darren Bolduc) wrote…

We can remove this sentence because "introduces jitter" means the same thing as "random delay"

Done.

Copy link
Member

@dbolduc dbolduc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 25 files reviewed, 34 unresolved discussions (waiting on @alevenberg, @coryan, and @devbww)


google/cloud/internal/backoff_policy.h line 129 at r4 (raw file):

Previously, alevenberg (Anna Levenberg) wrote…

Done.

I hate to say it, but initial_delay is a better name again. Sorry!!!


google/cloud/internal/backoff_policy.cc line 43 at r7 (raw file):

  if (current_delay_end_ >= maximum_delay_) {
    current_delay_start_ = std::max(maximum_delay_ / scaling_, initial_delay_);

I think this would become:

current_delay_start_ = (std::min)(current_delay_start_, (std::max)(maximum_delay_ / scaling_, initial_delay_));

to support this case:

delay = tested.OnCompletion();
EXPECT_LE(ms(0), delay);
EXPECT_GE(ms(50), delay);


google/cloud/internal/backoff_policy.cc line 53 at r7 (raw file):

  auto delay = DoubleMicroseconds(rng_distribution(*generator_));

  current_delay_start_ = current_delay_end_;

I think we should restore the current_delay_start_ logic, except changing this line to be current_delay_start_ *= scaling_;


google/cloud/internal/backoff_policy_test.cc line 35 at r7 (raw file):

  EXPECT_GE(ms(20), delay);
  delay = tested.OnCompletion();
  EXPECT_LE(ms(10), delay);

Here, and elsewhere, we should restore the lower bounds in tests of the old constructor.


google/cloud/internal/backoff_policy_test.cc line 43 at r7 (raw file):

/// @test Verify a full jitter policy, where the minimum delay is set to 0.
TEST(ExponentialBackoffPolicy, VerifyFullJitterPolicy) {

comment: This remains a good test 👍


google/cloud/storage/examples/storage_object_samples.cc line 583 at r4 (raw file):

Previously, alevenberg (Anna Levenberg) wrote…

I think it's not obvious what the behavior is with this description. Updated it to include the intervals.

Argh, this is wrong again. Now its [1, 3], [3, 9], [9, 27]

Copy link
Member Author

@alevenberg alevenberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 25 files reviewed, 34 unresolved discussions (waiting on @coryan, @dbolduc, and @devbww)


google/cloud/internal/backoff_policy.h line 129 at r4 (raw file):

Previously, dbolduc (Darren Bolduc) wrote…

I hate to say it, but initial_delay is a better name again. Sorry!!!

Done.

Copy link
Member Author

@alevenberg alevenberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Modified so that there is an extra parameter (scaling_lower_bound) that can move if set. The constructor can support all 3 behaviors discussed (FullJitter, MinJitter, and the initial ExponentialBackoff)

Reviewable status: 0 of 25 files reviewed, 34 unresolved discussions (waiting on @coryan, @dbolduc, and @devbww)


google/cloud/internal/backoff_policy.cc line 43 at r7 (raw file):

Previously, dbolduc (Darren Bolduc) wrote…

I think this would become:

current_delay_start_ = (std::min)(current_delay_start_, (std::max)(maximum_delay_ / scaling_, initial_delay_));

to support this case:

delay = tested.OnCompletion();
EXPECT_LE(ms(0), delay);
EXPECT_GE(ms(50), delay);

Done.


google/cloud/internal/backoff_policy.cc line 53 at r7 (raw file):

Previously, dbolduc (Darren Bolduc) wrote…

I think we should restore the current_delay_start_ logic, except changing this line to be current_delay_start_ *= scaling_;

Done.


google/cloud/internal/backoff_policy_test.cc line 35 at r7 (raw file):

Previously, dbolduc (Darren Bolduc) wrote…

Here, and elsewhere, we should restore the lower bounds in tests of the old constructor.

Done.


google/cloud/internal/backoff_policy_test.cc line 43 at r7 (raw file):

Previously, dbolduc (Darren Bolduc) wrote…

comment: This remains a good test 👍

Done.


google/cloud/storage/examples/storage_object_samples.cc line 583 at r4 (raw file):

Previously, dbolduc (Darren Bolduc) wrote…

Argh, this is wrong again. Now its [1, 3], [3, 9], [9, 27]

Done.

@dbolduc
Copy link
Member

dbolduc commented May 24, 2023

Modified ...

I do not see any new changes.

@alevenberg alevenberg force-pushed the bug-8755 branch 3 times, most recently from fee7b6f to f112d24 Compare May 25, 2023 17:22
Copy link
Member

@dbolduc dbolduc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very close. I have mostly documentation nits. I don't quite agree with the lower bound logic.

google/cloud/internal/backoff_policy.h Outdated Show resolved Hide resolved
google/cloud/internal/backoff_policy.h Show resolved Hide resolved
google/cloud/internal/backoff_policy.h Outdated Show resolved Hide resolved
google/cloud/internal/backoff_policy.h Outdated Show resolved Hide resolved
google/cloud/internal/backoff_policy.h Outdated Show resolved Hide resolved
google/cloud/internal/backoff_policy.h Outdated Show resolved Hide resolved
google/cloud/internal/backoff_policy.cc Outdated Show resolved Hide resolved
Copy link
Member

@dbolduc dbolduc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewable status: 0 of 25 files reviewed, 41 unresolved discussions (waiting on @alevenberg, @coryan, and @devbww)

@alevenberg alevenberg merged commit 7784ba5 into googleapis:main May 26, 2023
@alevenberg alevenberg deleted the bug-8755 branch May 26, 2023 20:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants