Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

retry: Default MaxBackoff is unintuitive #140

Open
ryanmcnamara opened this issue Jan 16, 2019 · 10 comments
Open

retry: Default MaxBackoff is unintuitive #140

ryanmcnamara opened this issue Jan 16, 2019 · 10 comments

Comments

@ryanmcnamara
Copy link

ryanmcnamara commented Jan 16, 2019

Issue

I would expect the following

func Test(t *testing.T) {
	err:=retry.Do(
		context.Background(),
		func() error {
			fmt.Println(time.Now())
			return errors.New("some errors")
		},
		retry.WithInitialBackoff(10*time.Second), retry.WithMultiplier(1), retry.WithMaxAttempts(5),
	)
	require.NoError(t,err)
}

To output one log line every 10 seconds. However what this actually outputs is

2019-01-16 14:51:16.025567248 -0800 PST m=+0.011095106
2019-01-16 14:51:18.092852159 -0800 PST m=+2.078377531
2019-01-16 14:51:20.357685855 -0800 PST m=+4.343208502
2019-01-16 14:51:22.459165577 -0800 PST m=+6.444685696
2019-01-16 14:51:24.425426817 -0800 PST m=+8.410944570

The reason for this output is the default for MaxBackoff is as follows:

// WithMaxBackoff sets upper limit on backoff duration.
//
// Max backoff of 0 indicates no limit.
//
// If max backoff option is not used, then default value of 2 seconds is used.

While the documentation is good, this caused us to write bugs that affected us in production as I think it's unintuitive.

Proposal

Set the default value of max backoff to 0 instead of 2 seconds, so that the max backoff will be unlimited.

I realize this is a breaking change, but I would believe that most people using this library would expect it to work this way

@ashrayjain
Copy link
Contributor

@ryanmcnamara I don't quite follow the problem above. Maybe you have a typo here? To output one log line every 2 seconds? The output is infact a log line every 2 seconds right?

Could you clarify what you would have expected to happen with the code snippet above?

@ryanmcnamara
Copy link
Author

Fixed my typo
Expected that it outputs a lot every 10 seconds since the initial backoff is set to 10 seconds

@ashrayjain
Copy link
Contributor

Cool, maybe a "fix" for this is instead of making MaxBackoff 0, we set it to the initial backoff value if that is configured?

@nmiyake
Copy link
Contributor

nmiyake commented Jan 17, 2019

Yes, although this is working as documented, agree that it's unintuitive given the starting parameters.

If the default value isn't a concrete value, I feel like 0 (no default limit) makes the most sense.

@ashrayjain if MaxBackoff is set to the initial backoff value by default, then whenever initial back-off is specified then the retry is effectively not exponential, right? Since back-off is defined as min(initialBackoff * pow(multiplier, $retryAttempt), maxBackoff == 0 ? +Inf : maxBackoff) * (1.0 - randomizationFactor + 2 * rand(0, randomizationFactor)), if maxBackoff == initialBackoff, then the retry behavior just becomes "wait for the initial back-off time and then add jitter".

@ashrayjain
Copy link
Contributor

Cool, "0" works then. I really think we should make sure nobody gets burned by this change though.

Can we excavate all places that use retry.Do currently without setting a MaxBackoff and explicitly add a MaxBackoff(2) in those places to preserve behavior?

@nmiyake
Copy link
Contributor

nmiyake commented Jan 17, 2019

Yeah it's hard to say for sure because Mateusz isn't around, but I believe the intent of the default value for max backoff is that most "simple" calls probably don't want to back off for more than 2 seconds at the upper bound. However, this is obviously completely subjective, and does result in very confusing behavior when the initial back-off is set to be above the max.

It sounds like the basic possible paths are:

  • Change default value and attempt to excavate/programmatically convert old calls to maintain compatibility
  • Keep the current behavior, but improve documentation/flag to users that they will likely need to explicitly configure the max back-off if they set an initial back-off
  • Create a v2 version of the library with the new default behavior (we would use the "major version package" approach so that v1 and v2 can coexist in the same codebase)

@ryanmcnamara
Copy link
Author

I would much prefer making the break to the api (first option @nmiyake suggested above). I'm not that worried about excavating the changes, because I believe that all usages of this that don't explicitly set MaxBackoff intended for it to be zero (no limit).

@ryanmcnamara
Copy link
Author

I think adding a release note and letting consumers audit their own code base on upgrade is sufficient

@ashrayjain
Copy link
Contributor

my vote is for approach 3 which is the cleanest and imo, the "right" way to handle this. We can choose to deprecate v1 and remove it entirely after some time if we'd like.

or we can make it a compile break to force people to explicitly pick a value somehow.

@ryanmcnamara all usages of this that don't explicitly set MaxBackoff intended for it to be zero (no limit). is not true in at least a few places that I know of. It doesn't really feel right to hide such a break in release notes which are arguably easy to miss when upgrading (do you really read the all the release nodes for every single dependency bump, especially when not even upgrading past major versions?)

@bmoylan
Copy link
Contributor

bmoylan commented Jun 9, 2020

@willdeuschle submitted #186 before either of us saw this ticket. Would folks be ok with that change (using initialBackoff as the max if it's larger than maxBackoff)?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants