*.Retry.BackoffFunc #1160

thomaslee · 2018-08-29T19:13:39Z

This is just a proof-of-concept to convey the idea, I intend to flesh things out based on feedback if this seems like something that would be likely to land.

The Retry.Backoff config settings as they currently stand can force users to make some hard choices. Take Config.Producer.Retry.Backoff for example: using the default configuration, a 300ms+ hiccup in the cluster could lead to data loss without careful error handling in user code. Say you want to survive up to a 10s hiccup without data loss -- you're left with a few bad options:

You increase Producer.Retry.Backoff to 4-5s+ and live with the occasional large latency spike. (This includes things like partition leadership changes.); or
You increase Producer.Retry.Max to 100 and keep Producer.Retry.Backoff in the low 100ms and live with the risk of increased duplicates.
You fiddle endlessly with these numbers trying to find the right balance between "do not lose data" and "do not be slow".

This change offers a better middle ground: compute the backoff using an (optional) function. This opens the door for more sophisticated strategies (e.g. exponential backoff) without breaking existing code.

Seems like if we do this for producers, we might as well do it for Metadata, Consumer, etc. too.

I see exponential backoff was discussed over in #782 but I don't think the suggested patch ever materialized. So thought I'd have a shot.

eapache · 2018-08-29T20:11:55Z

I had usage of https://github.com/eapache/go-resiliency/tree/master/retrier listed on https://github.com/Shopify/sarama/wiki/Ideas-that-will-break-backwards-compatibility but I suppose we could make something optional like this.

thomaslee · 2018-08-29T20:27:27Z

Yeah, the actual retry logic already exists so seems less risky to just give users a little flexibility wrt how the retry itself is calculated. Maybe retrier would make more sense when you're keen to make more sweeping changes to the innards of AsyncProducer et al?

FWIW the specific case we're running into is on some very low volume topics, we run afoul of connections.max.idle.ms. We get an EOF, then the retry logic kicks in and puts us to sleep for several seconds (we went with "option 1" above), so the first send after a long hiatus is super, duper slow. Other ephemeral stuff like partition leadership changes are just as likely to bite us, though. I'd be happy with anything that give us a little more flexibility to work around the smaller, expected hiccups.

thomaslee · 2018-08-30T00:08:48Z

Cleaned up the change a bit & added a test. I'll hook up similar changes for Metadata.Retry.Backoff & Consumer.Retry.Backoff tomorrow if I don't hear any complaints between now and then.

thomaslee · 2018-08-30T20:27:48Z

Addeed Config.{Consumer,Metadata}.Retry.BackoffFunc too & tests for all three. The only thing that feels a bit iffy here is the BackoffFunc for consumers has no upper bound. I think that's okay, but sort of a shame they don't all have the same signature.

Holler out if anything else looks suspicious.

thomaslee · 2018-09-13T21:49:58Z

Just giving this a bump -- any thoughts?

varun06 · 2019-02-13T17:37:48Z

This is something that I was researching today, let's bring it back and merge if things look okay.

sam-obeid

Looks good overall, and I see how this is useful! As long as we keep it optional as mentioned by @eapache, this is good to merge!

Add Config.Producer.Retry.BackoffFunc

7d903c1

thomaslee force-pushed the tom_backoff_func branch from 15f1568 to 7d903c1 Compare August 29, 2018 23:50

Tom Lee added 2 commits August 30, 2018 10:57

Add Config.Consumer.Retry.BackoffFunc

d6f3bc3

Add Config.Metadata.Retry.BackoffFunc

a047cd7

sam-obeid approved these changes Feb 15, 2019

View reviewed changes

varun06 merged commit 6a7bac8 into IBM:master Feb 15, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

*.Retry.BackoffFunc #1160

*.Retry.BackoffFunc #1160

thomaslee commented Aug 29, 2018

eapache commented Aug 29, 2018

thomaslee commented Aug 29, 2018

thomaslee commented Aug 30, 2018

thomaslee commented Aug 30, 2018

thomaslee commented Sep 13, 2018

varun06 commented Feb 13, 2019

sam-obeid left a comment

*.Retry.BackoffFunc #1160

*.Retry.BackoffFunc #1160

Conversation

thomaslee commented Aug 29, 2018

eapache commented Aug 29, 2018

thomaslee commented Aug 29, 2018

thomaslee commented Aug 30, 2018

thomaslee commented Aug 30, 2018

thomaslee commented Sep 13, 2018

varun06 commented Feb 13, 2019

sam-obeid left a comment

Choose a reason for hiding this comment