Set a NopCloser request body with retry middleware #1016

bamarni · 2017-01-04T11:58:19Z

SantoDE · 2017-01-12T07:32:41Z

first of all, thanks for your contribution 👍

I think it's fine for the moment. If we encounter an issue, where we encounter them send by upstream, we can still iterate over it.

Could you please rebase your branch with the current master?

LGTM 👼

/cc @containous/traefik

bamarni · 2017-01-12T07:57:05Z

Hi @SantoDE , you're welcome 😸

Maybe a cleaner solution would be to pass an explicit error handler to the oxy forwarder, which instead of passing the dial error in the http response would use the request context for example, then we would only retry for this specific error, what do you think?

SantoDE · 2017-01-12T08:26:20Z

Hi @bamarni ,

that would probably be even better! The oxy forwarder already supports passing it an error handler so why not? :-)

bamarni · 2017-01-12T08:55:09Z

Agreed, I'll check this out soon.

bamarni · 2017-01-14T12:36:44Z

@SantoDE : I've updated the PR and splitted it in 2 commits, please check their message description. The first one is about fixing #1008, the second is changing the retry logic to only catch network errors.

SantoDE · 2017-01-16T22:38:23Z

Hey @bamarni ,

sadly, you have to rebase again :(

For your commits, I'm okay with splitting it in 2 commits, if the fixing commit has a reference to what it fixes in the message :-)

ping @containous/traefik

bamarni · 2017-01-17T10:34:56Z

sure, just rebased

SantoDE · 2017-01-17T13:24:23Z

Thanks. Now it looks really LGTM 👼

/ping @containous/traefik

trecloux · 2017-01-17T17:14:29Z

LGTM 👍

emilevauge · 2017-01-19T16:38:47Z

middlewares/retry.go

 	attempts := 1
 	for {
 		recorder := NewRecorder()
 		recorder.responseWriter = rw
 		retry.next.ServeHTTP(recorder, r)
-		if !isNetworkError(recorder.Code) || attempts >= retry.attempts {
+		if recorder.Error == nil || attempts >= retry.attempts {


@bamarni are you sure you only have network errors in there? Maybe I'm wrong but I don't see the filter anymore (even in the ErrorHandler).

Thanks for double checking. The previous filter was looking at response status as it's how the default oxy error handler sets it, but my error handler would pass the error as a new field in the recorder struct. If this field is not nil then it comes from the transport RoundTrip call, meaning it is a network error.

However what I'm not sure about is whether the next attempt would be properly forwarded to the next backend and not the same one, I guess this depends on the load balancer configuration? If it's a sticky session for instance retrying wouldn't really make sense.

If this field is not nil then it comes from the transport RoundTrip call, meaning it is a network error.

I'm really not sure about that. In oxy, err is not null at multiple places:

https://github.com/containous/oxy/blob/master/forward/fwd.go#L164

https://github.com/containous/oxy/blob/master/forward/fwd.go#L198

https://github.com/containous/oxy/blob/master/forward/fwd.go#L265

https://github.com/containous/oxy/blob/master/forward/fwd.go#L278

https://github.com/containous/oxy/blob/master/forward/fwd.go#L288

Would you consider these as network errors too? At least from my pov, if the forwarder doesn't manage to send a request or get a proper response from upstream for whatever reason I'd see it as a network error, as opposed to a valid http response from upstream with a 5XX status code, which would currently be retried.

Shall I stick to the first commit and open an other issue to discuss about the exact behaviour we want for the retry feature?

Would you consider these as network errors too?

Not all these errors are network errors. For example https://github.com/containous/oxy/blob/master/forward/fwd.go#L278.

Therefore, I think it would be better to revert to the old test if !isNetworkError(recorder.Code) || attempts >= retry.attempts { :)

I've reverted it, however please note that it wouldn't change the behaviour for errors you're mentioning, they'd still lead to retries.

The errors I was tackling with this change are instead explicit http errors from upstream, to compare this according to proxy_next_upstream semantics, currently the behaviour is :

error timeout http_502 http_504 non_idempotent

And with my changes it would be :

error timeout non_idempotent

The motivation behind this is that imo http_5XX + non_idempotent is a bit aggressive for a default. If upstream sends http 504 for a POST request it still might have altered a resource but didn't manage to serve the response in a timely manner, and retrying might not be the best option, at least hard to assume. Maybe even non_idempotent shouldn't be by default? Or the retry behaviour should be configurable?

Russell-IO

LGTM 👍

bamarni · 2017-02-02T14:02:45Z

It seems like there are random failures in the build 😿 For example mine failed on TestAccessLog, which contains this code : https://github.com/containous/traefik/blob/009057cb87db4a0991909c9c813c0819a3d63e18/integration/access_log_test.go#L37

There are multiple places with this kind of random wait period in the test suite, which are subject to failures. How about introducing generic helpers, for example utils.WaitTcp(addr, timeout) instead of hardcoding a time period?

emilevauge · 2017-02-02T15:13:59Z

@bamarni
Yeah we know... We will refactor tests in the next release.

bamarni · 2017-02-02T16:11:49Z

@emilevauge : cool, let me know if you need extra help. As per this PR I've rebased it.

As the http client always closes the request body, this makes sure the request can be retried if needed. Fixes traefik#1008

bamarni mentioned this pull request Jan 4, 2017

Traefik failing on POST request #1008

Closed

emilevauge added status/1-needs-design-review status/2-needs-review labels Jan 11, 2017

SantoDE added contributor/needs-resolve-conflicts need-review labels Jan 12, 2017

SantoDE approved these changes Jan 12, 2017

View reviewed changes

bamarni force-pushed the issue-1008 branch from b4d55c6 to 7ddd8c6 Compare January 14, 2017 12:33

bamarni changed the title ~~use a seekable body in retry middleware~~ update retry middleware logic Jan 14, 2017

bamarni force-pushed the issue-1008 branch 3 times, most recently from 5e53ade to 03c025c Compare January 14, 2017 12:46

bamarni force-pushed the issue-1008 branch from 03c025c to e7908ab Compare January 17, 2017 10:10

emilevauge reviewed Jan 19, 2017

View reviewed changes

bamarni force-pushed the issue-1008 branch from e7908ab to 58d6146 Compare January 27, 2017 09:59

Russell-IO approved these changes Jan 30, 2017

View reviewed changes

bamarni force-pushed the issue-1008 branch from 58d6146 to 60f94a4 Compare January 31, 2017 17:58

emilevauge added status/3-needs-merge and removed contributor/needs-resolve-conflicts need-review labels Feb 2, 2017

emilevauge removed status/1-needs-design-review status/2-needs-review labels Feb 2, 2017

bamarni force-pushed the issue-1008 branch from 60f94a4 to ecabf0b Compare February 2, 2017 16:10

Set a NopCloser request body with retry middleware

86fd5b4

As the http client always closes the request body, this makes sure the request can be retried if needed. Fixes traefik#1008

emilevauge force-pushed the issue-1008 branch from ecabf0b to 86fd5b4 Compare February 2, 2017 16:24

bamarni mentioned this pull request Feb 2, 2017

retry behaviour #1104

Closed

bamarni changed the title ~~update retry middleware logic~~ Set a NopCloser request body with retry middleware Feb 2, 2017

emilevauge merged commit d0e2349 into traefik:master Feb 2, 2017

ldez removed the status/3-needs-merge label May 22, 2017

ldez added this to the 1.2 milestone Oct 1, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Set a NopCloser request body with retry middleware #1016

Set a NopCloser request body with retry middleware #1016

bamarni commented Jan 4, 2017 •

edited

Loading

SantoDE commented Jan 12, 2017

bamarni commented Jan 12, 2017 •

edited

Loading

SantoDE commented Jan 12, 2017

bamarni commented Jan 12, 2017

bamarni commented Jan 14, 2017

SantoDE commented Jan 16, 2017

bamarni commented Jan 17, 2017

SantoDE commented Jan 17, 2017

trecloux commented Jan 17, 2017

emilevauge Jan 19, 2017 •

edited

Loading

bamarni Jan 19, 2017

emilevauge Jan 26, 2017

bamarni Jan 26, 2017

emilevauge Jan 26, 2017

bamarni Jan 27, 2017

Russell-IO left a comment

bamarni commented Feb 2, 2017

emilevauge commented Feb 2, 2017

bamarni commented Feb 2, 2017

Set a NopCloser request body with retry middleware #1016

Set a NopCloser request body with retry middleware #1016

Conversation

bamarni commented Jan 4, 2017 • edited Loading

SantoDE commented Jan 12, 2017

bamarni commented Jan 12, 2017 • edited Loading

SantoDE commented Jan 12, 2017

bamarni commented Jan 12, 2017

bamarni commented Jan 14, 2017

SantoDE commented Jan 16, 2017

bamarni commented Jan 17, 2017

SantoDE commented Jan 17, 2017

trecloux commented Jan 17, 2017

emilevauge Jan 19, 2017 • edited Loading

Choose a reason for hiding this comment

bamarni Jan 19, 2017

Choose a reason for hiding this comment

emilevauge Jan 26, 2017

Choose a reason for hiding this comment

bamarni Jan 26, 2017

Choose a reason for hiding this comment

emilevauge Jan 26, 2017

Choose a reason for hiding this comment

bamarni Jan 27, 2017

Choose a reason for hiding this comment

Russell-IO left a comment

Choose a reason for hiding this comment

bamarni commented Feb 2, 2017

emilevauge commented Feb 2, 2017

bamarni commented Feb 2, 2017

bamarni commented Jan 4, 2017 •

edited

Loading

bamarni commented Jan 12, 2017 •

edited

Loading

emilevauge Jan 19, 2017 •

edited

Loading