-
Notifications
You must be signed in to change notification settings - Fork 5.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Retry only on real network errors #1549
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left some comments. :-)
middlewares/retry.go
Outdated
@@ -42,10 +43,16 @@ func (retry *Retry) ServeHTTP(rw http.ResponseWriter, r *http.Request) { | |||
} | |||
attempts := 1 | |||
for { | |||
netErrorOccured := false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typo: occurred
In the comment as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, fixed.
middlewares/retry.go
Outdated
@@ -42,10 +43,16 @@ func (retry *Retry) ServeHTTP(rw http.ResponseWriter, r *http.Request) { | |||
} | |||
attempts := 1 | |||
for { | |||
netErrorOccured := false | |||
// We pass in a pointer to netErrorOccured so that we can set it to true on network errors | |||
// that happen when proxying the http requests to the backends. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe mention that it's a custom error handler which will set the flag?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I extended the comment.
middlewares/retry.go
Outdated
if !isNetworkError(recorder.Code) || attempts >= retry.attempts { | ||
|
||
retry.next.ServeHTTP(recorder, r.WithContext(newCtx)) | ||
if netErrorOccured == false || attempts >= retry.attempts { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Simplify: !netErrorOccurred
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, done.
middlewares/retry.go
Outdated
return status == http.StatusBadGateway || status == http.StatusGatewayTimeout | ||
} | ||
// NetErrorCtxKey is a custom type that is used as key for the request context. | ||
// Using an own type we avoid potential collisions with other keys that may have the same string name. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice, didn't know that was possible! 🎉
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually its even a best practice to not export the type if it is not necessary, to have real safety. Though I decided to have the error handler in the server package still, as for me it makes more sense there, which requires to export the type.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I refactored the implementation now, so that the knowledge about how to interact with the context completely to the retry middleware and therefore even can use a private type.
middlewares/retry_test.go
Outdated
httpHandler = NewRetry(tc.retryCount, httpHandler) | ||
|
||
recorder := httptest.NewRecorder() | ||
req, err := http.NewRequest("GET", "http://localhost:3000/ok", ioutil.NopCloser(nil)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you could s/"GET"/http.MethodGet/
here.
In some other parts of the PR too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, fixed at all places.
server/errorhandler.go
Outdated
func recordNetError(req *http.Request) { | ||
val := req.Context().Value(middlewares.DefaultNetErrCtxKey) | ||
|
||
if netErrorOccured, ok := val.(*bool); ok { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this explode if val
is nil
(i.e., the value does not exist), or will Go handle this fine with ok
being false
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The ctx.Value()
guarantees a return value of type interface{}
, so we can do a type assertion without danger.
middlewares/retry_test.go
Outdated
|
||
for _, failAtCall := range handler.failAtCalls { | ||
if handler.callNumber == failAtCall { | ||
val := r.Context().Value(DefaultNetErrCtxKey) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about exporting server.recordNetError
and calling it here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As said before, I refactored this implementation, so the implementation for recording the network error is now in the middleware package and the tests make more sense.
server/errorhandler_test.go
Outdated
netErrorOccured := false | ||
req, err := http.NewRequest("GET", "http://localhost:3000/any", nil) | ||
if err != nil { | ||
panic(fmt.Sprintf("NewRequest(): %s", err)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A MustNewRequest
helper function might become convenient at this point.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is an util function in httptest.NewRequest
that will internally panic if the creation of the NewRequest
fails. I think this would be good to use for the tests.
server/errorhandler.go
Outdated
// RecordingErrorHandler is an error handler, implementing the vulcand/oxy | ||
// error handler interface, which is recording network errors by setting the | ||
// request context value with the key middlewares.DefaultNetErrCtxKey to true. | ||
// This enables the retry middleware to only retry on actual network errors. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMHO we should also document the type of errors it maps to 502 and 504, respectively.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a general comment that it will map to the proper status codes. I fear that being to explicit in the comment will bring up the danger, that the implementation and the comment will drift apart at some time. And when people read the comment, they probably also can read the code.
err: errors.New("any error"), | ||
wantHTTPStatus: http.StatusInternalServerError, | ||
wantNetErrRecorded: false, | ||
}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we missing a test case for err == nil
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, better to also add a test for this. Doing.
middlewares/retry.go
Outdated
} | ||
// NetErrorCtxKey is a custom type that is used as key for the request context. | ||
// Using an own type we avoid potential collisions with other keys that may have the same string name. | ||
type NetErrorCtxKey string |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From my learnings about context I remember that the key should be a custom type as you do here, but it should also be non exported, since it's something that should be package internal.
Obviously this is not straightforward here because the server package needs to signal something to the middleware package.
But instead of exporting low level package stuff, should the retry middlewares package maybe export some functions to deal with this and keep the context key type private?
Looks like you already have some kind of function in the server package called: recordNetError
.
This should probably be moved to the middlewares package.
Also recordNetError
doesn't need to get passed a request, a context would be sufficient.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An internal package might be helpful here too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the hint, I refactored it now in this way. Now the logic for mutating the context is solely in the retry middleware and therefore we can use a private type 🎉
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@timoreimann I am not sure how this should look like to be honest. Also I am hesitant to change the middleware structure in this PR, as imho it would be separate concern. Anyway I think for a long term it could be beneficial to bring more structure into the middleware package.
@timoreimann and @gottwald thanks for the review, I worked your feedback into the PR. |
middlewares/retry_test.go
Outdated
} | ||
|
||
func TestRecordNetError(t *testing.T) { | ||
boolNetErrorOccurred := false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A small hint for reviewers, I tried to build this as a table driven test at first, but I was not able to get it working in a straightforward way, as it would require me to specify one type for the initial context value I am passing in. This would have to be an interface{}
in my case, but then the type assertion on a *bool
in the DefaultRecordNetError
does not work anymore.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then let's have three separate test functions so that we can easily see which test failed, can easily rerun them, etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some smallish things left.
middlewares/retry.go
Outdated
|
||
// NetErrorRecorder is an interface to record net errors. | ||
type NetErrorRecorder interface { | ||
// RecordNetError can be used to signal the retry middleware, that an network error happened |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No comma before that
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, done.
middlewares/retry_test.go
Outdated
} | ||
|
||
func TestRecordNetError(t *testing.T) { | ||
boolNetErrorOccurred := false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then let's have three separate test functions so that we can easily see which test failed, can easily rerun them, etc.
middlewares/retry.go
Outdated
@@ -42,10 +43,16 @@ func (retry *Retry) ServeHTTP(rw http.ResponseWriter, r *http.Request) { | |||
} | |||
attempts := 1 | |||
for { | |||
netErrorOccurred := false | |||
// We pass in a pointer to netErrorOccurred so that we can set it to true on network errors | |||
// when proxying the http requests to the backends. This happens in the custom RecordingErrorHandler. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/http/HTTP/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, done.
server/errorhandler.go
Outdated
|
||
// RecordingErrorHandler is an error handler, implementing the vulcand/oxy | ||
// error handler interface, which is recording network errors by using the netErrorRecorder. | ||
// In addition it sets a proper http status code and body, depending on the type of error occurred. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/http/HTTP/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, done.
Ok, worked in feedback again. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
server/errorhandler.go
Outdated
statusCode := http.StatusInternalServerError | ||
|
||
if e, ok := err.(net.Error); ok { | ||
eh.netErrorRecorder.RecordNetError(req.Context()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit-pick: This stutters a bit, maybe call the method Record
instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for pointing this out, changed it.
6b413f7
to
5a0d6ee
Compare
@emilevauge or @vdemeester can you have a look? |
Due to SemaphoreCI, I close and reopen the PR. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @marco-jantke, thanks for your contribution :)
I made few comments, but all in all, seems good !
middlewares/retry.go
Outdated
} | ||
|
||
// NetErrorRecorderFunc is an util to be able to implement the NetErrorRecorder without the need of a struct. | ||
type NetErrorRecorderFunc func(ctx context.Context) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the need of doing this ? Why don't you want to use a struct
here ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The implementation has no need of any field. I originally had a struct
here without fields, but got a hint in a previous review here that a function would suffice. Therefore I added this type in order to able to implement the NetErrorRecorder
interface on a function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This said I am fine with both implementations. An empty struct
is a bit less code, but in some way unnecessary. As you prefer :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If that's OK for you, I would prefer an empty struct, which is easier to read, and with the same result :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No problem, I changed it to a struct now. See 6e291db. Though to add good documentation to the struct though, its a bit like Captain Obvious, but adhering to the linting :)
middlewares/retry.go
Outdated
func DefaultRecordNetError(ctx context.Context) { | ||
val := ctx.Value(defaultNetErrCtxKey) | ||
|
||
if netErrorOccurred, ok := val.(*bool); ok { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not easy to read.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I renamed ok
now to isBoolPointer
. Not sure how to improve readability in another way.
|
||
"github.com/containous/traefik/middlewares" | ||
) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You will probably get issues with websockets here.
You should add
var (
_ Stateful = &RecordingErrorHandler{}
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm it has none of the capabilities the Stateful
interface expects. Maybe the naming is rather not perfect then? It is not an extension of ResponseWriter
or whatsoever, its an implementation of the oxy ErrorHandler
interface. Its called Recording
because it calls the NetErrorRecorder
which in our case will manipulate the request context. So I think this is unnecessary here. Any suggestion for naming improvements?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right, successfully tested your code with websocket 👍
@emilevauge please have a look again, incorporated/answered your questions. |
} else { | ||
statusCode = http.StatusBadGateway | ||
} | ||
} else if err == io.EOF { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reading this part again, not sure this is the right way to go. We could get an EOF error not due to network right ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
About this one I am not 100% sure whether its right to retry here. I am currently trying to investigate use-cases when an io.EOF can happen. For sure it can be due to a network connection problem, but also when the server closes the connection by intend. Maybe the server would do so because of too much load and we should rather try another one? Anyway I think there is no way to vary in between those cases and thats why in general I would rather also retry those request.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@emilevauge WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@timoreimann any thoughts on this ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@emilevauge @marco-jantke the net package does distinguish between an an ErrUnexpectedEOF
and EOF
state. Unfortunately, it's not used by net/http.
Even if we considered this to be changeworthy, IMHO we should not do so along this PR which maintains the current oxy error handler behavior except for the response code logic. We can have a separate issue to drive a potential improvement in this focused regard.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@marco-jantke I agree with @timoreimann on this, we should deal with this issue in another PR. The fact that it is not used in the default go http client leads to discussions :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, so we handled all the questions/feedback in this PR I think :)
@timoreimann @emilevauge you are ok with me doing a rebase already to resolve the conflicts? |
@marco-jantke I think it's okay to do that now. |
fd565f7
to
70df8ab
Compare
Ok, everything should be ready now. |
@emilevauge ping :) |
70df8ab
to
1d0b4ba
Compare
@marco-jantke We browse all PRs regularly, you don't need to ping someone. |
8154f75
to
3b3b046
Compare
3b3b046
to
a04d509
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @marco-jantke !
LGTM
Now retries only happen when actual network errors occur and not only anymore based on the HTTP status code. This is because the backend could also send this status codes as their normal interface and in that case we don't want to retry.
a04d509
to
9f17ce1
Compare
Now retries only happen when actual network errors occur and not only
anymore based on the HTTP status code. This is because the backend could
also send this status codes as their normal interface and in that case
we don't want to retry.
This PR should solve the issue 1104