Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Retry only on real network errors #1549

Merged
merged 1 commit into from
Jun 19, 2017

Conversation

m3co-code
Copy link
Contributor

Now retries only happen when actual network errors occur and not only
anymore based on the HTTP status code. This is because the backend could
also send this status codes as their normal interface and in that case
we don't want to retry.

This PR should solve the issue 1104

@ldez ldez added the kind/enhancement a new or improved feature. label May 4, 2017
Copy link
Contributor

@timoreimann timoreimann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left some comments. :-)

@@ -42,10 +43,16 @@ func (retry *Retry) ServeHTTP(rw http.ResponseWriter, r *http.Request) {
}
attempts := 1
for {
netErrorOccured := false
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo: occurred

In the comment as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, fixed.

@@ -42,10 +43,16 @@ func (retry *Retry) ServeHTTP(rw http.ResponseWriter, r *http.Request) {
}
attempts := 1
for {
netErrorOccured := false
// We pass in a pointer to netErrorOccured so that we can set it to true on network errors
// that happen when proxying the http requests to the backends.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe mention that it's a custom error handler which will set the flag?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I extended the comment.

if !isNetworkError(recorder.Code) || attempts >= retry.attempts {

retry.next.ServeHTTP(recorder, r.WithContext(newCtx))
if netErrorOccured == false || attempts >= retry.attempts {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Simplify: !netErrorOccurred

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, done.

return status == http.StatusBadGateway || status == http.StatusGatewayTimeout
}
// NetErrorCtxKey is a custom type that is used as key for the request context.
// Using an own type we avoid potential collisions with other keys that may have the same string name.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, didn't know that was possible! 🎉

Copy link
Contributor Author

@m3co-code m3co-code May 5, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually its even a best practice to not export the type if it is not necessary, to have real safety. Though I decided to have the error handler in the server package still, as for me it makes more sense there, which requires to export the type.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I refactored the implementation now, so that the knowledge about how to interact with the context completely to the retry middleware and therefore even can use a private type.

httpHandler = NewRetry(tc.retryCount, httpHandler)

recorder := httptest.NewRecorder()
req, err := http.NewRequest("GET", "http://localhost:3000/ok", ioutil.NopCloser(nil))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you could s/"GET"/http.MethodGet/ here.

In some other parts of the PR too.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, fixed at all places.

func recordNetError(req *http.Request) {
val := req.Context().Value(middlewares.DefaultNetErrCtxKey)

if netErrorOccured, ok := val.(*bool); ok {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this explode if val is nil (i.e., the value does not exist), or will Go handle this fine with ok being false?

Copy link
Contributor Author

@m3co-code m3co-code May 5, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ctx.Value() guarantees a return value of type interface{}, so we can do a type assertion without danger.


for _, failAtCall := range handler.failAtCalls {
if handler.callNumber == failAtCall {
val := r.Context().Value(DefaultNetErrCtxKey)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about exporting server.recordNetError and calling it here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As said before, I refactored this implementation, so the implementation for recording the network error is now in the middleware package and the tests make more sense.

netErrorOccured := false
req, err := http.NewRequest("GET", "http://localhost:3000/any", nil)
if err != nil {
panic(fmt.Sprintf("NewRequest(): %s", err))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A MustNewRequest helper function might become convenient at this point.

Copy link
Contributor Author

@m3co-code m3co-code May 5, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is an util function in httptest.NewRequest that will internally panic if the creation of the NewRequest fails. I think this would be good to use for the tests.

// RecordingErrorHandler is an error handler, implementing the vulcand/oxy
// error handler interface, which is recording network errors by setting the
// request context value with the key middlewares.DefaultNetErrCtxKey to true.
// This enables the retry middleware to only retry on actual network errors.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMHO we should also document the type of errors it maps to 502 and 504, respectively.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a general comment that it will map to the proper status codes. I fear that being to explicit in the comment will bring up the danger, that the implementation and the comment will drift apart at some time. And when people read the comment, they probably also can read the code.

err: errors.New("any error"),
wantHTTPStatus: http.StatusInternalServerError,
wantNetErrRecorded: false,
},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we missing a test case for err == nil?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, better to also add a test for this. Doing.

}
// NetErrorCtxKey is a custom type that is used as key for the request context.
// Using an own type we avoid potential collisions with other keys that may have the same string name.
type NetErrorCtxKey string
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From my learnings about context I remember that the key should be a custom type as you do here, but it should also be non exported, since it's something that should be package internal.
Obviously this is not straightforward here because the server package needs to signal something to the middleware package.
But instead of exporting low level package stuff, should the retry middlewares package maybe export some functions to deal with this and keep the context key type private?
Looks like you already have some kind of function in the server package called: recordNetError.
This should probably be moved to the middlewares package.
Also recordNetError doesn't need to get passed a request, a context would be sufficient.

Copy link
Contributor

@timoreimann timoreimann May 5, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An internal package might be helpful here too.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the hint, I refactored it now in this way. Now the logic for mutating the context is solely in the retry middleware and therefore we can use a private type 🎉

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@timoreimann I am not sure how this should look like to be honest. Also I am hesitant to change the middleware structure in this PR, as imho it would be separate concern. Anyway I think for a long term it could be beneficial to bring more structure into the middleware package.

@m3co-code
Copy link
Contributor Author

@timoreimann and @gottwald thanks for the review, I worked your feedback into the PR.

}

func TestRecordNetError(t *testing.T) {
boolNetErrorOccurred := false
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A small hint for reviewers, I tried to build this as a table driven test at first, but I was not able to get it working in a straightforward way, as it would require me to specify one type for the initial context value I am passing in. This would have to be an interface{} in my case, but then the type assertion on a *bool in the DefaultRecordNetError does not work anymore.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then let's have three separate test functions so that we can easily see which test failed, can easily rerun them, etc.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, done.

Copy link
Contributor

@timoreimann timoreimann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some smallish things left.


// NetErrorRecorder is an interface to record net errors.
type NetErrorRecorder interface {
// RecordNetError can be used to signal the retry middleware, that an network error happened
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No comma before that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, done.

}

func TestRecordNetError(t *testing.T) {
boolNetErrorOccurred := false
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then let's have three separate test functions so that we can easily see which test failed, can easily rerun them, etc.

@@ -42,10 +43,16 @@ func (retry *Retry) ServeHTTP(rw http.ResponseWriter, r *http.Request) {
}
attempts := 1
for {
netErrorOccurred := false
// We pass in a pointer to netErrorOccurred so that we can set it to true on network errors
// when proxying the http requests to the backends. This happens in the custom RecordingErrorHandler.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/http/HTTP/

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, done.


// RecordingErrorHandler is an error handler, implementing the vulcand/oxy
// error handler interface, which is recording network errors by using the netErrorRecorder.
// In addition it sets a proper http status code and body, depending on the type of error occurred.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/http/HTTP/

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, done.

@m3co-code
Copy link
Contributor Author

Ok, worked in feedback again.

Copy link
Contributor

@timoreimann timoreimann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. ☺️

statusCode := http.StatusInternalServerError

if e, ok := err.(net.Error); ok {
eh.netErrorRecorder.RecordNetError(req.Context())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit-pick: This stutters a bit, maybe call the method Record instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for pointing this out, changed it.

@m3co-code m3co-code force-pushed the 1104-retry-behaviour branch 2 times, most recently from 6b413f7 to 5a0d6ee Compare May 12, 2017 08:00
@m3co-code
Copy link
Contributor Author

@emilevauge or @vdemeester can you have a look?

@ldez
Copy link
Contributor

ldez commented May 15, 2017

Due to SemaphoreCI, I close and reopen the PR.

@ldez ldez closed this May 15, 2017
@ldez ldez reopened this May 15, 2017
Copy link
Member

@emilevauge emilevauge left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @marco-jantke, thanks for your contribution :)
I made few comments, but all in all, seems good !

}

// NetErrorRecorderFunc is an util to be able to implement the NetErrorRecorder without the need of a struct.
type NetErrorRecorderFunc func(ctx context.Context)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the need of doing this ? Why don't you want to use a struct here ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The implementation has no need of any field. I originally had a struct here without fields, but got a hint in a previous review here that a function would suffice. Therefore I added this type in order to able to implement the NetErrorRecorder interface on a function.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This said I am fine with both implementations. An empty struct is a bit less code, but in some way unnecessary. As you prefer :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If that's OK for you, I would prefer an empty struct, which is easier to read, and with the same result :)

Copy link
Contributor Author

@m3co-code m3co-code May 19, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No problem, I changed it to a struct now. See 6e291db. Though to add good documentation to the struct though, its a bit like Captain Obvious, but adhering to the linting :)

func DefaultRecordNetError(ctx context.Context) {
val := ctx.Value(defaultNetErrCtxKey)

if netErrorOccurred, ok := val.(*bool); ok {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not easy to read.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I renamed ok now to isBoolPointer. Not sure how to improve readability in another way.


"github.com/containous/traefik/middlewares"
)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You will probably get issues with websockets here.
You should add

var (
	_ Stateful = &RecordingErrorHandler{}
)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm it has none of the capabilities the Stateful interface expects. Maybe the naming is rather not perfect then? It is not an extension of ResponseWriter or whatsoever, its an implementation of the oxy ErrorHandler interface. Its called Recording because it calls the NetErrorRecorder which in our case will manipulate the request context. So I think this is unnecessary here. Any suggestion for naming improvements?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right, successfully tested your code with websocket 👍

@m3co-code
Copy link
Contributor Author

@emilevauge please have a look again, incorporated/answered your questions.

} else {
statusCode = http.StatusBadGateway
}
} else if err == io.EOF {
Copy link
Member

@emilevauge emilevauge May 19, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reading this part again, not sure this is the right way to go. We could get an EOF error not due to network right ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

About this one I am not 100% sure whether its right to retry here. I am currently trying to investigate use-cases when an io.EOF can happen. For sure it can be due to a network connection problem, but also when the server closes the connection by intend. Maybe the server would do so because of too much load and we should rather try another one? Anyway I think there is no way to vary in between those cases and thats why in general I would rather also retry those request.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@emilevauge WDYT?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@timoreimann any thoughts on this ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@emilevauge @marco-jantke the net package does distinguish between an an ErrUnexpectedEOF and EOF state. Unfortunately, it's not used by net/http.

Even if we considered this to be changeworthy, IMHO we should not do so along this PR which maintains the current oxy error handler behavior except for the response code logic. We can have a separate issue to drive a potential improvement in this focused regard.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@marco-jantke I agree with @timoreimann on this, we should deal with this issue in another PR. The fact that it is not used in the default go http client leads to discussions :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, so we handled all the questions/feedback in this PR I think :)

@m3co-code
Copy link
Contributor Author

@timoreimann @emilevauge you are ok with me doing a rebase already to resolve the conflicts?

@timoreimann
Copy link
Contributor

@marco-jantke I think it's okay to do that now.

@m3co-code m3co-code force-pushed the 1104-retry-behaviour branch 2 times, most recently from fd565f7 to 70df8ab Compare May 29, 2017 13:34
@m3co-code
Copy link
Contributor Author

Ok, everything should be ready now.

@m3co-code
Copy link
Contributor Author

@emilevauge ping :)

@ldez
Copy link
Contributor

ldez commented Jun 7, 2017

@marco-jantke We browse all PRs regularly, you don't need to ping someone.

@m3co-code m3co-code force-pushed the 1104-retry-behaviour branch 4 times, most recently from 8154f75 to 3b3b046 Compare June 13, 2017 13:14
Copy link
Member

@emilevauge emilevauge left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @marco-jantke !
LGTM

Now retries only happen when actual network errors occur and not only
anymore based on the HTTP status code. This is because the backend could
also send this status codes as their normal interface and in that case
we don't want to retry.
@ldez ldez merged commit f79317a into traefik:master Jun 19, 2017
@ldez ldez mentioned this pull request Jun 19, 2017
@m3co-code m3co-code deleted the 1104-retry-behaviour branch June 20, 2017 06:47
@ldez ldez modified the milestone: 1.4 Jun 20, 2017
@ldez ldez changed the title retry only on real network errors Retry only on real network errors Aug 14, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants