-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: add basic retry middleware #283
Conversation
i am planning to add another commit that adds some unit tests, but wanted to get this up for review since I believe the bulk of the work is done. |
3d5ef7c
to
3e10ed2
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
This commit adds an initial implementation of a retry middleware, along with fleshing out the FixedCountRetryStrategy for compatibility with the new middleware. As of this commit, any Get/Set requests that fail with an Unavailable or Internal error gRPC status code will be retried up to 3 attempts. There is no backoff or jitter yet; we should implement that in the future. The determination of which gRPC status codes and request types are eligible for retries is implemented via a new IRetryEligibilityStrategy interface. This way, individual retry strategies have the ability to override the behavior, but in the 90% case they can just re-use the default eligibility strategy. NOTE: we implement retries as a bespoke middleware rather than using the retry configuration that is built into the .NET gRPC library because the built-in implementation has some strict restrictions that aren't compatible with our goals here (e.g. no response that includes any headers is eligible for retry in the built-in implementation, and we always get headers back from envoy.)
3ef2590
to
efa1c92
Compare
rebased |
{ | ||
_logger.LogDebug($"Determining whether request is eligible for retry; status code: {grpcStatus.StatusCode}, request type: {grpcRequest.GetType()}, attemptNumber: {attemptNumber}, maxAttempts: {MaxAttempts}"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here and elsewhere I notice you're charging the users' runtimes string interpolation allocations unconditionally. I think you should rewrite all these LogDebugs like:
_logger.LogDebug($"Determining whether request is eligible for retry; status code: {grpcStatus.StatusCode}, request type: {grpcRequest.GetType()}, attemptNumber: {attemptNumber}, maxAttempts: {MaxAttempts}"); | |
_logger.LogDebug( | |
"Determining whether request is eligible for retry; status code: {StatusCode}, request type: {RequestType}, attemptNumber: {AttemptNumber}, maxAttempts: {MaxAttempts}", | |
grpcStatus.StatusCode, | |
grpcRequest.GetType(), | |
attemptNumber, | |
MaxAttempts | |
); |
see for more examples and detail: https://learn.microsoft.com/en-us/dotnet/api/microsoft.extensions.logging.loggerextensions.logdebug?view=dotnet-plat-ext-6.0
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good call, agreed, it was my intent to write them that way but I always forget. thx
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
kotlin-structured-logging is an addictive library. It's so nice to not have to care about anything w.r.t. logging invocation cost inside of those logging blocks. I catch myself writing logs like this too so I look out for them
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i hope we will have a rust-structured-logging (or maybe one already exists in the rust universe)
Func<TRequest, CallOptions, Task<MiddlewareResponseState<TResponse>>> continuation | ||
) where TRequest : class where TResponse : class | ||
{ | ||
var foo = request.GetType(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this should be removed
attemptNumber++; | ||
nextState = await continuation(request, callOptions); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure I see how this will disambiguate between the likes of ListPushBack() or DictionaryIncrement() (non-idempotent methods) and Get() or Set() (idempotent methods). What's your plan there?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See the code in DefaultRetryEligibilityStrategy
, which is called as part of the FixedCountRetryStrategy
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I completely missed _retryableRequestTypes
, thanks.
This commit adds an initial implementation of a retry middleware, along with fleshing out the FixedCountRetryStrategy for compatibility with the new middleware. As of this commit, any Get/Set requests that fail with an Unavailable or Internal error gRPC status code will be retried up to 3 attempts.
There is no backoff or jitter yet; we should implement that in the future.
The determination of which gRPC status codes and request types are eligible for retries is implemented via a new IRetryEligibilityStrategy interface. This way, individual retry strategies have the ability to override the behavior, but in the 90% case they can just re-use the default eligibility strategy.
NOTE: we implement retries as a bespoke middleware rather than using the retry configuration that is built into the .NET gRPC library because the built-in implementation has some strict restrictions that aren't compatible with our goals here (e.g. no response that includes any headers is eligible for retry in the built-in implementation, and we always get headers back from envoy.)