Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add the ability to retry on reset connection to service-routers #12890

Merged
merged 10 commits into from
Oct 5, 2022

Conversation

aoskotsky-amplify
Copy link
Contributor

@aoskotsky-amplify aoskotsky-amplify commented Apr 29, 2022

Description

This adds a RetryOnReset RetryOn option to service-router configs.

My apps are getting an error intermittently when using consul connect: upstream connect error or disconnect/reset before headers. reset reason: connection termination. Envoy has an option to automatically retry in case of this kind of error. Consul does not support this option at the moment so I'm implementing it here.

Testing & Reproduction steps

Create a service-router with the RetryOn option like below:

{
    "Kind": "service-router",
    "Name": "test",
    "Routes": [
        {
            "Match": {
                "HTTP": {
                    "PathPrefix": "/"
                }
            },
            "Destination": {
                "NumRetries": 3,
                "RetryOn": ["reset"],
                "Service": "test"
            }
        }
    ]
}

Links

Issue #10274

PR Checklist

  • updated test coverage
  • external facing docs updated
  • not a security concern
  • checklist folder consulted

@hashicorp-cla
Copy link

hashicorp-cla commented Apr 29, 2022

CLA assistant check
All committers have signed the CLA.

@aoskotsky-amplify aoskotsky-amplify requested a review from a team as a code owner April 29, 2022 05:02
@Amier3
Copy link
Contributor

Amier3 commented May 2, 2022

Hey @aoskotsky-amplify

Thanks for the first-time contribution ( hope to see many more 👀 )! Generally we expose envoy-config on a per-case basis, but on first glance this use-case seems pretty solid. Since this wasn't made off an associated github issue, i'll talk this over with the team and let you know tommorow if we'd like to expose this config to everyone.

Then hopefully we'll get this reviewed and merged shortly after 👍

@Amier3
Copy link
Contributor

Amier3 commented May 2, 2022

@aoskotsky-amplify

Blake just let me know that this change came from a conversation on gitter, so you're all good to go 👍 . We'll get this reviewed shortly and thank you again for the contribution!

@aoskotsky-amplify
Copy link
Contributor Author

@aoskotsky-amplify

Blake just let me know that this change came from a conversation on gitter, so you're all good to go 👍 . We'll get this reviewed shortly and thank you again for the contribution!

Awesome. Thanks!

@blake
Copy link
Member

blake commented May 5, 2022

Hi @aoskotsky-amplify,

Thank you for this contribution. I had an opportunity to discuss this PR with the team. Here's the feedback from our review.

Envoy supports a number of retry conditions. At the time of writing, there are ten (10) x-envoy-retry-on conditions and five (5) x-envoy-retry-grpc-on conditions. Instead of exposing condition through their own variables–as is the current approach used in Consul and the one followed by this PR–we think a better solution is to expose retry_on as a top-level array in the service router configuration where users can specify a list of retry conditions.

Would you be willing to modify your PR to accommodate this alternative UX? If so, we can share a few UX options we are considering and get your feedback on them prior to you making any changes to the PR.

@aoskotsky-amplify
Copy link
Contributor Author

Sure, I can modify the PR. What UX options are you considering?

@blake
Copy link
Member

blake commented May 17, 2022

Hi @aoskotsky-amplify, here's the UX we were considering.

It would probably be beneficial to have input validation on the array to limit the retry conditions that can be specified. After looking at Envoy's docs, I believe we'd only want to allow the following conditions to be set:

  • 5xx
  • gateway-error
  • reset
  • connect-failure
  • envoy-ratelimited
  • retriable-4xx
  • refused-stream

gRPC retry conditions

  • cancelled
  • deadline-exceeded
  • internal
  • resource-exhausted
  • unavailable

Special handling for certain conditions

connect-failure

The connect-failure condition would need some special handling since there is currently a dedicated variable for enabling that retry condition, RetryOnConnectFailure.

If RetryOnConnectFailure is set to true, and connect-failure is not already present in retryPolicy.RetryOn, the connect-failure value should be appended to the list of retry conditions.

retriable-status-codes

I don't believe it makes sense to allow retriable-status-codes to be set directly in the the array of retry conditions because it is dependent on retryPolicy.RetriableStatusCodes also being configured. Since Consul does not configure Envoy to allow the x-envoy-retriable-status-codes header to be used to provide this list of codes (this is an alternative method for specifying the retry codes), the only option users have is to specify the list with RetryOnStatusCodes in the service router configuration.

In this case, I think it makes sense for that condition to only be enabled if one or more status codes are specified in RetryOnStatusCodes.

retriable-headers

The retriable-headers is dependent on a list of retriable headers also being configured. Similar to the retriable status codes, the list of headers cannot be specified using the x-envoy-retriable-header-names HTTP header.

Being that Consul does not have a configuration option that allows configuring retryPolicy.RetriableHeaders, I do not think it makes sense to allow this retry condition to be specified in the list.

What are your thoughts on this proposed UX?

@aoskotsky-amplify
Copy link
Contributor Author

Hi @blake, that UX makes sense to me. I'll start working on it.

I have a couple of questions though:

  1. Does Consul support service-router configs with gRPC services? I see the service-router docs say services need to be HTTP. If if it does, do you think the retry conditions should be a separate config option? Would all the options be available under a RetryOn option or would there separately be a GrpcRetryOn?

  2. Do you think in the future there should be a RetryOnHeaders option in a separate PR?

@blake
Copy link
Member

blake commented May 17, 2022

Hi @aoskotsky-amplify,

Service routers do support routing to gRPC services. The beginning of the doc implies under requirements that service routers can only be used with services using the http protocol, but this is not correct. Service routers can also route based on gRPC methods. See https://www.consul.io/docs/connect/config-entries/service-router#grpc-routing for an example.

Do you think in the future there should be a RetryOnHeaders option in a separate PR?

Yes, we would like to support this feature as well. I think it makes sense to add it in a separate PR.

@aoskotsky-amplify
Copy link
Contributor Author

Oh I see. Do you think there should be separate options like RetryOn and GrpcRetryOn?

@github-actions
Copy link

This pull request has been automatically flagged for inactivity because it has not been acted upon in the last 60 days. It will be closed if no new activity occurs in the next 30 days. Please feel free to re-open to resurrect the change if you feel this has happened by mistake. Thank you for your contributions.

@github-actions github-actions bot added the meta/stale Automatically flagged for inactivity by stalebot label Jul 17, 2022
aoskotsky-amplify and others added 4 commits August 14, 2022 20:43
Add a RetryOn option instead of RetryOnReset so that we can support the remaining envoy retry conditions
@aoskotsky-amplify
Copy link
Contributor Author

@blake I updated this PR with your suggestions. Can you give this another review? Thanks

@github-actions github-actions bot removed the meta/stale Automatically flagged for inactivity by stalebot label Aug 17, 2022
@aoskotsky-amplify
Copy link
Contributor Author

aoskotsky-amplify commented Sep 6, 2022

Hey @blake @Amier3 can you please give this PR another review?

@aoskotsky-amplify
Copy link
Contributor Author

@mkeeler could someone from Hashicorp review this PR? Thanks!

Copy link
Member

@mkeeler mkeeler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code looks good to me. In addition to the one request I added a comment for I would also ask that the ServiceRouteDestination type in api/config_entry_discoverychain.go also be updated to reflect the RetryOn field (and a test case added to TestAPI_ConfigEntry_DiscoveryChain in the corresponding test file).

As for your question about whether we should have a RetryOnGRPC separate from the main RetryOn field. My opinion is that since Envoy makes no distinction in the routing policy and can do the right thing, then its best to keep it as 1 field in Consul.

@@ -251,6 +257,26 @@ func isValidHTTPMethod(method string) bool {
}
}

func isValidRetryCondition(retryOn string) bool {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be good to add a test for this validation function in agent/structs/config_entry_discoverychain_test.go to enumerate all the valid retry conditions. It is obvious that it works today but if we refactor things in the future it would be great to know if we break something.

Additionally it would be great if you could add two more cases to TestServiceRouterConfigEntry in that same file. One to ensure that a router with this field configured with correct entries is accepted and one to test the validation error.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated. Let me know if the tests look fine now

Copy link
Contributor

@trujillo-adam trujillo-adam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a clarification to the docs

{
name: 'RetryOn',
type: 'array<string>',
description: `Allows retrying requests for a list of conditions. One of:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
description: `Allows retrying requests for a list of conditions. One of:
description: `Allows Consul to retry requests when the requests meet a specific set of conditions. You can specify one of the following lists:

Is that what this means?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. How about this? Allows Consul to retry requests when they meet one of a set of conditions. The available conditions are:

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's pretty good. We can probably even trim it down a bit more and improve consistency with the style guide:

Allows Consul to retry requests when they meet one of the following sets of conditions:

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated

@aoskotsky-amplify aoskotsky-amplify requested review from trujillo-adam and mkeeler and removed request for mkeeler and trujillo-adam October 4, 2022 05:17
Copy link
Member

@mkeeler mkeeler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pr/no-changelog PR does not need a corresponding .changelog entry
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants