Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gep: add GEP-1731 Configurable Retries #3199

Merged
merged 31 commits into from
Aug 2, 2024

Conversation

mikemorris
Copy link
Contributor

What type of PR is this?
/kind gep

What this PR does / why we need it:
Proposes configuration within HTTPRoute to retry unsuccessful requests to backends before sending a response to a client.

Which issue(s) this PR fixes:
Fixes #1731

Does this PR introduce a user-facing change?:

Adds `retry` stanza within HTTPRouteRule to configure retrying unsuccessful requests to backends before sending a response to a client.

@k8s-ci-robot
Copy link
Contributor

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@k8s-ci-robot k8s-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. release-note Denotes a PR that will be considered when it comes time to generate release notes. kind/gep PRs related to Gateway Enhancement Proposal(GEP) cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. labels Jul 16, 2024
@k8s-ci-robot k8s-ci-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Jul 16, 2024
@mikemorris mikemorris changed the title gep: add GEP-1731 Configurable Retriest gep: add GEP-1731 Configurable Retries Jul 16, 2024
geps/gep-1731/index.md Outdated Show resolved Hide resolved
@k8s-ci-robot k8s-ci-robot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Jul 16, 2024
mkdocs.yml Outdated Show resolved Hide resolved
geps/gep-1731/index.md Outdated Show resolved Hide resolved
geps/gep-1731/index.md Outdated Show resolved Hide resolved
@mikemorris mikemorris marked this pull request as ready for review July 23, 2024 01:22
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jul 23, 2024
Copy link
Contributor

@youngnick youngnick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM if some of the pending TODOs are finished up. I really, really like the background detail here, that is a great (if disheartening) read.

/approve

but we definitely need some more eyes here for further LGTM.

geps/gep-1731/index.md Outdated Show resolved Hide resolved
geps/gep-1731/index.md Outdated Show resolved Hide resolved
mkdocs.yml Outdated Show resolved Hide resolved
@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jul 23, 2024
Copy link
Member

@robscott robscott left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nicely done @mikemorris! Very thorough and well written GEP. Seems like the most complex part will be having some portability around what status codes can be retried, left some comments/ideas around that.

geps/gep-1731/index.md Outdated Show resolved Hide resolved
geps/gep-1731/index.md Outdated Show resolved Hide resolved
geps/gep-1731/index.md Outdated Show resolved Hide resolved
@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jul 30, 2024
geps/gep-1731/index.md Outdated Show resolved Hide resolved
@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Aug 1, 2024
geps/gep-1731/index.md Outdated Show resolved Hide resolved
@mikemorris
Copy link
Contributor Author

@kate-osborn Do you have a suggestion for the language you would want around behavior with singleton backends?

geps/gep-1731/index.md Outdated Show resolved Hide resolved
mikemorris and others added 4 commits August 1, 2024 15:26
- change connection error and backend timeout retry from MUST to SHOULD
- add conformance flags for connection errors and backend timeouts
- add UNRESOLVED warning
geps/gep-1731/index.md Outdated Show resolved Hide resolved
@robscott
Copy link
Member

robscott commented Aug 2, 2024

Status Update: I believe we should merge this GEP for v1.2. Despite this being just past the extension for the GEP deadline, this one was largely good to go yesterday with only formatting bits to cleanup. We've had multiple review cycles with a variety of reviewers, and the resulting API change is fairly safe and contained.

I'll defer to @shaneutt or @mlavacca for the final call here. Thanks to @mikemorris for all the work on this one!

/approve

@robscott robscott added this to the v1.2.0 milestone Aug 2, 2024
Copy link
Member

@mlavacca mlavacca left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for all the work on this, @mikemorris!

/lgtm
/unhold

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Aug 2, 2024
@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Aug 2, 2024
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: mikemorris, mlavacca, robscott, youngnick

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:
  • OWNERS [mlavacca,robscott,youngnick]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot merged commit d330bc0 into kubernetes-sigs:main Aug 2, 2024
8 checks passed
@mikemorris mikemorris deleted the gep-1731-retry branch August 2, 2024 18:06
xtineskim pushed a commit to xtineskim/gateway-api that referenced this pull request Aug 8, 2024
* gep: add GEP-1731 Configurable Retries

* minor updates to NGINX summary

* add clarification around retry on connection errors

* fixup details formatting for implementation background

* add Linkerd retry summary

* Update geps/gep-1731/index.md

* Update geps/gep-1731/index.md

* website: add GEP-1731 to mkdocs.yml index

* update status code rules for retries

* update conformance details for 5xx

* add note on streams out of scope as non-goal

* add note on mesh consumer vs product routes

* add introduction

* Fixup YAML to nest retry stanza under HTTPRouteRule, not BackendRefs

* add note on HTTPRouteFilter alternative

* remove allowance for [1-9]xx range values other than 5xx

* Update geps/gep-1731/index.md

Co-authored-by: Kate Osborn <[email protected]>

* Update geps/gep-1731/index.md

Co-authored-by: Kate Osborn <[email protected]>

* update GEP-1731 metadata

* rename GEP-1731

* add SupportHTTPRouteRetryBackoff conformance details

* split out Other Considerations section, update retry budget future-proofing question

* fixup nested list formatting in conformance details

* add future excludeRetryOnTimeout consideration

* move gRPC retries to non-goals, remove Envoy gRPC retry details

* clarify streaming support non-goal as needing a separate GEP

* update Linkerd retry background section

Co-authored-by: Flynn <[email protected]>

* update streaming language in non-goals

Co-authored-by: Flynn <[email protected]>

* update connection error and timeout, add warning

- change connection error and backend timeout retry from MUST to SHOULD
- add conformance flags for connection errors and backend timeouts
- add UNRESOLVED warning

* remove 5xx shorthand, switch HTTPRouteStatusCode to int

* Update geps/gep-1731/index.md

---------

Co-authored-by: Mike Morris <[email protected]>
Co-authored-by: Kate Osborn <[email protected]>
Co-authored-by: Flynn <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/gep PRs related to Gateway Enhancement Proposal(GEP) lgtm "Looks good to me", indicates that a PR is ready to be merged. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Configurable Retries in HTTPRoute
10 participants