Skip to content

Commit

Permalink
feedback
Browse files Browse the repository at this point in the history
Signed-off-by: Alex Leong <[email protected]>
  • Loading branch information
adleong committed Aug 6, 2024
1 parent 024bfbb commit dcdcd82
Show file tree
Hide file tree
Showing 6 changed files with 47 additions and 30 deletions.
7 changes: 4 additions & 3 deletions linkerd.io/content/2.16/features/retries-and-timeouts.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,10 +4,11 @@ description = "Linkerd can perform service-specific retries and timeouts."
weight = 3
+++

Automatic retries are one the most powerful and useful mechanisms a service mesh
has for gracefully handling partial or transient application failures.
Timeouts and automatic retries are two of the most powerful and useful
mechanisms a service mesh has for gracefully handling partial or transient
application failures.

Timeouts and retries can be configured using [HTTPRoute], GrpcRoute, or Service
Timeouts and retries can be configured using [HTTPRoute], GRPCRoute, or Service
resources. Retries and timeouts are always performed on the *outbound* (client)
side.

Expand Down
27 changes: 17 additions & 10 deletions linkerd.io/content/2.16/reference/retries.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,14 +10,14 @@ failures.

Retries are a client-side behavior, and are therefore performed by the
outbound side of the Linkerd proxy.[^1] If retries are configured on an
HttpRoute or GrpcRoute with multiple backends, each retry of a request can
HTTPRoute or GRPCRoute with multiple backends, each retry of a request can
potentially get sent to a different backend. If a request has a body larger than
64KiB then it will not be retried.

## Configuring Retries

Retries are configured by a set of annotations which can be set on a Kubernetes
Service resource or on a HttpRoute or GrpcRoute which has a Service as a parent.
Service resource or on a HTTPRoute or GRPCRoute which has a Service as a parent.
Client proxies will then retry failed requests to that Service or route. If any
retry configuration annotations are present on a route resource, they override
all retry configuration annotations on the parent Service.
Expand All @@ -29,15 +29,22 @@ proxies will use the ServiceProfile retry configuration and ignore any retry
annotations.
{{< /warning >}}

+ `retry.linkerd.io/http`: A comma seperated list of HTTP response codes which
should be retried. Valid values include `5xx` to retry all 5XX response codes,
`gateway-error` to retry response codes 502-504, or a range in the form
`xxx-yyy` (for example, `500-504`). This annotation is not valid on GrpcRoute
resources.
+ `retry.linkerd.io/http`: A comma separated list of HTTP response codes which
should be retried. Each element of the list may be
+ `xxx` to retry a single response code (for example, `"504"` -- remember,
annotation values must be strings!);
+ `xxx-yyy` to retry a range of response codes (for example, `500-504`);
+ `gateway-error` to retry response codes 502-504; or
+ `5xx` to retry all 5XX response codes.
This annotation is not valid on GRPCRoute resources.
+ `retry.linkerd.io/grpc`: A comma seperated list of gRPC status codes which
should be retried. Valid values include: `cancelled`, `deadline-exceeded`,
`internal`, `resource-exhausted`, and `unavailable`. This annotation is not
valid on HttpRoute resources.
should be retried. Each element of the list may be
+ `cancelled`
+ `deadline-exceeded`
+ `internal`
+ `resource-exhausted`
+ `unavailable`
This annotation is not valid on HTTPRoute resources.
+ `retry.linkerd.io/limit`: The maximum number of times a request can be
retried. If unspecified, the default is `1`.
+ `retry.linkerd.io/timeout`: A retry timeout after which a request is cancelled
Expand Down
12 changes: 9 additions & 3 deletions linkerd.io/content/2.16/reference/timeouts.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,14 +7,15 @@ Linkerd can be configured with timeouts to limit the maximum amount of time on
a request before aborting.

Timeouts are a client-side behavior, and are therefore performed by the
outbound side of the Linkerd proxy.[^1] Note that if these timeouts are reached,
the request will not be retried. Retry timeouts can be configured as part of
outbound side of the Linkerd proxy.[^1] Note that timeouts configured in this
way are not retryable -- if these timeouts are reached, the request will not be
retried. Retryable timeouts can be configured as part of
[retry configuration](../retries/).

## Configuring Timeouts

Timeous are configured by a set of annotations which can be set on a Kubernetes
Service resource or on a HttpRoute or GrpcRoute which has a Service as a parent.
Service resource or on a HTTPRoute or GRPCRoute which has a Service as a parent.
Client proxies will then fail requests to that Service or route once they exceed
the timeout. If any timeout configuration annotations are present on a route
resource, they override all timeout configuration annotations on the parent
Expand All @@ -34,6 +35,11 @@ may be in-flight.
+ `timeout.linkerd.io/idle`: The maximum amount of time a stream may be idle,
regardless of its state.

If the [request timeout](https://gateway-api.sigs.k8s.io/api-types/httproute/#timeouts-optional)
field is set on an HTTPRoute resource, it will be used as the
`timeout.linkerd.io/request` timeout. However, if both the field and the
annotation are specified, the annotation will take priority.

## Examples

```yaml
Expand Down
17 changes: 10 additions & 7 deletions linkerd.io/content/2.16/tasks/books.md
Original file line number Diff line number Diff line change
Expand Up @@ -147,8 +147,9 @@ responses from the `books` Service on port 7002.

We know that the webapp component is getting 500s from the books component, but
it would be great to narrow this down further and get per route metrics. To do
this, we leverage the Gateway API and define a set of HTTPRoute resources, each
attached to the `books` Service by specifying it as their `parent_ref`.
this, we take advantage of the Gateway API and define a set of HTTPRoute
resources, each attached to the `books` Service by specifying it as their
`parent_ref`.

```bash
kubectl apply -f - <<EOF
Expand Down Expand Up @@ -207,7 +208,7 @@ spec:
EOF
```

We can then check that these HTTPRoute have been accepted by their parent
We can then check that these HTTPRoutes have been accepted by their parent
Service by checking their status subresource:

```bash
Expand Down Expand Up @@ -299,10 +300,12 @@ outbound_http_route_retry_requests_total{...} 469
outbound_http_route_retry_successes_total{...} 247
```

This tells us that Linkerd make a total of 469 retry requests and 247 of those
were successful and the other 222 were not and hit the default retry limit of
`1`. We can improve this further by increasing this limit to allow more than
1 retry per request:
This tells us that Linkerd made a total of 469 retry requests, of which 247 were
successful. The remaining 222 failed and could not be retried again, since we
didn't raise the retry limit from its default of 1.

We can improve this further by increasing this limit to allow more than 1 retry
per request:

```bash
kubectl -n booksapp annotate httproutes.gateway.networking.k8s.io/books-create \
Expand Down
10 changes: 5 additions & 5 deletions linkerd.io/content/2.16/tasks/configuring-retries.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,8 +9,8 @@ questions that need to be answered:
- Which requests should be retried?
- How many times should the requests be retried?

Both of these questions can be answered by adding annotations to the Service
or HttpRoute resource you're sending requests to.
Both of these questions can be answered by adding annotations to the Service,
HTTPRoute, or GRPCRoute resource you're sending requests to.

The reason why these pieces of configuration are required is because retries can
potentially be dangerous. Automatically retrying a request that changes state
Expand All @@ -32,7 +32,7 @@ annotations.

## Retries

For HttpRoutes that are idempotent, you can add the `retry.linkerd.io/http: 5xx`
For HTTPRoutes that are idempotent, you can add the `retry.linkerd.io/http: 5xx`
annotation which instructs Linkerd to retry any requests which fail with an HTTP
response status in the 500s.

Expand All @@ -43,9 +43,9 @@ Note that requests will not be retried if the body exceeds 64KiB.
You can also add the `retry.linkerd.io/limit` annotation to specify the maximum
number of times a request may be retried. By default, this limit is `1`.

## Grpc Retries
## gRPC Retries

Retries can also be configured for gRPC traffic by adding the
`retry.linkerd.io/grpc` annotation to a GrpcRoute or Service resource. The value
`retry.linkerd.io/grpc` annotation to a GRPCRoute or Service resource. The value
of this annotation is a comma seperated list of gRPC status codes that should
be retried.
4 changes: 2 additions & 2 deletions linkerd.io/content/2.16/tasks/getting-per-route-metrics.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,9 +4,9 @@ description = "Configure per-route metrics for your application."
+++

To get per-route metrics, you must create [HTTPRoute] resources. If a route has
a `parent_ref` which points to a Service resource, Linkerd will generate
a `parent_ref` which points to a **Service** resource, Linkerd will generate
outbound per-route traffic metrics for all HTTP traffic that it sends to that
Service. If a route has a `parent_ref` which points to a Server resource,
Service. If a route has a `parent_ref` which points to a **Server** resource,
Linkerd will generate inbound per-route traffic metrcs for all HTTP traffic that
it receives on that Server. Note that an [HTTPRoute] can have multiple
`parent_ref`s which means that the same [HTTPRoute] resource can be used to
Expand Down

0 comments on commit dcdcd82

Please sign in to comment.