diff --git a/linkerd.io/content/2.14/overview/_index.md b/linkerd.io/content/2.14/overview/_index.md index 37cf659495..08ecdda2ee 100644 --- a/linkerd.io/content/2.14/overview/_index.md +++ b/linkerd.io/content/2.14/overview/_index.md @@ -3,7 +3,7 @@ title = "Overview" aliases = [ "/docs", "/documentation", - "/2.15/", + "/2.16/", "../docs/", "/doc/network-performance/", "/in-depth/network-performance/", diff --git a/linkerd.io/content/2.15/features/server-policy.md b/linkerd.io/content/2.15/features/server-policy.md index 256f0dea97..c0e0d38338 100644 --- a/linkerd.io/content/2.15/features/server-policy.md +++ b/linkerd.io/content/2.15/features/server-policy.md @@ -46,7 +46,7 @@ policy at that point in the hierarchy. Valid default policies include: - `all-unauthenticated`: allow all requests. This is the default. - `all-authenticated`: allow requests from meshed clients only. -- `cluster-authenticated`: allow requests form meshed clients in the same +- `cluster-authenticated`: allow requests from meshed clients in the same cluster. - `deny`: deny all requests. diff --git a/linkerd.io/content/2.16/_index.md b/linkerd.io/content/2.16/_index.md new file mode 100644 index 0000000000..200fc5d135 --- /dev/null +++ b/linkerd.io/content/2.16/_index.md @@ -0,0 +1,6 @@ +--- +title: "Overview" +--- + + + diff --git a/linkerd.io/content/2.16/checks/index.html b/linkerd.io/content/2.16/checks/index.html new file mode 100644 index 0000000000..d13700ee78 --- /dev/null +++ b/linkerd.io/content/2.16/checks/index.html @@ -0,0 +1,18 @@ + + + + + + + Linkerd Check Redirection + + + If you are not redirected automatically, follow this + link. + + diff --git a/linkerd.io/content/2.16/common-errors/_index.md b/linkerd.io/content/2.16/common-errors/_index.md new file mode 100644 index 0000000000..d635b5ef95 --- /dev/null +++ b/linkerd.io/content/2.16/common-errors/_index.md @@ -0,0 +1,21 @@ ++++ +title = "Common Errors" +weight = 10 +[sitemap] + priority = 1.0 ++++ + +Linkerd is generally robust, but things can always go wrong! You'll find +information here about the most common things that cause people trouble. + +## When in Doubt, Start With `linkerd check` + +Whenever you see anything that looks unusual about your mesh, **always** start +with `linkerd check`. It will check a long series of things that have caused +trouble for others and make sure that your configuration is sane, and it will +point you to help for any problems it finds. It's hard to overstate how useful +this command is. + +## Common Errors + +{{% sectiontoc "common-errors" %}} diff --git a/linkerd.io/content/2.16/common-errors/failfast.md b/linkerd.io/content/2.16/common-errors/failfast.md new file mode 100644 index 0000000000..5cd78c354e --- /dev/null +++ b/linkerd.io/content/2.16/common-errors/failfast.md @@ -0,0 +1,18 @@ ++++ +title = "Failfast" +description = "Failfast means that no endpoints are available." ++++ + +If Linkerd reports that a given service is in the _failfast_ state, it +means that the proxy has determined that there are no available endpoints +for that service. In this situation there's no point in the proxy trying +to actually make a connection to the service - it already knows that it +can't talk to it - so it reports that the service is in failfast and +immediately returns an error from the proxy. + +The error will be either a 503 or a 504; see below for more information, +but if you already know that the service is in failfast because you saw +it in the logs, that's the important part. + +To get out of failfast, some endpoints for the service have to +become available. diff --git a/linkerd.io/content/2.16/common-errors/http-502.md b/linkerd.io/content/2.16/common-errors/http-502.md new file mode 100644 index 0000000000..7205d049a1 --- /dev/null +++ b/linkerd.io/content/2.16/common-errors/http-502.md @@ -0,0 +1,11 @@ ++++ +title = "HTTP 502 Errors" +description = "HTTP 502 means connection errors between proxies." ++++ + +The Linkerd proxy will return a 502 error for connection errors between +proxies. Unfortunately it's fairly common to see an uptick in 502s when +first meshing a workload that hasn't previously been used with a mesh, +because the mesh surfaces errors that were previously invisible! + +There's actually a whole page on [debugging 502s](../../tasks/debugging-502s/). diff --git a/linkerd.io/content/2.16/common-errors/http-503-504.md b/linkerd.io/content/2.16/common-errors/http-503-504.md new file mode 100644 index 0000000000..a8777413af --- /dev/null +++ b/linkerd.io/content/2.16/common-errors/http-503-504.md @@ -0,0 +1,27 @@ ++++ +title = "HTTP 503 and 504 Errors" +description = "HTTP 503 and 504 mean overloaded workloads." ++++ + +503s and 504s show up when a Linkerd proxy is trying to make so many +requests to a workload that it gets overwhelmed. + +When the workload next to a proxy makes a request, the proxy adds it +to an internal dispatch queue. When things are going smoothly, the +request is pulled from the queue and dispatched almost immediately. +If the queue gets too long, though (which can generally happen only +if the called service is slow to respond), the proxy will go into +_load-shedding_, where any new request gets an immediate 503. The +proxy can only get _out_ of load-shedding when the queue shrinks. + +Failfast also plays a role here: if the proxy puts a service into +failfast while there are requests in the dispatch queue, all the +requests in the dispatch queue get an immediate 504 before the +proxy goes into load-shedding. + +To get out of failfast, some endpoints for the service have to +become available. + +To get out of load-shedding, the dispatch queue has to start +emptying, which implies that the service has to get more capacity +to process requests or that the incoming request rate has to drop. diff --git a/linkerd.io/content/2.16/common-errors/protocol-detection.md b/linkerd.io/content/2.16/common-errors/protocol-detection.md new file mode 100644 index 0000000000..515b065515 --- /dev/null +++ b/linkerd.io/content/2.16/common-errors/protocol-detection.md @@ -0,0 +1,35 @@ ++++ +title = "Protocol Detection Errors" +description = "Protocol detection errors indicate that Linkerd doesn't understand the protocol in use." ++++ + +Linkerd is capable of proxying all TCP traffic, including TLS connections, +WebSockets, and HTTP tunneling. In most cases where the client speaks first +when a new connection is made, Linkerd can detect the protocol in use, +allowing it to perform per-request routing and metrics. + +If your proxy logs contain messages like `protocol detection timed out after +10s`, or you're experiencing 10-second delays when establishing connections, +you're probably running a situation where Linkerd cannot detect the protocol. +This is most common for protocols where the server speaks first, and the +client is waiting for information from the server. It may also occur with +non-HTTP protocols for which Linkerd doesn't yet understand the wire format of +a request. + +You'll need to understand exactly what the situation is to fix this: + +- A server-speaks-first protocol will probably need to be configured as a + `skip` or `opaque` port, as described in the [protocol detection + documentation](../../features/protocol-detection/#configuring-protocol-detection). + +- If you're seeing transient protocol detection timeouts, this is more likely + to indicate a misbehaving workload. + +- If you know the protocol is client-speaks-first but you're getting + consistent protocol detection timeouts, you'll probably need to fall back on + a `skip` or `opaque` port. + +Note that marking ports as `skip` or `opaque` has ramifications beyond +protocol detection timeouts; see the [protocol detection +documentation](../../features/protocol-detection/#configuring-protocol-detection) +for more information. diff --git a/linkerd.io/content/2.16/features/_index.md b/linkerd.io/content/2.16/features/_index.md new file mode 100644 index 0000000000..21a32234d1 --- /dev/null +++ b/linkerd.io/content/2.16/features/_index.md @@ -0,0 +1,14 @@ ++++ +title = "Features" +weight = 3 +[sitemap] + priority = 1.0 ++++ + +Linkerd offers many features, outlined below. For our walkthroughs and guides, +please see the [Linkerd task docs]({{% ref "../tasks" %}}). For a reference, +see the [Linkerd reference docs]({{% ref "../reference" %}}). + +## Linkerd's features + +{{% sectiontoc "features" %}} diff --git a/linkerd.io/content/2.16/features/access-logging.md b/linkerd.io/content/2.16/features/access-logging.md new file mode 100644 index 0000000000..c7567bb984 --- /dev/null +++ b/linkerd.io/content/2.16/features/access-logging.md @@ -0,0 +1,68 @@ ++++ +title = "HTTP Access Logging" +description = "Linkerd proxies can be configured to emit HTTP access logs." +aliases = [ + "../access-logging/", + "../proxy-access-logging/", + "../http-access-logging/", + "../access-log", +] ++++ + +Linkerd proxies can be configured to generate an HTTP access log that records +all HTTP requests that transit the proxy. + +The `config.linkerd.io/access-log` annotation is used to enable proxy HTTP +access logging. Adding this annotation to a namespace or workload configures the +proxy injector to set an environment variable in the proxy container that +configures access logging. + +HTTP access logging is disabled by default because it has a performance impact, +compared to proxies without access logging enabled. Enabling access logging may +increase tail latency and CPU consumption under load. The severity of +this performance cost may vary depending on the traffic being proxied, and may +be acceptable in some environments. + +{{< note >}} +The proxy's HTTP access log is distinct from proxy debug logging, which is +configured separately. See the documentation on [modifying the proxy log +level](../../tasks/modifying-proxy-log-level/) for details on configuring the +proxy's debug logging. +{{< /note >}} + +## Access Log Formats + +The value of the `config.linkerd.io/access-log` annotation determines the format +of HTTP access log entries, and can be either "apache" or "json". + +Setting the `config.linkerd.io/access-log: "apache"` annotation configures the +proxy to emit HTTP access logs in the [Apache Common Log +Format](https://en.wikipedia.org/wiki/Common_Log_Format). For example: + +```text +10.42.0.63:51160 traffic.booksapp.serviceaccount.identity.linkerd.cluster.local - [2022-08-23T20:28:20.071809491Z] "GET http://webapp:7000/ HTTP/2.0" 200 +10.42.0.63:51160 traffic.booksapp.serviceaccount.identity.linkerd.cluster.local - [2022-08-23T20:28:20.187706137Z] "POST http://webapp:7000/authors HTTP/2.0" 303 +10.42.0.63:51160 traffic.booksapp.serviceaccount.identity.linkerd.cluster.local - [2022-08-23T20:28:20.301798187Z] "GET http://webapp:7000/authors/104 HTTP/2.0" 200 +10.42.0.63:51160 traffic.booksapp.serviceaccount.identity.linkerd.cluster.local - [2022-08-23T20:28:20.409177224Z] "POST http://webapp:7000/books HTTP/2.0" 303 +10.42.0.1:43682 - - [2022-08-23T20:28:23.049685223Z] "GET /ping HTTP/1.1" 200 +``` + +Setting the `config.linkerd.io/access-log: json` annotation configures the proxy +to emit access logs in a JSON format. For example: + +```json +{"client.addr":"10.42.0.70:32996","client.id":"traffic.booksapp.serviceaccount.identity.linkerd.cluster.local","host":"webapp:7000","method":"GET","processing_ns":"39826","request_bytes":"","response_bytes":"19627","status":200,"timestamp":"2022-08-23T20:33:42.321746212Z","total_ns":"14441135","trace_id":"","uri":"http://webapp:7000/","user_agent":"Go-http-client/1.1","version":"HTTP/2.0"} +{"client.addr":"10.42.0.70:32996","client.id":"traffic.booksapp.serviceaccount.identity.linkerd.cluster.local","host":"webapp:7000","method":"POST","processing_ns":"30036","request_bytes":"33","response_bytes":"0","status":303,"timestamp":"2022-08-23T20:33:42.436964052Z","total_ns":"14122403","trace_id":"","uri":"http://webapp:7000/authors","user_agent":"Go-http-client/1.1","version":"HTTP/2.0"} +{"client.addr":"10.42.0.70:32996","client.id":"traffic.booksapp.serviceaccount.identity.linkerd.cluster.local","host":"webapp:7000","method":"GET","processing_ns":"38664","request_bytes":"","response_bytes":"2350","status":200,"timestamp":"2022-08-23T20:33:42.551768300Z","total_ns":"6998222","trace_id":"","uri":"http://webapp:7000/authors/105","user_agent":"Go-http-client/1.1","version":"HTTP/2.0"} +{"client.addr":"10.42.0.70:32996","client.id":"traffic.booksapp.serviceaccount.identity.linkerd.cluster.local","host":"webapp:7000","method":"POST","processing_ns":"42492","request_bytes":"46","response_bytes":"0","status":303,"timestamp":"2022-08-23T20:33:42.659401621Z","total_ns":"9274163","trace_id":"","uri":"http://webapp:7000/books","user_agent":"Go-http-client/1.1","version":"HTTP/2.0"} +{"client.addr":"10.42.0.1:56300","client.id":"-","host":"10.42.0.69:7000","method":"GET","processing_ns":"35848","request_bytes":"","response_bytes":"4","status":200,"timestamp":"2022-08-23T20:33:49.254262428Z","total_ns":"1416066","trace_id":"","uri":"/ping","user_agent":"kube-probe/1.24","version":"HTTP/1.1"} +``` + +## Consuming Access Logs + +The HTTP access log is written to the proxy container's `stderr` stream, while +the proxy's standard debug logging is written to the proxy container's `stdout` +stream. Currently, the `kubectl logs` command will always output both the +container's `stdout` and `stderr` streams. However, [KEP +3289](https://github.com/kubernetes/enhancements/pull/3289) will add support for +separating a container's `stdout` or `stderr` in the `kubectl logs` command. diff --git a/linkerd.io/content/2.16/features/automatic-mtls.md b/linkerd.io/content/2.16/features/automatic-mtls.md new file mode 100644 index 0000000000..142ffda1cc --- /dev/null +++ b/linkerd.io/content/2.16/features/automatic-mtls.md @@ -0,0 +1,155 @@ +--- +title: Automatic mTLS +description: Linkerd automatically enables mutual Transport Layer Security (TLS) for all communication between meshed applications. +weight: 4 +aliases: + - ../automatic-tls +enableFAQSchema: true +faqs: + - question: What traffic can Linkerd automatically mTLS? + answer: Linkerd transparently applies mTLS to all TCP communication between + meshed pods. However, there are still ways in which you may still have + non-mTLS traffic in your system, including traffic to or from non-meshed + pods (e.g. Kubernetes healthchecks), and traffic on ports that were + marked as skip ports, which bypass the proxy entirely. + + - question: How does Linkerd's mTLS implementation work? + answer: The Linkerd control plane contains a certificate authority (CA) + called "identity". This CA issues TLS certificates to each Linkerd data + plane proxy. Each certificate is bound to the Kubernetes ServiceAccount + of the containing pod. These TLS certificates expire after 24 hours and + are automatically rotated. The proxies use these certificates to encrypt + and authenticate TCP traffic to other proxies. + + - question: What is mTLS? + answer: mTLS, or mutual TLS, is simply "regular TLS" with the extra + stipulation that the client is also authenticated. TLS guarantees + authenticity, but by default this only happens in one direction--the + client authenticates the server but the server doesn’t authenticate the + client. mTLS makes the authenticity symmetric. +--- + +By default, Linkerd automatically enables mutually-authenticated Transport +Layer Security (mTLS) for all TCP traffic between meshed pods. This means that +Linkerd adds authenticated, encrypted communication to your application with +no extra work on your part. (And because the Linkerd control plane also runs +on the data plane, this means that communication between Linkerd's control +plane components are also automatically secured via mTLS.) + +See [Caveats and future work](#caveats-and-future-work) below for some details. + +## What is mTLS? + +mTLS, or mutual TLS, is simply "regular TLS" with the extra stipulation that +the client is also authenticated. TLS guarantees authenticity, but by default +this only happens in one direction--the client authenticates the server but the +server doesn’t authenticate the client. mTLS makes the authenticity symmetric. + +mTLS is a large topic. For a broad overview of what mTLS is and how it works in +Kuberentes clusters, we suggest reading through [A Kubernetes engineer's guide +to mTLS](https://buoyant.io/mtls-guide/). + +## Which traffic can Linkerd automatically mTLS? + +Linkerd transparently applies mTLS to all TCP communication between meshed +pods. However, there are still ways in which you may still have non-mTLS +traffic in your system, including: + +* Traffic to or from non-meshed pods (e.g. Kubernetes healthchecks) +* Traffic on ports that were marked as [skip ports](../protocol-detection/), + which bypass the proxy entirely. + +You can [verify which traffic is mTLS'd](../../tasks/validating-your-traffic/) +in a variety of ways. External systems such as [Buoyant +Cloud](https://buoyant.io/cloud) can also automatically generate reports of TLS +traffic patterns on your cluster. + +## Operational concerns + +Linkerd's mTLS requires some preparation for production use, especially for +long-lived clusters or clusters that expect to have cross-cluster traffic. + +The trust anchor generated by the default `linkerd install` CLI command expires +after 365 days. After that, it must be [manually +rotated](../../tasks/manually-rotating-control-plane-tls-credentials/)—a +non-trivial task. Alternatively, you can [provide the trust anchor +yourself](../../tasks/generate-certificates/) and control the expiration date, +e.g. setting it to 10 years rather than one year. + +Kubernetes clusters that make use of Linkerd's [multi-cluster +communication](../multicluster/) must share a trust anchor. Thus, the default +`linkerd install` setup will not work for this situation and you must provide +an explicit trust anchor. + +Similarly, the default cluster issuer certificate and key expire after a year. +These must be [rotated before they +expire](../../tasks/manually-rotating-control-plane-tls-credentials/). +Alternatively, you can [set up automatic rotation with +`cert-manager`](../../tasks/automatically-rotating-control-plane-tls-credentials/). + +External systems such as [Buoyant Cloud](https://buoyant.io/cloud) can be used +to monitor cluster credentials and to send reminders if they are close to +expiration. + +## How does Linkerd's mTLS implementation work? + +The [Linkerd control plane](../../reference/architecture/) contains a certificate +authority (CA) called `identity`. This CA issues TLS certificates to each +Linkerd data plane proxy. Each certificate is bound to the [Kubernetes +ServiceAccount](https://kubernetes.io/docs/tasks/configure-pod-container/configure-service-account/) +identity of the containing pod. These TLS certificates expire after 24 hours +and are automatically rotated. The proxies use these certificates to encrypt +and authenticate TCP traffic to other proxies. + +On the control plane side, Linkerd maintains a set of credentials in the +cluster: a trust anchor, and an issuer certificate and private key. These +credentials can be generated by Linkerd during install time, or optionally +provided by an external source, e.g. [Vault](https://vaultproject.io) or +[cert-manager](https://github.com/jetstack/cert-manager). The issuer +certificate and private key are stored in a [Kubernetes +Secret](https://kubernetes.io/docs/concepts/configuration/secret/); this Secret +is placed in the `linkerd` namespace and can only be read by the service +account used by the [Linkerd control plane](../../reference/architecture/)'s +`identity` component. + +On the data plane side, each proxy is passed the trust anchor in an environment +variable. At startup, the proxy generates a private key, stored in a [tmpfs +emptyDir](https://kubernetes.io/docs/concepts/storage/volumes/#emptydir) which +stays in memory and never leaves the pod. The proxy connects to the control +plane's `identity` component, validating the connection to `identity` with the +trust anchor, and issues a [certificate signing request +(CSR)](https://en.wikipedia.org/wiki/Certificate_signing_request). The CSR +contains an initial certificate with identity set to the pod's [Kubernetes +ServiceAccount](https://kubernetes.io/docs/tasks/configure-pod-container/configure-service-account/), +and the actual service account token, so that `identity` can validate that the +CSR is valid. After validation, the signed trust bundle is returned to the +proxy, which can use it as both a client and server certificate. These +certificates are scoped to 24 hours and dynamically refreshed using the same +mechanism. + +Finally, when a proxy receives an outbound connection from the application +container within its pod, it looks up that destination with the Linkerd control +plane. If it's in the Kubernetes cluster, the control plane provides the proxy +with the destination's endpoint addresses, along with metadata including an +identity name. When the proxy connects to the destination, it initiates a TLS +handshake and verifies that that the destination proxy's certificate is signed +by the trust anchor and contains the expected identity. + +## TLS protocol parameters + +Linkerd currently uses the following TLS protocol parameters for mTLS +connections, although they may change in future versions: + +* TLS version 1.3 +* Cipher suite `TLS_CHACHA20_POLY1305_SHA256` as specified in [RFC + 8446](https://www.rfc-editor.org/rfc/rfc8446#section-9.1). + +## Caveats and future work + +* Linkerd does not *require* mTLS unless [authorization policies](../server-policy/) + are configured. + +* Ideally, the ServiceAccount token that Linkerd uses would not be shared with + other potential uses of that token. In future Kubernetes releases, Kubernetes + will support audience/time-bound ServiceAccount tokens, and Linkerd will use + those instead. diff --git a/linkerd.io/content/2.16/features/cni.md b/linkerd.io/content/2.16/features/cni.md new file mode 100644 index 0000000000..0d314e1de9 --- /dev/null +++ b/linkerd.io/content/2.16/features/cni.md @@ -0,0 +1,136 @@ ++++ +title = "CNI Plugin" +description = "Linkerd can optionally use a CNI plugin instead of an init-container to avoid NET_ADMIN capabilities." ++++ + +Linkerd's data plane works by transparently routing all TCP traffic to and from +every meshed pod to its proxy. (See the +[Architecture](../../reference/architecture/) doc.) This allows Linkerd to act +without the application being aware. + +By default, this rewiring is done with an [Init +Container](../../reference/architecture/#linkerd-init-container) that uses +iptables to install routing rules for the pod, at pod startup time. However, +this requires the `CAP_NET_ADMIN` capability; and in some clusters, this +capability is not granted to pods. + +To handle this, Linkerd can optionally run these iptables rules in a [CNI +plugin](https://kubernetes.io/docs/concepts/extend-kubernetes/compute-storage-net/network-plugins/) +rather than in an Init Container. This avoids the need for a `CAP_NET_ADMIN` +capability. + +{{< note >}} +Linkerd's CNI plugin is designed to run in conjunction with your existing CNI +plugin, using _CNI chaining_. It handles only the Linkerd-specific +configuration and does not replace the need for a CNI plugin. +{{< /note >}} + +{{< note >}} +If you're installing Linkerd's CNI plugin on top of Cilium, make sure to install +the latter with the option `cni.exclusive=false`, so Cilium doesn't take +ownership over the CNI configurations directory, and allows other plugins to +deploy their configurations there. +{{< /note >}} + +## Installation + +Usage of the Linkerd CNI plugin requires that the `linkerd-cni` DaemonSet be +successfully installed on your cluster _first_, before installing the Linkerd +control plane. + +### Using the CLI + +To install the `linkerd-cni` DaemonSet, run: + +```bash +linkerd install-cni | kubectl apply -f - +``` + +Once the DaemonSet is up and running, meshed pods should no longer use the +`linkerd-init` Init Container. To accomplish this, use the +`--linkerd-cni-enabled` flag when installing the control plane: + +```bash +linkerd install --linkerd-cni-enabled | kubectl apply -f - +``` + +Using this option will set a `cniEnabled` flag in the `linkerd-config` +ConfigMap. Proxy injections will read this field and omit the `linkerd-init` +Init Container. + +### Using Helm + +First ensure that your Helm local cache is updated: + +```bash +helm repo update +helm search repo linkerd2-cni +``` + +Install the CNI DaemonSet: + +```bash +# install the CNI plugin first +helm install linkerd-cni -n linkerd-cni --create-namespace linkerd/linkerd2-cni + +# ensure the plugin is installed and ready +linkerd check --pre --linkerd-cni-enabled +``` + +At that point you are ready to install Linkerd with CNI enabled. Follow the +[Installing Linkerd with Helm](../../tasks/install-helm/) instructions. + +## Additional configuration + +The `linkerd install-cni` command includes additional flags that you can use to +customize the installation. See `linkerd install-cni --help` for more +information. Note that many of the flags are similar to the flags that can be +used to configure the proxy when running `linkerd inject`. If you change a +default when running `linkerd install-cni`, you will want to ensure that you +make a corresponding change when running `linkerd inject`. + +The most important flags are: + +1. `--dest-cni-net-dir`: This is the directory on the node where the CNI + Configuration resides. It defaults to: `/etc/cni/net.d`. +2. `--dest-cni-bin-dir`: This is the directory on the node where the CNI Plugin + binaries reside. It defaults to: `/opt/cni/bin`. +3. `--cni-log-level`: Setting this to `debug` will allow more verbose logging. + In order to view the CNI Plugin logs, you must be able to see the `kubelet` + logs. One way to do this is to log onto the node and use + `journalctl -t kubelet`. The string `linkerd-cni:` can be used as a search to + find the plugin log output. + +### Allowing initContainer networking + +When using the Linkerd CNI plugin the required `iptables` rules are in effect +before the pod is scheduled. Also, the `linkerd-proxy` is not started until +after all `initContainers` have completed. This means no `initContainer` will +have network access because its packets will be caught by `iptables` and the +`linkerd-proxy` will not yet be available. + +It is possible to bypass these `iptables` rules by running the `initContainer` +as the UID of the proxy (by default `2102`). Processes run as this UID are +skipped by `iptables` and allow direct network connectivity. These network +connections are not meshed. + +The following is a snippet for an `initContainer` configured to allow unmeshed +networking while using the CNI plugin: + +```yaml +initContainers: +- name: example + image: example + securityContext: + runAsUser: 2102 # Allows skipping iptables rules +``` + +## Upgrading the CNI plugin + +Since the CNI plugin is basically stateless, there is no need for a separate +`upgrade` command. If you are using the CLI to upgrade the CNI plugin you can +just do: + +```bash +linkerd install-cni | kubectl apply --prune -l linkerd.io/cni-resource=true -f - +``` diff --git a/linkerd.io/content/2.16/features/dashboard.md b/linkerd.io/content/2.16/features/dashboard.md new file mode 100644 index 0000000000..5413064e77 --- /dev/null +++ b/linkerd.io/content/2.16/features/dashboard.md @@ -0,0 +1,112 @@ ++++ +title = "Dashboard and on-cluster metrics stack" +description = "Linkerd provides a full on-cluster metrics stack, including CLI tools and dashboards." ++++ + +Linkerd provides a full on-cluster metrics stack, including CLI tools and a web +dashboard. + +To access this functionality, install the viz extension: + +```bash +linkerd viz install | kubectl apply -f - +``` + +This extension installs the following components into your `linkerd-viz` +namespace: + +* A [Prometheus](https://prometheus.io/) instance +* metrics-api, tap, tap-injector, and web components + +These components work together to provide an on-cluster metrics stack. + +{{< note >}} +To limit excessive resource usage on the cluster, the metrics stored by this +extension are _transient_. Only the past 6 hours are stored, and metrics do not +persist in the event of pod restart or node outages. This may not be suitable +for production use. +{{< /note >}} + +{{< note >}} +This metrics stack may require significant cluster resources. Prometheus, in +particular, will consume resources as a function of traffic volume within the +cluster. +{{< /note >}} + +## Linkerd dashboard + +The Linkerd dashboard provides a high level view of what is happening with your +services in real time. It can be used to view "golden metrics" (success rate, +requests/second and latency), visualize service dependencies and understand the +health of specific service routes. + +One way to pull it up is by running `linkerd viz dashboard` from the command +line. + +{{< fig src="/images/architecture/stat.png" title="Top Line Metrics">}} + +## Grafana + +In earlier versions of Linkerd, the viz extension also pre-installed a Grafana +dashboard. As of Linkerd 2.12, due to licensing changes in Grafana, this is no +longer the case. However, you can still install Grafana on your own—see the +[Grafana docs](../../tasks/grafana/) for instructions on how to create the +Grafana dashboards. + +## Examples + +In these examples, we assume you've installed the emojivoto example application. +Please refer to the [Getting Started Guide](../../getting-started/) for how to +do this. + +You can use your dashboard extension and see all the services in the demo app. +Since the demo app comes with a load generator, we can see live traffic metrics +by running: + +```bash +linkerd -n emojivoto viz stat deploy +``` + +This will show the "golden" metrics for each deployment: + +* Success rates +* Request rates +* Latency distribution percentiles + +To dig in a little further, it is possible to use `top` to get a real-time +view of which paths are being called: + +```bash +linkerd -n emojivoto viz top deploy +``` + +To go even deeper, we can use `tap` shows the stream of requests across a +single pod, deployment, or even everything in the emojivoto namespace: + +```bash +linkerd -n emojivoto viz tap deploy/web +``` + +All of this functionality is also available in the dashboard, if you would like +to use your browser instead: + +{{< gallery >}} + +{{< gallery-item src="/images/getting-started/stat.png" + title="Top Line Metrics">}} + +{{< gallery-item src="/images/getting-started/inbound-outbound.png" + title="Deployment Detail">}} + +{{< gallery-item src="/images/getting-started/top.png" + title="Top" >}} + +{{< gallery-item src="/images/getting-started/tap.png" + title="Tap" >}} + +{{< /gallery >}} + +## Futher reading + +See [Exporting metrics](../../tasks/exporting-metrics/) for alternative ways +to consume Linkerd's metrics. diff --git a/linkerd.io/content/2.16/features/distributed-tracing.md b/linkerd.io/content/2.16/features/distributed-tracing.md new file mode 100644 index 0000000000..7bf2ef5be8 --- /dev/null +++ b/linkerd.io/content/2.16/features/distributed-tracing.md @@ -0,0 +1,59 @@ ++++ +title = "Distributed Tracing" +description = "You can enable distributed tracing support in Linkerd." ++++ + +Tracing can be an invaluable tool in debugging distributed systems performance, +especially for identifying bottlenecks and understanding the latency cost of +each component in your system. Linkerd can be configured to emit trace spans +from the proxies, allowing you to see exactly what time requests and responses +spend inside. + +Unlike most of the features of Linkerd, distributed tracing requires both code +changes and configuration. (You can read up on [Distributed tracing in the +service mesh: four myths](/2019/08/09/service-mesh-distributed-tracing-myths/) +for why this is.) + +Furthermore, Linkerd provides many of the features that are often associated +with distributed tracing, *without* requiring configuration or application +changes, including: + +* Live service topology and dependency graphs +* Aggregated service health, latencies, and request volumes +* Aggregated path / route health, latencies, and request volumes + +For example, Linkerd can display a live topology of all incoming and outgoing +dependencies for a service, without requiring distributed tracing or any other +such application modification: + +{{< fig src="/images/books/webapp-detail.png" + title="The Linkerd dashboard showing an automatically generated topology graph" +>}} + +Likewise, Linkerd can provide golden metrics per service and per *route*, again +without requiring distributed tracing or any other such application +modification: + +{{< fig src="/images/books/webapp-routes.png" + title="Linkerd dashboard showing an automatically generated route metrics" +>}} + +## Using distributed tracing + +That said, distributed tracing certainly has its uses, and Linkerd makes this +as easy as it can. Linkerd's role in distributed tracing is actually quite +simple: when a Linkerd data plane proxy sees a tracing header in a proxied HTTP +request, Linkerd will emit a trace span for that request. This span will +include information about the exact amount of time spent in the Linkerd proxy. +When paired with software to collect, store, and analyze this information, this +can provide significant insight into the behavior of the mesh. + +To use this feature, you'll also need to introduce several additional +components in your system., including an ingress layer that kicks off the trace +on particular requests, a client library for your application (or a mechanism +to propagate trace headers), a trace collector to collect span data and turn +them into traces, and a trace backend to store the trace data and allow the +user to view/query it. + +For details, please see our [guide to adding distributed tracing to your +application with Linkerd](../../tasks/distributed-tracing/). diff --git a/linkerd.io/content/2.16/features/fault-injection.md b/linkerd.io/content/2.16/features/fault-injection.md new file mode 100644 index 0000000000..540b977d83 --- /dev/null +++ b/linkerd.io/content/2.16/features/fault-injection.md @@ -0,0 +1,12 @@ ++++ +title = "Fault Injection" +description = "Linkerd provides mechanisms to programmatically inject failures into services." ++++ + +Fault injection is a form of chaos engineering where the error rate of a service +is artificially increased to see what impact there is on the system as a whole. +Traditionally, this would require modifying the service's code to add a fault +injection library that would be doing the actual work. Linkerd can do this +without any service code changes, only requiring a little configuration. + +To inject faults into your own services, follow the [tutorial](../../tasks/fault-injection/). diff --git a/linkerd.io/content/2.16/features/ha.md b/linkerd.io/content/2.16/features/ha.md new file mode 100644 index 0000000000..5cb9dd4116 --- /dev/null +++ b/linkerd.io/content/2.16/features/ha.md @@ -0,0 +1,142 @@ ++++ +title = "High Availability" +description = "The Linkerd control plane can run in high availability (HA) mode." +aliases = [ + "../ha/" +] ++++ + +For production workloads, Linkerd's control plane can run in high availability +(HA) mode. This mode: + +* Runs three replicas of critical control plane components. +* Sets production-ready CPU and memory resource requests on control plane + components. +* Sets production-ready CPU and memory resource requests on data plane proxies +* *Requires* that the [proxy auto-injector](../proxy-injection/) be + functional for any pods to be scheduled. +* Sets [anti-affinity + policies](https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#affinity-and-anti-affinity) + on critical control plane components to ensure, if possible, that they are + scheduled on separate nodes and in separate zones by default. + +## Enabling HA + +You can enable HA mode at control plane installation time with the `--ha` flag: + +```bash +linkerd install --ha | kubectl apply -f - +``` + +Also note the Viz extension also supports an `--ha` flag with similar +characteristics: + +```bash +linkerd viz install --ha | kubectl apply -f - +``` + +You can override certain aspects of the HA behavior at installation time by +passing other flags to the `install` command. For example, you can override the +number of replicas for critical components with the `--controller-replicas` +flag: + +```bash +linkerd install --ha --controller-replicas=2 | kubectl apply -f - +``` + +See the full [`install` CLI documentation](../../reference/cli/install/) for +reference. + +The `linkerd upgrade` command can be used to enable HA mode on an existing +control plane: + +```bash +linkerd upgrade --ha | kubectl apply -f - +``` + +## Proxy injector failure policy + +The HA proxy injector is deployed with a stricter failure policy to enforce +[automatic proxy injection](../proxy-injection/). This setup ensures +that no annotated workloads are accidentally scheduled to run on your cluster, +without the Linkerd proxy. (This can happen when the proxy injector is down.) + +If proxy injection process failed due to unrecognized or timeout errors during +the admission phase, the workload admission will be rejected by the Kubernetes +API server, and the deployment will fail. + +Hence, it is very important that there is always at least one healthy replica +of the proxy injector running on your cluster. + +If you cannot guarantee the number of healthy proxy injector on your cluster, +you can loosen the webhook failure policy by setting its value to `Ignore`, as +seen in the +[Linkerd Helm chart](https://github.com/linkerd/linkerd2/blob/803511d77b33bd9250b4a7fecd36752fcbd715ac/charts/linkerd2/templates/proxy-injector-rbac.yaml#L98). + +{{< note >}} +See the Kubernetes +[documentation](https://kubernetes.io/docs/reference/access-authn-authz/extensible-admission-controllers/#failure-policy) +for more information on the admission webhook failure policy. +{{< /note >}} + +## Pod anti-affinity rules + +All critical control plane components are deployed with pod anti-affinity rules +to ensure redundancy. + +Linkerd uses a `requiredDuringSchedulingIgnoredDuringExecution` pod +anti-affinity rule to ensure that the Kubernetes scheduler does not colocate +replicas of critical component on the same node. A +`preferredDuringSchedulingIgnoredDuringExecution` pod anti-affinity rule is also +added to try to schedule replicas in different zones, where possible. + +In order to satisfy these anti-affinity rules, HA mode assumes that there +are always at least three nodes in the Kubernetes cluster. If this assumption is +violated (e.g. the cluster is scaled down to two or fewer nodes), then the +system may be left in a non-functional state. + +Note that these anti-affinity rules don't apply to add-on components like +Prometheus. + +## Scaling Prometheus + +The Linkerd Viz extension provides a pre-configured Prometheus pod, but for +production workloads we recommend setting up your own Prometheus instance. To +scrape the data plane metrics, follow the instructions +[here](../../tasks/external-prometheus/). This will provide you +with more control over resource requirement, backup strategy and data retention. + +When planning for memory capacity to store Linkerd timeseries data, the usual +guidance is 5MB per meshed pod. + +If your Prometheus is experiencing regular `OOMKilled` events due to the amount +of data coming from the data plane, the two key parameters that can be adjusted +are: + +* `storage.tsdb.retention.time` defines how long to retain samples in storage. + A higher value implies that more memory is required to keep the data around + for a longer period of time. Lowering this value will reduce the number of + `OOMKilled` events as data is retained for a shorter period of time +* `storage.tsdb.retention.size` defines the maximum number of bytes that can be + stored for blocks. A lower value will also help to reduce the number of + `OOMKilled` events + +For more information and other supported storage options, see the Prometheus +documentation +[here](https://prometheus.io/docs/prometheus/latest/storage/#operational-aspects). + +## Working with Cluster AutoScaler + +The Linkerd proxy stores its mTLS private key in a +[tmpfs emptyDir](https://kubernetes.io/docs/concepts/storage/volumes/#emptydir) +volume to ensure that this information never leaves the pod. This causes the +default setup of Cluster AutoScaler to not be able to scale down nodes with +injected workload replicas. + +The workaround is to annotate the injected workload with the +`cluster-autoscaler.kubernetes.io/safe-to-evict: "true"` annotation. If you +have full control over the Cluster AutoScaler configuration, you can start the +Cluster AutoScaler with the `--skip-nodes-with-local-storage=false` option. + +For more information on this, see the Cluster AutoScaler documentation +[here](https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/FAQ.md#what-types-of-pods-can-prevent-ca-from-removing-a-node). diff --git a/linkerd.io/content/2.16/features/http-grpc.md b/linkerd.io/content/2.16/features/http-grpc.md new file mode 100644 index 0000000000..bb32e81916 --- /dev/null +++ b/linkerd.io/content/2.16/features/http-grpc.md @@ -0,0 +1,21 @@ ++++ +title = "HTTP, HTTP/2, and gRPC Proxying" +description = "Linkerd will automatically enable advanced features (including metrics, load balancing, retries, and more) for HTTP, HTTP/2, and gRPC connections." +weight = 1 ++++ + +Linkerd can proxy all TCP connections. For HTTP connections (including HTTP/1.0, +HTTP/1.1, HTTP/2, and gRPC connections), it will automatically enable advanced +L7 features including [request-level metrics](../telemetry/), [latency-aware +load balancing](../load-balancing/), [retries](../retries-and-timeouts/), and +more. + +(See [TCP Proxying and Protocol Detection](../protocol-detection/) for details of +how this detection happens automatically, and how it can sometimes fail.) + +Note that while Linkerd does [zero-config mutual TLS](../automatic-mtls/), it +cannot decrypt TLS connections initiated by the outside world. For example, if +you have a TLS connection from outside the cluster, or if your application does +HTTP/2 plus TLS, Linkerd will treat these connections as raw TCP streams. To +take advantage of Linkerd's full array of L7 features, communication between +meshed pods must be TLS'd by Linkerd, not by the application itself. diff --git a/linkerd.io/content/2.16/features/httproute.md b/linkerd.io/content/2.16/features/httproute.md new file mode 100644 index 0000000000..6dd00a4b40 --- /dev/null +++ b/linkerd.io/content/2.16/features/httproute.md @@ -0,0 +1,84 @@ ++++ +title = "HTTPRoutes" +description = "Linkerd can use the HTTPRoute resource to configure per-route policies." +aliases = [ + "../httproutes/" +] ++++ + +To configure routing behavior and policy for HTTP traffic, Linkerd supports the +[HTTPRoute resource], defined by the Kubernetes [Gateway API]. + +{{< note >}} +Two versions of the HTTPRoute resource may be used with Linkerd: + +- The upstream version provided by the Gateway API, with the + `gateway.networking.k8s.io` API group +- A Linkerd-specific CRD provided by Linkerd, with the `policy.linkerd.io` API + group + +The two HTTPRoute resource definitions are similar, but the Linkerd version +implements experimental features not yet available with the upstream Gateway API +resource definition. See [the HTTPRoute reference +documentation](../../reference/httproute/#linkerd-and-gateway-api-httproutes) +for details. +{{< /note >}} + +If the Gateway API CRDs already exist in your cluster, then Linkerd must be +installed with the `--set enableHttpRoutes=false` flag during the +`linkerd install --crds` step or with the `enableHttpRoutes=false` Helm value +when installing the `linkerd-crds` Helm chart. This avoid conflicts by +instructing Linkerd to not install the Gateway API CRDs and instead rely on the +Gateway CRDs which already exist. + +An HTTPRoute is a Kubernetes resource which attaches to a parent resource, such +as a [Service]. The HTTPRoute defines a set of rules which match HTTP requests +to that resource, based on parameters such as the request's path, method, and +headers, and can configure how requests matching that rule are routed by the +Linkerd service mesh. + +## Inbound and Outbound HTTPRoutes + +Two types of HTTPRoute are used for configuring the behavior of Linkerd's +proxies: + +- HTTPRoutes with a [Service] as their parent resource configure policies for + _outbound_ proxies in pods which are clients of that [Service]. Outbound + policy includes [dynamic request routing][dyn-routing], adding request + headers, modifying a request's path, and reliability features such as + [timeouts]. +- HTTPRoutes with a [Server] as their parent resource configure policy for + _inbound_ proxies in pods which recieve traffic to that [Server]. Inbound + HTTPRoutes are used to configure fine-grained [per-route authorization and + authentication policies][auth-policy]. + +{{< warning >}} +**Outbound HTTPRoutes and [ServiceProfile](../service-profiles/)s provide +overlapping configuration.** For backwards-compatibility reasons, a +ServiceProfile will take precedence over HTTPRoutes which configure the same +Service. If a ServiceProfile is defined for the parent Service of an HTTPRoute, +proxies will use the ServiceProfile configuration, rather than the HTTPRoute +configuration, as long as the ServiceProfile +exists. +{{< /warning >}} + +## Learn More + +To get started with HTTPRoutes, you can: + +- [Configure fault injection](../../tasks/fault-injection/) using an outbound + HTTPRoute. +- [Configure timeouts][timeouts] using an outbound HTTPRoute. +- [Configure dynamic request routing][dyn-routing] using an outbound HTTPRoute. +- [Configure per-route authorization policy][auth-policy] using an inbound + HTTPRoute. +- See the [reference documentation](../../reference/httproute/) for a complete + description of the HTTPRoute resource. + +[HTTPRoute resource]: https://gateway-api.sigs.k8s.io/api-types/httproute/ +[Gateway API]: https://gateway-api.sigs.k8s.io/ +[Service]: https://kubernetes.io/docs/concepts/services-networking/service/ +[Server]: ../../reference/authorization-policy/#server +[auth-policy]: ../../tasks/configuring-per-route-policy/ +[dyn-routing]:../../tasks/configuring-dynamic-request-routing/ +[timeouts]: ../../tasks/configuring-timeouts/#using-httproutes diff --git a/linkerd.io/content/2.16/features/ingress.md b/linkerd.io/content/2.16/features/ingress.md new file mode 100644 index 0000000000..6a3d9308f8 --- /dev/null +++ b/linkerd.io/content/2.16/features/ingress.md @@ -0,0 +1,14 @@ ++++ +title = "Ingress" +description = "Linkerd can work alongside your ingress controller of choice." +weight = 7 +aliases = [ + "../ingress/" +] ++++ + +For reasons of simplicity, Linkerd does not provide its own ingress controller. +Instead, Linkerd is designed to work alongside your ingress controller of choice. + +See the [Using Ingress with Linkerd Guide](../../tasks/using-ingress/) for examples +of how to get it all working together. diff --git a/linkerd.io/content/2.16/features/ipv6.md b/linkerd.io/content/2.16/features/ipv6.md new file mode 100644 index 0000000000..ed70051ff3 --- /dev/null +++ b/linkerd.io/content/2.16/features/ipv6.md @@ -0,0 +1,14 @@ ++++ +title = "IPv6 Support" +description = "Linkerd is compatible with both IPv6-only and dual-stack clusters." ++++ + +As of version 2.16 (and edge-24.8.2) Linkerd fully supports Kubernetes clusters +configured for IPv6-only or dual-stack networking. + +This is disabled by default; to enable just set `proxy.disableIPv6=false` when +installing the control plane and, if you use it, the linkerd-cni plugin. + +Enabling IPv6 support does not generally change how Linkerd operates, except in +one way: when enabled on a dual-stack cluster, Linkerd will only use the IPv6 +endpoints of destinations and will not use the IPv4 endpoints. diff --git a/linkerd.io/content/2.16/features/load-balancing.md b/linkerd.io/content/2.16/features/load-balancing.md new file mode 100644 index 0000000000..5ec51ffac6 --- /dev/null +++ b/linkerd.io/content/2.16/features/load-balancing.md @@ -0,0 +1,37 @@ ++++ +title = "Load Balancing" +description = "Linkerd automatically load balances requests across all destination endpoints on HTTP, HTTP/2, and gRPC connections." +weight = 9 ++++ + +For HTTP, HTTP/2, and gRPC connections, Linkerd automatically load balances +requests across all destination endpoints without any configuration required. +(For TCP connections, Linkerd will balance connections.) + +Linkerd uses an algorithm called EWMA, or *exponentially weighted moving average*, +to automatically send requests to the fastest endpoints. This load balancing can +improve end-to-end latencies. + +## Service discovery + +For destinations that are not in Kubernetes, Linkerd will balance across +endpoints provided by DNS. + +For destinations that are in Kubernetes, Linkerd will look up the IP address in +the Kubernetes API. If the IP address corresponds to a Service, Linkerd will +load balance across the endpoints of that Service and apply any policy from that +Service's [Service Profile](../service-profiles/). On the other hand, +if the IP address corresponds to a Pod, Linkerd will not perform any load +balancing or apply any [Service Profiles](../service-profiles/). + +{{< note >}} +If working with headless services, endpoints of the service cannot be retrieved. +Therefore, Linkerd will not perform load balancing and instead route only to the +target IP address. +{{< /note >}} + +## Load balancing gRPC + +Linkerd's load balancing is particularly useful for gRPC (or HTTP/2) services +in Kubernetes, for which [Kubernetes's default load balancing is not +effective](https://kubernetes.io/blog/2018/11/07/grpc-load-balancing-on-kubernetes-without-tears/). diff --git a/linkerd.io/content/2.16/features/multicluster.md b/linkerd.io/content/2.16/features/multicluster.md new file mode 100644 index 0000000000..2714d8983a --- /dev/null +++ b/linkerd.io/content/2.16/features/multicluster.md @@ -0,0 +1,130 @@ ++++ +title = "Multi-cluster communication" +description = "Linkerd can transparently and securely connect services that are running in different clusters." +aliases = [ "multicluster_support" ] ++++ + +Linkerd can connect Kubernetes services across cluster boundaries in a way that +is secure, fully transparent to the application, and independent of network +topology. This multi-cluster capability is designed to provide: + +1. **A unified trust domain.** The identity of source and destination workloads + are validated at every step, both in and across cluster boundaries. +2. **Separate failure domains.** Failure of a cluster allows the remaining + clusters to function. +3. **Support for any type of network.** Linkerd does not require any specific + network topology between clusters, and can function both with hierarchical + networks as well as when clusters [share the same flat + network](#flat-networks). +4. **A unified model alongside in-cluster communication.** The same + observability, reliability, and security features that Linkerd provides for + in-cluster communication extend to cross-cluster communication. + +Just as with in-cluster connections, Linkerd’s cross-cluster connections are +transparent to the application code. Regardless of whether that communication +happens within a cluster, across clusters within a datacenter or VPC, or across +the public Internet, Linkerd will establish a connection between clusters +that's reliable, encrypted, and authenticated on both sides with mTLS. + +## How it works + +Linkerd's multi-cluster support works by "mirroring" service information between +clusters, using a *service mirror* component that watches a target cluster for +updates to services and applies those updates locally on the source cluster. + +These mirrored services are suffixed with the name of the remote cluster, e.g. +the *Foo* service on the *west* cluster would be mirrored as *Foo-west* on the +local cluster. This approach is typically combined with [traffic +splitting](../traffic-split/) or [dynamic request routing](../request-routing/) +to allow local services to access the *Foo* service as if it were on the local +cluster. + +Linkerd supports two basic forms of multi-cluster communication: hierarchical +and flat. + +{{< fig + alt="Architectural diagram comparing hierarchical and flat network modes" + src="/uploads/2023/07/flat_network@2x.png">}} + +### Hierarchical networks + +In hierarchical mode, Linkerd deploys a *gateway* component on the target +cluster that allows it to receive requests from source clusters. This approach +works on almost any network topology, as it only requires that the gateway IP of +the destination cluster be reachable by pods on the source cluster. + +### Flat networks + +As of Linkerd 2.14, Linkerd supports pod-to-pod communication for clusters that +share a flat network, where pods can establish TCP connections and send traffic +directly to each other across cluster boundaries. In these environments, Linkerd +does not use a gateway intermediary for data plane traffic, which provides +several advantages: + +* Improved latency by avoiding an additional network hop +* Reduced operational costs in cloud environments that require a + `LoadBalancer`-type service for the gateway +* Better multi-cluster authorization policies, as workload identity + is preserved across cluster boundaries. + +Hierarchical (gateway-based) and flat (direct pod-to-pod) modes can be combined, +and pod-to-pod mode can be enabled for specific services by using the +`remote-discovery` value for the label selector used to export services to other +clusters. See the [pod-to-pod multicluster +communication](../../tasks/pod-to-pod-multicluster/) guide and the +[multi-cluster reference](../../reference/multicluster/) for more. + +## Headless services + +[headless-svc]: https://kubernetes.io/docs/concepts/services-networking/service/#headless-services +[stateful-set]: https://kubernetes.io/docs/concepts/workloads/controllers/statefulset/ + +By default, Linkerd will mirror all exported services as Kubernetes `clusterIP` +services. This also extends to [headless services][headless-svc]; an exported +headless service will be mirrored as `clusterIP` and have an IP address +assigned to it. In general, headless services _should not have an IP address_, +they are used when a workloads needs a stable network identifier or to +facilitate service discovery without being tied to Kubernetes' native +implementation. This allows clients to either implement their own load +balancing or to address a pod directly through its DNS name. In certain +situations, it is desirable to preserve some of this functionality, especially +when working with Kubernetes objects that require it, such as +[StatefulSet][stateful-set]. + +Linkerd's multi-cluster extension can be configured with support for headless +services when linking two clusters together. When the feature is turned on, the +*service mirror* component will export headless services without assigning them +an IP. This allows clients to talk to specific pods (or hosts) across clusters. +To support direct communication, underneath the hood, the service mirror +component will create an *endpoint mirror* for each host that backs a headless +service. To exemplify, if in a target cluster there is a StatefulSet deployed +with two replicas, and the StatefulSet is backed by a headless service, when +the service will be exported, the source cluster will create a headless mirror +along with two "endpoint mirrors" representing the hosts in the StatefulSet. + +This approach allows Linkerd to preserve DNS record creation and support direct +communication to pods across clusters. Clients may also implement their own +load balancing based on the DNS records created by the headless service. +Hostnames are also preserved across clusters, meaning that the only difference +in the DNS name (or FQDN) is the headless service's mirror name. In order to be +exported as a headless service, the hosts backing the service need to be named +(e.g a StatefulSet is supported since all pods have a hostname, but a +Deployment would not be supported, since they do not allow for arbitrary +hostnames in the pod spec). + +Ready to get started? See the [getting started with multi-cluster +guide](../../tasks/multicluster/) for a walkthrough. + +## Further reading + +* [Multi-cluster installation instructions](../../tasks/installing-multicluster/). +* [Pod-to-pod multicluster communication](../../tasks/pod-to-pod-multicluster/) +* [Multi-cluster communication with StatefulSets](../../tasks/multicluster-using-statefulsets/). +* [Architecting for multi-cluster + Kubernetes](/2020/02/17/architecting-for-multicluster-kubernetes/), a blog + post explaining some of the design rationale behind Linkerd's multi-cluster + implementation. +* [Multi-cluster Kubernetes with service + mirroring](/2020/02/25/multicluster-kubernetes-with-service-mirroring/), a + deep dive of some of the architectural decisions behind Linkerd's + multi-cluster implementation. diff --git a/linkerd.io/content/2.16/features/nft.md b/linkerd.io/content/2.16/features/nft.md new file mode 100644 index 0000000000..beb0a29504 --- /dev/null +++ b/linkerd.io/content/2.16/features/nft.md @@ -0,0 +1,61 @@ ++++ +title = "Iptables-nft Support" +description = "Linkerd's init container can use iptables-nft on systems that require it." ++++ + +To transparently route TCP traffic through the proxy, without any awareness +from the application, Linkerd will configure a set of [firewall +rules](../../reference/iptables/) in each injected pod. Configuration can be +done either through an [init +container](../../reference/architecture/#linkerd-init-container) or through a +[CNI plugin](../cni/). + +Linkerd's init container can be run in two separate modes: `legacy` or `nft`. +The difference between the two modes is what variant of `iptables` they will use +to configure firewall rules. + +## Details + +Modes for the init container can be changed either at upgrade time, or during +installation. Once configured, all injected workloads (including the control +plane) will use the same mode in the init container. Both modes will use the +`iptables` utility to configure firewall rules; the main difference between the +two, is which binary they will call into: + +1. `legacy` mode will call into [`iptables-legacy`] for firewall configuration. + This is the default mode that `linkerd-init` runs in, and is supported by + most operating systems and distributions. +2. `nft` mode will call into `iptables-nft`, which uses the newer `nf_tables` + kernel API. The `nftables` utilities are used by newer operating systems to + configure firewalls by default. + +[`iptables-legacy`]: https://manpages.debian.org/bullseye/iptables/iptables-legacy.8.en.html +Conceptually, `iptables-nft` is a bridge between the legacy and the newer +`nftables` utilities. Under the hood, it uses a different backend, where rules +additions and deletions are atomic. The nft version of iptables uses the same +packet matching syntax (xtables) as its legacy counterpart. + +Because both utilities use the same syntax, it is recommended to run in +whatever mode your Kubernetes hosts support best. Certain operating systems +(e.g Google Container Optimized OS) do not offer support out-of-the-box for +`nftables` modules. Others (e.g RHEL family of operating systems) do not +support the legacy version of iptables. Linkerd's init container should be run +in `nft` mode only if the nodes support it and contain the relevant nftables +modules. + +{{< note >}} +Linkerd supports a `-w` (wait) option for its init container. Because +operations are atomic, and rulesets are not reloaded when modified (only +appended),this option is a no-op when running `linkerd-init` in nft mode. +{{< /note >}} + +## Installation + +The mode for `linkerd-init` can be overridden through the configuration option +`proxyInit.iptablesMode=iptables|nft`. The configuration option can be used for +both Helm and CLI installations (or upgrades). For example, the following line +will install Linkerd and set the init container mode to `nft`: + +```bash +linkerd install --set "proxyInit.iptablesMode=nft" | kubectl apply -f - +``` diff --git a/linkerd.io/content/2.16/features/non-kubernetes-workloads.md b/linkerd.io/content/2.16/features/non-kubernetes-workloads.md new file mode 100644 index 0000000000..315edb880b --- /dev/null +++ b/linkerd.io/content/2.16/features/non-kubernetes-workloads.md @@ -0,0 +1,16 @@ +--- +title: Non-Kubernetes workloads (mesh expansion) +--- + +Linkerd features *mesh expansion*, or the ability to add non-Kubernetes +workloads to your service mesh by deploying the Linkerd proxy to the remote +machine and connecting it back to the Linkerd control plane within the mesh. +This allows you to use Linkerd to establish communication to and from the +workload that is secure, reliable, and observable, just like communication to +and from your Kubernetes workloads. + +Related content: + +* [Guide: Adding non-Kubernetes workloads to your mesh]({{< relref + "../tasks/adding-non-kubernetes-workloads" >}}) +* [ExternalWorkload Reference]({{< relref "../reference/external-workload" >}}) diff --git a/linkerd.io/content/2.16/features/protocol-detection.md b/linkerd.io/content/2.16/features/protocol-detection.md new file mode 100644 index 0000000000..079f7d229d --- /dev/null +++ b/linkerd.io/content/2.16/features/protocol-detection.md @@ -0,0 +1,148 @@ ++++ +title = "TCP Proxying and Protocol Detection" +description = "Linkerd is capable of proxying all TCP traffic, including TLS'd connections, WebSockets, and HTTP tunneling." +weight = 2 +aliases = [ + "/2.11/supported-protocols/", + "../tasks/upgrading-2.10-ports-and-protocols/", +] ++++ + +Linkerd is capable of proxying all TCP traffic, including TLS connections, +WebSockets, and HTTP tunneling. + +In most cases, Linkerd can do this without configuration. To accomplish this, +Linkerd performs *protocol detection* to determine whether traffic is HTTP or +HTTP/2 (including gRPC). If Linkerd detects that a connection is HTTP or +HTTP/2, Linkerd automatically provides HTTP-level metrics and routing. + +If Linkerd *cannot* determine that a connection is using HTTP or HTTP/2, +Linkerd will proxy the connection as a plain TCP connection, applying +[mTLS](../automatic-mtls/) and providing byte-level metrics as usual. + +(Note that HTTPS calls to or from meshed pods are treated as TCP, not as HTTP. +Because the client initiates the TLS connection, Linkerd is not be able to +decrypt the connection to observe the HTTP transactions.) + +## Configuring protocol detection + +{{< note >}} +If your proxy logs contain messages like `protocol detection timed out +after 10s`, or you are experiencing 10-second delays when establishing +connections, you are likely running into a protocol detection timeout. +This section will help you understand how to fix this. +{{< /note >}} + +In some cases, Linkerd's protocol detection will time out because it doesn't see +any bytes from the client. This situation is commonly encountered when using +protocols where the server sends data before the client does (such as SMTP) or +protocols that proactively establish connections without sending data (such as +Memcache). In this case, the connection will proceed as a TCP connection after a +10-second protocol detection delay. + +To avoid this delay, you will need to provide some configuration for Linkerd. +There are two basic mechanisms for configuring protocol detection: _opaque +ports_ and _skip ports_: + +* Opaque ports instruct Linkerd to skip protocol detection and proxy the + connection as a TCP stream +* Skip ports bypass the proxy entirely. + +Opaque ports are generally preferred as they allow Linkerd to provide mTLS, +TCP-level metrics, policy, etc. Skip ports circumvent Linkerd's ability to +provide security features. + +Linkerd maintains a default list of opaque ports that corresponds to the +standard ports used by protocols that interact poorly with protocol detection. +As of the 2.12 release, that list is: **25** (SMTP), **587** (SMTP), **3306** +(MySQL), **4444** (Galera), **5432** (Postgres), **6379** (Redis), **9300** +(ElasticSearch), and **11211** (Memcache). + +## Protocols that may require configuration + +The following table contains common protocols that may require additional +configuration. + +| Protocol | Standard port(s) | In default list? | Notes | +|-----------------|------------------|------------------|-------| +| SMTP | 25, 587 | Yes | | +| MySQL | 3306 | Yes | | +| MySQL with Galera | 3306, 4444, 4567, 4568 | Partially | Ports 4567 and 4568 are not in Linkerd's default set of opaque ports | +| PostgreSQL | 5432 | Yes | | +| Redis | 6379 | Yes | | +| ElasticSearch | 9300 | Yes | | +| Memcache | 11211 | Yes | | + +If you are using one of those protocols, follow this decision tree to determine +which configuration you need to apply. + +![Decision tree](/images/protocol-detection-decision-tree.png) + +## Marking ports as opaque + +You can use the `config.linkerd.io/opaque-ports` annotation to mark a port as +opaque. Note that this annotation should be set on the _destination_, not on the +source, of the traffic. + +This annotation can be set in a variety of ways: + +1. On the workload itself, e.g. on the Deployment's Pod spec receiving the traffic. +1. On the Service receiving the traffic. +1. On a namespace (in which it will apply to all workloads in the namespace). +1. In an [authorization policy](../server-policy/) `Server` object's + `proxyProtocol` field, in which case it will apply to all pods targeted by that + `Server`. + +When set, Linkerd will skip protocol detection both on the client side and on +the server side. Note that since this annotation informs the behavior of meshed +_clients_, it can be applied to unmeshed workloads as well as meshed ones. + +{{< note >}} +Multiple ports can be provided as a comma-delimited string. The values you +provide will _replace_, not augment, the default list of opaque ports. +{{< /note >}} + +## Marking ports as skip ports + +Sometimes it is necessary to bypass the proxy altogether. In this case, you can +use the `config.linkerd.io/skip-outbound-ports` annotation to bypass the proxy +entirely when sending to those ports. (Note that there is a related annotation, +`skip-inbound-ports`, to bypass the proxy for incoming connections. This is +typically only needed for debugging purposes.) + +As with opaque ports, multiple skip-ports can be provided as a comma-delimited +string. + +This annotation should be set on the source of the traffic. + +## Setting the enable-external-profiles annotation + +The `config.linkerd.io/enable-external-profiles` annotation configures Linkerd +to look for [`ServiceProfiles`](../service-profiles/) for off-cluster +connections. It *also* instructs Linkerd to respect the default set of opaque +ports for this connection. + +This annotation should be set on the source of the traffic. + +Note that the default set of opaque ports can be configured at install +time, e.g. by using `--set proxy.opaquePorts`. This may be helpful in +conjunction with `enable-external-profiles`. + +{{< note >}} +There was a bug in Linkerd 2.11.0 and 2.11.1 that prevented the opaque ports +behavior of `enable-external-profiles` from working. This was fixed in Linkerd +2.11.2. +{{< /note >}} + +## Using `NetworkPolicy` resources with opaque ports + +When a service has a port marked as opaque, any `NetworkPolicy` resources that +apply to the respective port and restrict ingress access will have to be +changed to target the proxy's inbound port instead (by default, `4143`). If the +service has a mix of opaque and non-opaque ports, then the `NetworkPolicy` +should target both the non-opaque ports, and the proxy's inbound port. + +A connection that targets an opaque endpoint (i.e a pod with a port marked as +opaque) will have its original target port replaced with the proxy's inbound +port. Once the inbound proxy receives the traffic, it will transparently +forward it to the main application container over a TCP connection. diff --git a/linkerd.io/content/2.16/features/proxy-injection.md b/linkerd.io/content/2.16/features/proxy-injection.md new file mode 100644 index 0000000000..954f2104d2 --- /dev/null +++ b/linkerd.io/content/2.16/features/proxy-injection.md @@ -0,0 +1,74 @@ ++++ +title = "Automatic Proxy Injection" +description = "Linkerd will automatically inject the data plane proxy into your pods based annotations." +aliases = [ + "../proxy-injection/" +] ++++ + +Linkerd automatically adds the data plane proxy to pods when the +`linkerd.io/inject: enabled` annotation is present on a namespace or any +workloads, such as deployments or pods. This is known as "proxy injection". + +See [Adding Your Service](../../tasks/adding-your-service/) for a walkthrough of +how to use this feature in practice. + +{{< note >}} +Proxy injection is also where proxy *configuration* happens. While it's rarely +necessary, you can configure proxy settings by setting additional Kubernetes +annotations at the resource level prior to injection. See the [full list of +proxy configuration options](../../reference/proxy-configuration/). +{{< /note >}} + +## Details + +Proxy injection is implemented as a [Kubernetes admission +webhook](https://kubernetes.io/docs/reference/access-authn-authz/extensible-admission-controllers/#admission-webhooks). +This means that the proxies are added to pods within the Kubernetes cluster +itself, regardless of whether the pods are created by `kubectl`, a CI/CD +system, or any other system. + +For each pod, two containers are injected: + +1. `linkerd-init`, a Kubernetes [Init + Container](https://kubernetes.io/docs/concepts/workloads/pods/init-containers/) + that configures `iptables` to automatically forward all incoming and + outgoing TCP traffic through the proxy. (Note that this container is not + injected if the [Linkerd CNI Plugin](../cni/) has been enabled.) +1. `linkerd-proxy`, the Linkerd data plane proxy itself. + +Note that simply adding the annotation to a resource with pre-existing pods +will not automatically inject those pods. You will need to update the pods +(e.g. with `kubectl rollout restart` etc.) for them to be injected. This is +because Kubernetes does not call the webhook until it needs to update the +underlying resources. + +## Exclusions + +At install time, Kubernetes is configured to avoid calling Linkerd's proxy +injector for resources in the `kube-system` and `cert-manager` namespaces. This +is to prevent injection on components that are themselves required for Linkerd +to function. + +The injector will not run on components in these namespaces, regardless of any +`linkerd.io/inject` annotations. + +## Overriding injection + +Automatic injection can be disabled for a pod or deployment for which it would +otherwise be enabled, by adding the `linkerd.io/inject: disabled` annotation. + +## Manual injection + +The [`linkerd inject`](../../reference/cli/inject/) CLI command is a text +transform that, by default, simply adds the inject annotation to a given +Kubernetes manifest. + +Alternatively, this command can also perform the full injection purely on the +client side with the `--manual` flag. This was the default behavior prior to +Linkerd 2.4; however, having injection to the cluster side makes it easier to +ensure that the data plane is always present and configured correctly, +regardless of how pods are deployed. + +See the [`linkerd inject` reference](../../reference/cli/inject/) for more +information. diff --git a/linkerd.io/content/2.16/features/request-routing.md b/linkerd.io/content/2.16/features/request-routing.md new file mode 100644 index 0000000000..6daef28b0f --- /dev/null +++ b/linkerd.io/content/2.16/features/request-routing.md @@ -0,0 +1,24 @@ ++++ +title = "Dynamic Request Routing" +description = "Linkerd can route individual HTTP requests based on their properties." ++++ + +Linkerd's dynamic request routing allows you to control routing of HTTP and gRPC +traffic based on properties of the request, including verb, method, query +parameters, and headers. For example, you can route all requests that match +a specific URL pattern to a given backend; or you can route traffic with a +particular header to a different service. + +This is an example of _client-side policy_, i.e. ways to dynamically configure +Linkerd's behavior when it is sending requests from a meshed pod. + +Dynamic request routing is built on Kubernetes's Gateway API types, especially +[HTTPRoute](https://gateway-api.sigs.k8s.io/api-types/httproute/). + +This feature extends Linkerd's traffic routing capabilities beyond those of +[traffic splits](../traffic-split/), which only provide percentage-based +splits. + +## Learning more + +- [Guide to configuring routing policy](../../tasks/configuring-dynamic-request-routing/) diff --git a/linkerd.io/content/2.16/features/retries-and-timeouts.md b/linkerd.io/content/2.16/features/retries-and-timeouts.md new file mode 100644 index 0000000000..5786e60883 --- /dev/null +++ b/linkerd.io/content/2.16/features/retries-and-timeouts.md @@ -0,0 +1,27 @@ ++++ +title = "Retries and Timeouts" +description = "Linkerd can perform service-specific retries and timeouts." +weight = 3 ++++ + +Timeouts and automatic retries are two of the most powerful and useful +mechanisms a service mesh has for gracefully handling partial or transient +application failures. + +Timeouts and retries can be configured using [HTTPRoute], GRPCRoute, or Service +resources. Retries and timeouts are always performed on the *outbound* (client) +side. + +{{< note >}} +If working with headless services, outbound policy cannot be retrieved. Linkerd +reads service discovery information based off the target IP address, and if that +happens to be a pod IP address then it cannot tell which service the pod belongs +to. +{{< /note >}} + +These can be setup by following the guides: + +- [Configuring Retries](../../tasks/configuring-retries/) +- [Configuring Timeouts](../../tasks/configuring-timeouts/) + +[HTTPRoute]: ../httproute/ diff --git a/linkerd.io/content/2.16/features/server-policy.md b/linkerd.io/content/2.16/features/server-policy.md new file mode 100644 index 0000000000..2843f8940f --- /dev/null +++ b/linkerd.io/content/2.16/features/server-policy.md @@ -0,0 +1,169 @@ ++++ +title = "Authorization Policy" +description = "Linkerd can restrict which types of traffic are allowed between meshed services." ++++ + +Linkerd's authorization policy allows you to control which types of +traffic are allowed to meshed pods. For example, you can restrict communication +to a particular service (or HTTP route on a service) to only come from certain +other services; you can enforce that mTLS must be used on a certain port; and so +on. + +{{< note >}} +Linkerd can only enforce policy on meshed pods, i.e. pods where the Linkerd +proxy has been injected. If policy is a strict requirement, you should pair the +usage of these features with [HA mode](../ha/), which enforces that the proxy +*must* be present when pods start up. +{{< /note >}} + +## Policy overview + +By default Linkerd allows all traffic to transit the mesh, and uses a variety +of mechanisms, including [retries](../retries-and-timeouts/) and [load +balancing](../load-balancing/), to ensure that requests are delivered +successfully. + +Sometimes, however, we want to restrict which types of traffic are allowed. +Linkerd's policy features allow you to *deny* access to resources unless certain +conditions are met, including the TLS identity of the client. + +Linkerd's policy is configured using two mechanisms: + +1. A set of _default policies_, which can be set at the cluster, + namespace, workload, and pod level through Kubernetes annotations. +2. A set of CRDs that specify fine-grained policy for specific ports, routes, + workloads, etc. + +These mechanisms work in conjunction. For example, a default cluster-wide policy +of `deny` would prohibit any traffic to any meshed pod; traffic would then need +to be explicitly allowed through the use of CRDs. + +## Default policies + +The `config.linkerd.io/default-inbound-policy` annotation can be set at a +namespace, workload, and pod level, and will determine the default traffic +policy at that point in the hierarchy. Valid default policies include: + +- `all-unauthenticated`: allow all requests. This is the default. +- `all-authenticated`: allow requests from meshed clients only. +- `cluster-authenticated`: allow requests from meshed clients in the same + cluster. +- `deny`: deny all requests. +- `audit`: Same as `all-unauthenticated` but requests get flagged in logs and + metrics. + +As well as several other default policies—see the [Policy +reference](../../reference/authorization-policy/) for more. + +Every cluster has a cluster-wide default policy (by default, +`all-unauthenticated`), set at install time. Annotations that are present at the +workload or namespace level *at pod creation time* can override that value to +determine the default policy for that pod. (Note that the default policy is fixed +at proxy initialization time, and thus, after a pod is created, changing the +annotation will not change the default policy for that pod.) + +## Fine-grained policies + +For finer-grained policy that applies to specific ports, routes, or more, +Linkerd uses a set of CRDs. In contrast to default policy annotations, these +policy CRDs can be changed dynamically and policy behavior will be updated on +the fly. + +Two policy CRDs represent "targets" for policy: subsets of traffic over which +policy can be applied. + +- [`Server`]: all traffic to a port, for a set of pods in a namespace +- [`HTTPRoute`]: a subset of HTTP requests for a [`Server`] + +Two policy CRDs represent authentication rules that must be satisfied as part of +a policy rule: + +- `MeshTLSAuthentication`: authentication based on [secure workload + identities](../automatic-mtls/) +- `NetworkAuthentication`: authentication based on IP address + +And finally, two policy CRDs represent policy itself: the mapping of +authentication rules to targets. + +- `AuthorizationPolicy`: a policy that restricts access to one or more targets + unless an authentication rule is met + +- `ServerAuthorization`: an earlier form of policy that restricts access to + [`Server`]s only (i.e. not [`HTTPRoute`]s) + +The general pattern for Linkerd's dynamic, fine-grained policy is to define the +traffic target that must be protected (via a combination of `Server` and +[`HTTPRoute`] CRs); define the types of authentication that are required before +access to that traffic is permitted (via `MeshTLSAuthentication` and +`NetworkAuthentication`); and then define the policy that maps authentication to +target (via an `AuthorizationPolicy`). + +See the [Policy reference](../../reference/authorization-policy/) for more +details on how these resources work. + +## ServerAuthorization vs AuthorizationPolicy + +Linkerd 2.12 introduced `AuthorizationPolicy` as a more flexible alternative to +`ServerAuthorization` that can target [`HTTPRoute`]s as well as `Server`s. Use of +`AuthorizationPolicy` is preferred, and `ServerAuthorization` will be deprecated +in future releases. + +## Default authorizations + +A blanket denial of all to a pod would also deny health and readiness probes +from Kubernetes, meaning that the pod would not be able to start. Thus, any +default-deny setup must, in practice, still authorize these probes. + +In order to simplify default-deny setups, Linkerd automatically authorizes +probes to pods. These default authorizations apply only when no [`Server`] is +configured for a port, or when a [`Server`] is configured but no [`HTTPRoute`]s are +configured for that [`Server`]. If any [`HTTPRoute`] matches the `Server`, these +automatic authorizations are not created and you must explicitly create them for +health and readiness probes. + +## Policy rejections + +Any traffic that is known to be HTTP (including HTTP/2 and gRPC) that is denied +by policy will result in the proxy returning an HTTP 403. All other traffic will +be denied at the TCP level, i.e. by refusing the connection. + +Note that dynamically changing the policy to deny existing connections may +result in an abrupt termination of those connections. + +## Audit mode + +A [`Server`]'s default policy is defined in its `accessPolicy` field, which +defaults to `deny`. That means that, by default, traffic that doesn't conform to +the rules associated to that Server is denied (the same applies to `Servers` +that don't have associated rules yet). This can inadvertently prevent traffic if +you apply rules that don't account for all the possible sources/routes for your +services. + +This is why we recommend that when first setting authorization policies, you +explicitly set `accessPolicy:audit` for complex-enough services. In this mode, +if a request doesn't abide to the policy rules, it won't get blocked, but it +will generate a log entry in the proxy at the INFO level with the tag +`authz.name=audit` along with other useful information. Likewise, the proxy will +add entries to metrics like `request_total` with the label `authz_name=audit`. +So when you're in the process of fine-tuning a new authorization policy, you can +filter by those tags/labels in your observability stack to keep an eye on +requests which weren't caught by the policy. + +### Audit mode for default policies + +Audit mode is also supported at cluster, namespace, or workload level. To set +the whole cluster to audit mode, set `proxy.defaultInboundPolicy=audit` when +installing Linkerd; for a namespace or a workload, use the annotation +`config.linkerd.io/default-inbound-policy:audit`. For example, if you had +`config.linkerd.io/default-inbound-policy:all_authenticated` for a namespace and +no `Servers` declared, all unmeshed traffic would be denied. By using +`config.linkerd.io/default-inbound-policy:audit` instead, unmeshed traffic would +be allowed but it would be logged and surfaced in metrics as detailed above. + +## Learning more + +- [Authorization policy reference](../../reference/authorization-policy/) +- [Guide to configuring per-route policy](../../tasks/configuring-per-route-policy/) + +[`HTTPRoute`]: ../httproute/ +[`Server`]: ../../reference/authorization-policy/#server diff --git a/linkerd.io/content/2.16/features/service-profiles.md b/linkerd.io/content/2.16/features/service-profiles.md new file mode 100644 index 0000000000..e27a540d74 --- /dev/null +++ b/linkerd.io/content/2.16/features/service-profiles.md @@ -0,0 +1,35 @@ ++++ +title = "Service Profiles" +description = "Linkerd's service profiles enable per-route metrics as well as retries and timeouts." +aliases = [ + "../service-profiles/" +] ++++ + +{{< note >}} +[HTTPRoutes](../httproute/) are the recommended method for getting per-route +metrics, specifying timeouts, and specifying retries. Service profiles continue +to be supported for backwards compatibility. +{{< /note >}} + +A service profile is a custom Kubernetes resource ([CRD][crd]) that can provide +Linkerd additional information about a service. In particular, it allows you to +define a list of routes for the service. Each route uses a regular expression +to define which paths should match that route. Defining a service profile +enables Linkerd to report per-route metrics and also allows you to enable +per-route features such as retries and timeouts. + +{{< note >}} +If working with headless services, service profiles cannot be retrieved. Linkerd +reads service discovery information based off the target IP address, and if that +happens to be a pod IP address then it cannot tell which service the pod belongs +to. +{{< /note >}} + +To get started with service profiles you can: + +- Look into [setting up service profiles](../../tasks/setting-up-service-profiles/) + for your own services. +- Glance at the [reference](../../reference/service-profiles/) documentation. + +[crd]: https://kubernetes.io/docs/concepts/extend-kubernetes/api-extension/custom-resources/ diff --git a/linkerd.io/content/2.16/features/telemetry.md b/linkerd.io/content/2.16/features/telemetry.md new file mode 100644 index 0000000000..0cf9059287 --- /dev/null +++ b/linkerd.io/content/2.16/features/telemetry.md @@ -0,0 +1,79 @@ ++++ +title = "Telemetry and Monitoring" +description = "Linkerd automatically collects metrics from all services that send traffic through it." +weight = 8 +aliases = [ + "../observability/" +] ++++ + +One of Linkerd's most powerful features is its extensive set of tooling around +*observability*—the measuring and reporting of observed behavior in +meshed applications. While Linkerd doesn't have insight directly into the +*internals* of service code, it has tremendous insight into the external +behavior of service code. + +To gain access to Linkerd's observability features you only need to install the +Viz extension: + +```bash +linkerd viz install | kubectl apply -f - +``` + +Linkerd's telemetry and monitoring features function automatically, without +requiring any work on the part of the developer. These features include: + +* Recording of top-line ("golden") metrics (request volume, success rate, and + latency distributions) for HTTP, HTTP/2, and gRPC traffic. +* Recording of TCP-level metrics (bytes in/out, etc) for other TCP traffic. +* Reporting metrics per service, per caller/callee pair, or per route/path + (with [Service Profiles](../service-profiles/)). +* Generating topology graphs that display the runtime relationship between + services. +* Live, on-demand request sampling. + +This data can be consumed in several ways: + +* Through the [Linkerd CLI](../../reference/cli/), e.g. with `linkerd viz stat` and + `linkerd viz routes`. +* Through the [Linkerd dashboard](../dashboard/), and + [pre-built Grafana dashboards](../../tasks/grafana/). +* Directly from Linkerd's built-in Prometheus instance + +## Golden metrics + +### Success Rate + +This is the percentage of successful requests during a time window (1 minute by +default). + +In the output of the command `linkerd viz routes -o wide`, this metric is split +into EFFECTIVE_SUCCESS and ACTUAL_SUCCESS. For routes configured with retries, +the former calculates the percentage of success after retries (as perceived by +the client-side), and the latter before retries (which can expose potential +problems with the service). + +### Traffic (Requests Per Second) + +This gives an overview of how much demand is placed on the service/route. As +with success rates, `linkerd viz routes --o wide` splits this metric into +EFFECTIVE_RPS and ACTUAL_RPS, corresponding to rates after and before retries +respectively. + +### Latencies + +Times taken to service requests per service/route are split into 50th, 95th and +99th percentiles. Lower percentiles give you an overview of the average +performance of the system, while tail percentiles help catch outlier behavior. + +## Lifespan of Linkerd metrics + +Linkerd is not designed as a long-term historical metrics store. While +Linkerd's Viz extension does include a Prometheus instance, this instance +expires metrics at a short, fixed interval (currently 6 hours). + +Rather, Linkerd is designed to *supplement* your existing metrics store. If +Linkerd's metrics are valuable, you should export them into your existing +historical metrics store. + +See [Exporting Metrics](../../tasks/exporting-metrics/) for more. diff --git a/linkerd.io/content/2.16/features/traffic-split.md b/linkerd.io/content/2.16/features/traffic-split.md new file mode 100644 index 0000000000..725bbce8e3 --- /dev/null +++ b/linkerd.io/content/2.16/features/traffic-split.md @@ -0,0 +1,44 @@ ++++ +title = "Traffic Split (canaries, blue/green deploys)" +description = "Linkerd can dynamically send a portion of traffic to different services." ++++ + +Linkerd's traffic split functionality allows you to dynamically shift arbitrary +portions of traffic destined for a Kubernetes service to a different destination +service. This feature can be used to implement sophisticated rollout strategies +such as [canary deployments](https://martinfowler.com/bliki/CanaryRelease.html) +and +[blue/green deployments](https://martinfowler.com/bliki/BlueGreenDeployment.html), +for example, by slowly easing traffic off of an older version of a service and +onto a newer version. + +{{< note >}} +This feature will eventually be supplanted by the newer [dynamic request +routing](../request-routing/) capabilities, which does not require the SMI +extension. +{{< /note >}} + +{{< note >}} +TrafficSplits cannot be used with headless services. Linkerd reads +service discovery information based off the target IP address, and if that +happens to be a pod IP address then it cannot tell which service the pod belongs +to. +{{< /note >}} + +Linkerd exposes this functionality via the +[Service Mesh Interface](https://smi-spec.io/) (SMI) +[TrafficSplit API](https://github.com/servicemeshinterface/smi-spec/tree/master/apis/traffic-split). +To use this feature, you create a Kubernetes resource as described in the +TrafficSplit spec, and Linkerd takes care of the rest. You can see step by step +documentation on our +[Getting started with Linkerd SMI extension](../../tasks/linkerd-smi/) page. + +By combining traffic splitting with Linkerd's metrics, it is possible to +accomplish even more powerful deployment techniques that automatically take into +account the success rate and latency of old and new versions. See the +[Flagger](https://flagger.app/) project for one example of this. + +Check out some examples of what you can do with traffic splitting: + +- [Canary Releases](../../tasks/canary-release/) +- [Fault Injection](../../tasks/fault-injection/) diff --git a/linkerd.io/content/2.16/getting-started/_index.md b/linkerd.io/content/2.16/getting-started/_index.md new file mode 100644 index 0000000000..3adedd5f40 --- /dev/null +++ b/linkerd.io/content/2.16/getting-started/_index.md @@ -0,0 +1,279 @@ ++++ +title = "Getting Started" +aliases = [ + "/getting-started/istio/", + "/choose-your-platform/", + "/../katacoda/", + "/doc/getting-started", + "/getting-started" +] +weight = 2 +[sitemap] + priority = 1.0 ++++ + +Welcome to Linkerd! 🎈 + +In this guide, we'll walk you through how to install Linkerd into your +Kubernetes cluster. Then we'll deploy a sample application to show off what +Linkerd can do. + +This guide is designed to walk you through the basics of Linkerd. First, you'll +install the *CLI* (command-line interface) onto your local machine. Using this +CLI, you'll then install the *control plane* onto your Kubernetes cluster. +Finally, you'll "mesh" an application by adding Linkerd's *data plane* to it. + +{{< releases >}} + +## Step 0: Setup + +Before anything else, we need to ensure you have access to modern Kubernetes +cluster and a functioning `kubectl` command on your local machine. (If you +don't already have a Kubernetes cluster, one easy option is to run one on your +local machine. There are many ways to do this, including +[kind](https://kind.sigs.k8s.io/), [k3d](https://k3d.io/), [Docker for +Desktop](https://www.docker.com/products/docker-desktop), [and +more](https://kubernetes.io/docs/setup/).) + +Validate your Kubernetes setup by running: + +```bash +kubectl version +``` + +You should see output with both a `Client Version` and `Server Version` +component. + +Now that we have our cluster, we'll install the Linkerd CLI and use it validate +that your cluster is capable of hosting Linkerd. + +{{< note >}} +If you're using a GKE "private cluster", or if you're using Cilium as a CNI, +there may be some [cluster-specific +configuration](../reference/cluster-configuration/) before you can proceed to +the next step. +{{< /note >}} + +## Step 1: Install the CLI + +If this is your first time running Linkerd, you will need to download the +`linkerd` CLI onto your local machine. The CLI will allow you to interact with +your Linkerd deployment. + +To install the CLI manually, run: + +```bash +curl --proto '=https' --tlsv1.2 -sSfL https://run.linkerd.io/install-edge | sh +``` + +Be sure to follow the instructions to add it to your path: + +```bash +export PATH=$HOME/.linkerd2/bin:$PATH +``` + +This will install the CLI for the latest _edge release_ of Linkerd. (For more +information about what edge releases are, see our [Releases and +Versions](../../releases/) page.) + +Once installed, verify the CLI is running correctly with: + +```bash +linkerd version +``` + +You should see the CLI version, and also `Server version: unavailable`. This is +because you haven't installed the control plane on your cluster. Don't +worry—we'll fix that soon enough. + +Make sure that your Linkerd version and Kubernetes version are compatible by +checking Linkerd's [supported Kubernetes +versions](../reference/k8s-versions/). + +## Step 2: Validate your Kubernetes cluster + +Kubernetes clusters can be configured in many different ways. Before we can +install the Linkerd control plane, we need to check and validate that +everything is configured correctly. To check that your cluster is ready to +install Linkerd, run: + +```bash +linkerd check --pre +``` + +If there are any checks that do not pass, make sure to follow the provided links +and fix those issues before proceeding. + +## Step 3: Install Linkerd onto your cluster + +Now that you have the CLI running locally and a cluster that is ready to go, +it's time to install Linkerd on your Kubernetes cluster. To do this, run: + +```bash +linkerd install --crds | kubectl apply -f - +``` + +followed by: + +```bash +linkerd install | kubectl apply -f - +``` + +These commands generate Kubernetes manifests with all the core resources required +for Linkerd (feel free to inspect this output if you're curious). Piping these +manifests into `kubectl apply` then instructs Kubernetes to add those resources +to your cluster. The `install --crds` command installs Linkerd's Custom Resource +Definitions (CRDs), which must be installed first, while the `install` command +installs the Linkerd control plane. + +{{< note >}} +The CLI-based install presented here is quick and easy, but there are a variety +of other ways to install Linkerd, including by [using Helm +charts](../tasks/install-helm/) or by using a marketplace install from your +Kubernetes provider. +{{< /note >}} + +Depending on the speed of your cluster's Internet connection, it may take a +minute or two for the control plane to finish installing. Wait for the control +plane to be ready (and verify your installation) by running: + +```bash +linkerd check +``` + +## Step 4: Install the demo app + +Congratulations, Linkerd is installed! However, it's not doing anything just +yet. To see Linkerd in action, we're going to need an application. + +Let's install a demo application called *Emojivoto*. Emojivoto is a simple +standalone Kubernetes application that uses a mix of gRPC and HTTP calls to +allow the user to vote on their favorite emojis. + +Install Emojivoto into the `emojivoto` namespace by running: + +```bash +curl --proto '=https' --tlsv1.2 -sSfL https://run.linkerd.io/emojivoto.yml \ + | kubectl apply -f - +``` + +This command installs Emojivoto onto your cluster, but Linkerd hasn't been +activated on it yet—we'll need to "mesh" the application before Linkerd can +work its magic. + +Before we mesh it, let's take a look at Emojivoto in its natural state. +We'll do this by forwarding traffic to its `web-svc` service so that we can +point our browser to it. Forward `web-svc` locally to port 8080 by running: + +```bash +kubectl -n emojivoto port-forward svc/web-svc 8080:80 +``` + +Now visit [http://localhost:8080](http://localhost:8080). Voila! You should see +Emojivoto in all its glory. + +If you click around Emojivoto, you might notice that it's a little broken! For +example, if you try to vote for the **donut** emoji, you'll get a 404 page. +Don't worry, these errors are intentional. (In a later guide, we'll show you +how to [use Linkerd to identify the problem](../debugging-an-app/).) + +With Emoji installed and running, we're ready to *mesh* it—that is, to add +Linkerd's data plane proxies to it. We can do this on a live application +without downtime, thanks to Kubernetes's rolling deploys. Mesh your Emojivoto +application by running: + +```bash +kubectl get -n emojivoto deploy -o yaml \ + | linkerd inject - \ + | kubectl apply -f - +``` + +This command retrieves all of the deployments running in the `emojivoto` +namespace, runs their manifests through `linkerd inject`, and then reapplies it +to the cluster. (The `linkerd inject` command simply adds annotations to the +pod spec that instruct Linkerd to inject the proxy into the pods when they +are created.) + +As with `install`, `inject` is a pure text operation, meaning that you can +inspect the input and output before you use it. Once piped into `kubectl +apply`, Kubernetes will execute a rolling deploy and update each pod with the +data plane's proxies. + +Congratulations! You've now added Linkerd to an application! Just as with the +control plane, it's possible to verify that everything is working the way it +should on the data plane side. Check your data plane with: + +```bash +linkerd -n emojivoto check --proxy +``` + +And, of course, you can visit [http://localhost:8080](http://localhost:8080) +and once again see Emojivoto in all its meshed glory. + +## Step 5: Explore Linkerd! + +Perhaps that last step was a little unsatisfying. We've added Linkerd to +Emojivoto, but there are no visible changes to the application! That is part +of Linkerd's design—it does its best not to interfere with a functioning +application. + +Let's take a closer look at what Linkerd is actually doing. To do this, +we'll need to install an *extension*. Linkerd's core control plane is extremely +minimal, so Linkerd ships with extensions that add non-critical but often +useful functionality to Linkerd, including a variety of dashboards. + +Let's install the **viz** extension, which will install an on-cluster metric +stack and dashboard. + +To install the viz extension, run: + +```bash +linkerd viz install | kubectl apply -f - # install the on-cluster metrics stack +``` + +Once you've installed the extension, let's validate everything one last time: + +```bash +linkerd check +``` + +With the control plane and extensions installed and running, we're now ready +to explore Linkerd! Access the dashboard with: + +```bash +linkerd viz dashboard & +``` + +You should see a screen like this: + +{{< fig src="/images/getting-started/viz-empty-dashboard.png" + title="The Linkerd dashboard in action" >}} + +Click around, explore, and have fun! For extra credit, see if you can find the +live metrics for each Emojivoto component, and determine which one has a partial +failure. (See the debugging tutorial below for much more on this.) + +## That's it! 👏 + +Congratulations, you have joined the exalted ranks of Linkerd users! +Give yourself a pat on the back. + +What's next? Here are some steps we recommend: + +* Learn how to use Linkerd to [debug the errors in + Emojivoto](../debugging-an-app/). +* Learn how to [add your own services](../adding-your-service/) to + Linkerd without downtime. +* Learn how to install other [Linkerd extensions](../tasks/extensions/) such as + Jaeger and the multicluster extension. +* Learn more about [Linkerd's architecture](../reference/architecture/) +* Learn how to set up [automatic control plane mTLS credential + rotation](../tasks/automatically-rotating-control-plane-tls-credentials/) for + long-lived clusters. +* Learn how to [restrict access to services using authorization + policy](../tasks/restricting-access/). +* Hop into the `#linkerd` channel on [the Linkerd + Slack](https://slack.linkerd.io) + and say hi! + +Above all else: welcome to the Linkerd community! diff --git a/linkerd.io/content/2.16/overview/_index.md b/linkerd.io/content/2.16/overview/_index.md new file mode 100644 index 0000000000..b8d3b6e243 --- /dev/null +++ b/linkerd.io/content/2.16/overview/_index.md @@ -0,0 +1,64 @@ ++++ +title = "Overview" +aliases = [ + "/docs", + "/documentation", + "/2-edge/", + "../docs/", + "/doc/network-performance/", + "/in-depth/network-performance/", + "/in-depth/debugging-guide/", + "/in-depth/concepts/" +] +weight = 1 ++++ + +Linkerd is a _service mesh_ for Kubernetes. It makes running services easier +and safer by giving you runtime debugging, observability, reliability, and +security—all without requiring any changes to your code. + +For a brief introduction to the service mesh model, we recommend reading [The +Service Mesh: What Every Software Engineer Needs to Know about the World's Most +Over-Hyped Technology](https://servicemesh.io/). + +Linkerd is fully open source, licensed under [Apache +v2](https://github.com/linkerd/linkerd2/blob/main/LICENSE), and is a [Cloud +Native Computing Foundation](https://cncf.io) graduated project. Linkerd is +developed in the open in the [Linkerd GitHub organization](https://github.com/linkerd). + +Linkerd has two basic components: a *control plane* and a *data plane*. Once +Linkerd's control plane has been installed on your Kubernetes cluster, you add +the data plane to your workloads (called "meshing" or "injecting" your +workloads) and voila! Service mesh magic happens. + +You can [get started with Linkerd](../getting-started/) in minutes! + +## How it works + +Linkerd works by installing a set of ultralight, transparent "micro-proxies" +next to each service instance. These proxies automatically handle all traffic to +and from the service. Because they're transparent, these proxies act as highly +instrumented out-of-process network stacks, sending telemetry to, and receiving +control signals from, the control plane. This design allows Linkerd to measure +and manipulate traffic to and from your service without introducing excessive +latency. + +In order to be as small, lightweight, and safe as possible, Linkerd's +micro-proxies are written in [Rust](https://www.rust-lang.org/) and specialized +for Linkerd. You can learn more about the these micro-proxies in our blog post, +[Under the hood of Linkerd's state-of-the-art Rust proxy, +Linkerd2-proxy](/2020/07/23/under-the-hood-of-linkerds-state-of-the-art-rust-proxy-linkerd2-proxy/), +(If you want to know why Linkerd doesn't use Envoy, you can learn why in our blog +post, [Why Linkerd doesn't use +Envoy](/2020/12/03/why-linkerd-doesnt-use-envoy/).) + +## Getting Linkerd + +Linkerd is available in a variety of packages and channels. See the [Linkerd +Releases](/releases/) page for details. + +## Next steps + +[Get started with Linkerd](../getting-started/) in minutes, or check out the +[architecture](../reference/architecture/) for more details on Linkerd's +components and how they all fit together. diff --git a/linkerd.io/content/2.16/reference/_index.md b/linkerd.io/content/2.16/reference/_index.md new file mode 100644 index 0000000000..192c211e5f --- /dev/null +++ b/linkerd.io/content/2.16/reference/_index.md @@ -0,0 +1,6 @@ ++++ +title = "Reference" +weight = 5 ++++ + +{{% sectiontoc "reference" %}} diff --git a/linkerd.io/content/2.16/reference/architecture.md b/linkerd.io/content/2.16/reference/architecture.md new file mode 100644 index 0000000000..2bb53414bd --- /dev/null +++ b/linkerd.io/content/2.16/reference/architecture.md @@ -0,0 +1,119 @@ ++++ +title = "Architecture" +description = "Deep dive into the architecture of Linkerd." +aliases = [ + "../architecture/" +] ++++ + +At a high level, Linkerd consists of a **control plane** and a **data plane**. + +The **control plane** is a set of services that and provide control over +Linkerd as a whole. + +The **data plane** consists of transparent _micro-proxies_ that run "next" to +each service instance, as sidecar containers in the pods. These proxies +automatically handle all TCP traffic to and from the service, and communicate +with the control plane for configuration. + +Linkerd also provides a **CLI** that can be used to interact with the control +and data planes. + +{{< fig src="/images/architecture/control-plane.png" +title="Linkerd's architecture" >}} + +## CLI + +The Linkerd CLI is typically run outside of the cluster (e.g. on your local +machine) and is used to interact with the Linkerd. + +## Control plane + +The Linkerd control plane is a set of services that run in a dedicated +Kubernetes namespace (`linkerd` by default). The control plane has several +components, enumerated below. + +### The destination service + +The destination service is used by the data plane proxies to determine various +aspects of their behavior. It is used to fetch service discovery information +(i.e. where to send a particular request and the TLS identity expected on the +other end); to fetch policy information about which types of requests are +allowed; to fetch service profile information used to inform per-route metrics, +retries, and timeouts; and more. + +### The identity service + +The identity service acts as a [TLS Certificate +Authority](https://en.wikipedia.org/wiki/Certificate_authority) that accepts +[CSRs](https://en.wikipedia.org/wiki/Certificate_signing_request) from proxies +and returns signed certificates. These certificates are issued at proxy +initialization time and are used for proxy-to-proxy connections to implement +[mTLS](../../features/automatic-mtls/). + +### The proxy injector + +The proxy injector is a Kubernetes [admission +controller](https://kubernetes.io/docs/reference/access-authn-authz/admission-controllers/) +that receives a webhook request every time a pod is created. This injector +inspects resources for a Linkerd-specific annotation (`linkerd.io/inject: +enabled`). When that annotation exists, the injector mutates the pod's +specification and adds the `proxy-init` and `linkerd-proxy` containers to the +pod, along with the relevant start-time configuration. + +## Data plane + +The Linkerd data plane comprises ultralight _micro-proxies_ which are deployed +as sidecar containers inside application pods. These proxies transparently +intercept TCP connections to and from each pod, thanks to iptables rules put in +place by the [linkerd-init](#linkerd-init-container) (or, alternatively, by +Linkerd's [CNI plugin](../../features/cni/)). + +### Proxy + +The Linkerd2-proxy is an ultralight, transparent _micro-proxy_ written in +[Rust](https://www.rust-lang.org/). Linkerd2-proxy is designed specifically for +the service mesh use case and is not designed as a general-purpose proxy. + +The proxy's features include: + +* Transparent, zero-config proxying for HTTP, HTTP/2, and arbitrary TCP + protocols. +* Automatic Prometheus metrics export for HTTP and TCP traffic. +* Transparent, zero-config WebSocket proxying. +* Automatic, latency-aware, layer-7 load balancing. +* Automatic layer-4 load balancing for non-HTTP traffic. +* Automatic TLS. +* An on-demand diagnostic tap API. +* And lots more. + +The proxy supports service discovery via DNS and the +[destination gRPC API](https://github.com/linkerd/linkerd2-proxy-api). + +You can read more about these micro-proxies here: + +* [Why Linkerd doesn't use Envoy](/2020/12/03/why-linkerd-doesnt-use-envoy/) +* [Under the hood of Linkerd's state-of-the-art Rust proxy, + Linkerd2-proxy](/2020/07/23/under-the-hood-of-linkerds-state-of-the-art-rust-proxy-linkerd2-proxy/) + +### Meshed Conncections + +When one pod establishes a TCP connection to another pod and both of those pods +are injected with the Linkerd proxy, we say that the connection is *meshed*. +The proxy in the pod that initiated the connection is called the *outbound* +proxy and the proxy in the pod that accepted the connection is called the +*inbound* proxy. + +The *outbound* proxy is responsible for service discovery, load balancing, +circuit breakers, retries, and timeouts. The *inbound* proxy is responsible for +enforcing authorization policy. Both *inbound* and *outbound* proxies report +traffic metrics about the traffic they send and receive. + +### Linkerd init container + +The `linkerd-init` container is added to each meshed pod as a Kubernetes [init +container](https://kubernetes.io/docs/concepts/workloads/pods/init-containers/) +that runs before any other containers are started. It [uses +iptables](../iptables/) to route all TCP traffic to and from the pod through +the proxy. Linkerd's init container can be run in [different +modes](../../features/nft/) which determine what iptables variant is used. diff --git a/linkerd.io/content/2.16/reference/authorization-policy.md b/linkerd.io/content/2.16/reference/authorization-policy.md new file mode 100644 index 0000000000..46ba134c93 --- /dev/null +++ b/linkerd.io/content/2.16/reference/authorization-policy.md @@ -0,0 +1,562 @@ ++++ +title = "Authorization Policy" +description = "Reference guide to Linkerd's policy resources." ++++ + +Linkerd's authorization policy allows you to control which types of traffic are +allowed to meshed pods. See the [Authorization Policy feature +description](../../features/server-policy/) for more information on what this +means. + +Linkerd's policy is configured using two mechanisms: + +1. A set of _default policies_, which can be set at the cluster, + namespace, and workload level through Kubernetes annotations. +2. A set of CRDs that specify fine-grained policy for specific ports, routes, + workloads, etc. + +## Default policies + +During a Linkerd install, the `proxy.defaultInboundPolicy` field is used to +specify the cluster-wide default policy. This field can be one of the following: + +- `all-unauthenticated`: allow all traffic. This is the default. +- `all-authenticated`: allow traffic from meshed clients in the same or from + a different cluster (with multi-cluster). +- `cluster-authenticated`: allow traffic from meshed clients in the same cluster. +- `cluster-unauthenticated`: allow traffic from both meshed and non-meshed clients + in the same cluster. +- `deny`: all traffic are denied. +- `audit`: Same as `all-unauthenticated` but requests get flagged in logs and + metrics. + +This cluster-wide default can be overridden for specific resources by setting +the annotation `config.linkerd.io/default-inbound-policy` on either a pod spec +or its namespace. + +## Dynamic policy resources + +For dynamic control of policy, and for finer-grained policy than what the +default polices allow, Linkerd provides a set of CRDs which control traffic +policy in the cluster: [Server], [HTTPRoute], [ServerAuthorization], +[AuthorizationPolicy], [MeshTLSAuthentication], and [NetworkAuthentication]. + +The general pattern for authorization is: + +- A `Server` describes and a set of pods, and a single port on those pods. +- Optionally, an `HTTPRoute` references that `Server` and describes a + subset of HTTP traffic to it. +- A `MeshTLSAuthentication` or `NetworkAuthentication` decribes who + is allowed access. +- An `AuthorizationPolicy` references the `HTTPRoute` or `Server` + (the thing to be authorized) and the `MeshTLSAuthentication` or + `NetworkAuthentication` (the clients that have authorization). + +## Server + +A `Server` selects a port on a set of pods in the same namespace as the server. +It typically selects a single port on a pod, though it may select multiple ports +when referring to the port by name (e.g. `admin-http`). While the `Server` +resource is similar to a Kubernetes `Service`, it has the added restriction that +multiple `Server` instances must not overlap: they must not select the same +pod/port pairs. Linkerd ships with an admission controller that prevents +overlapping `Server`s from being created. + +{{< note >}} +When a Server resource is present, all traffic to the port on its pods will be +denied unless explicitly authorized or audit mode is enabled (with +`accessPolicy:audit`). Thus, Servers are typically paired with e.g. an +AuthorizationPolicy that references the Server, or that reference an HTTPRoute +that in turn references the Server. +{{< /note >}} + +### Server Spec + +A `Server` spec may contain the following top level fields: + +{{< table >}} +| field| value | +|------|-------| +| `accessPolicy`| [accessPolicy](#accessPolicy) declares the policy applied to traffic not matching any associated authorization policies (defaults to `deny`). | +| `podSelector`| A [podSelector](#podselector) selects pods in the same namespace. | +| `port`| A port name or number. Only ports in a pod spec's `ports` are considered. | +| `proxyProtocol`| Configures protocol discovery for inbound connections. Supersedes the `config.linkerd.io/opaque-ports` annotation. Must be one of `unknown`,`HTTP/1`,`HTTP/2`,`gRPC`,`opaque`,`TLS`. Defaults to `unknown` if not set. | +{{< /table >}} + +#### accessPolicy + +Traffic that doesn't conform to the authorization policies associated to the +Server are denied by default. You can alter that behavior by overriding the +`accessPolicy` field, which accepts the same values as the [default +policies](#default-policies). Of particular interest is the `audit` value, which +enables [audit mode](../../features/server-policy/#audit-mode), that you can use +to test policies before enforcing them. + +#### podSelector + +This is the [same labelSelector field in Kubernetes](https://kubernetes.io/docs/reference/kubernetes-api/common-definitions/label-selector/#LabelSelector). +All the pods that are part of this selector will be part of the [Server] group. +A podSelector object must contain _exactly one_ of the following fields: + +{{< table >}} +| field | value | +|-------|-------| +| `matchExpressions` | matchExpressions is a list of label selector requirements. The requirements are ANDed. | +| `matchLabels` | matchLabels is a map of {key,value} pairs. | +{{< /table >}} + +See [the Kubernetes LabelSelector reference](https://kubernetes.io/docs/reference/kubernetes-api/common-definitions/label-selector/#LabelSelector) +for more details. + +### Server Examples + +A [Server] that selects over pods with a specific label, with `gRPC` as +the `proxyProtocol`. + +```yaml +apiVersion: policy.linkerd.io/v1beta1 +kind: Server +metadata: + namespace: emojivoto + name: emoji-grpc +spec: + podSelector: + matchLabels: + app: emoji-svc + port: grpc + proxyProtocol: gRPC +``` + +A [Server] that selects over pods with `matchExpressions`, with `HTTP/2` +as the `proxyProtocol`, on port `8080`. + +```yaml +apiVersion: policy.linkerd.io/v1beta1 +kind: Server +metadata: + namespace: emojivoto + name: backend-services +spec: + podSelector: + matchExpressions: + - {key: app, operator: In, values: [voting-svc, emoji-svc]} + - {key: environment, operator: NotIn, values: [dev]} + port: 8080 + proxyProtocol: "HTTP/2" +``` + +## HTTPRoute + +When attached to a [Server], an `HTTPRoute` resource represents a subset of the +traffic handled by the ports on pods referred in that Server, by declaring a set +of rules which determine which requests match. Matches can be based on path, +headers, query params, and/or verb. [AuthorizationPolicies] may target +`HTTPRoute` resources, thereby authorizing traffic to that `HTTPRoute` only +rather than to the entire [Server]. `HTTPRoutes` may also define filters which +add processing steps that must be completed during the request or response +lifecycle. + +{{< note >}} +A given HTTP request can only match one HTTPRoute. If multiple HTTPRoutes +are present that match a request, one will be picked according to the [Gateway +API rules of +precendence](https://gateway-api.sigs.k8s.io/references/spec/#gateway.networking.k8s.io/v1beta1.HTTPRouteSpec). +{{< /note >}} + +Please refer to HTTPRoute's full [spec](../httproute/). + +{{< note >}} +Two versions of the HTTPRoute resource may be used with Linkerd: + +- The upstream version provided by the Gateway API, with the + `gateway.networking.k8s.io` API group +- A Linkerd-specific CRD provided by Linkerd, with the `policy.linkerd.io` API + group + +The two HTTPRoute resource definitions are similar, but the Linkerd version +implements experimental features not yet available with the upstream Gateway API +resource definition. See [the HTTPRoute reference +documentation](../httproute/#linkerd-and-gateway-api-httproutes) +for details. +{{< /note >}} + +## AuthorizationPolicy + +An AuthorizationPolicy provides a way to authorize traffic to a [Server] or an +[HTTPRoute]. AuthorizationPolicies are a replacement for [ServerAuthorizations] +which are more flexible because they can target [HTTPRoutes] instead of only +being able to target [Servers]. + +### AuthorizationPolicy Spec + +An `AuthorizationPolicy` spec may contain the following top level fields: + +{{< table >}} +| field| value | +|------|-------| +| `targetRef`| A [TargetRef](#targetref) which references a resource to which the authorization policy applies.| +| `requiredAuthenticationRefs`| A list of [TargetRefs](#targetref) representing the required authentications. In the case of multiple entries, _all_ authentications must match.| +{{< /table >}} + +#### targetRef + +A `TargetRef` identifies an API object to which this AuthorizationPolicy +applies. The API objects supported are: + +- A [Server], indicating that the AuthorizationPolicy applies to all traffic to + the Server. +- An [HTTPRoute], indicating that the AuthorizationPolicy applies to all traffic + matching the HTTPRoute. +- A namespace (`kind: Namespace`), indicating that the AuthorizationPolicy + applies to all traffic to all [Servers] and [HTTPRoutes] defined in the + namespace. + +{{< table >}} +| field| value | +|------|-------| +| `group`| Group is the group of the target resource. For namespace kinds, this should be omitted.| +| `kind`| Kind is kind of the target resource.| +| `namespace`| The namespace of the target resource. When unspecified (or empty string), this refers to the local namespace of the policy.| +| `name`| Name is the name of the target resource.| +{{< /table >}} + +### AuthorizationPolicy Examples + +An `AuthorizationPolicy` which authorizes clients that satisfy the +`authors-get-authn` authentication to send to the `authors-get-route` +[HTTPRoute]. + +```yaml +apiVersion: policy.linkerd.io/v1alpha1 +kind: AuthorizationPolicy +metadata: + name: authors-get-policy + namespace: booksapp +spec: + targetRef: + group: policy.linkerd.io + kind: HTTPRoute + name: authors-get-route + requiredAuthenticationRefs: + - name: authors-get-authn + kind: MeshTLSAuthentication + group: policy.linkerd.io +``` + +An `AuthorizationPolicy` which authorizes the `webapp` `ServiceAccount` to send +to the `authors` [Server]. + +```yaml +apiVersion: policy.linkerd.io/v1alpha1 +kind: AuthorizationPolicy +metadata: + name: authors-policy + namespace: booksapp +spec: + targetRef: + group: policy.linkerd.io + kind: Server + name: authors + requiredAuthenticationRefs: + - name: webapp + kind: ServiceAccount +``` + +An `AuthorizationPolicy` which authorizes the `webapp` `ServiceAccount` to send +to all policy "targets" within the `booksapp` namespace. + +```yaml +apiVersion: policy.linkerd.io/v1alpha1 +kind: AuthorizationPolicy +metadata: + name: authors-policy + namespace: booksapp +spec: + targetRef: + kind: Namespace + name: booksapp + requiredAuthenticationRefs: + - name: webapp + kind: ServiceAccount +``` + +## MeshTLSAuthentication + +A `MeshTLSAuthentication` represents a set of mesh identities. When an +[AuthorizationPolicy] has a `MeshTLSAuthentication` as one of its +`requiredAuthenticationRefs`, this means that clients must be in the mesh and +must have one of the specified identities in order to be authorized to send +to the target. + +### MeshTLSAuthentication Spec + +A `MeshTLSAuthentication` spec may contain the following top level fields: + +{{< table >}} +| field| value | +|------|-------| +| `identities`| A list of mTLS identities to authenticate. The `*` prefix can be used to match all identities in a domain. An identity string of `*` indicates that all meshed clients are authorized.| +| `identityRefs`| A list of [targetRefs](#targetref) to `ServiceAccounts` to authenticate.| +{{< /table >}} + +### MeshTLSAuthentication Examples + +A `MeshTLSAuthentication` which authenticates the `books` and `webapp` mesh +identities. + +```yaml +apiVersion: policy.linkerd.io/v1alpha1 +kind: MeshTLSAuthentication +metadata: + name: authors-get-authn + namespace: booksapp +spec: + identities: + - "books.booksapp.serviceaccount.identity.linkerd.cluster.local" + - "webapp.booksapp.serviceaccount.identity.linkerd.cluster.local" +``` + +A `MeshTLSAuthentication` which authenticate thes `books` and `webapp` mesh +identities. This is an alternative way to specify the same thing as the above +example. + +```yaml +apiVersion: policy.linkerd.io/v1alpha1 +kind: MeshTLSAuthentication +metadata: + name: authors-get-authn + namespace: booksapp +spec: + identityRefs: + - kind: ServiceAccount + name: books + - kind: ServiceAccount + name: webapp +``` + +A `MeshTLSAuthentication` which authenticates all meshed identities. + +```yaml +apiVersion: policy.linkerd.io/v1alpha1 +kind: MeshTLSAuthentication +metadata: + name: authors-get-authn + namespace: booksapp +spec: + identities: ["*"] +``` + +## NetworkAuthentication + +A `NetworkAuthentication` represents a set of IP subnets. When an +[AuthorizationPolicy] has a `NetworkAuthentication` as one of its +`requiredAuthenticationRefs`, this means that clients must be in one of the +specified networks in order to be authorized to send to the target. + +### NetworkAuthentication Spec + +A `NetworkAuthentication` spec may contain the following top level fields: + +{{< table >}} +| field| value | +|------|-------| +| `networks`| A list of [networks](#network) to authenticate.| +{{< /table >}} + +#### network + +A `network` defines an authenticated IP subnet. + +{{< table >}} +| field| value | +|------|-------| +| `cidr`| A subnet in CIDR notation to authenticate.| +| `except`| A list of subnets in CIDR notation to exclude from the authentication.| +{{< /table >}} + +### NetworkAuthentication Examples + +A `NetworkAuthentication` that authenticates clients which belong to any of +the specified CIDRs. + +```yaml +apiVersion: policy.linkerd.io/v1alpha1 +kind: NetworkAuthentication +metadata: + name: cluster-network + namespace: booksapp +spec: + networks: + - cidr: 10.0.0.0/8 + - cidr: 100.64.0.0/10 + - cidr: 172.16.0.0/12 + - cidr: 192.168.0.0/16 +``` + +## ServerAuthorization + +A [ServerAuthorization] provides a way to authorize traffic to one or more +[Server]s. + +{{< note >}} +[AuthorizationPolicy](#authorizationpolicy) is a more flexible alternative to +`ServerAuthorization` that can target [HTTPRoutes](#httproute) as well as +[Servers](#server). Use of [AuthorizationPolicy](#authorizationpolicy) is +preferred, and `ServerAuthorization` will be deprecated in future releases. +{{< /note >}} + +### ServerAuthorization Spec + +A ServerAuthorization spec must contain the following top level fields: + +{{< table >}} +| field| value | +|------|-------| +| `client`| A [client](#client) describes clients authorized to access a server. | +| `server`| A [serverRef](#serverref) identifies `Servers` in the same namespace for which this authorization applies. | +{{< /table >}} + +#### serverRef + +A `serverRef` object must contain _exactly one_ of the following fields: + +{{< table >}} +| field| value | +|------|-------| +| `name`| References a `Server` instance by name. | +| `selector`| A [selector](#selector) selects servers on which this authorization applies in the same namespace. | +{{< /table >}} + +#### selector + +This is the [same labelSelector field in Kubernetes](https://kubernetes.io/docs/reference/kubernetes-api/common-definitions/label-selector/#LabelSelector). +All the servers that are part of this selector will have this authorization applied. +A selector object must contain _exactly one_ of the following fields: + +{{< table >}} +| field | value | +|-------|-------| +| `matchExpressions` | A list of label selector requirements. The requirements are ANDed. | +| `matchLabels` | A map of {key,value} pairs. | +{{< /table >}} + +See [the Kubernetes LabelSelector reference](https://kubernetes.io/docs/reference/kubernetes-api/common-definitions/label-selector/#LabelSelector) +for more details. + +#### client + +A `client` object must contain _exactly one_ of the following fields: + +{{< table >}} +| field| value | +|------|-------| +| `meshTLS`| A [meshTLS](#meshtls) is used to authorize meshed clients to access a server. | +| `unauthenticated`| A boolean value that authorizes unauthenticated clients to access a server. | +{{< /table >}} + +Optionally, it can also contain the `networks` field: + +{{< table >}} +| field| value | +|------|-------| +| `networks`| Limits the client IP addresses to which this authorization applies. If unset, the server chooses a default (typically, all IPs or the cluster's pod network). | +{{< /table >}} + +#### meshTLS + +A `meshTLS` object must contain _exactly one_ of the following fields: + +{{< table >}} +| field| value | +|------|-------| +| `unauthenticatedTLS`| A boolean to indicate that no client identity is required for communication. This is mostly important for the identity controller, which must terminate TLS connections from clients that do not yet have a certificate. | +| `identities`| A list of proxy identity strings (as provided via mTLS) that are authorized. The `*` prefix can be used to match all identities in a domain. An identity string of `*` indicates that all authentication clients are authorized. | +| `serviceAccounts`| A list of authorized client [serviceAccount](#serviceAccount)s (as provided via mTLS). | +{{< /table >}} + +#### serviceAccount + +A serviceAccount field contains the following top level fields: + +{{< table >}} +| field| value | +|------|-------| +| `name`| The ServiceAccount's name. | +| `namespace`| The ServiceAccount's namespace. If unset, the authorization's namespace is used. | +{{< /table >}} + +### ServerAuthorization Examples + +A [ServerAuthorization] that allows meshed clients with +`*.emojivoto.serviceaccount.identity.linkerd.cluster.local` proxy identity i.e. all +service accounts in the `emojivoto` namespace. + +```yaml +apiVersion: policy.linkerd.io/v1beta1 +kind: ServerAuthorization +metadata: + namespace: emojivoto + name: emoji-grpc +spec: + # Allow all authenticated clients to access the (read-only) emoji service. + server: + selector: + matchLabels: + app: emoji-svc + client: + meshTLS: + identities: + - "*.emojivoto.serviceaccount.identity.linkerd.cluster.local" +``` + +A [ServerAuthorization] that allows any unauthenticated +clients. + +```yaml +apiVersion: policy.linkerd.io/v1beta1 +kind: ServerAuthorization +metadata: + namespace: emojivoto + name: web-public +spec: + server: + name: web-http + # Allow all clients to access the web HTTP port without regard for + # authentication. If unauthenticated connections are permitted, there is no + # need to describe authenticated clients. + client: + unauthenticated: true + networks: + - cidr: 0.0.0.0/0 + - cidr: ::/0 +``` + +A [ServerAuthorization] that allows meshed clients with a +specific service account. + +```yaml +apiVersion: policy.linkerd.io/v1beta1 +kind: ServerAuthorization +metadata: + namespace: emojivoto + name: prom-prometheus +spec: + server: + name: prom + client: + meshTLS: + serviceAccounts: + - namespace: linkerd-viz + name: prometheus +``` + +[Server]: #server +[Servers]: #server +[HTTPRoute]: #httproute +[HTTPRoutes]: #httproute +[ServerAuthorization]: #serverauthorization +[ServerAuthorizations]: #serverauthorization +[AuthorizationPolicy]: #authorizationpolicy +[AuthorizationPolicies]: #authorizationpolicy +[MeshTLSAuthentication]: #meshtlsauthentication +[NetworkAuthentication]: #networkauthentication diff --git a/linkerd.io/content/2.16/reference/circuit-breaking.md b/linkerd.io/content/2.16/reference/circuit-breaking.md new file mode 100644 index 0000000000..062f9b01d8 --- /dev/null +++ b/linkerd.io/content/2.16/reference/circuit-breaking.md @@ -0,0 +1,158 @@ ++++ +title = "Circuit Breaking" +description = "How Linkerd implements circuit breaking." +aliases = [ + "../failure-accrual/", +] ++++ + +[_Circuit breaking_][circuit-breaker] is a pattern for improving the reliability +of distributed applications. In circuit breaking, an application which makes +network calls to remote backends monitors whether those calls succeed or fail, +in an attempt to determine whether that backend is in a failed state. If a +given backend is believed to be in a failed state, its circuit breaker is +"tripped", and no subsequent requests are sent to that backend until it is +determined to have returned to normal. + +The Linkerd proxy is capable of performing endpoint-level circuit breaking on +HTTP requests using a configurable failure accrual strategy. This means that the +Linkerd proxy performs circuit breaking at the level of individual endpoints +in a [load balancer](../../features/load-balancing/) (i.e., each Pod in a given +Service), and failures are tracked at the level of HTTP response status codes. + +Circuit breaking is a client-side behavior, and is therefore performed by the +[outbound] side of the Linkerd proxy.[^1] Outbound proxies implement circuit +breaking in the load balancer, by marking failing endpoints as _unavailable_. +When an endpoint is unavailable, the load balancer will not select it when +determining where to send a given request. This means that if only some +endpoints have tripped their circuit breakers, the proxy will simply not select +those endpoints while they are in a failed state. When all endpoints in a load +balancer are unavailable, requests may be failed with [503 Service Unavailable] +errors, or, if the Service is one of multiple [`backendRef`s in an +HTTPRoute](../httproute/#httpbackendref), the entire backend Service will be +considered unavailable and a different backend may be selected. + +The [`outbound_http_balancer_endpoints` gauge metric][metric] reports the number +of "ready" and "pending" endpoints in a load balancer, with the "pending" number +including endpoints made unavailable by failure accrual. + +## Failure Accrual Policies + +A _failure accrual policy_ determines how failures are tracked for endpoints, +and what criteria result in an endpoint becoming unavailable ("tripping the +circuit breaker"). Currently, the Linkerd proxy implements one failure accrual +policy, _consecutive failures_. Additional failure accrual policies may be +added in the future. + +{{< note >}} +HTTP responses are classified as _failures_ if their status code is a [5xx +server error]. Future Linkerd releases may add support for configuring what +status codes are classified as failures. +{{}} + +### Consecutive Failures + +In this failure accrual policy, an endpoint is marked as failing after a +configurable number of failures occur _consecutively_ (i.e., without any +successes). For example, if the maximum number of failures is 7, the endpoint is +made unavailable once 7 failures occur in a row with no successes. + +## Probation and Backoffs + +Once a failure accrual policy makes an endpoint unavailble, the circuit breaker +will attempt to determine whether the endpoint is still in a failing state, and +transition it back to available if it has recovered. This process is called +_probation_. When an endpoint enters probation, it is temporarily made available +to the load balancer again, and permitted to handle a single request, called a +_probe request_. If this request succeeds, the endpoint is no longer considered +failing, and is once again made available. If the probe request fails, the +endpoint remains unavailable, and another probe request will be issued after a +backoff. + +{{< note >}} +In the context of HTTP failure accrual, a probe request is an actual application +request, and should not be confused with HTTP readiness and liveness probes. +This means that a circuit breaker will not allow an endpoint to exit probation +just because it responds successfully to health checks — actual +application traffic must succeed for the endpoint to become available again. +{{}} + +When an endpoint's failure accrual policy trips the circuit breaker, it will +remain unavailble for at least a _minimum penalty_ duration. After this duration +has elapsed, the endpoint will enter probation. When a probe request fails, the +endpoint will not be placed in probation again until a backoff duration has +elapsed. Every time a probe request fails, [the backoff increases +exponentially][exp-backoff], up to an upper bound set by the _maximum penalty_ +duration. + +An amount of random noise, called _jitter_, is added to each backoff +duration. Jitter is controlled by a parameter called the _jitter ratio_, a +floating-point number from 0.0 to 100.0, which represents the maximum percentage +of the original backoff duration which may be added as jitter. + +## Configuring Failure Accrual + +HTTP failure accrual is configured by a set of annotations. When these +annotations are added to a Kubernetes Service, client proxies will perform +HTTP failure accrual when communicating with endpoints of that Service. If no +failure accrual annotations are present on a Service, proxies will not perform +failure accrual. + +{{< warning >}} +Circuit breaking is **incompatible with ServiceProfiles**. If a +[ServiceProfile](../../features/service-profiles/) is defined for the annotated +Service, proxies will not perform circuit breaking as long as the ServiceProfile +exists. +{{< /warning >}} + +{{< note >}} +Some failure accrual annotations have values which represent a duration. +Durations are specified as a positive integer, followed by a unit, which may be +one of: `ms` for milliseconds, `s` for seconds, `m` for minutes, `h` for hours, +or `d` for days. +{{}} + +Set this annotation on a Service to enable meshed clients to use circuit +breaking when sending traffic to that Service: + ++ `balancer.linkerd.io/failure-accrual`: Selects the [failure accrual + policy](#failure-accrual-policies) used + when communicating with this Service. If this is not present, no failure + accrual is performed. Currently, the only supported value for this annotation + is `"consecutive"`, to perform [consecutive failures failure + accrual](#consecutive-failures). + +When the failure accrual mode is `"consecutive"`, the following annotations +configure parameters for the consecutive-failures failure accrual policy: + ++ `balancer.linkerd.io/failure-accrual-consecutive-max-failures`: Sets the + number of consecutive failures which must occur before an endpoint is made + unavailable. Must be an integer. If this annotation is not present, the + default value is 7. ++ `balancer.linkerd.io/failure-accrual-consecutive-min-penalty`: Sets the + minumum penalty duration for which an endpoint will be marked as unavailable + after `max-failures` consecutive failures occur. After this period of time + elapses, the endpoint will be [probed](#probation-and-backoffs). This duration + must be non-zero, and may not be greater than the max-penalty duration. If this + annotation is not present, the default value is one second (`1s`). ++ `balancer.linkerd.io/failure-accrual-consecutive-max-penalty`: Sets the + maximum penalty duration for which an endpoint will be marked as unavailable + after `max-failures` consecutive failures occur. This is an upper bound on the + duration between [probe requests](#probation-and-backoffs). This duration + must be non-zero, and must be greater than the min-penalty duration. If this + annotation is not present, the default value is one minute (`1m`). ++ `balancer.linkerd.io/failure-accrual-consecutive-jitter-ratio`: Sets the + jitter ratio used for [probation backoffs](#probation-and-backoffs). This is a + floating-point number, and must be between 0.0 and 100.0. If this annotation + is not present, the default value is 0.5. + +[^1]: The part of the proxy which handles connections from within the pod to the + rest of the cluster. + +[circuit-breaker]: https://www.martinfowler.com/bliki/CircuitBreaker.html +[503 Service Unavailable]: https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/503 +[metric]: ../proxy-metrics/#outbound-xroute-metrics +[5xx server error]: https://developer.mozilla.org/en-US/docs/Web/HTTP/Status#server_error_responses +[exp-backoff]: + https://aws.amazon.com/blogs/architecture/exponential-backoff-and-jitter/ +[outbound]: ../architecture/#meshed-conncections diff --git a/linkerd.io/content/2.16/reference/cli/_index.md b/linkerd.io/content/2.16/reference/cli/_index.md new file mode 100644 index 0000000000..055867683c --- /dev/null +++ b/linkerd.io/content/2.16/reference/cli/_index.md @@ -0,0 +1,23 @@ ++++ +title = "CLI" +description = "Reference documentation for all the CLI commands." +aliases = [ + "../cli/", + "cli/get/", + "cli/repair/", +] ++++ + +The Linkerd CLI is the primary way to interact with Linkerd. It can install the +control plane to your cluster, add the proxy to your service and provide +detailed metrics for how your service is performing. + +As reference, check out the commands below: + +{{< cli >}} + +## Global flags + +The following flags are available for *all* linkerd CLI commands: + +{{< global-flags >}} diff --git a/linkerd.io/content/2.16/reference/cli/authz.md b/linkerd.io/content/2.16/reference/cli/authz.md new file mode 100644 index 0000000000..73bc5d7a8f --- /dev/null +++ b/linkerd.io/content/2.16/reference/cli/authz.md @@ -0,0 +1,11 @@ ++++ +title = "authz" ++++ + +{{< cli/description "authz" >}} + +Check out the [Authorization Policy](../../../reference/authorization-policy/) +and [Restricting Access to Services](../../../tasks/restricting-access/) +documentation for all the details about authorization policy in Linkerd. + +{{< cli/flags "authz" >}} diff --git a/linkerd.io/content/2.16/reference/cli/check.md b/linkerd.io/content/2.16/reference/cli/check.md new file mode 100644 index 0000000000..1f6c3a25d0 --- /dev/null +++ b/linkerd.io/content/2.16/reference/cli/check.md @@ -0,0 +1,59 @@ ++++ +title = "check" +aliases = [ + "../check-reference/" +] ++++ + +{{< cli/description "check" >}} + +Take a look at the [troubleshooting](../../../tasks/troubleshooting/) documentation +for a full list of all the possible checks, what they do and how to fix them. + +{{< cli/examples "check" >}} + +## Example output + +```bash +$ linkerd check +kubernetes-api +-------------- +√ can initialize the client +√ can query the Kubernetes API + +kubernetes-version +------------------ +√ is running the minimum Kubernetes API version + +linkerd-existence +----------------- +√ control plane namespace exists +√ controller pod is running +√ can initialize the client +√ can query the control plane API + +linkerd-api +----------- +√ control plane pods are ready +√ control plane self-check +√ [kubernetes] control plane can talk to Kubernetes +√ [prometheus] control plane can talk to Prometheus + +linkerd-service-profile +----------------------- +√ no invalid service profiles + +linkerd-version +--------------- +√ can determine the latest version +√ cli is up-to-date + +control-plane-version +--------------------- +√ control plane is up-to-date +√ control plane and cli versions match + +Status check results are √ +``` + +{{< cli/flags "check" >}} diff --git a/linkerd.io/content/2.16/reference/cli/completion.md b/linkerd.io/content/2.16/reference/cli/completion.md new file mode 100644 index 0000000000..04c1a47807 --- /dev/null +++ b/linkerd.io/content/2.16/reference/cli/completion.md @@ -0,0 +1,9 @@ ++++ +title = "completion" ++++ + +{{< cli/description "completion" >}} + +{{< cli/examples "completion" >}} + +{{< cli/flags "completion" >}} diff --git a/linkerd.io/content/2.16/reference/cli/diagnostics.md b/linkerd.io/content/2.16/reference/cli/diagnostics.md new file mode 100644 index 0000000000..23b2ca6f8f --- /dev/null +++ b/linkerd.io/content/2.16/reference/cli/diagnostics.md @@ -0,0 +1,48 @@ ++++ +title = "diagnostics" +aliases = [ + "endpoints", + "install-sp", + "metrics" +] ++++ + +{{< cli/description "diagnostics" >}} + +{{< cli/examples "diagnostics" >}} + +{{< cli/flags "diagnostics" >}} + +## Subcommands + +### controller-metrics + +{{< cli/description "diagnostics controller-metrics" >}} + +{{< cli/examples "diagnostics controller-metrics" >}} + +{{< cli/flags "diagnostics controller-metrics" >}} + +### endpoints + +{{< cli/description "diagnostics endpoints" >}} + +{{< cli/examples "diagnostics endpoints" >}} + +{{< cli/flags "diagnostics endpoints" >}} + +### install-sp + +{{< cli/description "diagnostics install-sp" >}} + +{{< cli/examples "diagnostics install-sp" >}} + +{{< cli/flags "diagnostics install-sp" >}} + +### proxy-metrics + +{{< cli/description "diagnostics proxy-metrics" >}} + +{{< cli/examples "diagnostics proxy-metrics" >}} + +{{< cli/flags "diagnostics proxy-metrics" >}} diff --git a/linkerd.io/content/2.16/reference/cli/identity.md b/linkerd.io/content/2.16/reference/cli/identity.md new file mode 100644 index 0000000000..d289a24e71 --- /dev/null +++ b/linkerd.io/content/2.16/reference/cli/identity.md @@ -0,0 +1,9 @@ ++++ +title = "identity" ++++ + +{{< cli/description "identity" >}} + +{{< cli/examples "identity" >}} + +{{< cli/flags "identity" >}} diff --git a/linkerd.io/content/2.16/reference/cli/inject.md b/linkerd.io/content/2.16/reference/cli/inject.md new file mode 100644 index 0000000000..6aab449ba1 --- /dev/null +++ b/linkerd.io/content/2.16/reference/cli/inject.md @@ -0,0 +1,27 @@ ++++ +title = "inject" +aliases = [ + "../inject-reference/" +] ++++ + +The `inject` command is a text transform that modifies Kubernetes manifests +passed to it either as a file or as a stream (`-`) to adds a +`linkerd.io/inject: enabled` annotation to eligible resources in the manifest. +When the resulting annotated manifest is applied to the Kubernetes cluster, +Linkerd's [proxy autoinjector](../../../features/proxy-injection/) automatically +adds the Linkerd data plane proxies to the corresponding pods. + +Note that there is no *a priori* reason to use this command. In production, +these annotations may be instead set by a CI/CD system, or any other +deploy-time mechanism. + +## Manual injection + +Alternatively, this command can also perform the full injection purely on the +client side, by enabling with the `--manual` flag. (Prior to Linkerd 2.4, this +was the default behavior.) + +{{< cli/examples "inject" >}} + +{{< cli/flags "inject" >}} diff --git a/linkerd.io/content/2.16/reference/cli/install-cni.md b/linkerd.io/content/2.16/reference/cli/install-cni.md new file mode 100644 index 0000000000..e2da1a351c --- /dev/null +++ b/linkerd.io/content/2.16/reference/cli/install-cni.md @@ -0,0 +1,9 @@ ++++ +title = "install-cni" ++++ + +{{< cli/description "install-cni" >}} + +{{< cli/examples "install-cni" >}} + +{{< cli/flags "install-cni" >}} diff --git a/linkerd.io/content/2.16/reference/cli/install.md b/linkerd.io/content/2.16/reference/cli/install.md new file mode 100644 index 0000000000..731f5fe4fc --- /dev/null +++ b/linkerd.io/content/2.16/reference/cli/install.md @@ -0,0 +1,12 @@ ++++ +title = "install" ++++ + +{{< cli/description "install" >}} + +For further details on how to install Linkerd onto your own cluster, check out +the [install documentation](../../../tasks/install/). + +{{< cli/examples "install" >}} + +{{< cli/flags "install" >}} diff --git a/linkerd.io/content/2.16/reference/cli/jaeger.md b/linkerd.io/content/2.16/reference/cli/jaeger.md new file mode 100644 index 0000000000..10e6970f21 --- /dev/null +++ b/linkerd.io/content/2.16/reference/cli/jaeger.md @@ -0,0 +1,51 @@ ++++ +title = "jaeger" ++++ + +{{< cli/description "jaeger" >}} + +{{< cli/examples "jaeger" >}} + +{{< cli/flags "jaeger" >}} + +## Subcommands + +### check + +{{< cli/description "jaeger check" >}} + +{{< cli/examples "jaeger check" >}} + +{{< cli/flags "jaeger check" >}} + +### dashboard + +{{< cli/description "jaeger dashboard" >}} + +{{< cli/examples "jaeger dashboard" >}} + +{{< cli/flags "jaeger dashboard" >}} + +### install + +{{< cli/description "jaeger install" >}} + +{{< cli/examples "jaeger install" >}} + +{{< cli/flags "jaeger install" >}} + +### list + +{{< cli/description "jaeger list" >}} + +{{< cli/examples "jaeger list" >}} + +{{< cli/flags "jaeger list" >}} + +### uninstall + +{{< cli/description "jaeger uninstall" >}} + +{{< cli/examples "jaeger uninstall" >}} + +{{< cli/flags "jaeger uninstall" >}} diff --git a/linkerd.io/content/2.16/reference/cli/multicluster.md b/linkerd.io/content/2.16/reference/cli/multicluster.md new file mode 100644 index 0000000000..a7a26e4ce9 --- /dev/null +++ b/linkerd.io/content/2.16/reference/cli/multicluster.md @@ -0,0 +1,67 @@ ++++ +title = "multicluster" ++++ + +{{< cli/description "multicluster" >}} + +{{< cli/examples "multicluster" >}} + +{{< cli/flags "multicluster" >}} + +## Subcommands + +### allow + +{{< cli/description "multicluster allow" >}} + +{{< cli/examples "multicluster allow" >}} + +{{< cli/flags "multicluster allow" >}} + +### check + +{{< cli/description "multicluster check" >}} + +{{< cli/examples "multicluster check" >}} + +{{< cli/flags "multicluster check" >}} + +### gateways + +{{< cli/description "multicluster gateways" >}} + +{{< cli/examples "multicluster gateways" >}} + +{{< cli/flags "multicluster gateways" >}} + +### install + +{{< cli/description "multicluster install" >}} + +{{< cli/examples "multicluster install" >}} + +{{< cli/flags "multicluster install" >}} + +### link + +{{< cli/description "multicluster link" >}} + +{{< cli/examples "multicluster link" >}} + +{{< cli/flags "multicluster link" >}} + +### uninstall + +{{< cli/description "multicluster uninstall" >}} + +{{< cli/examples "multicluster uninstall" >}} + +{{< cli/flags "multicluster uninstall" >}} + +### unlink + +{{< cli/description "multicluster unlink" >}} + +{{< cli/examples "multicluster unlink" >}} + +{{< cli/flags "multicluster unlink" >}} diff --git a/linkerd.io/content/2.16/reference/cli/profile.md b/linkerd.io/content/2.16/reference/cli/profile.md new file mode 100644 index 0000000000..052e5e5e72 --- /dev/null +++ b/linkerd.io/content/2.16/reference/cli/profile.md @@ -0,0 +1,13 @@ ++++ +title = "profile" ++++ + +{{< cli/description "profile" >}} + +Check out the [service profile](../../../features/service-profiles/) +documentation for more details on what this command does and what you can do +with service profiles. + +{{< cli/examples "profile" >}} + +{{< cli/flags "profile" >}} diff --git a/linkerd.io/content/2.16/reference/cli/prune.md b/linkerd.io/content/2.16/reference/cli/prune.md new file mode 100644 index 0000000000..38d8c7054d --- /dev/null +++ b/linkerd.io/content/2.16/reference/cli/prune.md @@ -0,0 +1,9 @@ ++++ +title = "prune" ++++ + +{{< cli/description "prune" >}} + +{{< cli/examples "prune" >}} + +{{< cli/flags "prune" >}} diff --git a/linkerd.io/content/2.16/reference/cli/uninject.md b/linkerd.io/content/2.16/reference/cli/uninject.md new file mode 100644 index 0000000000..6ac72e9a54 --- /dev/null +++ b/linkerd.io/content/2.16/reference/cli/uninject.md @@ -0,0 +1,9 @@ ++++ +title = "uninject" ++++ + +{{< cli/description "uninject" >}} + +{{< cli/examples "uninject" >}} + +{{< cli/flags "uninject" >}} diff --git a/linkerd.io/content/2.16/reference/cli/uninstall.md b/linkerd.io/content/2.16/reference/cli/uninstall.md new file mode 100644 index 0000000000..750126ab57 --- /dev/null +++ b/linkerd.io/content/2.16/reference/cli/uninstall.md @@ -0,0 +1,9 @@ ++++ +title = "uninstall" ++++ + +{{< cli/description "uninstall" >}} + +{{< cli/examples "uninstall" >}} + +{{< cli/flags "uninstall" >}} diff --git a/linkerd.io/content/2.16/reference/cli/upgrade.md b/linkerd.io/content/2.16/reference/cli/upgrade.md new file mode 100644 index 0000000000..cba9f5a81c --- /dev/null +++ b/linkerd.io/content/2.16/reference/cli/upgrade.md @@ -0,0 +1,9 @@ ++++ +title = "upgrade" ++++ + +{{< cli/description "upgrade" >}} + +{{< cli/examples "upgrade" >}} + +{{< cli/flags "upgrade" >}} diff --git a/linkerd.io/content/2.16/reference/cli/version.md b/linkerd.io/content/2.16/reference/cli/version.md new file mode 100644 index 0000000000..3f7949a6fa --- /dev/null +++ b/linkerd.io/content/2.16/reference/cli/version.md @@ -0,0 +1,9 @@ ++++ +title = "version" ++++ + +{{< cli/description "version" >}} + +{{< cli/examples "version" >}} + +{{< cli/flags "version" >}} diff --git a/linkerd.io/content/2.16/reference/cli/viz.md b/linkerd.io/content/2.16/reference/cli/viz.md new file mode 100644 index 0000000000..96f71b2e54 --- /dev/null +++ b/linkerd.io/content/2.16/reference/cli/viz.md @@ -0,0 +1,183 @@ ++++ +title = "viz" +aliases = [ + "dashboard", + "edges", + "routes", + "stat", + "tap", + "top" +] ++++ + +{{< cli/description "viz" >}} + +{{< cli/examples "viz" >}} + +{{< cli/flags "viz" >}} + +## Subcommands + +## allow-scrapes + +{{< cli/description "viz allow-scrapes" >}} + +{{< cli/examples "viz allow-scrapes" >}} + +{{< cli/flags "viz allow-scrapes" >}} + +## authz + +{{< cli/description "viz authz" >}} + +{{< cli/examples "viz authz" >}} + +{{< cli/flags "viz authz" >}} + +### check + +{{< cli/description "viz check" >}} + +{{< cli/examples "viz check" >}} + +{{< cli/flags "viz check" >}} + +### dashboard + +{{< cli/description "viz dashboard" >}} + +Check out the [architecture](../../architecture/#dashboard) docs for a +more thorough explanation of what this command does. + +{{< cli/examples "viz dashboard" >}} + +{{< cli/flags "viz dashboard" >}} + +(*) You'll need to tweak the dashboard's `enforced-host` parameter with this +value, as explained in [the DNS-rebinding protection +docs](../../../tasks/exposing-dashboard/#tweaking-host-requirement) + +### edges + +{{< cli/description "viz edges" >}} + +{{< cli/examples "viz edges" >}} + +{{< cli/flags "viz edges" >}} + +### install + +{{< cli/description "viz install" >}} + +{{< cli/examples "viz install" >}} + +{{< cli/flags "viz install" >}} + +### list + +{{< cli/description "viz list" >}} + +{{< cli/examples "viz list" >}} + +{{< cli/flags "viz list" >}} + +### profile + +{{< cli/description "viz profile" >}} + +{{< cli/examples "viz profile" >}} + +{{< cli/flags "viz profile" >}} + +### routes + +The `routes` command displays per-route service metrics. In order for +this information to be available, a service profile must be defined for the +service that is receiving the requests. For more information about how to +create a service profile, see [service profiles](../../../features/service-profiles/). +and the [profile](../../cli/profile/) command reference. + +## Inbound Metrics + +By default, `routes` displays *inbound* metrics for a target. In other +words, it shows information about requests which are sent to the target and +responses which are returned by the target. For example, the command: + +```bash +linkerd viz routes deploy/webapp +``` + +Displays the request volume, success rate, and latency of requests to the +`webapp` deployment. These metrics are from the `webapp` deployment's +perspective, which means that, for example, these latencies do not include the +network latency between a client and the `webapp` deployment. + +## Outbound Metrics + +If you specify the `--to` flag then `linkerd viz routes` displays *outbound* metrics +from the target resource to the resource in the `--to` flag. In contrast to +the inbound metrics, these metrics are from the perspective of the sender. This +means that these latencies do include the network latency between the client +and the server. For example, the command: + +```bash +linkerd viz routes deploy/traffic --to deploy/webapp +``` + +Displays the request volume, success rate, and latency of requests from +`traffic` to `webapp` from the perspective of the `traffic` deployment. + +## Effective and Actual Metrics + +If you are looking at *outbound* metrics (by specifying the `--to` flag) you +can also supply the `-o wide` flag to differentiate between *effective* and +*actual* metrics. + +Effective requests are requests which are sent by some client to the Linkerd +proxy. Actual requests are requests which the Linkerd proxy sends to some +server. If the Linkerd proxy is performing retries, one effective request can +translate into more than one actual request. If the Linkerd proxy is not +performing retries, effective requests and actual requests will always be equal. +When enabling retries, you should expect to see the actual request rate +increase and the effective success rate increase. See the +[retries and timeouts section](../../../features/retries-and-timeouts/) for more +information. + +Because retries are only performed on the *outbound* (client) side, the +`-o wide` flag can only be used when the `--to` flag is specified. + +{{< cli/examples "viz routes" >}} + +{{< cli/flags "viz routes" >}} + +### stat + +{{< cli/description "viz stat" >}} + +{{< cli/examples "viz stat" >}} + +{{< cli/flags "viz stat" >}} + +### tap + +{{< cli/description "viz tap" >}} + +{{< cli/examples "viz tap" >}} + +{{< cli/flags "viz tap" >}} + +### top + +{{< cli/description "viz top" >}} + +{{< cli/examples "viz top" >}} + +{{< cli/flags "viz top" >}} + +### uninstall + +{{< cli/description "viz uninstall" >}} + +{{< cli/examples "viz uninstall" >}} + +{{< cli/flags "viz uninstall" >}} diff --git a/linkerd.io/content/2.16/reference/cluster-configuration.md b/linkerd.io/content/2.16/reference/cluster-configuration.md new file mode 100644 index 0000000000..99172cd797 --- /dev/null +++ b/linkerd.io/content/2.16/reference/cluster-configuration.md @@ -0,0 +1,134 @@ ++++ +title = "Cluster Configuration" +description = "Configuration settings unique to providers and install methods." ++++ + +## GKE + +### Private Clusters + +If you are using a **private GKE cluster**, you are required to create a +firewall rule that allows the GKE operated api-server to communicate with the +Linkerd control plane. This makes it possible for features such as automatic +proxy injection to receive requests directly from the api-server. + +In this example, we will use [gcloud](https://cloud.google.com/sdk/install) to +simplify the creation of the said firewall rule. + +Setup: + +```bash +CLUSTER_NAME=your-cluster-name +gcloud config set compute/zone your-zone-or-region +``` + +Get the cluster `MASTER_IPV4_CIDR`: + +```bash +MASTER_IPV4_CIDR=$(gcloud container clusters describe $CLUSTER_NAME \ + | grep "masterIpv4CidrBlock: " \ + | awk '{print $2}') +``` + +Get the cluster `NETWORK`: + +```bash +NETWORK=$(gcloud container clusters describe $CLUSTER_NAME \ + | grep "^network: " \ + | awk '{print $2}') +``` + +Get the cluster auto-generated `NETWORK_TARGET_TAG`: + +```bash +NETWORK_TARGET_TAG=$(gcloud compute firewall-rules list \ + --filter network=$NETWORK --format json \ + | jq ".[] | select(.name | contains(\"$CLUSTER_NAME\"))" \ + | jq -r '.targetTags[0]' | head -1) +``` + +The format of the network tag should be something like `gke-cluster-name-xxxx-node`. + +Verify the values: + +```bash +echo $MASTER_IPV4_CIDR $NETWORK $NETWORK_TARGET_TAG + +# example output +10.0.0.0/28 foo-network gke-foo-cluster-c1ecba83-node +``` + +Create the firewall rules for `proxy-injector`, `policy-validator` and `tap`: + +```bash +gcloud compute firewall-rules create gke-to-linkerd-control-plane \ + --network "$NETWORK" \ + --allow "tcp:8443,tcp:8089,tcp:9443" \ + --source-ranges "$MASTER_IPV4_CIDR" \ + --target-tags "$NETWORK_TARGET_TAG" \ + --priority 1000 \ + --description "Allow traffic on ports 8443, 8089, 9443 for linkerd control-plane components" +``` + +Finally, verify that the firewall is created: + +```bash +gcloud compute firewall-rules describe gke-to-linkerd-control-plane +``` + +## Cilium + +### Turn Off Socket-Level Load Balancing + +Cilium can be configured to replace kube-proxy functionality through eBPF. When +running in kube-proxy replacement mode, connections to a `ClusterIP` service +will be established directly to the service's backend at the socket level (i.e. +during TCP connection establishment). Linkerd relies on `ClusterIPs` being +present on packets in order to do service discovery. + +When packets do not contain a `ClusterIP` address, Linkerd will instead forward +directly to the pod endpoint that was selected by Cilium. Consequentially, +while mTLS and telemetry will still function correctly, features such as peak +EWMA load balancing, and [dynamic request +routing](../../tasks/configuring-dynamic-request-routing/) may not work as +expected. + +This behavior can be turned off in Cilium by [turning off socket-level load +balancing for +pods](https://docs.cilium.io/en/v1.13/network/istio/#setup-cilium) through the +CLI option `--config bpf-lb-sock-hostns-only=true`, or through the Helm value +`socketLB.hostNamespaceOnly=true`. + +### Disable Exclusive Mode + +If you're using Cilium as your CNI and then want to install +[linkerd-cni](../../features/cni/) on top of it, make sure you install Cilium +with the option `cni.exclusive=false`. This avoids Cilium taking ownership over +the CNI configurations directory. Other CNI plugins like linkerd-cni install +themselves and operate in chain mode with the other deployed plugins by +deploying their configuration into this directory. + +## Lifecycle Hook Timeout + +Linkerd uses a `postStart` lifecycle hook for all control plane components, and +all injected workloads by default. The hook will poll proxy readiness through +[linkerd-await](https://github.com/linkerd/linkerd-await) and block the main +container from starting until the proxy is ready to handle traffic. By default, +the hook will time-out in 2 minutes. + +CNI plugins that are responsible for setting up and enforcing `NetworkPolicy` +resources can interfere with the lifecycle hook's execution. While lifecycle +hooks are running, the container will not reach a `Running` state. Some CNI +plugin implementations acquire the Pod's IP address only after all containers +have reached a running state, and the kubelet has updated the Pod's status +through the API Server. Without access to the Pod's IP, the CNI plugins will +not operate correctly. This in turn will block the proxy from being set-up, +since it does not have the necessary network connectivity. + +As a workaround, users can manually remove the `postStart` lifecycle hook from +control plane components. For injected workloads, users may opt out of the +lifecycle hook through the root-level `await: false` option, or alternatively, +behavior can be overridden at a workload or namespace level through the +annotation `config.linkerd.io/proxy-await: disabled`. Removing the hook will +allow containers to start asynchronously, unblocking network connectivity once +the CNI plugin receives the pod's IP. diff --git a/linkerd.io/content/2.16/reference/extension-list.md b/linkerd.io/content/2.16/reference/extension-list.md new file mode 100644 index 0000000000..2069506a52 --- /dev/null +++ b/linkerd.io/content/2.16/reference/extension-list.md @@ -0,0 +1,14 @@ ++++ +title = "Extensions List" +description = "List of Linkerd extensions that can be added to the installation for additional functionality" ++++ + +Linkerd provides a mix of built-in and third-party +[extensions](../../tasks/extensions/) to add additional functionality to the +base installation. The following is the list of known extensions: + +{{< extensions-2-10 >}} + +If you have an extension for Linkerd and it is not on the list, [please edit +this +page!](https://github.com/linkerd/website/edit/main/linkerd.io/data/extension-list.yaml) diff --git a/linkerd.io/content/2.16/reference/external-workload.md b/linkerd.io/content/2.16/reference/external-workload.md new file mode 100644 index 0000000000..21f70bf986 --- /dev/null +++ b/linkerd.io/content/2.16/reference/external-workload.md @@ -0,0 +1,105 @@ +--- +title: ExternalWorkload +--- + +Linkerd's [mesh expansion]({{< relref "../features/non-kubernetes-workloads" +>}}) functionality allows you to join workloads outside of Kubernetes into the +mesh. + +At its core, this behavior is controlled by an `ExternalWorkload` resource, +which is used by Linkerd to describe a workload that lives outside of Kubernetes +for discovery and policy. This resource contains information such as the +workload's identity, the concrete IP address as well as ports that this workload +accepts connections on. + +## ExternalWorkloads + +An ExternalWorkload is a namespace resource that defines a set of ports and an +IP address that is reachable from within the mesh. Linkerd uses that information +and translates it into `EndpointSlice`s that are then attached to `Service` objects. + +### Spec + +- `meshTLS` (required) - specified the identity information that Linkerd + requires to establish encrypted connections to this workload +- `workloadIPs` (required, at most 1) - an IP address that this workload is + reachable on +- `ports` - a list of port definitions that the workload exposes + +### MeshTLS + +- `identity` (required) - the TLS identity of the workload, proxies require this + value to establish TLS connections with the workload +- `serverName` (required) - this value is what the workload's proxy expects to + see in the `ClientHello` SNI TLS extension when other peers attempt to + initiate a TLS connection + +### Port + +- `name` - must be unique within the ports set. Each named port can be referred + to by services. +- `port` (required) - a port number that the workload is listening on +- `protocol` - protocol exposed by the port + +### Status + +- `conditions` - a list of condition objects + +### Condition + +- `lastProbeTime` - the last time the healthcheck endpoint was probed +- `lastTransitionTime` - the last time the condition transitioned from one + status to another +- `status` - status of the condition (one of True, False, Unknown) +- `type` - type of the condition (Ready is used for indicating discoverability) +- `reason` - contains a programmatic identifier indicating the reason for the + condition's last transition +- `message` - message is a human-readable message indicating details about the transition. + +## Example + +Below is an example of an `ExternalWorkload` resource that specifies a number of +ports and is selected by a service. + +```yaml +apiVersion: workload.linkerd.io/v1beta1 +kind: ExternalWorkload +metadata: + name: external-workload + namespace: mixed-env + labels: + location: vm + workload_name: external-workload +spec: + meshTLS: + identity: "spiffe://root.linkerd.cluster.local/external-workload" + serverName: "external-workload.cluster.local" + workloadIPs: + - ip: 193.1.4.11 + ports: + - port: 80 + name: http + - port: 9980 + name: admin +status: + conditions: + - type: Ready + status: "True" +--- +apiVersion: v1 +kind: Service +metadata: + name: external-workload + namespace: mixed-env +spec: + type: ClusterIP + selector: + workload_name: external-workload + ports: + - port: 80 + protocol: TCP + name: http + - port: 9980 + protocol: TCP + name: admin +``` diff --git a/linkerd.io/content/2.16/reference/helm-chart-version-matrix.md b/linkerd.io/content/2.16/reference/helm-chart-version-matrix.md new file mode 100644 index 0000000000..5785c30ab3 --- /dev/null +++ b/linkerd.io/content/2.16/reference/helm-chart-version-matrix.md @@ -0,0 +1,13 @@ ++++ +title = "Helm Chart Version Matrix" ++++ + +The following version matrices include only the latest versions of the stable +releases along with corresponding app and Helm versions for Linkerd and +extensions. Use these to guide you to the right Helm chart version or to +automate workflows you might have. + +* [YAML matrix](/releases/release_matrix.yaml) +* [JSON matrix](/releases/release_matrix.json) + +{{< release-data-table />}} diff --git a/linkerd.io/content/2.16/reference/httproute.md b/linkerd.io/content/2.16/reference/httproute.md new file mode 100644 index 0000000000..4f699cbd89 --- /dev/null +++ b/linkerd.io/content/2.16/reference/httproute.md @@ -0,0 +1,317 @@ ++++ +title = "HTTPRoute" +description = "Reference guide to HTTPRoute resources." ++++ + +## Linkerd and Gateway API HTTPRoutes + +The HTTPRoute resource was originally specified by the Kubernetes [Gateway API] +project. Linkerd currently supports two versions of the HTTPRoute resource: the +upstream version from the Gateway API, with the +`gateway.networking.kubernetes.io` API group, and a Linkerd-specific version, +with the `policy.linkerd.io` API group. While these two resource definitions are +largely the same, the `policy.linkerd.io` HTTPRoute resource is an experimental +version that contains features not yet stabilized in the upstream +`gateway.networking.k8s.io` HTTPRoute resource, such as +[timeouts](#httproutetimeouts). Both the Linkerd and Gateway API resource +definitions coexist within the same cluster, and both can be used to configure +policies for use with Linkerd. + +If the Gateway API CRDs already exist in your cluster, then Linkerd must be +installed with the `--set enableHttpRoutes=false` flag during the +`linkerd install --crds` step or with the `enableHttpRoutes=false` Helm value +when installing the `linkerd-crds` Helm chart. This avoid conflicts by +instructing Linkerd to not install the Gateway API CRDs and instead rely on the +Gateway CRDs which already exist. + +This documentation describes the `policy.linkerd.io` HTTPRoute resource. For a +similar description of the upstream Gateway API HTTPRoute resource, refer to the +Gateway API's [HTTPRoute +specification](https://gateway-api.sigs.k8s.io/references/spec/#gateway.networking.k8s.io/v1beta1.HTTPRoute). + +## HTTPRoute Spec + +An HTTPRoute spec may contain the following top level fields: + +{{< table >}} +| field| value | +|------|-------| +| `parentRefs`| A set of [ParentReference](#parentreference)s which indicate which [Servers](#server) or Services this HTTPRoute attaches to.| +| `hostnames`| A set of hostnames that should match against the HTTP Host header.| +| `rules`| An array of [HTTPRouteRules](#httprouterule).| +{{< /table >}} + +### parentReference + +A reference to the parent resource this HTTPRoute is a part of. + +HTTPRoutes can be attached to a [Server](../authorization-policy/#server) to +allow defining an [authorization +policy](../authorization-policy/#authorizationpolicy) for specific routes served +on that Server. + +HTTPRoutes can also be attached to a Service, in order to route requests +depending on path, headers, query params, and/or verb. Requests can then be +rerouted to different backend services. This can be used to perform [dynamic +request routing](../../tasks/configuring-dynamic-request-routing/). + +{{< warning >}} +**Outbound HTTPRoutes and [ServiceProfile](../../features/service-profiles/)s +provide overlapping configuration.** For backwards-compatibility reasons, a +ServiceProfile will take precedence over HTTPRoutes which configure the same +Service. If a ServiceProfile is defined for the parent Service of an HTTPRoute, +proxies will use the ServiceProfile configuration, rather than the HTTPRoute +configuration, as long as the ServiceProfile exists. +{{< /warning >}} + +ParentReferences are namespaced, and may reference either a parent in the same +namespace as the HTTPRoute, or one in a different namespace. As described in +[GEP-1426][ns-boundaries], a HTTPRoute with a `parentRef` that references a +Service in the same namespace as the HTTPRoute is referred to as a _producer +route_, while an HTTPRoute with a `parentRef` referencing a Service in a +different namespace is referred to as a _consumer route_. A producer route will +apply to requests originating from clients in any namespace. On the other hand, +a consumer route is scoped to apply only to traffic originating in the +HTTPRoute's namespace. See the ["Namespace boundaries" section in +GEP-1426][ns-boundaries] for details on producer and consumer routes. + +{{< table >}} +| field| value | +|------|-------| +| `group`| The group of the referent. This must either be "policy.linkerd.io" (for Server) or "core" (for Service).| +| `kind`| The kind of the referent. This must be either "Server" or "Service".| +| `port`| The targeted port number, when attaching to Services.| +| `namespace`| The namespace of the referent. When unspecified (or empty string), this refers to the local namespace of the Route.| +| `name`| The name of the referent.| +{{< /table >}} + +### httpRouteRule + +HTTPRouteRule defines semantics for matching an HTTP request based on conditions +(matches) and processing it (filters). + +{{< table >}} +| field| value | +|------|-------| +| `matches`| A list of [httpRouteMatches](#httproutematch). Each match is independent, i.e. this rule will be matched if **any** one of the matches is satisfied.| +| `filters`| A list of [httpRouteFilters](#httproutefilter) which will be applied to each request which matches this rule.| +| `backendRefs`| An array of [HTTPBackendRefs](#httpbackendref) to declare where the traffic should be routed to (only allowed with Service [parentRefs](#parentreference)).| +| `timeouts` | An optional [httpRouteTimeouts](#httproutetimeouts) object which configures timeouts for requests matching this rule. | +{{< /table >}} + +### httpRouteMatch + +HTTPRouteMatch defines the predicate used to match requests to a given +action. Multiple match types are ANDed together, i.e. the match will +evaluate to true only if all conditions are satisfied. + +{{< table >}} +| field| value | +|------|-------| +| `path`| An [httpPathMatch](#httppathmatch). If this field is not specified, a default prefix match on the "/" path is provided.| +| `headers`| A list of [httpHeaderMatches](#httpheadermatch). Multiple match values are ANDed together.| +| `queryParams`| A list of [httpQueryParamMatches](#httpqueryparammatch). Multiple match values are ANDed together.| +| `method`| When specified, this route will be matched only if the request has the specified method.| +{{< /table >}} + +### httpPathMatch + +`HTTPPathMatch` describes how to select a HTTP route by matching the HTTP +request path. + +{{< table >}} +| field| value | +|------|-------| +| `type`| How to match against the path Value. One of: Exact, PathPrefix, RegularExpression. If this field is not specified, a default of "PathPrefix" is provided.| +| `value`| The HTTP path to match against.| +{{< /table >}} + +### httpHeaderMatch + +`HTTPHeaderMatch` describes how to select a HTTP route by matching HTTP request +headers. + +{{< table >}} +| field| value | +|------|-------| +| `type`| How to match against the value of the header. One of: Exact, RegularExpression. If this field is not specified, a default of "Exact" is provided.| +| `name`| The HTTP Header to be matched against. Name matching MUST be case insensitive.| +| `value`| Value of HTTP Header to be matched.| +{{< /table >}} + +### httpQueryParamMatch + +`HTTPQueryParamMatch` describes how to select a HTTP route by matching HTTP +query parameters. + +{{< table >}} +| field| value | +|------|-------| +| `type`| How to match against the value of the query parameter. One of: Exact, RegularExpression. If this field is not specified, a default of "Exact" is provided.| +| `name`| The HTTP query param to be matched. This must be an exact string match.| +| `value`| Value of HTTP query param to be matched.| +{{< /table >}} + +### httpRouteFilter + +HTTPRouteFilter defines processing steps that must be completed during the +request or response lifecycle. + +{{< table >}} +| field| value | +|------|-------| +| `type`| One of: RequestHeaderModifier, ResponseHeaderModifier, or RequestRedirect.| +| `requestHeaderModifier`| An [httpHeaderFilter](#httpheaderfilter) which modifies request headers.| +| `responseHeaderModifier` | An [httpHeaderFilter](#httpheaderfilter) which modifies response headers.| +| `requestRedirect`| An [httpRequestRedirectFilter](#httprequestredirectfilter).| +{{< /table >}} + +### httpHeaderFilter + +A filter which modifies HTTP request or response headers. + +{{< table >}} +| field| value | +|------|-------| +| `set`| A list of [httpHeaders](#httpheader) to overwrite on the request or response.| +| `add`| A list of [httpHeaders](#httpheader) to add on to the request or response, appending to any existing value.| +| `remove`| A list of header names to remove from the request or response.| +{{< /table >}} + +### httpHeader + +`HTTPHeader` represents an HTTP Header name and value as defined by RFC 7230. + +{{< table >}} +| field| value | +|------|-------| +| `name`| Name of the HTTP Header to be matched. Name matching MUST be case insensitive.| +| `value`| Value of HTTP Header to be matched.| +{{< /table >}} + +### httpRequestRedirectFilter + +`HTTPRequestRedirect` defines a filter that redirects a request. + +{{< table >}} +| field| value | +|------|-------| +| `scheme`| The scheme to be used in the value of the `Location` header in the response. When empty, the scheme of the request is used.| +| `hostname`| The hostname to be used in the value of the `Location` header in the response. When empty, the hostname of the request is used.| +| `path`| An [httpPathModfier](#httppathmodfier) which modifies the path of the incoming request and uses the modified path in the `Location` header.| +| `port`| The port to be used in the value of the `Location` header in the response. When empty, port (if specified) of the request is used.| +| `statusCode`| The HTTP status code to be used in response.| +{{< /table >}} + +### httpPathModfier + +`HTTPPathModifier` defines configuration for path modifiers. + +{{< table >}} +| field| value | +|------|-------| +| `type`| One of: ReplaceFullPath, ReplacePrefixMatch.| +| `replaceFullPath`| The value with which to replace the full path of a request during a rewrite or redirect.| +| `replacePrefixMatch`| The value with which to replace the prefix match of a request during a rewrite or redirect.| +{{< /table >}} + +### httpBackendRef + +`HTTPBackendRef` defines the list of objects where matching requests should be +sent to. Only allowed when a route has Service [parentRefs](#parentReference). + +{{< table >}} +| field| value | +|------|-------| +| `name`| Name of service for this backend.| +| `port`| Destination port number for this backend.| +| `namespace`| Namespace of service for this backend.| +| `weight`| Proportion of requests sent to this backend.| +{{< /table >}} + +### httpRouteTimeouts + +`HTTPRouteTimeouts` defines the timeouts that can be configured for an HTTP +request. + +Linkerd implements HTTPRoute timeouts as described in [GEP-1742]. Timeout +durations are specified as strings using the [Gateway API duration format] +specified by [GEP-2257](https://gateway-api.sigs.k8s.io/geps/gep-2257/) (e.g. +1h/1m/1s/1ms), and MUST be at least 1ms. A timeout field with duration 0 +disables that timeout. + +{{< table >}} +| field| value | +|------|-------| +| `request` | Specifies the duration for processing an HTTP client request after which the proxy will time out if unable to send a response. When this field is unspecified or 0, the proxy will not enforce request timeouts. | +| `backendRequest` | Specifies a timeout for an individual request from the proxy to a backend service. This covers the time from when the request first starts being sent from the proxy to when the response has been received from the backend. When this field is unspecified or 0, the proxy will not enforce a backend request timeout, but may still enforce the `request` timeout, if one is configured. | +{{< /table >}} + +If retries are enabled, a request received by the proxy may be retried by +sending it to a different backend. In this case, a new `backendRequest` timeout +will be started for each retry request, but each retry request will count +against the overall `request` timeout. + +[GEP-1742]: https://gateway-api.sigs.k8s.io/geps/gep-1742/ +[Gateway API duration format]: https://gateway-api.sigs.k8s.io/geps/gep-2257/#gateway-api-duration-format + +## HTTPRoute Examples + +An HTTPRoute attached to a Server resource which matches GETs to +`/authors.json` or `/authors/*`: + +```yaml +apiVersion: policy.linkerd.io/v1beta2 +kind: HTTPRoute +metadata: + name: authors-get-route + namespace: booksapp +spec: + parentRefs: + - name: authors-server + kind: Server + group: policy.linkerd.io + rules: + - matches: + - path: + value: "/authors.json" + method: GET + - path: + value: "/authors/" + type: "PathPrefix" + method: GET +``` + +An HTTPRoute attached to a Service to perform header-based routing. If there's +a `x-faces-user: testuser` header in the request, the request is routed to the +`smiley2` backend Service. Otherwise, the request is routed to the `smiley` +backend Service. + +```yaml +apiVersion: policy.linkerd.io/v1beta2 +kind: HTTPRoute +metadata: + name: smiley-a-b + namespace: faces +spec: + parentRefs: + - name: smiley + kind: Service + group: core + port: 80 + rules: + - matches: + - headers: + - name: "x-faces-user" + value: "testuser" + backendRefs: + - name: smiley2 + port: 80 + - backendRefs: + - name: smiley + port: 80 +``` + +[ServiceProfile]: ../../features/service-profiles/ +[Gateway API]: https://gateway-api.sigs.k8s.io/ +[GEP-1426]: https://gateway-api.sigs.k8s.io/geps/gep-1426/#namespace-boundaries diff --git a/linkerd.io/content/2.16/reference/iptables.md b/linkerd.io/content/2.16/reference/iptables.md new file mode 100644 index 0000000000..4c539474d6 --- /dev/null +++ b/linkerd.io/content/2.16/reference/iptables.md @@ -0,0 +1,198 @@ ++++ +title = "IPTables Reference" +description = "A table with all of the chains and associated rules" ++++ + +In order to route TCP traffic in a pod to and from the proxy, an [`init +container`](https://kubernetes.io/docs/concepts/workloads/pods/init-containers/) +is used to set up `iptables` rules at the start of an injected pod's +lifecycle. + +At first, `linkerd-init` will create two chains in the `nat` table: +`PROXY_INIT_REDIRECT`, and `PROXY_INIT_OUTPUT`. These chains are used to route +inbound and outbound packets through the proxy. Each chain has a set of rules +attached to it, these rules are traversed by a packet in order. + +## Inbound connections + +When a packet arrives in a pod, it will typically be processed by the +`PREROUTING` chain, a default chain attached to the `nat` table. The sidecar +container will create a new chain to process inbound packets, called +`PROXY_INIT_REDIRECT`. The sidecar container creates a rule +(`install-proxy-init-prerouting`) to send packets from the `PREROUTING` chain +to our redirect chain. This is the first rule traversed by an inbound packet. + +The redirect chain will be configured with two more rules: + +1. `ignore-port`: will ignore processing packets whose destination ports are + included in the `skip-inbound-ports` install option. +2. `proxy-init-redirect-all`: will redirect all incoming TCP packets through + the proxy, on port `4143`. + +Based on these two rules, there are two possible paths that an inbound packet +can take, both of which are outlined below. + +{{}} + +The packet will arrive on the `PREROUTING` chain and will be immediately routed +to the redirect chain. If its destination port matches any of the inbound ports +to skip, then it will be forwarded directly to the application process, +_bypassing the proxy_. The list of destination ports to check against can be +[configured when installing Linkerd](../cli/install/#). If the +packet does not match any of the ports in the list, it will be redirected +through the proxy. Redirection is done by changing the incoming packet's +destination header, the target port will be replaced with `4143`, which is the +proxy's inbound port. The proxy will process the packet and produce a new one +that will be forwarded to the service; it will be able to get the original +target (IP:PORT) of the inbound packet by using a special socket option +[`SO_ORIGINAL_DST`](https://linux.die.net/man/3/getsockopt). The new packet +will be routed through the `OUTPUT` chain, from there it will be sent to the +application. The `OUTPUT` chain rules are covered in more detail below. + +## Outbound connections + +When a packet leaves a pod, it will first traverse the `OUTPUT` chain, the +first default chain an outgoing packet traverses in the `nat` table. To +redirect outgoing packets through the outbound side of the proxy, the sidecar +container will again create a new chain. The first outgoing rule is similar to +the inbound counterpart: any packet that traverses the `OUTPUT` chain should be +forwarded to our `PROXY_INIT_OUTPUT` chain to be processed. + +The output redirect chain is slightly harder to understand but follows the same +logical flow as the inbound redirect chain, in total there are 4 rules +configured: + +1. `ignore-proxy-uid`: any packets owned by the proxy (whose user id is + `2102`), will skip processing and return to the previous (`OUTPUT`) chain. + From there, it will be sent on the outbound network interface (either to + the application, in the case of an inbound packet, or outside of the pod, + for an outbound packet). +2. `ignore-loopback`: if the packet is sent over the loopback interface + (`lo`), it will skip processing and return to the previous chain. From + here, the packet will be sent to the destination, much like the first rule + in the chain. +3. `ignore-port`: will ignore processing packets whose destination ports are + included in the `skip-outbound-ports` install option. +4. `redirect-all-outgoing`: the last rule in the chain, it will redirect all + outgoing TCP packets to port `4140`, the proxy's outbound port. If a + packet has made it this far, it is guaranteed its destination is not local + (i.e `lo`) and it has not been produced by the proxy. This means the + packet has been produced by the service, so it should be forwarded to its + destination by the proxy. + +{{< fig src="/images/iptables/iptables-fig2-2.png" +title="Outbound iptables chain traversal" >}} + +A packet produced by the service will first hit the `OUTPUT` chain; from here, +it will be sent to our own output chain for processing. The first rule it +encounters in `PROXY_INIT_OUTPUT` will be `ignore-proxy-uid`. Since the packet +was generated by the service, this rule will be skipped. If the packet's +destination is not a port bound on localhost (e.g `127.0.0.1:80`), then it will +skip the second rule as well. The third rule, `ignore-port` will be matched if +the packet's destination port is in the outbound ports to skip list, in this +case, it will be sent out on the network interface, bypassing the proxy. If the +rule is not matched, then the packet will reach the final rule in the chain +`redirect-all-outgoing`-- as the name implies, it will be sent to the proxy to +be processed, on its outbound port `4140`. Much like in the inbound case, the +routing happens at the `nat` level, the packet's header will be re-written to +target the outbound port. The proxy will process the packet and then forward it +to its destination. The new packet will take the same path through the `OUTPUT` +chain, however, it will stop at the first rule, since it was produced by the +proxy. + +The substantiated explanation applies to a packet whose destination is another +service, outside of the pod. In practice, an application can also send traffic +locally. As such, there are two other possible scenarios that we will explore: +_when a service talks to itself_ (by sending traffic over localhost or by using +its own endpoint address), and when _a service talks to itself through a +`clusterIP` target_. Both scenarios are somehow related, but the path a packet +takes differs. + +**A service may send requests to itself**. It can also target another container +in the pod. This scenario would typically apply when: + +* The destination is the pod (or endpoint) IP address. +* The destination is a port bound on localhost (regardless of which container +it belongs to). + +{{< fig src="/images/iptables/iptables-fig2-3.png" +title="Outbound iptables chain traversal" >}} + +When the application targets itself through its pod's IP (or loopback address), +the packets will traverse the two output chains. The first rule will be +skipped, since the owner is the application, and not the proxy. Once the second +rule is matched, the packets will return to the first output chain, from here, +they'll be sent directly to the service. + +{{< note >}} +Usually, packets traverse another chain on the outbound side called +`POSTROUTING`. This chain is traversed after the `OUTPUT` chain, but to keep +the explanation simple, it has not been mentioned. Likewise, outbound packets that +are sent over the loopback interface become inbound packets, since they need to +be processed again. The kernel takes shortcuts in this case and bypasses the +`PREROUTING` chain that inbound packets from the outside world traverse when +they first arrive. For this reason, we do not need any special rules on the +inbound side to account for outbound packets that are sent locally. +{{< /note >}} + +**A service may send requests to itself using its clusterIP**. In such cases, +it is not guaranteed that the destination will be local. The packet follows an +unusual path, as depicted in the diagram below. + +{{< fig src="/images/iptables/iptables-fig2-4.png" +title="Outbound iptables chain traversal" >}} + +When the packet first traverses the output chains, it will follow the same path +an outbound packet would normally take. In such a scenario, the packet's +destination will be an address that is not considered to be local by the +kernel-- it is, after all, a virtual IP. The proxy will process the packet, at +a connection level, connections to a `clusterIP` will be load balanced between +endpoints. Chances are that the endpoint selected will be the pod itself, +packets will therefore never leave the pod; the destination will be resolved to +the podIP. The packets produced by the proxy will traverse the output chain and +stop at the first rule, then they will be forwarded to the service. This +constitutes an edge case because at this point, the packet has been processed +by the proxy, unlike the scenario previously discussed where it skips it +altogether. For this reason, at a connection level, the proxy will _not_ mTLS +or opportunistically upgrade the connection to HTTP/2 when the endpoint is +local to the pod. In practice, this is treated as if the destination was +loopback, with the exception that the packet is forwarded through the proxy, +instead of being forwarded from the service directly to itself. + +## Rules table + +For reference, you can find the actual commands used to create the rules below. +Alternatively, if you want to inspect the iptables rules created for a pod, you +can retrieve them through the following command: + +```bash +$ kubectl -n logs linkerd-init +# where is the name of the pod +# you want to see the iptables rules for +``` + +### Inbound + +{{< table >}} +| # | name | iptables rule | description| +|---|------|---------------|------------| +| 1 | redirect-common-chain | `iptables -t nat -N PROXY_INIT_REDIRECT`| creates a new `iptables` chain to add inbound redirect rules to; the chain is attached to the `nat` table | +| 2 | ignore-port | `iptables -t nat -A PROXY_INIT_REDIRECT -p tcp --match multiport --dports -j RETURN` | configures `iptables` to ignore the redirect chain for packets whose dst ports are included in the `--skip-inbound-ports` config option | +| 3 | proxy-init-redirect-all | `iptables -t nat -A PROXY_INIT_REDIRECT -p tcp -j REDIRECT --to-port 4143` | configures `iptables` to redirect all incoming TCP packets to port `4143`, the proxy's inbound port | +| 4 | install-proxy-init-prerouting | `iptables -t nat -A PREROUTING -j PROXY_INIT_REDIRECT` | the last inbound rule configures the `PREROUTING` chain (first chain a packet traverses inbound) to send packets to the redirect chain for processing | +{{< /table >}} + +### Outbound + +{{< table >}} +| # | name | iptables rule | description | +|---|------|---------------|-------------| +| 1 | redirect-common-chain | `iptables -t nat -N PROXY_INIT_OUTPUT`| creates a new `iptables` chain to add outbound redirect rules to, also attached to the `nat` table | +| 2 | ignore-proxy-uid | `iptables -t nat -A PROXY_INIT_OUTPUT -m owner --uid-owner 2102 -j RETURN` | when a packet is owned by the proxy (`--uid-owner 2102`), skip processing and return to the previous (`OUTPUT`) chain | +| 3 | ignore-loopback | `iptables -t nat -A PROXY_INIT_OUTPUT -o lo -j RETURN` | when a packet is sent over the loopback interface (`lo`), skip processing and return to the previous chain | +| 4 | ignore-port | `iptables -t nat -A PROXY_INIT_OUTPUT -p tcp --match multiport --dports -j RETURN` | configures `iptables` to ignore the redirect output chain for packets whose dst ports are included in the `--skip-outbound-ports` config option | +| 5 | redirect-all-outgoing | `iptables -t nat -A PROXY_INIT_OUTPUT -p tcp -j REDIRECT --to-port 4140`| configures `iptables` to redirect all outgoing TCP packets to port `4140`, the proxy's outbound port | +| 6 | install-proxy-init-output | `iptables -t nat -A OUTPUT -j PROXY_INIT_OUTPUT` | the last outbound rule configures the `OUTPUT` chain (second before last chain a packet traverses outbound) to send packets to the redirect output chain for processing | +{{< /table >}} + diff --git a/linkerd.io/content/2.16/reference/k8s-versions.md b/linkerd.io/content/2.16/reference/k8s-versions.md new file mode 100644 index 0000000000..6e6bd52262 --- /dev/null +++ b/linkerd.io/content/2.16/reference/k8s-versions.md @@ -0,0 +1,40 @@ ++++ +title = "Supported Kubernetes Versions" +description = "Reference documentation for which Linkerd version supports which Kubernetes version" ++++ + +Linkerd supports all versions of Kubernetes that were supported at the time +that a given Linkerd version ships. For example, at the time that Linkerd 2.14 +shipped, Kubernetes versions 1.26, 1.27, and 1.28 were supported, so Linkerd +2.14 supports all of those Kubernetes versions. (In many cases, as you'll see +below, Linkerd versions will also support older Kubernetes versions.) + +Obviously, Linkerd 2.14 has no knowledge of what changes will come _after_ +Kubernetes 1.28. In some cases, later versions of Kubernetes end up making +changes that cause older versions of Linkerd to not work: we will update the +chart below as these situations arise. + +{{< table >}} +| Linkerd Version | Minimum Kubernetes Version | Maximum Kubernetes Version | +|-----------------|----------------------------|----------------------------| +| `2.10` | `1.16` | `1.23` | +| `2.11` | `1.17` | `1.23` | +| `2.12` | `1.21` | `1.24` | +| `2.13` | `1.21` | `1.28` | +| `2.14` | `1.21` | `1.28` | +| `2.15` | `1.22` | `1.29` | +{{< /table >}} + +Note that Linkerd will almost never change the supported Kubernetes version in +a minor release, which is why the table above only lists major versions. One +known exception: Linkerd 2.11.0 supported Kubernetes 1.16, but 2.11.1 and +later required Kubernetes 1.17 as shown in the table above. + +## Edge Releases + +{{< table >}} +| Linkerd Version | Minimum Kubernetes Version | Maximum Kubernetes Version | +|-----------------|----------------------------|----------------------------| +| `edge-22.10.1` - `edge-23.12.1` | `1.21` | `1.29` | +| `edge-23.12.2` and newer | `1.22` | `1.29` | +{{< /table >}} diff --git a/linkerd.io/content/2.16/reference/multicluster.md b/linkerd.io/content/2.16/reference/multicluster.md new file mode 100644 index 0000000000..375a30766b --- /dev/null +++ b/linkerd.io/content/2.16/reference/multicluster.md @@ -0,0 +1,70 @@ ++++ +title = "Multi-cluster communication" +description = "Multi-cluster communication" ++++ + +Linkerd's [multi-cluster functionality](../../features/multicluster/) allows +pods to connect to Kubernetes services across cluster boundaries in a way that +is secure and fully transparent to the application. As of Linkerd 2.14, this +feature supports two modes: hierarchical (using an gateway) and flat (without a +gateway): + +* **Flat mode** requires that all pods on the source cluster be able to directly + connect to pods on the destination cluster. +* **Hierarchical mode** only requires that the gateway IP of the destination + cluster be reachable by pods on the source cluster. + +These modes can be mixed and matched. + +{{< fig + alt="Architectural diagram comparing hierarchical and flat network modes" + src="/uploads/2023/07/flat_network@2x.png">}} + +Hierarchical mode places a bare minimum of requirements on the underlying +network, as it only requires that the gateway IP be reachable. However, flat +mode has a few advantages over the gateway approach used in hierarchical mode, +including reducing latency and preserving client identity. + +## Service mirroring + +Linkerd's multi-cluster functionality uses a *service mirror* component that +watches a target cluster for updates to services and mirrors those service +updates locally to a source cluster. + +Multi-cluster support is underpinned by a concept known as service mirroring. +Mirroring refers to importing a service definition from another cluster, and it +allows applications to address and consume multi-cluster services. The *service +mirror* component runs on the source cluster; it watches a target cluster for +updates to services and mirrors those updates locally in the source cluster. +Only Kubernetes service objects that match a label selector are exported. + +The label selector also controls the mode a service is exported in. For example, +by default, services labeled with `mirror.linkerd.io/exported=true` will be +exported in hierarchical (gateway) mode, whereas services labeled with +`mirror.linkerd.io/exported=remote-discovery` will be exported in flat +(pod-to-pod) mode. Since the configuration is service-centric, switching from +gateway to pod-to-pod mode is trivial and does not require the extension to be +re-installed. + +{{< note >}} +In flat mode, the namespace of the Linkerd control plane should be the same +across all clusters. We recommend leaving this at the default value of +`linkerd`. +{{< /note >}} + +The term "remote-discovery" refers to how the imported services should be +interpreted by Linkerd's control plane. Service discovery is performed by the +[*destination service*](../../reference/architecture/#the-destination-service). +Whenever traffic is sent to a target imported in "remote-discovery" mode, the +destination service knows to look for all relevant information in the cluster +the service has been exported from, not locally. In contrast, service discovery +for a hierarchical (gateway mode) import will be performed locally; instead of +routing directly to a pod, traffic will be sent to the gateway address on the +target cluster. + +Linkerd's *destination service* performs remote discovery by connecting directly +to multiple Kubernetes API servers. Whenever two clusters are connected +together, a Kubernetes `Secret` is created in the control plane's namespace with +a kubeconfig file that allows an API client to be configured. The kubeconfig +file uses RBAC to provide the "principle of least privilege", ensuring the +*destination service* may only access only the resources it needs. diff --git a/linkerd.io/content/2.16/reference/proxy-configuration.md b/linkerd.io/content/2.16/reference/proxy-configuration.md new file mode 100644 index 0000000000..a3cd3b15de --- /dev/null +++ b/linkerd.io/content/2.16/reference/proxy-configuration.md @@ -0,0 +1,61 @@ ++++ +title = "Proxy Configuration" +description = "Linkerd provides a set of annotations that can be used to override the data plane proxy's configuration." ++++ + +Linkerd provides a set of annotations that can be used to **override** the data +plane proxy's configuration. This is useful for **overriding** the default +configurations of [auto-injected proxies](../../features/proxy-injection/). + +The following is the list of supported annotations: + +{{< cli/annotations "inject" >}} + +For example, to update an auto-injected proxy's CPU and memory resources, we +insert the appropriate annotations into the `spec.template.metadata.annotations` +of the owner's pod spec, using `kubectl edit` like this: + +```yaml +spec: + template: + metadata: + annotations: + config.linkerd.io/proxy-cpu-limit: "1" + config.linkerd.io/proxy-cpu-request: "0.2" + config.linkerd.io/proxy-memory-limit: 2Gi + config.linkerd.io/proxy-memory-request: 128Mi +``` + +See [here](../../tasks/configuring-proxy-concurrency/) for details on tuning the +proxy's resource usage. + +For proxies injected using the `linkerd inject` command, configuration can be +overridden using the [command-line flags](../cli/inject/). + +## Ingress Mode + +{{< warning >}} +When an ingress is meshed in `ingress` mode by using `linkerd.io/inject: +ingress`, the ingress _must_ be configured to remove the `l5d-dst-override` +header to avoid creating an open relay to cluster-local and external endpoints. +{{< /warning >}} + +Proxy ingress mode is a mode of operation designed to help Linkerd integrate +with certain ingress controllers. Ingress mode is necessary if the ingress +itself cannot be otherwise configured to use the Service port/ip as the +destination. + +When an individual Linkerd proxy is set to `ingress` mode, it will route +requests based on their `:authority`, `Host`, or `l5d-dst-override` headers +instead of their original destination. This will inform Linkerd to override the +endpoint selection of the ingress container and to perform its own endpoint +selection, enabling features such as per-route metrics and traffic splitting. + +The proxy can be made to run in `ingress` mode by using the `linkerd.io/inject: +ingress` annotation rather than the default `linkerd.io/inject: enabled` +annotation. This can also be done with the `--ingress` flag in the `inject` CLI +command: + +```bash +kubectl get deployment -n -o yaml | linkerd inject --ingress - | kubectl apply -f - +``` diff --git a/linkerd.io/content/2.16/reference/proxy-log-level.md b/linkerd.io/content/2.16/reference/proxy-log-level.md new file mode 100644 index 0000000000..facb9eb161 --- /dev/null +++ b/linkerd.io/content/2.16/reference/proxy-log-level.md @@ -0,0 +1,39 @@ ++++ +title = "Proxy Log Level" +description = "Syntax of the proxy log level." ++++ + +The Linkerd proxy's log level can be configured via the: + +* `LINKERD_PROXY_LOG` environment variable +* `--proxy-log-level` CLI flag of the `install`, `inject` and `upgrade` commands +* `config.linkerd.io/proxy-log-level` annotation + (see [Proxy Configuration](../proxy-configuration/)) + which sets `LINKERD_PROXY_LOG` environment-variable on the injected sidecar +* an [endpoint on the admin port](../../tasks/modifying-proxy-log-level/) + of a running proxy. + +The log level is a comma-separated list of log directives, which is +based on the logging syntax of the [`env_logger` crate](https://docs.rs/env_logger/0.6.1/env_logger/#enabling-logging). + +A log directive consists of either: + +* A level (e.g. `info`), which sets the global log level, or +* A module path (e.g. `foo` or `foo::bar::baz`), or +* A module path followed by an equals sign and a level (e.g. `foo=warn` +or `foo::bar::baz=debug`), which sets the log level for that module + +A level is one of: + +* `trace` +* `debug` +* `info` +* `warn` +* `error` + +A module path represents the path to a Rust module. It consists of one or more +module names, separated by `::`. + +A module name starts with a letter, and consists of alphanumeric characters and `_`. + +The proxy's default log level is set to `warn,linkerd2_proxy=info`. diff --git a/linkerd.io/content/2.16/reference/proxy-metrics.md b/linkerd.io/content/2.16/reference/proxy-metrics.md new file mode 100644 index 0000000000..7f19c3ad99 --- /dev/null +++ b/linkerd.io/content/2.16/reference/proxy-metrics.md @@ -0,0 +1,283 @@ ++++ +title = "Proxy Metrics" +description = "The Linkerd proxy natively exports Prometheus metrics for all incoming and outgoing traffic." +aliases = [ + "/proxy-metrics/", + "../proxy-metrics/", + "../observability/proxy-metrics/" +] ++++ + +The Linkerd proxy exposes metrics that describe the traffic flowing through the +proxy. The following metrics are available at `/metrics` on the proxy's metrics +port (default: `:4191`) in the [Prometheus format][prom-format]. + +## Protocol-Level Metrics + +* `request_total`: A counter of the number of requests the proxy has received. + This is incremented when the request stream begins. + +* `response_total`: A counter of the number of responses the proxy has received. + This is incremented when the response stream ends. + +* `response_latency_ms`: A histogram of response latencies. This measurement + reflects the [time-to-first-byte][ttfb] (TTFB) by recording the elapsed time + between the proxy processing a request's headers and the first data frame of the + response. If a response does not include any data, the end-of-stream event is + used. The TTFB measurement is used so that Linkerd accurately reflects + application behavior when a server provides response headers immediately but is + slow to begin serving the response body. + +* `route_request_total`, `route_response_latency_ms`, and `route_response_total`: + These metrics are analogous to `request_total`, `response_latency_ms`, and + `response_total` except that they are collected at the route level. This + means that they do not have `authority`, `tls`, `grpc_status_code` or any + outbound labels but instead they have: + * `dst`: The authority of this request. + * `rt_route`: The name of the route for this request. + +* `control_request_total`, `control_response_latency_ms`, and `control_response_total`: + These metrics are analogous to `request_total`, `response_latency_ms`, and + `response_total` but for requests that the proxy makes to the Linkerd control + plane. Instead of `authority`, `direction`, or any outbound labels, instead + they have: + * `addr`: The address used to connect to the control plane. + +* `inbound_http_authz_allow_total`: A counter of the total number of inbound + HTTP requests that were authorized. + * `authz_name`: The name of the authorization policy used to allow the request. + +* `inbound_http_authz_deny_total`: A counter of the total number of inbound + HTTP requests that could not be processed due to being denied by the + authorization policy. + +* `inbound_http_route_not_found_total`: A counter of the total number of + inbound HTTP requests that could not be associated with a route. + +Note that latency measurements are not exported to Prometheus until the stream +_completes_. This is necessary so that latencies can be labeled with the appropriate +[response classification](#response-labels). + +### Labels + +Each of these metrics has the following labels: + +* `authority`: The value of the `:authority` (HTTP/2) or `Host` (HTTP/1.1) + header of the request. +* `direction`: `inbound` if the request originated from outside of the pod, + `outbound` if the request originated from inside of the pod. +* `tls`: `true` if the request's connection was secured with TLS. + +#### Response Labels + +The following labels are only applicable on `response_*` metrics. + +* `status_code`: The HTTP status code of the response. + +#### Response Total Labels + +In addition to the labels applied to all `response_*` metrics, the +`response_total`, `route_response_total`, and `control_response_total` metrics +also have the following labels: + +* `classification`: `success` if the response was successful, or `failure` if + a server error occurred. This classification is based on + the gRPC status code if one is present, and on the HTTP + status code otherwise. +* `grpc_status_code`: The value of the `grpc-status` trailer. Only applicable + for gRPC responses. + +{{< note >}} +Because response classification may be determined based on the `grpc-status` +trailer (if one is present), a response may not be classified until its body +stream completes. Response latency, however, is determined based on +[time-to-first-byte][ttfb], so the `response_latency_ms` metric is recorded as +soon as data is received, rather than when the response body ends. Therefore, +the values of the `classification` and `grpc_status_code` labels are not yet +known when the `response_latency_ms` metric is recorded. +{{< /note >}} + +#### Outbound labels + +The following labels are only applicable if `direction=outbound`. + +* `dst_deployment`: The deployment to which this request is being sent. +* `dst_k8s_job`: The job to which this request is being sent. +* `dst_replicaset`: The replica set to which this request is being sent. +* `dst_daemonset`: The daemon set to which this request is being sent. +* `dst_statefulset`: The stateful set to which this request is being sent. +* `dst_replicationcontroller`: The replication controller to which this request + is being sent. +* `dst_namespace`: The namespace to which this request is being sent. +* `dst_service`: The service to which this request is being sent. +* `dst_pod_template_hash`: The [pod-template-hash][pod-template-hash] of the pod + to which this request is being sent. This label + selector roughly approximates a pod's `ReplicaSet` or + `ReplicationController`. + +#### Prometheus Collector labels + +The following labels are added by the Prometheus collector. + +* `instance`: ip:port of the pod. +* `job`: The Prometheus job responsible for the collection, typically + `linkerd-proxy`. + +##### Kubernetes labels added at collection time + +Kubernetes namespace, pod name, and all labels are mapped to corresponding +Prometheus labels. + +* `namespace`: Kubernetes namespace that the pod belongs to. +* `pod`: Kubernetes pod name. +* `pod_template_hash`: Corresponds to the [pod-template-hash][pod-template-hash] + Kubernetes label. This value changes during redeploys and + rolling restarts. This label selector roughly + approximates a pod's `ReplicaSet` or + `ReplicationController`. + +##### Linkerd labels added at collection time + +Kubernetes labels prefixed with `linkerd.io/` are added to your application at +`linkerd inject` time. More specifically, Kubernetes labels prefixed with +`linkerd.io/proxy-*` will correspond to these Prometheus labels: + +* `daemonset`: The daemon set that the pod belongs to (if applicable). +* `deployment`: The deployment that the pod belongs to (if applicable). +* `k8s_job`: The job that the pod belongs to (if applicable). +* `replicaset`: The replica set that the pod belongs to (if applicable). +* `replicationcontroller`: The replication controller that the pod belongs to + (if applicable). +* `statefulset`: The stateful set that the pod belongs to (if applicable). + +### Example + +Here's a concrete example, given the following pod snippet: + +```yaml +name: vote-bot-5b7f5657f6-xbjjw +namespace: emojivoto +labels: + app: vote-bot + linkerd.io/control-plane-ns: linkerd + linkerd.io/proxy-deployment: vote-bot + pod-template-hash: "3957278789" + test: vote-bot-test +``` + +The resulting Prometheus labels will look like this: + +```bash +request_total{ + pod="vote-bot-5b7f5657f6-xbjjw", + namespace="emojivoto", + app="vote-bot", + control_plane_ns="linkerd", + deployment="vote-bot", + pod_template_hash="3957278789", + test="vote-bot-test", + instance="10.1.3.93:4191", + job="linkerd-proxy" +} +``` + +## Transport-Level Metrics + +The following metrics are collected at the level of the underlying transport +layer. + +* `tcp_open_total`: A counter of the total number of opened transport + connections. +* `tcp_close_total`: A counter of the total number of transport connections + which have closed. +* `tcp_open_connections`: A gauge of the number of transport connections + currently open. +* `tcp_write_bytes_total`: A counter of the total number of sent bytes. This is + updated when the connection closes. +* `tcp_read_bytes_total`: A counter of the total number of received bytes. This + is updated when the connection closes. +* `tcp_connection_duration_ms`: A histogram of the duration of the lifetime of a + connection, in milliseconds. This is updated when the connection closes. +* `inbound_tcp_errors_total`: A counter of the total number of inbound TCP + connections that could not be processed due to a proxy error. +* `outbound_tcp_errors_total`: A counter of the total number of outbound TCP + connections that could not be processed due to a proxy error. +* `inbound_tcp_authz_allow_total`: A counter of the total number of TCP + connections that were authorized. +* `inbound_tcp_authz_deny_total`: A counter of the total number of TCP + connections that were denied + +### Labels + +Each of these metrics has the following labels: + +* `direction`: `inbound` if the connection was established either from outside the + pod to the proxy, or from the proxy to the application, + `outbound` if the connection was established either from the + application to the proxy, or from the proxy to outside the pod. +* `peer`: `src` if the connection was accepted by the proxy from the source, + `dst` if the connection was opened by the proxy to the destination. + +Note that the labels described above under the heading "Prometheus Collector labels" +are also added to transport-level metrics, when applicable. + +#### Connection Close Labels + +The following labels are added only to metrics which are updated when a +connection closes (`tcp_close_total` and `tcp_connection_duration_ms`): + +* `classification`: `success` if the connection terminated cleanly, `failure` if + the connection closed due to a connection failure. + +## Identity Metrics + +* `identity_cert_expiration_timestamp_seconds`: A gauge of the time when the + proxy's current mTLS identity certificate will expire (in seconds since the UNIX + epoch). +* `identity_cert_refresh_count`: A counter of the total number of times the + proxy's mTLS identity certificate has been refreshed by the Identity service. + +## Outbound `xRoute` Metrics + +When performing policy-based routing, proxies may dispatch requests through +per-route backend configurations. In order to record how routing rules +apply and how backend distributions are applied, the outbound proxy records the +following metrics: + +* `outbound_http_route_backend_requests_total`: A counter of the total number of + outbound HTTP requests dispatched to a route-backend. +* `outbound_grpc_route_backend_requests_total`: A counter of the total number of + outbound gRPC requests dispatched to a route-backend. +* `outbound_http_balancer_endpoints`: A gauge of the number of endpoints in an + outbound load balancer. + +### Labels + +Each of these metrics has the following common labels, which describe the +Kubernetes resources to which traffic is routed by the proxy: + +* `parent_group`, `parent_kind`, `parent_name`, and `parent_namespace` reference + the parent resource through which the proxy discovered the route binding. + The parent resource of an [HTTPRoute] is generally a Service. +* `route_group`, `route_kind`, `route_name`, and `route_namespace` reference the + route resource through which the proxy discovered the route binding. This will + either reference an [HTTPRoute] resource or a default (synthesized) route. +* `backend_group`, `backend_kind`, `backend_name`, and `backend_namespace` + reference the backend resource to which which the proxy routed the request. + This will always be a Service. + +In addition, the `outbound_http_balancer_endpoints` gauge metric adds the +following labels: + +* `endpoint_state`: Either "ready" if the endpoint is available to have requests + routed to it by the load balancer, or "pending" if the endpoint is currently + unavailable. + + Endpoints may be "pending" when a connection is being established (or + reestablished), or when the endpoint has been [made unavailable by failure + accrual](../circuit-breaking/). + +[prom-format]: https://prometheus.io/docs/instrumenting/exposition_formats/#format-version-0.0.4 +[pod-template-hash]: https://kubernetes.io/docs/concepts/workloads/controllers/deployment/#pod-template-hash-label +[ttfb]: https://en.wikipedia.org/wiki/Time_to_first_byte +[HTTPRoute]: ../../features/httproute/ diff --git a/linkerd.io/content/2.16/reference/retries.md b/linkerd.io/content/2.16/reference/retries.md new file mode 100644 index 0000000000..9cc3d3cc6a --- /dev/null +++ b/linkerd.io/content/2.16/reference/retries.md @@ -0,0 +1,105 @@ ++++ +title = "Retries" +description = "How Linkerd implements retries." ++++ + +Linkerd can be configured to automatically retry requests when it receives a +failed response instead of immediately returning that failure to the client. +This is valuable tool for improving success rate in the face of transient +failures. + +Retries are a client-side behavior, and are therefore performed by the +outbound side of the Linkerd proxy.[^1] If retries are configured on an +HTTPRoute or GRPCRoute with multiple backends, each retry of a request can +potentially get sent to a different backend. If a request has a body larger than +64KiB then it will not be retried. + +## Configuring Retries + +Retries are configured by a set of annotations which can be set on a Kubernetes +Service resource or on a HTTPRoute or GRPCRoute which has a Service as a parent. +Client proxies will then retry failed requests to that Service or route. If any +retry configuration annotations are present on a route resource, they override +all retry configuration annotations on the parent Service. + +{{< warning >}} +Retries configured in this way are **incompatible with ServiceProfiles**. If a +[ServiceProfile](../../features/service-profiles/) is defined for a Service, +proxies will use the ServiceProfile retry configuration and ignore any retry +annotations. +{{< /warning >}} + ++ `retry.linkerd.io/http`: A comma separated list of HTTP response codes which +should be retried. Each element of the list may be + + `xxx` to retry a single response code (for example, `"504"` -- remember, + annotation values must be strings!); + + `xxx-yyy` to retry a range of response codes (for example, `500-504`); + + `gateway-error` to retry response codes 502-504; or + + `5xx` to retry all 5XX response codes. +This annotation is not valid on GRPCRoute resources. ++ `retry.linkerd.io/grpc`: A comma seperated list of gRPC status codes which +should be retried. Each element of the list may be + + `cancelled` + + `deadline-exceeded` + + `internal` + + `resource-exhausted` + + `unavailable` +This annotation is not valid on HTTPRoute resources. ++ `retry.linkerd.io/limit`: The maximum number of times a request can be +retried. If unspecified, the default is `1`. ++ `retry.linkerd.io/timeout`: A retry timeout after which a request is cancelled +and retried (if the retry limit has not yet been reached). If unspecified, no +retry timeout is applied. Units must be specified in this value e.g. `5s` or +`200ms`. + +## Examples + +```yaml +kind: HTTPRoute +apiVersion: gateway.networking.k8s.io/v1beta1 +metadata: + name: schlep-default + namespace: schlep + annotations: + retry.linkerd.io/http: 5xx + retry.linkerd.io/limit: "2" + retry.linkerd.io/timeout: 300ms +spec: + parentRefs: + - name: schlep + kind: Service + group: core + port: 80 + rules: + - matches: + - path: + type: PathPrefix + value: "/" +``` + +```yaml +kind: GRPCRoute +apiVersion: gateway.networking.k8s.io/v1alpha2 +metadata: + name: schlep-default + namespace: schlep + annotations: + retry.linkerd.io/grpc: internal + retry.linkerd.io/limit: "2" + retry.linkerd.io/timeout: 400ms +spec: + parentRefs: + - name: schlep + kind: Service + group: core + port: 8080 + rules: + - matches: + - method: + type: Exact + service: schlep.Schlep + method: Get +``` + +[^1]: The part of the proxy which handles connections from within the pod to the + rest of the cluster. diff --git a/linkerd.io/content/2.16/reference/service-profiles.md b/linkerd.io/content/2.16/reference/service-profiles.md new file mode 100644 index 0000000000..4cf32098df --- /dev/null +++ b/linkerd.io/content/2.16/reference/service-profiles.md @@ -0,0 +1,135 @@ ++++ +title = "Service Profiles" +description = "Details on the specification and what is possible with service profiles." ++++ + +[Service profiles](../../features/service-profiles/) provide Linkerd additional +information about a service. This is a reference for everything that can be done +with service profiles. + +## Spec + +A service profile spec must contain the following top level fields: + +{{< table >}} +| field| value | +|------|-------| +| `routes`| a list of [route](#route) objects | +| `retryBudget`| a [retry budget](#retry-budget) object that defines the maximum retry rate to this service | +{{< /table >}} + +## Route + +A route object must contain the following fields: + +{{< table >}} +| field | value | +|-------|-------| +| `name` | the name of this route as it will appear in the route label | +| `condition` | a [request match](#request-match) object that defines if a request matches this route | +| `responseClasses` | (optional) a list of [response class](#response-class) objects | +| `isRetryable` | indicates that requests to this route are always safe to retry and will cause the proxy to retry failed requests on this route whenever possible | +| `timeout` | the maximum amount of time to wait for a response (including retries) to complete after the request is sent | +{{< /table >}} + +## Request Match + +A request match object must contain _exactly one_ of the following fields: + +{{< table >}} +| field | value | +|-------|-------| +| `pathRegex` | a regular expression to match the request path against | +| `method` | one of GET, POST, PUT, DELETE, OPTION, HEAD, TRACE | +| `all` | a list of [request match](#request-match) objects which must _all_ match | +| `any` | a list of [request match](#request-match) objects, at least one of which must match | +| `not` | a [request match](#request-match) object which must _not_ match | +{{< /table >}} + +### Request Match Usage Examples + +The simplest condition is a path regular expression: + +```yaml +pathRegex: '/authors/\d+' +``` + +This is a condition that checks the request method: + +```yaml +method: POST +``` + +If more than one condition field is set, all of them must be satisfied. This is +equivalent to using the 'all' condition: + +```yaml +all: +- pathRegex: '/authors/\d+' +- method: POST +``` + +Conditions can be combined using 'all', 'any', and 'not': + +```yaml +any: +- all: + - method: POST + - pathRegex: '/authors/\d+' +- all: + - not: + method: DELETE + - pathRegex: /info.txt +``` + +## Response Class + +A response class object must contain the following fields: + +{{< table >}} +| field | value | +|-------|-------| +| `condition` | a [response match](#response-match) object that defines if a response matches this response class | +| `isFailure` | a boolean that defines if these responses should be classified as failed | +{{< /table >}} + +## Response Match + +A response match object must contain _exactly one_ of the following fields: + +{{< table >}} +| field | value | +|-------|-------| +| `status` | a [status range](#status-range) object to match the response status code against | +| `all` | a list of [response match](#response-match) objects which must _all_ match | +| `any` | a list of [response match](#response-match) objects, at least one of which must match | +| `not` | a [response match](#response-match) object which must _not_ match | +{{< /table >}} + +Response Match conditions can be combined in a similar way as shown above for +[Request Match Usage Examples](#request-match-usage-examples) + +## Status Range + +A status range object must contain _at least one_ of the following fields. +Specifying only one of min or max matches just that one status code. + +{{< table >}} +| field | value | +|-------|-------| +| `min` | the status code must be greater than or equal to this value | +| `max` | the status code must be less than or equal to this value | +{{< /table >}} + +## Retry Budget + +A retry budget specifies the maximum total number of retries that should be sent +to this service as a ratio of the original request volume. + +{{< table >}} +| field | value | +|-------|-------| +| `retryRatio` | the maximum ratio of retries requests to original requests | +| `minRetriesPerSecond` | allowance of retries per second in addition to those allowed by the retryRatio | +| `ttl` | indicates for how long requests should be considered for the purposes of calculating the retryRatio | +{{< /table >}} diff --git a/linkerd.io/content/2.16/reference/timeouts.md b/linkerd.io/content/2.16/reference/timeouts.md new file mode 100644 index 0000000000..d651b64ef9 --- /dev/null +++ b/linkerd.io/content/2.16/reference/timeouts.md @@ -0,0 +1,68 @@ ++++ +title = "Timeouts" +description = "How Linkerd implements timeouts." ++++ + +Linkerd can be configured with timeouts to limit the maximum amount of time on +a request before aborting. + +Timeouts are a client-side behavior, and are therefore performed by the +outbound side of the Linkerd proxy.[^1] Note that timeouts configured in this +way are not retryable -- if these timeouts are reached, the request will not be +retried. Retryable timeouts can be configured as part of +[retry configuration](../retries/). + +## Configuring Timeouts + +Timeous are configured by a set of annotations which can be set on a Kubernetes +Service resource or on a HTTPRoute or GRPCRoute which has a Service as a parent. +Client proxies will then fail requests to that Service or route once they exceed +the timeout. If any timeout configuration annotations are present on a route +resource, they override all timeout configuration annotations on the parent +Service. + +{{< warning >}} +Timeouts configured in this way are **incompatible with ServiceProfiles**. If a +[ServiceProfile](../../features/service-profiles/) is defined for a Service, +proxies will use the ServiceProfile timeout configuration and ignore any timeout +annotations. +{{< /warning >}} + ++ `timeout.linkerd.io/request`: The maximum amount of time a full +request-response stream is in flight. ++ `timeout.linkerd.io/response`: The maximum amount of time a backend response +may be in-flight. ++ `timeout.linkerd.io/idle`: The maximum amount of time a stream may be idle, +regardless of its state. + +If the [request timeout](https://gateway-api.sigs.k8s.io/api-types/httproute/#timeouts-optional) +field is set on an HTTPRoute resource, it will be used as the +`timeout.linkerd.io/request` timeout. However, if both the field and the +annotation are specified, the annotation will take priority. + +## Examples + +```yaml +kind: HTTPRoute +apiVersion: gateway.networking.k8s.io/v1beta1 +metadata: + name: schlep-default + namespace: schlep + annotations: + timeout.linkerd.io/request: 2s + timeout.linkerd.io/response: 1s +spec: + parentRefs: + - name: schlep + kind: Service + group: core + port: 80 + rules: + - matches: + - path: + type: PathPrefix + value: "/" +``` + +[^1]: The part of the proxy which handles connections from within the pod to the + rest of the cluster. diff --git a/linkerd.io/content/2.16/tasks/_index.md b/linkerd.io/content/2.16/tasks/_index.md new file mode 100644 index 0000000000..884f9ce20c --- /dev/null +++ b/linkerd.io/content/2.16/tasks/_index.md @@ -0,0 +1,17 @@ ++++ +title = "Tasks" +weight = 4 +aliases = [ + "./next-steps/", + "./tasks/enabling-addons/", + "./tasks/upgrade-multicluster/", +] ++++ + +As a complement to the [Linkerd feature docs]({{% ref "../features" %}}) and +the [Linkerd reference docs]({{% ref "../reference" %}}), we've provided guides +and examples of common tasks that you may need to perform when using Linkerd. + +## Common Linkerd tasks + +{{% sectiontoc "tasks" %}} diff --git a/linkerd.io/content/2.16/tasks/adding-non-kubernetes-workloads.md b/linkerd.io/content/2.16/tasks/adding-non-kubernetes-workloads.md new file mode 100644 index 0000000000..e70e751893 --- /dev/null +++ b/linkerd.io/content/2.16/tasks/adding-non-kubernetes-workloads.md @@ -0,0 +1,540 @@ +--- +title: Adding non-Kubernetes workloads to your mesh +--- + +In this guide, we'll walk you through an example of [mesh expansion]({{< relref +"../features/non-kubernetes-workloads" >}}): setting up and configuring an +example non-Kubernetes workload and adding it to your Linkerd mesh. + +## Overall flow + +In this guide, we'll take you through how to: + +1. Install the Linkerd proxy onto a virtual or physical machine outside the + Kubernetes cluster. +1. Configure network rules so traffic is routed through the proxy. +1. Register the external workload in the mesh. +1. Exercise traffic patterns and apply authorization policies that affect the + external workload. + +We'll be using [SPIRE](https://github.com/spiffe/spire) as our identity +mechanism to generate a workload identity. + +## Prerequisites + +You will need: + +- A functioning Linkerd installation and its trust anchor. +- A cluster that you have elevated privileges to. For local development, you can + use [kind](https://kind.sigs.k8s.io/) or [k3d](https://k3d.io/). +- A physical or virtual machine. +- `NET_CAP` privileges on the machine, so iptables rules can be modified. +- IP connectivity from the machine to every pod in the mesh. +- A working DNS setup such that the machine is able to resolve DNS names for + in-cluster Kubernetes workloads. + +## Getting the current trust anchor and key + +To be able to use mutual TLS across cluster boundaries, the off-cluster machine +and the cluster need to have a shared trust anchor. For the purposes of this +tutorial, we will assume that you have access to the trust anchor certificate +and secret key for your Linkerd deployment and placed it in files called +`ca.key` and `ca.crt`. + +## Install SPIRE on your machine + +Linkerd's proxies normally obtain TLS certificates from the identity component +of Linkerd's control plane. In order to attest their identity, they use the +Kubernetes Service Account token that is provided to each Pod. + +Since our external workload lives outside of Kubernetes, the concept of Service +Account tokens does not exist. Instead, we turn to the [SPIFFE +framework](https://spiffee.io) and its SPIRE implementation to create identities +for off-cluster resources. Thus, for mesh expansion, we configure the Linkerd +proxy to obtain its certificates directly from SPIRE instead of the Linkerd's +identity service. The magic of SPIFFE is that these certificates are compatible +with those generated by Linkerd on the cluster. + +In production, you may already have your own identity infrastructure built on +top of SPIFFE that can be used by the proxies on external machines. For this +tutorial however, we can take you through installing and setting up a minimal +SPIRE environment on your machine. To begin with you need to install SPIRE by +downloading it from the [SPIRE GitHub releases +page](https://github.com/spiffe/spire/releases). For example: + +```bash +wget https://github.com/spiffe/SPIRE/releases/download/v1.8.2/SPIRE-1.8.2-linux-amd64-musl.tar.gz +tar zvxf SPIRE-1.8.2-linux-amd64-musl.tar.gz +cp -r SPIRE-1.8.2/. /opt/SPIRE/ +``` + +Then you need to configure the SPIRE server on your machine: + +```bash +cat >/opt/SPIRE/server.cfg </opt/SPIRE/agent.cfg < +kubectl --context=west apply -f - < + while true; do + sleep 3600; + done + serviceAccountName: client +EOF +``` + +You can also create a service that selects over both the machine as well as an +in-cluster workload: + +```yaml +kubectl apply -f - <}} +Adding the annotation to existing pods does not automatically mesh them. For +existing pods, after adding the annotation you will also need to recreate or +update the resource (e.g. by using `kubectl rollout restart` to perform a +[rolling +update](https://kubernetes.io/docs/tutorials/kubernetes-basics/update/update-intro/)) +to trigger proxy injection. +{{< /note >}} + +## Examples + +To add Linkerd's data plane proxies to a service defined in a Kubernetes +manifest, you can use `linkerd inject` to add the annotations before applying +the manifest to Kubernetes. + +You can transform an existing `deployment.yml` file to add annotations +in the correct places and apply it to the cluster: + +```bash +cat deployment.yml | linkerd inject - | kubectl apply -f - +``` + +You can mesh every deployment in a namespace by combining this +with `kubectl get`: + +```bash +kubectl get -n NAMESPACE deploy -o yaml | linkerd inject - | kubectl apply -f - +``` + +## Verifying the data plane pods have been injected + +To verify that your services have been added to the mesh, you can query +Kubernetes for the list of containers in the pods and ensure that the proxy is +listed: + +```bash +kubectl -n NAMESPACE get po -o jsonpath='{.items[0].spec.containers[*].name}' +``` + +If everything was successful, you'll see `linkerd-proxy` in the output, e.g.: + +```bash +linkerd-proxy CONTAINER +``` + +## Handling MySQL, SMTP, and other non-HTTP protocols + +Linkerd's [protocol detection](../../features/protocol-detection/) works by +looking at the first few bytes of client data to determine the protocol of the +connection. Some protocols, such as MySQL and SMTP, don't send these bytes. If +your application uses these protocols without TLSing them, you may require +additional configuration to avoid a 10-second delay when establishing +connections. + +See [Configuring protocol +detection](../../features/protocol-detection/#configuring-protocol-detection) +for details. + +## More reading + +For more information on how the inject command works and all of the parameters +that can be set, see the [`linkerd inject` reference +page](../../reference/cli/inject/). + +For details on how autoinjection works, see the [proxy injection +page](../../features/proxy-injection/). diff --git a/linkerd.io/content/2.16/tasks/automatic-failover.md b/linkerd.io/content/2.16/tasks/automatic-failover.md new file mode 100644 index 0000000000..bd33727b08 --- /dev/null +++ b/linkerd.io/content/2.16/tasks/automatic-failover.md @@ -0,0 +1,176 @@ ++++ +title = "Automatic Multicluster Failover" +description = "Use the Linkerd Failover extension to failover between clusters." ++++ + +The Linkerd Failover extension is a controller which automatically shifts +traffic from a primary service to one or more fallback services whenever the +primary becomes unavailable. This can help add resiliency when you have a +service which is replicated in multiple clusters. If the local service is +unavailable, the failover controller can shift that traffic to the backup +cluster. + +Let's see a simple example of how to use this extension by installing the +Emojivoto application on two Kubernetes clusters and simulating a failure in +one cluster. We will see the failover controller shift traffic to the other +cluster to ensure the service remains available. + +{{< trylpt >}} + +## Prerequisites + +You will need two clusters with Linkerd installed and for the clusters to be +linked together with the multicluster extension. Follow the steps in the +[multicluster guide](../multicluster/) to generate a shared trust root, install +Linkerd, Linkerd Viz, and Linkerd Multicluster, and to link the clusters +together. For the remainder of this guide, we will assume the cluster context +names are "east" and "west" respectively. Please substitute your cluster +context names where appropriate. + +## Installing the Failover Extension + +Failovers are described using SMI +[TrafficSplit](https://github.com/servicemeshinterface/smi-spec/blob/main/apis/traffic-split/v1alpha1/traffic-split.md) +resources. We install the Linkerd SMI extension and the Linkerd Failover +extension. These can be installed in both clusters, but since we'll only be +initiating failover from the "west" cluster in this example, we'll only install +them in that cluster: + +```bash +# Install linkerd-smi in west cluster +> helm --kube-context=west repo add linkerd-smi https://linkerd.github.io/linkerd-smi +> helm --kube-context=west repo up +> helm --kube-context=west install linkerd-smi -n linkerd-smi --create-namespace linkerd-smi/linkerd-smi + +# Install linkerd-failover in west cluster +> helm --kube-context=west repo add linkerd-edge https://helm.linkerd.io/edge +> helm --kube-context=west repo up +> helm --kube-context=west install linkerd-failover -n linkerd-failover --create-namespace --devel linkerd-edge/linkerd-failover +``` + +## Installing and Exporting Emojivoto + +We'll now install the Emojivoto example application into both clusters: + +```bash +> linkerd --context=west inject https://run.linkerd.io/emojivoto.yml | kubectl --context=west apply -f - +> linkerd --context=east inject https://run.linkerd.io/emojivoto.yml | kubectl --context=east apply -f - +``` + +Next we'll "export" the `web-svc` in the east cluster by setting the +`mirror.linkerd.io/exported=true` label. This will instruct the +multicluster extension to create a mirror service called `web-svc-east` in the +west cluster, making the east Emojivoto application available in the west +cluster: + +```bash +> kubectl --context=east -n emojivoto label svc/web-svc mirror.linkerd.io/exported=true +> kubectl --context=west -n emojivoto get svc +NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE +emoji-svc ClusterIP 10.96.41.137 8080/TCP,8801/TCP 13m +voting-svc ClusterIP 10.96.247.68 8080/TCP,8801/TCP 13m +web-svc ClusterIP 10.96.222.169 80/TCP 13m +web-svc-east ClusterIP 10.96.244.245 80/TCP 92s +``` + +## Creating the Failover TrafficSplit + +To tell the failover controller how to failover traffic, we need to create a +TrafficSplit resource in the west cluster with the +`failover.linkerd.io/controlled-by: linkerd-failover` label. The +`failover.linkerd.io/primary-service` annotation indicates that the `web-svc` +backend is the primary and all other backends will be treated as the fallbacks: + +```bash +kubectl --context=west apply -f - < linkerd --context=west viz stat -n emojivoto svc --from deploy/vote-bot +NAME MESHED SUCCESS RPS LATENCY_P50 LATENCY_P95 LATENCY_P99 TCP_CONN +web-svc - 96.67% 2.0rps 2ms 3ms 5ms 1 +web-svc-east - - - - - - - +``` + +Now we'll simulate the local service becoming unavailable by scaling it down: + +```bash +> kubectl --context=west -n emojivoto scale deploy/web --replicas=0 +``` + +We can immediately see that the TrafficSplit has been adjusted to send traffic +to the backup. Notice that the `web-svc` backend now has weight 0 and the +`web-svc-east` backend now has weight 1. + +```bash +> kubectl --context=west -n emojivoto get ts/web-svc-failover -o yaml +apiVersion: split.smi-spec.io/v1alpha2 +kind: TrafficSplit +metadata: + annotations: + failover.linkerd.io/primary-service: web-svc + creationTimestamp: "2022-03-22T23:47:11Z" + generation: 4 + labels: + failover.linkerd.io/controlled-by: linkerd-failover + name: web-svc-failover + namespace: emojivoto + resourceVersion: "10817806" + uid: 77039fb3-5e39-48ad-b7f7-638d187d7a28 +spec: + backends: + - service: web-svc + weight: 0 + - service: web-svc-east + weight: 1 + service: web-svc +``` + +We can also confirm that this traffic is going to the fallback using the +`viz stat` command: + +```bash +> linkerd --context=west viz stat -n emojivoto svc --from deploy/vote-bot +NAME MESHED SUCCESS RPS LATENCY_P50 LATENCY_P95 LATENCY_P99 TCP_CONN +web-svc - - - - - - - +web-svc-east - 93.04% 1.9rps 25ms 30ms 30ms 1 +``` + +Finally, we can restore the primary by scaling its deployment back up and +observe the traffic shift back to it: + +```bash +> kubectl --context=west -n emojivoto scale deploy/web --replicas=1 +deployment.apps/web scaled +> linkerd --context=west viz stat -n emojivoto svc --from deploy/vote-bot +NAME MESHED SUCCESS RPS LATENCY_P50 LATENCY_P95 LATENCY_P99 TCP_CONN +web-svc - 89.29% 1.9rps 2ms 4ms 5ms 1 +web-svc-east - - - - - - - +``` diff --git a/linkerd.io/content/2.16/tasks/automatically-rotating-control-plane-tls-credentials.md b/linkerd.io/content/2.16/tasks/automatically-rotating-control-plane-tls-credentials.md new file mode 100644 index 0000000000..3c332585e5 --- /dev/null +++ b/linkerd.io/content/2.16/tasks/automatically-rotating-control-plane-tls-credentials.md @@ -0,0 +1,203 @@ ++++ +title = "Automatically Rotating Control Plane TLS Credentials" +description = "Use cert-manager to automatically rotate control plane TLS credentials." +aliases = [ "use_external_certs" ] ++++ + +Linkerd's [automatic mTLS](../../features/automatic-mtls/) feature generates TLS +certificates for proxies and automatically rotates them without user +intervention. These certificates are derived from a *trust anchor*, which is +shared across clusters, and an *issuer certificate*, which is specific to the +cluster. + +While Linkerd automatically rotates the per-proxy TLS certificates, it does not +rotate the issuer certificate. In this doc, we'll describe how to set up +automatic rotation of the issuer certificate and its corresponding private key +using the cert-manager project. + +{{< trylpt >}} + +## Cert manager + +[Cert-manager](https://github.com/jetstack/cert-manager) is a popular project +for making TLS credentials from external sources available to Kubernetes +clusters. + +Cert-manager is very flexible. You can configure it to pull certificates from +secrets managemenet solutions such as [Vault](https://www.vaultproject.io). In +this guide, we'll focus on a self-sufficient setup: we will configure +cert-manager to act as an on-cluster +[CA](https://en.wikipedia.org/wiki/Certificate_authority) and have it re-issue +Linkerd's issuer certificate and private key on a periodic basis, derived from +the trust anchor. + +### Cert manager as an on-cluster CA + +As a first step, [install cert-manager on your +cluster](https://cert-manager.io/docs/installation/). + +Next, create the namespace that cert-manager will use to store its +Linkerd-related resources. For simplicity, we suggest reusing the default +Linkerd control plane namespace: + +```bash +kubectl create namespace linkerd +``` + +#### Save the signing key pair as a Secret + +Next, using the [`step`](https://smallstep.com/cli/) tool, create a signing key +pair and store it in a Kubernetes Secret in the namespace created above: + +```bash +step certificate create root.linkerd.cluster.local ca.crt ca.key \ + --profile root-ca --no-password --insecure && + kubectl create secret tls \ + linkerd-trust-anchor \ + --cert=ca.crt \ + --key=ca.key \ + --namespace=linkerd +``` + +For a longer-lived trust anchor certificate, pass the `--not-after` argument +to the step command with the desired value (e.g. `--not-after=87600h`). + +#### Create an Issuer referencing the secret + +With the Secret in place, we can create a cert-manager "Issuer" resource that +references it: + +```bash +kubectl apply -f - <}} + +## Install Cert manager + +As a first step, [install cert-manager on your +cluster](https://cert-manager.io/docs/installation/) +and create the namespaces that cert-manager will use to store its +webhook-related resources. For simplicity, we suggest using the default +namespace linkerd uses: + +```bash +# control plane core +kubectl create namespace linkerd +kubectl label namespace linkerd \ + linkerd.io/is-control-plane=true \ + config.linkerd.io/admission-webhooks=disabled \ + linkerd.io/control-plane-ns=linkerd +kubectl annotate namespace linkerd linkerd.io/inject=disabled + +# viz (ignore if not using the viz extension) +kubectl create namespace linkerd-viz +kubectl label namespace linkerd-viz linkerd.io/extension=viz + +# jaeger (ignore if not using the jaeger extension) +kubectl create namespace linkerd-jaeger +kubectl label namespace linkerd-jaeger linkerd.io/extension=jaeger +``` + +## Save the signing key pair as a Secret + +Next, we will use the [`step`](https://smallstep.com/cli/) tool, to create a +signing key pair which will be used to sign each of the webhook certificates: + +```bash +step certificate create webhook.linkerd.cluster.local ca.crt ca.key \ + --profile root-ca --no-password --insecure --san webhook.linkerd.cluster.local + +kubectl create secret tls webhook-issuer-tls --cert=ca.crt --key=ca.key --namespace=linkerd + +# ignore if not using the viz extension +kubectl create secret tls webhook-issuer-tls --cert=ca.crt --key=ca.key --namespace=linkerd-viz + +# ignore if not using the jaeger extension +kubectl create secret tls webhook-issuer-tls --cert=ca.crt --key=ca.key --namespace=linkerd-jaeger +``` + +## Create Issuers referencing the secrets + +With the Secrets in place, we can create cert-manager "Issuer" resources that +reference them: + +```bash +kubectl apply -f - <}} +When installing the `linkerd-control-plane` chart, you _must_ provide the +issuer trust root and issuer credentials as described in [Installing Linkerd +with Helm](../install-helm/). +{{< /note >}} + +See [Automatically Rotating Control Plane TLS +Credentials](../automatically-rotating-control-plane-tls-credentials/) +for details on how to do something similar for control plane credentials. diff --git a/linkerd.io/content/2.16/tasks/books.md b/linkerd.io/content/2.16/tasks/books.md new file mode 100644 index 0000000000..81d0e52444 --- /dev/null +++ b/linkerd.io/content/2.16/tasks/books.md @@ -0,0 +1,365 @@ ++++ +title = "Debugging HTTP applications with per-route metrics" +description = "Follow a long-form example of debugging a failing HTTP application using per-route metrics." ++++ + +This demo is of a Ruby application that helps you manage your bookshelf. It +consists of multiple microservices and uses JSON over HTTP to communicate with +the other services. There are three services: + +- [webapp](https://github.com/BuoyantIO/booksapp/blob/master/webapp.rb): the + frontend + +- [authors](https://github.com/BuoyantIO/booksapp/blob/master/authors.rb): an + API to manage the authors in the system + +- [books](https://github.com/BuoyantIO/booksapp/blob/master/books.rb): an API + to manage the books in the system + +For demo purposes, the app comes with a simple traffic generator. The overall +topology looks like this: + +{{< fig src="/images/books/topology.png" title="Topology" >}} + +## Prerequisites + +To use this guide, you'll need to have Linkerd installed on your cluster. +Follow the [Installing Linkerd Guide](../install/) if you haven't already done +this. + +## Install the app + +To get started, let's install the books app onto your cluster. In your local +terminal, run: + +```bash +kubectl create ns booksapp && \ + curl --proto '=https' --tlsv1.2 -sSfL https://run.linkerd.io/booksapp.yml \ + | kubectl -n booksapp apply -f - +``` + +This command creates a namespace for the demo, downloads its Kubernetes +resource manifest and uses `kubectl` to apply it to your cluster. The app +comprises the Kubernetes deployments and services that run in the `booksapp` +namespace. + +Downloading a bunch of containers for the first time takes a little while. +Kubernetes can tell you when all the services are running and ready for +traffic. Wait for that to happen by running: + +```bash +kubectl -n booksapp rollout status deploy webapp +``` + +You can also take a quick look at all the components that were added to your +cluster by running: + +```bash +kubectl -n booksapp get all +``` + +Once the rollout has completed successfully, you can access the app itself by +port-forwarding `webapp` locally: + +```bash +kubectl -n booksapp port-forward svc/webapp 7000 & +``` + +Open [http://localhost:7000/](http://localhost:7000/) in your browser to see the +frontend. + +{{< fig src="/images/books/frontend.png" title="Frontend" >}} + +Unfortunately, there is an error in the app: if you click *Add Book*, it will +fail 50% of the time. This is a classic case of non-obvious, intermittent +failure---the type that drives service owners mad because it is so difficult to +debug. Kubernetes itself cannot detect or surface this error. From Kubernetes's +perspective, it looks like everything's fine, but you know the application is +returning errors. + +{{< fig src="/images/books/failure.png" title="Failure" >}} + +## Add Linkerd to the service + +Now we need to add the Linkerd data plane proxies to the service. The easiest +option is to do something like this: + +```bash +kubectl get -n booksapp deploy -o yaml \ + | linkerd inject - \ + | kubectl apply -f - +``` + +This command retrieves the manifest of all deployments in the `booksapp` +namespace, runs them through `linkerd inject`, and then re-applies with +`kubectl apply`. The `linkerd inject` command annotates each resource to +specify that they should have the Linkerd data plane proxies added, and +Kubernetes does this when the manifest is reapplied to the cluster. Best of +all, since Kubernetes does a rolling deploy, the application stays running the +entire time. (See [Automatic Proxy Injection](../../features/proxy-injection/) for +more details on how this works.) + +## Debugging + +Let's use Linkerd to discover the root cause of this app's failures. Linkerd's +proxy exposes rich metrics about the traffic that it processes, including HTTP +response codes. The metric that we're interested is `outbound_http_route_backend_response_statuses_total` +and will help us identify where HTTP errors are occuring. We can use the +`linkerd diagnostics proxy-metrics` command to get proxy metrics. Pick one of +your webapp pods and run the following command to get the metrics for HTTP 500 +responses: + +```bash +linkerd diagnostics proxy-metrics -n booksapp po/webapp-pod-here \ +| grep outbound_http_route_backend_response_statuses_total \ +| grep http_status=\"500\" +``` + +This should return a metric that looks something like: + +```text +outbound_http_route_backend_response_statuses_total{ + parent_group="core", + parent_kind="Service", + parent_namespace="booksapp", + parent_name="books", + parent_port="7002", + parent_section_name="", + route_group="", + route_kind="default", + route_namespace="", + route_name="http", + backend_group="core", + backend_kind="Service", + backend_namespace="booksapp", + backend_name="books", + backend_port="7002", + backend_section_name="", + http_status="500", + error="" +} 207 +``` + +This counter tells us that the webapp pod received a total of 207 HTTP 500 +responses from the `books` Service on port 7002. + +## HTTPRoute + +We know that the webapp component is getting 500s from the books component, but +it would be great to narrow this down further and get per route metrics. To do +this, we take advantage of the Gateway API and define a set of HTTPRoute +resources, each attached to the `books` Service by specifying it as their +`parent_ref`. + +```bash +kubectl apply -f - <}} + +## Prerequisites + +To use this guide, you'll need a Kubernetes cluster running: + +- Linkerd and Linkerd-Viz. If you haven't installed these yet, follow the + [Installing Linkerd Guide](../install/). + +## Set up the demo + +Remember those puzzles where one guard always tells the truth and one guard +always lies? This demo involves one pod (named `good`) which always returns an +HTTP 200 and one pod (named `bad`) which always returns an HTTP 500. We'll also +create a load generator to send traffic to a Service which includes these two +pods. + +For load generation we'll use +[Slow-Cooker](https://github.com/BuoyantIO/slow_cooker) +and for the backend pods we'll use [BB](https://github.com/BuoyantIO/bb). + +To add these components to your cluster and include them in the Linkerd +[data plane](../../reference/architecture/#data-plane), run: + +```bash +cat < linkerd viz -n circuit-breaking-demo stat deploy +NAME MESHED SUCCESS RPS LATENCY_P50 LATENCY_P95 LATENCY_P99 TCP_CONN +bad 1/1 6.43% 4.7rps 1ms 1ms 4ms 2 +good 1/1 100.00% 5.9rps 1ms 1ms 1ms 3 +slow-cooker 1/1 100.00% 0.3rps 1ms 1ms 1ms 1 +``` + +Here we can see that `good` and `bad` deployments are each receiving similar +amounts of traffic, but `good` has a success rate of 100% while the success +rate of `bad` is very low (only healthcheck probes are succeeding). We can also +see how this looks from the perspective of the traffic generator: + +```console +> linkerd viz -n circuit-breaking-demo stat deploy/slow-cooker --to svc/bb +NAME MESHED SUCCESS RPS LATENCY_P50 LATENCY_P95 LATENCY_P99 TCP_CONN +slow-cooker 1/1 51.00% 10.0rps 1ms 1ms 2ms 2 +``` + +From `slow-cooker`'s perspective, roughly 50% of requests that it sends to the +Service are failing. We can use circuit breaking to improve this by cutting off +traffic to the `bad` pod. + +## Breaking the circuit + +Linkerd supports a type of circuit breaking called [_consecutive failure +accrual_](../../reference/circuit-breaking/#consecutive-failures). +This works by tracking consecutive failures from each endpoint in Linkerd's +internal load balancer. If there are ever too many failures in a row, that +endpoint is temporarily ignored and Linkerd will only load balance among the +remaining endpoints. After a [backoff +period](../../reference/circuit-breaking/#probation-and-backoffs), the endpoint +is re-introduced so that we can determine if it has become healthy. + +Let's enable consecutive failure accrual on the `bb` Service by adding an +annotation: + +```bash +kubectl annotate -n circuit-breaking-demo svc/bb balancer.linkerd.io/failure-accrual=consecutive +``` + +{{< warning >}} +Circuit breaking is **incompatible with ServiceProfiles**. If a +[ServiceProfile](../../features/service-profiles/) is defined for the annotated +Service, proxies will not perform circuit breaking as long as the ServiceProfile +exists. +{{< /warning >}} + +We can check that failure accrual was configured correctly by using a Linkerd +diagnostics command. The `linkerd diagnostics policy` command prints the policy +that Linkerd will use when sending traffic to a Service. We'll use the +[jq](https://stedolan.github.io/jq/) utility to filter the output to focus on +failure accrual: + +```console +> linkerd diagnostics policy -n circuit-breaking-demo svc/bb 8080 -o json | jq '.protocol.Kind.Detect.http1.failure_accrual' +{ + "Kind": { + "ConsecutiveFailures": { + "max_failures": 7, + "backoff": { + "min_backoff": { + "seconds": 1 + }, + "max_backoff": { + "seconds": 60 + }, + "jitter_ratio": 0.5 + } + } + } +} +``` + +This tells us that Linkerd will use `ConsecutiveFailures` failure accrual +when talking to the `bb` Service. It also tells us that the `max_failures` is +7, meaning that it will trip the circuit breaker once it observes 7 consective +failures. We'll talk more about each of the parameters here at the end of this +article. + +Let's look at how much traffic each pod is getting now that the circuit breaker +is in place: + +```console +> linkerd viz -n circuit-breaking-demo stat deploy +NAME MESHED SUCCESS RPS LATENCY_P50 LATENCY_P95 LATENCY_P99 TCP_CONN +bad 1/1 94.74% 0.3rps 1ms 1ms 1ms 3 +good 1/1 100.00% 10.3rps 1ms 1ms 4ms 4 +slow-cooker 1/1 100.00% 0.3rps 1ms 1ms 1ms 1 +``` + +Notice that the `bad` pod's RPS is significantly lower now. The circuit breaker +has stopped nearly all of the traffic from `slow-cooker` to `bad`. + +We can also see how this has affected `slow-cooker`: + +```console +> linkerd viz -n circuit-breaking-demo stat deploy/slow-cooker --to svc/bb +NAME MESHED SUCCESS RPS LATENCY_P50 LATENCY_P95 LATENCY_P99 TCP_CONN +slow-cooker 1/1 99.83% 10.0rps 1ms 1ms 1ms 4 +``` + +Nearly all of `slow-cooker`'s requests are now getting routed to the `good` pod +and succeeding! + +## Tuning circuit breaking + +As we saw when we ran the `linkerd diagnostics policy` command, consecutive +failure accrual is controlled by a number of parameters. Each of these +parameters has a default, but can be manually configured using annotations: + +- `balancer.linkerd.io/failure-accrual-consecutive-max-failures` + - The number of consecutive failures that Linkerd must observe before + tripping the circuit breaker (default: 7). Consider setting a lower value + if you want circuit breaks to trip more easily which can lead to better + success rate at the expense of less evenly distributed traffic. Consider + setting a higher value if you find circuit breakers are tripping too easily, + causing traffic to be cut off from healthy endpoints. +- `balancer.linkerd.io/failure-accrual-consecutive-max-penalty` + - The maximum amount of time a circuit breaker will remain tripped + before the endpoint is restored (default: 60s). Consider setting a longer + duration if you want to reduce the amount of traffic to endpoints which have + tripped the circuit breaker. Consider setting a shorter duration if you'd + like tripped circuit breakers to recover faster after an endpoint becomes + healthy again. +- `balancer.linkerd.io/failure-accrual-consecutive-min-penalty` + - The minimum amount of time a circuit breaker will remain tripped + before the endpoints is restored (default: 1s). Consider tuning this in a + similar way to `failure-accrual-consecutive-max-penalty`. +- `balancer.linkerd.io/failure-accrual-consecutive-jitter-ratio` + - The amount of jitter to introduce to circuit breaker backoffs (default: 0.5). + You are unlikely to need to tune this but might consider increasing it if + you notice many clients are sending requests to a circuit broken endpoint + at the same time, leading to spiky traffic patterns. + +See the [reference +documentation](../../reference/circuit-breaking/#configuring-failure-accrual) +for details on failure accrual configuration. diff --git a/linkerd.io/content/2.16/tasks/configuring-dynamic-request-routing.md b/linkerd.io/content/2.16/tasks/configuring-dynamic-request-routing.md new file mode 100644 index 0000000000..92a1607fdd --- /dev/null +++ b/linkerd.io/content/2.16/tasks/configuring-dynamic-request-routing.md @@ -0,0 +1,191 @@ ++++ +title = "Configuring Dynamic Request Routing" +description = "Configuring HTTPRoute resources to perform dynamic request routing." ++++ + +## Prerequisites + +To use this guide, you'll need to have Linkerd installed on your cluster. Follow +the [Installing Linkerd Guide](../install/) if you haven't already done this +(make sure you have at least linkerd stable-2.13.0 or edge-23.3.2). + +You also need to have the [Helm](https://helm.sh/docs/intro/quickstart/) CLI +installed. + +## HTTPRoute for Dynamic Request Routing + +With dynamic request routing, you can route HTTP traffic based on the contents +of request headers. This can be useful for performing things like A/B testing +and many other strategies for traffic management. + +In this tutorial, we'll make use of the +[podinfo](https://github.com/stefanprodan/podinfo) project to showcase dynamic +request routing, by deploying in the cluster two backend and one frontend +podinfo pods. Traffic will flow to just one backend, and then we'll switch +traffic to the other one just by adding a header to the frontend requests. + +## Setup + +First we create the `test` namespace, annotated by linkerd so all pods that get +created there get injected with the linkerd proxy: + +``` bash +kubectl create ns test --dry-run -o yaml \ + | linkerd inject - \ + | kubectl apply -f - +``` + +Then we add podinfo's Helm repo, and install two instances of it. The first one +will respond with the message "`A backend`", the second one with "`B backend`". + +```bash +helm repo add podinfo https://stefanprodan.github.io/podinfo +helm install backend-a -n test \ + --set ui.message='A backend' podinfo/podinfo +helm install backend-b -n test \ + --set ui.message='B backend' podinfo/podinfo +``` + +We add another podinfo instance which will forward requests only to the first +backend instance `backend-a`: + +```bash +helm install frontend -n test \ + --set backend=http://backend-a-podinfo:9898/env podinfo/podinfo +``` + +Once those three pods are up and running, we can port-forward requests from our +local machine to the frontend: + +```bash +kubectl -n test port-forward svc/frontend-podinfo 9898 & +``` + +## Sending Requests + +Requests to `/echo` on port 9898 to the frontend pod will get forwarded the pod +pointed by the Service `backend-a-podinfo`: + +```bash +$ curl -sX POST localhost:9898/echo \ + | grep -o 'PODINFO_UI_MESSAGE=. backend' + +PODINFO_UI_MESSAGE=A backend +``` + +## Introducing HTTPRoute + +Let's apply the following [`HTTPRoute`] resource to enable header-based routing: + +```yaml +cat <}} +Two versions of the HTTPRoute resource may be used with Linkerd: + +- The upstream version provided by the Gateway API, with the + `gateway.networking.k8s.io` API group +- A Linkerd-specific CRD provided by Linkerd, with the `policy.linkerd.io` API + group + +The two HTTPRoute resource definitions are similar, but the Linkerd version +implements experimental features not yet available with the upstream Gateway API +resource definition. See [the HTTPRoute reference +documentation](../../reference/httproute/#linkerd-and-gateway-api-httproutes) +for details. +{{< /note >}} + +In `parentRefs` we specify the resources we want this [`HTTPRoute`] instance to +act on. So here we point to the `backend-a-podinfo` Service on the [`HTTPRoute`]'s +namespace (`test`), and also specify the Service port number (not the Service's +target port). + +{{< warning >}} +**Outbound [`HTTPRoute`](../../features/httproute/)s and +[`ServiceProfile`](../../features/service-profiles/)s provide overlapping +configuration.** For backwards-compatibility reasons, a `ServiceProfile` will +take precedence over `HTTPRoute`s which configure the same Service. If a +`ServiceProfile` is defined for the parent Service of an `HTTPRoute`, +proxies will use the `ServiceProfile` configuration, rather than the +`HTTPRoute` configuration, as long as the `ServiceProfile` exists. +{{< /warning >}} + +Next, we give a list of rules that will act on the traffic hitting that Service. + +The first rule contains two entries: `matches` and `backendRefs`. + +In `matches` we list the conditions that this particular rule has to match. One +matches suffices to trigger the rule (conditions are OR'ed). Inside, we use +`headers` to specify a match for a particular header key and value. If multiple +headers are specified, they all need to match (matchers are AND'ed). Note we can +also specify a regex match on the value by adding a `type: RegularExpression` +field. By not specifying the type like we did here, we're performing a match of +type `Exact`. + +In `backendRefs` we specify the final destination for requests matching the +current rule, via the Service's `name` and `port`. + +Here we're specifying we'd like to route to `backend-b-podinfo` all the requests +having the `x-request-id: alterrnative` header. If the header is not present, +the engine fall backs to the last rule which has no `matches` entries and points +to the `backend-a-podinfo` Service. + +The previous requests should still reach `backend-a-podinfo` only: + +```bash +$ curl -sX POST localhost:9898/echo \ + | grep -o 'PODINFO_UI_MESSAGE=. backend' + +PODINFO_UI_MESSAGE=A backend +``` + +But if we add the "`x-request-id: alternative`" header they get routed to +`backend-b-podinfo`: + +```bash +$ curl -sX POST \ + -H 'x-request-id: alternative' \ + localhost:9898/echo \ + | grep -o 'PODINFO_UI_MESSAGE=. backend' + +PODINFO_UI_MESSAGE=B backend +``` + +### To Keep in Mind + +Note that you can use any header you like, but for this to work the frontend has +to forward it. "`x-request-id`" is a common header used in microservices, that is +explicitly forwarded by podinfo, and that's why we chose it. + +Also, keep in mind the linkerd proxy handles this on the client side of the +request (the frontend pod in this case) and so that pod needs to be injected, +whereas the destination pods don't require to be injected. But of course the +more workloads you have injected the better, to benefit from things like easy +mTLS setup and all the other advantages that linkerd brings to the table! + +[`HTTPRoute`]: ../../features/httproute/ +[`ServiceProfile`]: ../../features/ServiceProfile/ diff --git a/linkerd.io/content/2.16/tasks/configuring-per-route-policy.md b/linkerd.io/content/2.16/tasks/configuring-per-route-policy.md new file mode 100644 index 0000000000..1b3651c9eb --- /dev/null +++ b/linkerd.io/content/2.16/tasks/configuring-per-route-policy.md @@ -0,0 +1,465 @@ ++++ +title = "Configuring Per-Route Authorization Policy" +description = "Fine-grained authorization policies can be configured for individual HTTP routes." +aliases = [] ++++ + + + +In addition to [enforcing authorization at the service +level](../restricting-access/), finer-grained authorization policies can also be +configured for individual HTTP routes. In this example, we'll use the Books demo +app to demonstrate how to control which clients can access particular routes on +a service. + +This is an advanced example that demonstrates more complex policy configuration. +For a basic introduction to Linkerd authorization policy, start with the +[Restricting Access to Services](../restricting-access/) example. For more +comprehensive documentation of the policy resources, see the +[Authorization policy reference](../../reference/authorization-policy/). + +## Prerequisites + +To use this guide, you'll need to have Linkerd installed on your cluster, along +with its Viz extension. Follow the [Installing Linkerd Guide](../install/) +if you haven't already done this. + +## Install the Books demo application + +Inject and install the Books demo application: + +```bash +$ kubectl create ns booksapp && \ + curl --proto '=https' --tlsv1.2 -sSfL https://run.linkerd.io/booksapp.yml \ + | linkerd inject - \ + | kubectl -n booksapp apply -f - +``` + +This command creates a namespace for the demo, downloads its Kubernetes +resource manifest, injects Linkerd into the application, and uses `kubectl` to +apply it to your cluster. The app comprises the Kubernetes deployments and +services that run in the `booksapp` namespace. + +Confirm that the Linkerd data plane was injected successfully: + +```bash +$ linkerd check -n booksapp --proxy -o short +``` + +You can take a quick look at all the components that were added to your +cluster by running: + +```bash +$ kubectl -n booksapp get all +``` + +Once the rollout has completed successfully, you can access the app itself by +port-forwarding `webapp` locally: + +```bash +$ kubectl -n booksapp port-forward svc/webapp 7000 & +``` + +Open [http://localhost:7000/](http://localhost:7000/) in your browser to see the +frontend. + +{{< fig src="/images/books/frontend.png" title="Frontend" >}} + +## Creating a Server resource + +Both the `books` service and the `webapp` service in the demo application are +clients of the `authors` service. + +However, these services send different requests to the `authors` service. The +`books` service should only send `GET` +requests to the `/authors/:id.json` route, to get the author associated with a +particular book. Meanwhile, the `webapp` service may also send `DELETE` and +`PUT` requests to `/authors`, and `POST` requests to `/authors.json`, as it +allows the user to create and delete authors. + +Since the `books` service should never need to create or delete authors, we will +create separate authorization policies for the `webapp` and `books` services, +restricting which services can access individual routes of the `authors` +service. + +First, let's run the `linkerd viz authz` command to list the authorization +resources that currently exist for the `authors` deployment: + +```bash +$ linkerd viz authz -n booksapp deploy/authors +ROUTE SERVER AUTHORIZATION UNAUTHORIZED SUCCESS RPS LATENCY_P50 LATENCY_P95 LATENCY_P99 +default default:all-unauthenticated default/all-unauthenticated 0.0rps 70.31% 8.1rps 1ms 43ms 49ms +probe default:all-unauthenticated default/probe 0.0rps 100.00% 0.3rps 1ms 1ms 1ms +``` + +By default, the `authors` deployment uses the cluster's default authorization +policy, "all-unauthenticated". In addition, a separate authorization is +generated to allow liveness and readiness probes from the kubelet. + +First, we'll create a [`Server`] resource for the `authors` deployment's service +port. For details on [`Server`] resources, see +[here](../restricting-access/#creating-a-server-resource). + +```bash +kubectl apply -f - <}} +Routes configured in service profiles are different from [`HTTPRoute`] resources. +Service profile routes allow you to collect per-route metrics and configure +client-side behavior such as retries and timeouts. [`HTTPRoute`] resources, on the +other hand, can be the target of [`AuthorizationPolicies`] and allow you to specify +per-route authorization. + +[`HTTPRoute`]: ../../reference/authorization-policy/#httproute +[`AuthorizationPolicies`]: + ../../reference/authorization-policy/#authorizationpolicy +{{< /note >}} + +First, let's create an [`HTTPRoute`] that matches `GET` requests to the `authors` +service's API: + +```bash +kubectl apply -f - <}} +Two versions of the HTTPRoute resource may be used with Linkerd: + +- The upstream version provided by the Gateway API, with the + `gateway.networking.k8s.io` API group +- A Linkerd-specific CRD provided by Linkerd, with the `policy.linkerd.io` API + group + +The two HTTPRoute resource definitions are similar, but the Linkerd version +implements experimental features not yet available with the upstream Gateway API +resource definition. See [the HTTPRoute reference +documentation](../../reference/httproute/#linkerd-and-gateway-api-httproutes) +for details. +{{< /note >}} + +This will create an [`HTTPRoute`] targeting the `authors-server` [`Server`] resource +we defined previously. The `rules` section defines a list of matches, which +determine which requests match the [`HTTPRoute`]. Here, we 've defined a match +rule that matches `GET` requests to the path `/authors.json`, and a second match +rule that matches `GET` requests to paths starting with the path segment +`/authors`. + +Now that we've created a route, we can associate policy with that route. We'll +create an [`AuthorizationPolicy`] resource that defines policy for our +[`HTTPRoute`]: + +```bash +kubectl apply -f - <}} + +and similarly, adding a new author takes us to an error page. + +This is because creating or deleting an author will send a `PUT` or `DELETE` +request, respectively, from `webapp` to `authors`. The route we created to +authorize `GET` requests does not match `PUT` or `DELETE` requests, so the +`authors` proxy rejects those requests with a 404 error. + +To resolve this, we'll create an additional [`HTTPRoute`] resource that matches +`PUT`, `POST`, and `DELETE` requests: + +```bash +kubectl apply -f - <}} + +This is because we have created a _route_ matching `DELETE`, `PUT`, and `POST` +requests, but we haven't _authorized_ requests to that route. Running the +`linkerd viz authz` command again confirms this — note the unauthorized +requests to `authors-modify-route`: + +```bash +$ linkerd viz authz -n booksapp deploy/authors +ROUTE SERVER AUTHORIZATION UNAUTHORIZED SUCCESS RPS LATENCY_P50 LATENCY_P95 LATENCY_P99 +authors-get-route authors-server authorizationpolicy/authors-get-policy - - - - - - +authors-modify-route authors-server 9.7rps 0.00% 0.0rps 0ms 0ms 0ms +authors-probe-route authors-server authorizationpolicy/authors-probe-policy 0.0rps 100.00% 0.1rps 1ms 1ms 1ms +default default:all-unauthenticated default/all-unauthenticated 0.0rps 100.00% 0.1rps 1ms 1ms 1ms +probe default:all-unauthenticated default/probe 0.0rps 100.00% 0.2rps 1ms 1ms 1ms +``` + +Now, let's create authorization and authentication policy resources to authorize +this route: + +```bash +kubectl apply -f - <}} + +Similarly, we can now create a new author successfully, as well: + +{{< fig src="/images/books/create-ok.png" title="Author created" >}} + +Running the `linkerd viz authz` command one last time, we now see that all +traffic is authorized: + +```bash +$ linkerd viz authz -n booksapp deploy/authors +ROUTE SERVER AUTHORIZATION UNAUTHORIZED SUCCESS RPS LATENCY_P50 LATENCY_P95 LATENCY_P99 +authors-get-route authors-server authorizationpolicy/authors-get-policy 0.0rps 100.00% 0.1rps 0ms 0ms 0ms +authors-modify-route authors-server authorizationpolicy/authors-modify-policy 0.0rps 100.00% 0.0rps 0ms 0ms 0ms +authors-probe-route authors-server authorizationpolicy/authors-probe-policy 0.0rps 100.00% 0.1rps 1ms 1ms 1ms +default default:all-unauthenticated default/all-unauthenticated 0.0rps 100.00% 0.1rps 1ms 1ms 1ms +probe default:all-unauthenticated default/probe 0.0rps 100.00% 0.2rps 1ms 1ms 1ms +``` + +## Next Steps + +We've now covered the basics of configuring per-route authorization policies +with Linkerd. For more practice, try creating additional policies to restrict +access to the `books` service as well. Or, to learn more about Linkerd +authorization policy in general, and the various configurations that are +available, see the [Policy reference +docs](../../reference/authorization-policy/). + +[`Server`]: ../../reference/authorization-policy/#server +[`HTTPRoute`]: ../../reference/authorization-policy/#httproute +[`AuthorizationPolicy`]: + ../../reference/authorization-policy/#authorizationpolicy +[`MeshTLSAuthentication`]: + ../../reference/authorization-policy/#meshtlsauthentication +[`NetworkAuthentication`]: + ../../reference/authorization-policy/#networkauthentication diff --git a/linkerd.io/content/2.16/tasks/configuring-proxy-concurrency.md b/linkerd.io/content/2.16/tasks/configuring-proxy-concurrency.md new file mode 100644 index 0000000000..c7d449340d --- /dev/null +++ b/linkerd.io/content/2.16/tasks/configuring-proxy-concurrency.md @@ -0,0 +1,131 @@ ++++ +title = "Configuring Proxy Concurrency" +description = "Limit the Linkerd proxy's CPU usage." ++++ + +The Linkerd data plane's proxies are multithreaded, and are capable of running a +variable number of worker threads so that their resource usage matches the +application workload. + +In a vacuum, of course, proxies will exhibit the best throughput and lowest +latency when allowed to use as many CPU cores as possible. However, in practice, +there are other considerations to take into account. + +A real world deployment is _not_ a load test where clients and servers perform +no other work beyond saturating the proxy with requests. Instead, the service +mesh model has proxy instances deployed as sidecars to application containers. +Each proxy only handles traffic to and from the pod it is injected into. This +means that throughput and latency are limited by the application workload. If an +application container instance can only handle so many requests per second, it +may not actually matter that the proxy could handle more. In fact, giving the +proxy more CPU cores than it requires to keep up with the application may _harm_ +overall performance, as the application may have to compete with the proxy for +finite system resources. + +Therefore, it is more important for individual proxies to handle their traffic +efficiently than to configure all proxies to handle the maximum possible load. +The primary method of tuning proxy resource usage is limiting the number of +worker threads used by the proxy to forward traffic. There are multiple methods +for doing this. + +## Using the `proxy-cpu-limit` Annotation + +The simplest way to configure the proxy's thread pool is using the +`config.linkerd.io/proxy-cpu-limit` annotation. This annotation configures the +proxy injector to set an environment variable that controls the number of CPU +cores the proxy will use. + +When installing Linkerd using the [`linkerd install` CLI +command](../install/), the `--proxy-cpu-limit` argument sets this +annotation globally for all proxies injected by the Linkerd installation. For +example, + +```bash +# first, install the Linkerd CRDs +linkerd install --crds | kubectl apply -f - + +# install Linkerd, with a proxy CPU limit configured. +linkerd install --proxy-cpu-limit 2 | kubectl apply -f - +``` + +For more fine-grained configuration, the annotation may be added to any +[injectable Kubernetes resource](../../proxy-injection/), such as a namespace, pod, +or deployment. + +For example, the following will configure any proxies in the `my-deployment` +deployment to use two CPU cores: + +```yaml +kind: Deployment +apiVersion: apps/v1 +metadata: + name: my-deployment + # ... +spec: + template: + metadata: + annotations: + config.linkerd.io/proxy-cpu-limit: '1' + # ... +``` + +{{< note >}} Unlike Kubernetes CPU limits and requests, which can be expressed +in milliCPUs, the `proxy-cpu-limit` annotation should be expressed in whole +numbers of CPU cores. Fractional values will be rounded up to the nearest whole +number. {{< /note >}} + +## Using Kubernetes CPU Limits and Requests + +Kubernetes provides +[CPU limits and CPU requests](https://kubernetes.io/docs/tasks/configure-pod-container/assign-cpu-resource/#specify-a-cpu-request-and-a-cpu-limit) +to configure the resources assigned to any pod or container. These may also be +used to configure the Linkerd proxy's CPU usage. However, depending on how the +kubelet is configured, using Kubernetes resource limits rather than the +`proxy-cpu-limit` annotation may not be ideal. + +{{< warning >}} +When the environment variable configured by the `proxy-cpu-limit` annotation is +unset, the proxy will run only a single worker thread. Therefore, a +`proxy-cpu-limit` annotation should always be added to set an upper bound on the +number of CPU cores used by the proxy, even when Kubernetes CPU limits are also +in use. +{{< /warning >}} + +The kubelet uses one of two mechanisms for enforcing pod CPU limits. This is +determined by the +[`--cpu-manager-policy` kubelet option](https://kubernetes.io/docs/tasks/administer-cluster/cpu-management-policies/#configuration). +With the default CPU manager policy, +[`none`](https://kubernetes.io/docs/tasks/administer-cluster/cpu-management-policies/#none-policy), +the kubelet uses +[CFS quotas](https://en.wikipedia.org/wiki/Completely_Fair_Scheduler) to enforce +CPU limits. This means that the Linux kernel is configured to limit the amount +of time threads belonging to a given process are scheduled. Alternatively, the +CPU manager policy may be set to +[`static`](https://kubernetes.io/docs/tasks/administer-cluster/cpu-management-policies/#static-policy). +In this case, the kubelet will use Linux `cgroup`s to enforce CPU limits for +containers which meet certain criteria. + +On the other hand, using +[cgroup cpusets](https://www.kernel.org/doc/Documentation/cgroup-v1/cpusets.txt) +will limit the number of CPU cores available to the process. In essence, it will +appear to the proxy that the system has fewer CPU cores than it actually does. +If this value is lower than the value of the `proxy-cpu-limit` annotation, the +proxy will use the number of CPU cores determined by the cgroup limit. + +However, it's worth noting that in order for this mechanism to be used, certain +criteria must be met: + +- The kubelet must be configured with the `static` CPU manager policy +- The pod must be in the + [Guaranteed QoS class](https://kubernetes.io/docs/tasks/configure-pod-container/quality-service-pod/#create-a-pod-that-gets-assigned-a-qos-class-of-guaranteed). + This means that all containers in the pod must have both a limit and a request + for memory and CPU, and the limit for each must have the same value as the + request. +- The CPU limit and CPU request must be an integer greater than or equal to 1. + +## Using Helm + +When using [Helm](../install-helm/), users must take care to set the +`proxy.cores` Helm variable in addition to `proxy.cpu.limit`, if +the criteria for cgroup-based CPU limits +[described above](#using-kubernetes-cpu-limits-and-requests) are not met. diff --git a/linkerd.io/content/2.16/tasks/configuring-proxy-discovery-cache.md b/linkerd.io/content/2.16/tasks/configuring-proxy-discovery-cache.md new file mode 100644 index 0000000000..7e72e25a89 --- /dev/null +++ b/linkerd.io/content/2.16/tasks/configuring-proxy-discovery-cache.md @@ -0,0 +1,82 @@ ++++ +title = "Configuring Proxy Discovery Cache" +description = "Changing proxy discover cache timeouts when using slow clients." ++++ + +The Linkerd proxy maintains in-memory state, such as discovery results, requests +and connections. This state is cached to allow the proxy to process traffic more +efficiently. Cached discovery results also improve resiliency in the face of +control plane outages. + +To ensure the CPU and memory footprint is low, cached entries are dropped if +they go unused for some amount of time. If an entry is not referenced within the +timeout, it will be evicted. If it is referenced, the timer resets. + +These timeouts are handle via these two config values: + +- `proxy.outboundDiscoveryCacheUnusedTimeout`: Defines the eviction timeout for + cached service discovery results, connections and clients. Defaults to `5s`. +- `proxy.inboundDiscoveryCacheUnusedTimeout`: Defines the eviction timeout for + cached policy discovery results. Defaults to `90s`. + +These values can be configured globally (affecting all the data plane) via Helm +or the CLI at install/upgrade time, or with annotations at a namespace or +workload level for affecting only workloads under a given namespace or specific +workloads. + +## Configuring via Helm + +When installing/upgrading Linkerd via [Helm](../install-helm/), you can use the +`proxy.outboundDiscoveryCacheUnusedTimeout` and +`proxy.inboundDiscoveryCacheUnusedTimeout` values. For example: + +```bash +helm upgrade linkerd-control-plane \ + --set proxy.outboundDiscoveryCacheUnusedTimeout=60s \ + --set proxy.inboundDiscoveryCacheUnusedTimeout=120s \ + linkerd/linkerd-control-plane +``` + +## Configuring via the Linkerd CLI + +As with any Helm value, these are available via the `--set` flag: + +```bash +linkerd upgrade \ + --set proxy.outboundDiscoveryCacheUnusedTimeout=60s \ + --set proxy.inboundDiscoveryCacheUnusedTimeout=120s \ + | kubectl apply -f - +``` + +## Configuring via Annotations + +You can also use the +`config.linkerd.io/proxy-outbound-discovery-cache-unused-timeout` and +`config.linkerd.io/proxy-inbound-discovery-cache-unused-timeout` annotations at +the namespace or pod template level: + +```yaml +kind: Deployment +apiVersion: apps/v1 +metadata: + name: my-deployment + # ... +spec: + template: + metadata: + annotations: + config.linkerd.io/proxy-outbound-discovery-cache-unused-timeout: '60s' + config.linkerd.io/proxy-inbound-discovery-cache-unused-timeout: '120s' + # ... +``` + +Note that these values need to be present before having injected your workloads. +For applying to existing workloads, you'll need to roll them out. + +## When to Change Timeouts + +In the vast majority of cases the default values will just work. You should +think about experimenting with larger values when using slow clients (5 RPS or +less across two or more replicas) where clients would experience unexpected +connection closure errors as soon as the control plane comes down. A higher +cache idle timeout for discovery results can help mitigating these problems. diff --git a/linkerd.io/content/2.16/tasks/configuring-retries.md b/linkerd.io/content/2.16/tasks/configuring-retries.md new file mode 100644 index 0000000000..394757f6ad --- /dev/null +++ b/linkerd.io/content/2.16/tasks/configuring-retries.md @@ -0,0 +1,51 @@ ++++ +title = "Configuring Retries" +description = "Configure Linkerd to automatically retry failing requests." ++++ + +In order for Linkerd to do automatic retries of failures, there are two +questions that need to be answered: + +- Which requests should be retried? +- How many times should the requests be retried? + +Both of these questions can be answered by adding annotations to the Service, +HTTPRoute, or GRPCRoute resource you're sending requests to. + +The reason why these pieces of configuration are required is because retries can +potentially be dangerous. Automatically retrying a request that changes state +(e.g. a request that submits a financial transaction) could potentially impact +your user's experience negatively. In addition, retries increase the load on +your system. A set of services that have requests being constantly retried +could potentially get taken down by the retries instead of being allowed time +to recover. + +Check out the [retries section](../books/#retries) of the books demo +for a tutorial of how to configure retries. + +{{< warning >}} +Retries configured in this way are **incompatible with ServiceProfiles**. If a +[ServiceProfile](../../features/service-profiles/) is defined for a Service, +proxies will use the ServiceProfile retry configuration and ignore any retry +annotations. +{{< /warning >}} + +## Retries + +For HTTPRoutes that are idempotent, you can add the `retry.linkerd.io/http: 5xx` +annotation which instructs Linkerd to retry any requests which fail with an HTTP +response status in the 500s. + +Note that requests will not be retried if the body exceeds 64KiB. + +## Retry Limits + +You can also add the `retry.linkerd.io/limit` annotation to specify the maximum +number of times a request may be retried. By default, this limit is `1`. + +## gRPC Retries + +Retries can also be configured for gRPC traffic by adding the +`retry.linkerd.io/grpc` annotation to a GRPCRoute or Service resource. The value +of this annotation is a comma seperated list of gRPC status codes that should +be retried. diff --git a/linkerd.io/content/2.16/tasks/configuring-timeouts.md b/linkerd.io/content/2.16/tasks/configuring-timeouts.md new file mode 100644 index 0000000000..45005a5d15 --- /dev/null +++ b/linkerd.io/content/2.16/tasks/configuring-timeouts.md @@ -0,0 +1,25 @@ ++++ +title = "Configuring Timeouts" +description = "Configure Linkerd to automatically fail requests that take too long." ++++ + +To limit how long Linkerd will wait before failing an outgoing request to +another service, you can configure timeouts. Timeouts specify the maximum amount +of time to wait for a response from a remote service to complete after the +request is sent. If the timeout elapses without receiving a response, Linkerd +will cancel the request and return a [504 Gateway Timeout] response. + +Timeouts can be specified by adding annotations to HTTPRoute, GRPCRoute, or +Service resources. + +{{< warning >}} +Timeouts configured in this way are **incompatible with ServiceProfiles**. If a +[ServiceProfile](../../features/service-profiles/) is defined for a Service, +proxies will use the ServiceProfile timeout configuration and ignore any timeout +annotations. +{{< /warning >}} + +## Timeouts + +Check out the [timeouts section](../books/#timeouts) of the books demo +for a tutorial of how to configure timeouts. diff --git a/linkerd.io/content/2.16/tasks/customize-install.md b/linkerd.io/content/2.16/tasks/customize-install.md new file mode 100644 index 0000000000..9d962b0189 --- /dev/null +++ b/linkerd.io/content/2.16/tasks/customize-install.md @@ -0,0 +1,104 @@ ++++ +title = "Customizing Linkerd's Configuration with Kustomize" +description = "Use Kustomize to modify Linkerd's configuration in a programmatic way." ++++ + +Instead of forking the Linkerd install and upgrade process, +[Kustomize](https://kustomize.io/) can be used to patch the output of `linkerd +install` in a consistent way. This allows customization of the install to add +functionality specific to installations. + +{{< trylpt >}} + +To get started, save the output of `linkerd install` to a YAML file. This will +be the base resource that Kustomize uses to patch and generate what is added +to your cluster. + +```bash +linkerd install > linkerd.yaml +``` + +{{< note >}} +When upgrading, make sure you populate this file with the content from `linkerd +upgrade`. Using the latest `kustomize` releases, it would be possible to +automate this with an [exec +plugin](https://github.com/kubernetes-sigs/kustomize/tree/master/docs/plugins#exec-plugins). +{{< /note >}} + +Next, create a `kustomization.yaml` file. This file will contain the +instructions for Kustomize listing the base resources and the transformations to +do on those resources. Right now, this looks pretty empty: + +```yaml +resources: +- linkerd.yaml +``` + +Now, let's look at how to do some example customizations. + +{{< note >}} +Kustomize allows as many patches, transforms and generators as you'd like. These +examples show modifications one at a time but it is possible to do as many as +required in a single `kustomization.yaml` file. +{{< /note >}} + +## Add PriorityClass + +There are a couple components in the control plane that can benefit from being +associated with a critical `PriorityClass`. While this configuration isn't +currently supported as a flag to `linkerd install`, it is not hard to add by +using Kustomize. + +First, create a file named `priority-class.yaml` that will create define a +`PriorityClass` resource. + +```yaml +apiVersion: scheduling.k8s.io/v1 +description: Used for critical linkerd pods that must run in the cluster, but + can be moved to another node if necessary. +kind: PriorityClass +metadata: + name: linkerd-critical +value: 1000000000 +``` + +{{< note >}} +`1000000000` is the max. allowed user-defined priority, adjust +accordingly. +{{< /note >}} + +Next, create a file named `patch-priority-class.yaml` that will contain the +overlay. This overlay will explain what needs to be modified. + +```yaml +apiVersion: apps/v1 +kind: Deployment +metadata: + name: linkerd-identity + namespace: linkerd +spec: + template: + spec: + priorityClassName: linkerd-critical +``` + +Then, add this as a strategic merge option to `kustomization.yaml`: + +```yaml +resources: +- priority-class.yaml +- linkerd.yaml +patchesStrategicMerge: +- patch-priority-class.yaml +``` + +Applying this to your cluster requires taking the output of `kustomize` +and piping it to `kubectl apply`. For example, you can run: + +```bash +# install the Linkerd CRDs +linkerd install --crds | kubectl apply -f - + +# install the Linkerd control plane manifests using Kustomize +kubectl kustomize . | kubectl apply -f - +``` diff --git a/linkerd.io/content/2.16/tasks/debugging-502s.md b/linkerd.io/content/2.16/tasks/debugging-502s.md new file mode 100644 index 0000000000..c92ae14906 --- /dev/null +++ b/linkerd.io/content/2.16/tasks/debugging-502s.md @@ -0,0 +1,75 @@ ++++ +title = "Debugging 502s" +description = "Determine why Linkerd is returning 502 responses." ++++ + +When the Linkerd proxy encounters connection errors while processing a +request, it will typically return an HTTP 502 (Bad Gateway) response. It can be +very difficult to figure out why these errors are happening because of the lack +of information available. + +## Why do these errors only occur when Linkerd is injected? + +Linkerd turns connection errors into HTTP 502 responses. This can make issues +which were previously undetected suddenly visible. This is a good thing. +Linkerd also changes the way that connections to your application are managed: +it re-uses persistent connections and establishes an additional layer of +connection tracking. Managing connections in this way can sometimes expose +underlying application or infrastructure issues such as misconfigured connection +timeouts which can manifest as connection errors. + +## Why can't Linkerd give a more informative error message? + +From the Linkerd proxy's perspective, it just sees its connections to the +application refused or closed without explanation. This makes it nearly +impossible for Linkerd to report any error message in the 502 response. However, +if these errors coincide with the introduction of Linkerd, it does suggest that +the problem is related to connection re-use or connection tracking. Here are +some common reasons why the application may be refusing or terminating +connections. + +## Common Causes of Connection Errors + +### Connection Idle Timeouts + +Some servers are configured with a connection idle timeout (for example, [this +timeout in the Go HTTP +server](https://golang.org/src/net/http/server.go#L2535]). This means that the +server will close any connections which do not receive any traffic in the +specified time period. If any requests are already in transit when the +connection shutdown is initiated, those requests will fail. This scenario is +likely to occur if you have traffic with a regular period (such as liveness +checks, for example) and an idle timeout equal to that period. + +To remedy this, ensure that your server's idle timeouts are sufficiently long so +that they do not close connections which are actively in use. + +### Half-closed Connection Timeouts + +During the shutdown of a TCP connection, each side of the connection must be +closed independently. When one side is closed but the other is not, the +connection is said to be "half-closed". It is valid for the connection to be in +this state, however, the operating system's connection tracker can lose track of +connections which remain half-closed for long periods of time. This can lead to +responses not being delivered and to port conflicts when establishing new +connections which manifest as 502 responses. + +You can use a [script to detect half-closed +connections](https://gist.github.com/adleong/0203b0864af2c29ddb821dd48f339f49) +on your Kubernetes cluster. If you detect a large number of half-closed +connections, you have a couple of ways to remedy the situation. + +One solution would be to update your application to not leave connections +half-closed for long periods of time or to stop using software that does this. +Unfortunately, this is not always an option. + +Another option is to increase the connection tracker's timeout for half-closed +connections. The default value of this timeout is platform dependent but is +typically 1 minute or 1 hour. You can view the current value by looking at the +file `/proc/sys/net/netfilter/nf_conntrack_tcp_timeout_close_wait` in any +injected container. To increase this value, you can use the +`--close-wait-timeout` flag with `linkerd inject`. Note, however, that setting +this flag will also set the `privileged` field of the proxy init container to +true. Setting this timeout to 1 hour is usually sufficient and matches the +[value used by +kube-proxy](https://github.com/kubernetes/kubernetes/issues/32551). diff --git a/linkerd.io/content/2.16/tasks/debugging-your-service.md b/linkerd.io/content/2.16/tasks/debugging-your-service.md new file mode 100644 index 0000000000..19a4297f7c --- /dev/null +++ b/linkerd.io/content/2.16/tasks/debugging-your-service.md @@ -0,0 +1,64 @@ ++++ +title = "Debugging gRPC applications with request tracing" +description = "Follow a long-form example of debugging a failing gRPC application using live request tracing." +aliases = [ + "/debugging-an-app/", + "../debugging-an-app/" +] ++++ + +The demo application emojivoto has some issues. Let's use that and Linkerd to +diagnose an application that fails in ways which are a little more subtle than +the entire service crashing. This guide assumes that you've followed the steps +in the [Getting Started](../../getting-started/) guide and have Linkerd and the +demo application running in a Kubernetes cluster. If you've not done that yet, +go get started and come back when you're done! + +If you glance at the Linkerd dashboard (by running the `linkerd viz dashboard` +command), you should see all the resources in the `emojivoto` namespace, +including the deployments. Each deployment running Linkerd shows success rate, +requests per second and latency percentiles. + +{{< fig src="/images/debugging/stat.png" title="Top Level Metrics" >}} + +That's pretty neat, but the first thing you might notice is that the success +rate is well below 100%! Click on `web` and let's dig in. + +{{< fig src="/images/debugging/octopus.png" title="Deployment Detail" >}} + +You should now be looking at the Deployment page for the web deployment. The first +thing you'll see here is that the web deployment is taking traffic from `vote-bot` +(a deployment included with emojivoto to continually generate a low level of +live traffic). The web deployment also has two outgoing dependencies, `emoji` +and `voting`. + +While the emoji deployment is handling every request from web successfully, it +looks like the voting deployment is failing some requests! A failure in a dependent +deployment may be exactly what is causing the errors that web is returning. + +Let's scroll a little further down the page, we'll see a live list of all +traffic that is incoming to *and* outgoing from `web`. This is interesting: + +{{< fig src="/images/debugging/web-top.png" title="Top" >}} + +There are two calls that are not at 100%: the first is vote-bot's call to the +`/api/vote` endpoint. The second is the `VoteDoughnut` call from the web +deployment to its dependent deployment, `voting`. Very interesting! Since +`/api/vote` is an incoming call, and `VoteDoughnut` is an outgoing call, this is +a good clue that this endpoint is what's causing the problem! + +Finally, to dig a little deeper, we can click on the `tap` icon in the far right +column. This will take us to the live list of requests that match only this +endpoint. You'll see `Unknown` under the `GRPC status` column. This is because +the requests are failing with a +[gRPC status code 2](https://godoc.org/google.golang.org/grpc/codes#Code), +which is a common error response as you can see from +[the code][code]. Linkerd is aware of gRPC's response classification without any +other configuration! + +{{< fig src="/images/debugging/web-tap.png" title="Tap" >}} + +At this point, we have everything required to get the endpoint fixed and restore +the overall health of our applications. + +[code]: https://github.com/BuoyantIO/emojivoto/blob/67faa83af33db647927946a672fc63ab7ce869aa/emojivoto-voting-svc/api/api.go#L21 diff --git a/linkerd.io/content/2.16/tasks/distributed-tracing.md b/linkerd.io/content/2.16/tasks/distributed-tracing.md new file mode 100644 index 0000000000..4fb66ff3e7 --- /dev/null +++ b/linkerd.io/content/2.16/tasks/distributed-tracing.md @@ -0,0 +1,310 @@ ++++ +title = "Distributed tracing with Linkerd" +description = "Use Linkerd to help instrument your application with distributed tracing." ++++ + +Using distributed tracing in practice can be complex, for a high level +explanation of what you get and how it is done, we've assembled a [list of +myths](https://linkerd.io/2019/08/09/service-mesh-distributed-tracing-myths/). + +This guide will walk you through configuring and enabling tracing for +[emojivoto](../../getting-started/#step-5-install-the-demo-app). Jump to the end +for some recommendations on the best way to make use of distributed tracing with +Linkerd. + +To use distributed tracing, you'll need to: + +- Install the Linkerd-Jaeger extension. +- Modify your application to emit spans. + +In the case of emojivoto, once all these steps are complete there will be a +topology that looks like: + +{{< fig src="/images/tracing/tracing-topology.svg" + title="Topology" >}} + +## Prerequisites + +- To use this guide, you'll need to have Linkerd installed on your cluster. + Follow the [Installing Linkerd Guide](../install/) if you haven't + already done this. + +## Install the Linkerd-Jaeger extension + +The first step of getting distributed tracing setup is installing the +Linkerd-Jaeger extension onto your cluster. This extension consists of a +collector, a Jaeger backend, and a Jaeger-injector. The collector consumes spans +emitted from the mesh and your applications and sends them to the Jaeger backend +which stores them and serves a dashboard to view them. The Jaeger-injector is +responsible for configuring the Linkerd proxies to emit spans. + +To install the Linkerd-Jaeger extension, run the command: + +```bash +linkerd jaeger install | kubectl apply -f - +``` + +You can verify that the Linkerd-Jaeger extension was installed correctly by +running: + +```bash +linkerd jaeger check +``` + +## Install Emojivoto + + Add emojivoto to your cluster and inject it with the Linkerd proxy: + + ```bash + linkerd inject https://run.linkerd.io/emojivoto.yml | kubectl apply -f - + ``` + +Before moving onto the next step, make sure everything is up and running with +`kubectl`: + +```bash +kubectl -n emojivoto rollout status deploy/web +``` + +## Modify the application + +Unlike most features of a service mesh, distributed tracing requires modifying +the source of your application. Tracing needs some way to tie incoming requests +to your application together with outgoing requests to dependent services. To do +this, some headers are added to each request that contain a unique ID for the +trace. Linkerd uses the [b3 +propagation](https://github.com/openzipkin/b3-propagation) format to tie these +things together. + +We've already modified emojivoto to instrument its requests with this +information, this +[commit](https://github.com/BuoyantIO/emojivoto/commit/47a026c2e4085f4e536c2735f3ff3788b0870072) +shows how this was done. For most programming languages, it simply requires the +addition of a client library to take care of this. Emojivoto uses the OpenCensus +client, but others can be used. + +To enable tracing in emojivoto, run: + +```bash +kubectl -n emojivoto set env --all deploy OC_AGENT_HOST=collector.linkerd-jaeger:55678 +``` + +This command will add an environment variable that enables the applications to +propagate context and emit spans. + +## Explore Jaeger + +With `vote-bot` starting traces for every request, spans should now be showing +up in Jaeger. To get to the UI, run: + +```bash +linkerd jaeger dashboard +``` + +{{< fig src="/images/tracing/jaeger-empty.png" + title="Jaeger" >}} + +You can search for any service in the dropdown and click Find Traces. `vote-bot` +is a great way to get started. + +{{< fig src="/images/tracing/jaeger-search.png" + title="Search" >}} + +Clicking on a specific trace will provide all the details, you'll be able to see +the spans for every proxy! + +{{< fig src="/images/tracing/example-trace.png" + title="Search" >}} + +There sure are a lot of `linkerd-proxy` spans in that output. Internally, the +proxy has a server and client side. When a request goes through the proxy, it is +received by the server and then issued by the client. For a single request that +goes between two meshed pods, there will be a total of 4 spans. Two will be on +the source side as the request traverses that proxy and two will be on the +destination side as the request is received by the remote proxy. + +## Integration with the Dashboard + +After having set up the Linkerd-Jaeger extension, as the proxy adds application +meta-data as trace attributes, users can directly jump into related resources +traces directly from the linkerd-web dashboard by clicking the Jaeger icon in +the Metrics Table, as shown below: + +{{< fig src="/images/tracing/linkerd-jaeger-ui.png" + title="Linkerd-Jaeger" >}} + +To obtain that functionality you need to install (or upgrade) the Linkerd-Viz +extension specifying the service exposing the Jaeger UI. By default, this would +be something like this: + +```bash +linkerd viz install --set jaegerUrl=jaeger.linkerd-jaeger:16686 \ + | kubectl apply -f - +``` + +## Cleanup + +To cleanup, uninstall the Linkerd-Jaeger extension along with emojivoto by running: + +```bash +linkerd jaeger uninstall | kubectl delete -f - +kubectl delete ns emojivoto +``` + +## Bring your own Jaeger + +If you have an existing Jaeger installation, you can configure the OpenCensus +collector to send traces to it instead of the Jaeger instance built into the +Linkerd-Jaeger extension. + +Create the following YAML file which disables the built in Jaeger instance +and specifies the OpenCensus collector's config. + +```bash +cat < jaeger-linkerd.yaml +jaeger: + enabled: false + +collector: + config: | + receivers: + otlp: + protocols: + grpc: + http: + opencensus: + zipkin: + jaeger: + protocols: + grpc: + thrift_http: + thrift_compact: + thrift_binary: + processors: + batch: + extensions: + health_check: + exporters: + jaeger: + endpoint: my-jaeger-collector.my-jaeger-ns:14250 + tls: + insecure: true + service: + extensions: [health_check] + pipelines: + traces: + receivers: [otlp,opencensus,zipkin,jaeger] + processors: [batch] + exporters: [jaeger] +EOF +linkerd jaeger install --values ./jaeger-linkerd.yaml | kubectl apply -f - +``` + +You'll want to ensure that the `exporters.jaeger.endpoint` which is +`my-jaeger-collector.my-jaeger-ns:14250` in this example is set to a value +appropriate for your environment. This should point to a Jaeger Collector +on port 14250. + +The YAML file is merged with the [Helm values.yaml][helm-values] which shows +other possible values that can be configured. + + +[helm-values]: https://github.com/linkerd/linkerd2/blob/main/jaeger/charts/linkerd-jaeger/values.yaml + +It is also possible to manually edit the OpenCensus configuration to have it +export to any backend which it supports. See the +[OpenCensus documentation](https://opencensus.io/service/exporters/) for a full +list. + +## Troubleshooting + +### I don't see any spans for the proxies + +The Linkerd proxy uses the [b3 +propagation](https://github.com/openzipkin/b3-propagation) format. Some client +libraries, such as Jaeger, use different formats by default. You'll want to +configure your client library to use the b3 format to have the proxies +participate in traces. + +## Recommendations + +### Ingress + +The ingress is an especially important component for distributed tracing because +it typically creates the root span of each trace and is responsible for deciding +if that trace should be sampled or not. Having the ingress make all sampling +decisions ensures that either an entire trace is sampled or none of it is, and +avoids creating "partial traces". + +Distributed tracing systems all rely on services to propagate metadata about the +current trace from requests that they receive to requests that they send. This +metadata, called the trace context, is usually encoded in one or more request +headers. There are many different trace context header formats and while we hope +that the ecosystem will eventually converge on open standards like [W3C +tracecontext](https://www.w3.org/TR/trace-context/), we only use the [b3 +format](https://github.com/openzipkin/b3-propagation) today. Being one of the +earliest widely used formats, it has the widest support, especially among +ingresses like Nginx. + +This reference architecture uses a traffic generator called `vote-bot` instead +of an ingress to create the root span of each trace. + +### Client Library + +While it is possible for services to manually propagate trace propagation +headers, it's usually much easier to use a library which does three things: + +- Propagates the trace context from incoming request headers to outgoing request + headers +- Modifies the trace context (i.e. starts a new span) +- Transmits this data to a trace collector + +We recommend using OpenCensus in your service and configuring it with: + +- [b3 propagation](https://github.com/openzipkin/b3-propagation) (this is the + default) +- [the OpenCensus agent + exporter](https://opencensus.io/exporters/supported-exporters/go/ocagent/) + +The OpenCensus agent exporter will export trace data to the OpenCensus collector +over a gRPC API. The details of how to configure OpenCensus will vary language +by language, but there are [guides for many popular +languages](https://opencensus.io/quickstart/). You can also see an end-to-end +example of this in Go with our example application, +[Emojivoto](https://github.com/adleong/emojivoto). + +You may notice that the OpenCensus project is in maintenance mode and will +become part of [OpenTelemetry](https://opentelemetry.io/). Unfortunately, +OpenTelemetry is not yet production ready and so OpenCensus remains our +recommendation for the moment. + +It is possible to use many other tracing client libraries as well. Just make +sure the b3 propagation format is being used and the client library can export +its spans in a format the collector has been configured to receive. + +## Collector: OpenCensus + +The OpenCensus collector receives trace data from the OpenCensus agent exporter +and potentially does translation and filtering before sending that data to +Jaeger. Having the OpenCensus exporter send to the OpenCensus collector gives us +a lot of flexibility: we can switch to any backend that OpenCensus supports +without needing to interrupt the application. + +## Backend: Jaeger + +Jaeger is one of the most widely used tracing backends and for good reason: it +is easy to use and does a great job of visualizing traces. However, [any backend +supported by OpenCensus](https://opencensus.io/service/exporters/) can be used +instead. + +## Linkerd + +If your application is injected with Linkerd, the Linkerd proxy will participate +in the traces and will also emit trace data to the OpenCensus collector. This +enriches the trace data and allows you to see exactly how much time requests are +spending in the proxy and on the wire. + +While Linkerd can only actively participate in traces that use the b3 +propagation format, Linkerd will always forward unknown request headers +transparently, which means it will never interfere with traces that use other +propagation formats. diff --git a/linkerd.io/content/2.16/tasks/exporting-metrics.md b/linkerd.io/content/2.16/tasks/exporting-metrics.md new file mode 100644 index 0000000000..bb96a559e5 --- /dev/null +++ b/linkerd.io/content/2.16/tasks/exporting-metrics.md @@ -0,0 +1,177 @@ ++++ +title = "Exporting Metrics" +description = "Integrate Linkerd's metrics with your existing metrics infrastructure." +aliases = [ + "../prometheus/", + "../observability/prometheus/", + "../observability/exporting-metrics/" +] ++++ + +Linkerd provides an extensive set of metrics for all traffic that passes through +its data plane. These metrics are collected at the proxy level and reported on +the proxy's metrics endpoint. + +Typically, consuming these metrics is not done from the proxies directly, as +each proxy only provides a portion of the full picture. Instead, a separate tool +is used to collect metrics from all proxies and aggregate them together for +consumption. + +{{< trylpt >}} + +One easy option is the [linkerd-viz](../../features/dashboard/) extension, which +will create an on-cluster Prometheus instance as well as dashboards and CLI +commands that make use of it. However, this extension only keeps metrics data +for a brief window of time (6 hours) and does not persist data across restarts. +Depending on your use case, you may want to export these metrics into an +external metrics store. + +There are several options for how to export these metrics to a destination +outside of the cluster: + +- [Federate data from linkerd-viz to your own Prometheus cluster](#federation) +- [Use a Prometheus integration with linkerd-viz](#integration) +- [Extract data from linkerd-viz via Prometheus's APIs](#api) +- [Gather data from the proxies directly without linkerd-viz](#proxy) + +## Using the Prometheus federation API {#federation} + +If you are already using Prometheus as your own metrics store, we recommend +taking advantage of Prometheus's *federation* API, which is designed exactly for +the use case of copying data from one Prometheus to another. + +Simply add the following item to your `scrape_configs` in your Prometheus config +file (replace `{{.Namespace}}` with the namespace where the Linkerd Viz +extension is running): + +```yaml +- job_name: 'linkerd' + kubernetes_sd_configs: + - role: pod + namespaces: + names: ['{{.Namespace}}'] + + relabel_configs: + - source_labels: + - __meta_kubernetes_pod_container_name + action: keep + regex: ^prometheus$ + + honor_labels: true + metrics_path: '/federate' + + params: + 'match[]': + - '{job="linkerd-proxy"}' + - '{job="linkerd-controller"}' +``` + +Alternatively, if you prefer to use Prometheus' ServiceMonitors to configure +your Prometheus, you can use this ServiceMonitor YAML (replace `{{.Namespace}}` +with the namespace where Linkerd Viz extension is running): + +```yaml +apiVersion: monitoring.coreos.com/v1 +kind: ServiceMonitor +metadata: + labels: + k8s-app: linkerd-prometheus + release: monitoring + name: linkerd-federate + namespace: {{.Namespace}} +spec: + endpoints: + - interval: 30s + scrapeTimeout: 30s + params: + match[]: + - '{job="linkerd-proxy"}' + - '{job="linkerd-controller"}' + path: /federate + port: admin-http + honorLabels: true + relabelings: + - action: keep + regex: '^prometheus$' + sourceLabels: + - '__meta_kubernetes_pod_container_name' + jobLabel: app + namespaceSelector: + matchNames: + - {{.Namespace}} + selector: + matchLabels: + component: prometheus +``` + +That's it! Your Prometheus cluster is now configured to federate Linkerd's +metrics from Linkerd's internal Prometheus instance. + +Once the metrics are in your Prometheus, Linkerd's proxy metrics will have the +label `job="linkerd-proxy"` and Linkerd's control plane metrics will have the +label `job="linkerd-controller"`. For more information on specific metric and +label definitions, have a look at [Proxy Metrics](../../reference/proxy-metrics/). + +For more information on Prometheus' `/federate` endpoint, have a look at the +[Prometheus federation docs](https://prometheus.io/docs/prometheus/latest/federation/). + +## Using a Prometheus integration {#integration} + +If you are not using Prometheus as your own long-term data store, you may be +able to leverage one of Prometheus's [many +integrations](https://prometheus.io/docs/operating/integrations/) to +automatically extract data from Linkerd's Prometheus instance into the data +store of your choice. Please refer to the Prometheus documentation for details. + +## Extracting data via Prometheus's APIs {#api} + +If neither Prometheus federation nor Prometheus integrations are options for +you, it is possible to call Prometheus's APIs to extract data from Linkerd. + +For example, you can call the federation API directly via a command like: + +```bash +curl -G \ + --data-urlencode 'match[]={job="linkerd-proxy"}' \ + --data-urlencode 'match[]={job="linkerd-controller"}' \ + http://prometheus.linkerd-viz.svc.cluster.local:9090/federate +``` + +{{< note >}} +If your data store is outside the Kubernetes cluster, it is likely that +you'll want to set up +[ingress](https://kubernetes.io/docs/concepts/services-networking/ingress/) +at a domain name of your choice with authentication. +{{< /note >}} + +Similar to the `/federate` API, Prometheus provides a JSON query API to +retrieve all metrics: + +```bash +curl http://prometheus.linkerd-viz.svc.cluster.local:9090/api/v1/query?query=request_total +``` + +## Gathering data from the Linkerd proxies directly {#proxy} + +Finally, if you want to avoid Linkerd's Prometheus entirely, you can query the +Linkerd proxies directly on their `/metrics` endpoint. + +For example, to view `/metrics` from a single Linkerd proxy, running in the +`linkerd` namespace: + +```bash +kubectl -n linkerd port-forward \ + $(kubectl -n linkerd get pods \ + -l linkerd.io/control-plane-ns=linkerd \ + -o jsonpath='{.items[0].metadata.name}') \ + 4191:4191 +``` + +and then: + +```bash +curl localhost:4191/metrics +``` + +Alternatively, `linkerd diagnostics proxy-metrics` can be used to retrieve +proxy metrics for a given workload. diff --git a/linkerd.io/content/2.16/tasks/exposing-dashboard.md b/linkerd.io/content/2.16/tasks/exposing-dashboard.md new file mode 100644 index 0000000000..bb11cc8e03 --- /dev/null +++ b/linkerd.io/content/2.16/tasks/exposing-dashboard.md @@ -0,0 +1,249 @@ ++++ +title = "Exposing the Dashboard" +description = "Make it easy for others to access Linkerd and Grafana dashboards without the CLI." +aliases = [ + "../dns-rebinding/", +] ++++ + +Instead of using `linkerd viz dashboard` every time you'd like to see what's +going on, you can expose the dashboard via an ingress. This will also expose +Grafana, if you have it linked against Linkerd viz through the `grafana.url` +setting. + +{{< pagetoc >}} + +## Nginx + +### Nginx with basic auth + +A sample ingress definition is: + +```yaml +apiVersion: v1 +kind: Secret +type: Opaque +metadata: + name: web-ingress-auth + namespace: linkerd-viz +data: + auth: YWRtaW46JGFwcjEkbjdDdTZnSGwkRTQ3b2dmN0NPOE5SWWpFakJPa1dNLgoK +--- +# apiVersion: networking.k8s.io/v1beta1 # for k8s < v1.19 +apiVersion: networking.k8s.io/v1 +kind: Ingress +metadata: + name: web-ingress + namespace: linkerd-viz + annotations: + nginx.ingress.kubernetes.io/upstream-vhost: $service_name.$namespace.svc.cluster.local:8084 + nginx.ingress.kubernetes.io/configuration-snippet: | + proxy_set_header Origin ""; + proxy_hide_header l5d-remote-ip; + proxy_hide_header l5d-server-id; + nginx.ingress.kubernetes.io/auth-type: basic + nginx.ingress.kubernetes.io/auth-secret: web-ingress-auth + nginx.ingress.kubernetes.io/auth-realm: 'Authentication Required' +spec: + ingressClassName: nginx + rules: + - host: dashboard.example.com + http: + paths: + - path: / + pathType: Prefix + backend: + service: + name: web + port: + number: 8084 +``` + +This exposes the dashboard at `dashboard.example.com` and protects it with basic +auth using admin/admin. Take a look at the [ingress-nginx][nginx-auth] +documentation for details on how to change the username and password. + +### Nginx with oauth2-proxy + +A more secure alternative to basic auth is using an authentication proxy, such +as [oauth2-proxy](https://oauth2-proxy.github.io/oauth2-proxy/). + +For reference on how to deploy and configure oauth2-proxy in kubernetes, see +this [blog post by Don +Bowman](https://blog.donbowman.ca/2019/02/14/using-single-sign-on-oauth2-across-many-sites-in-kubernetes/). + +tl;dr: If you deploy oauth2-proxy via the [helm +chart](https://github.com/helm/charts/tree/master/stable/oauth2-proxy), the +following values are required: + +```yaml +config: + existingSecret: oauth2-proxy + configFile: |- + email_domains = [ "example.com" ] + upstreams = [ "file:///dev/null" ] + +ingress: + enabled: true + annotations: + kubernetes.io/ingress.class: nginx + path: /oauth2 +ingress: + hosts: + - linkerd.example.com +``` + +Where the `oauth2-proxy` secret would contain the required [oauth2 +config](https://oauth2-proxy.github.io/oauth2-proxy/docs/configuration/oauth_provider) +such as, `client-id` `client-secret` and `cookie-secret`. + +Once setup, a sample ingress would be: + +```yaml +# apiVersion: networking.k8s.io/v1beta1 # for k8s < v1.19 +apiVersion: networking.k8s.io/v1 +kind: Ingress +metadata: + name: web + namespace: linkerd-viz + annotations: + nginx.ingress.kubernetes.io/upstream-vhost: $service_name.$namespace.svc.cluster.local:8084 + nginx.ingress.kubernetes.io/configuration-snippet: | + proxy_set_header Origin ""; + proxy_hide_header l5d-remote-ip; + proxy_hide_header l5d-server-id; + nginx.ingress.kubernetes.io/auth-signin: https://$host/oauth2/start?rd=$escaped_request_uri + nginx.ingress.kubernetes.io/auth-url: https://$host/oauth2/auth +spec: + ingressClassName: nginx + rules: + - host: linkerd.example.com + http: + paths: + - path: / + pathType: Prefix + backend: + service: + name: web + port: + number: 8084 +``` + +## Traefik + +A sample ingress definition is: + +```yaml +apiVersion: v1 +kind: Secret +type: Opaque +metadata: + name: web-ingress-auth + namespace: linkerd-viz +data: + auth: YWRtaW46JGFwcjEkbjdDdTZnSGwkRTQ3b2dmN0NPOE5SWWpFakJPa1dNLgoK +--- +# apiVersion: networking.k8s.io/v1beta1 # for k8s < v1.19 +apiVersion: networking.k8s.io/v1 +kind: Ingress +metadata: + name: web-ingress + namespace: linkerd-viz + annotations: + ingress.kubernetes.io/custom-request-headers: l5d-dst-override:web.linkerd-viz.svc.cluster.local:8084 + traefik.ingress.kubernetes.io/auth-type: basic + traefik.ingress.kubernetes.io/auth-secret: web-ingress-auth +spec: + ingressClassName: traefik + rules: + - host: dashboard.example.com + http: + paths: + - path: / + pathType: Prefix + backend: + service: + name: web + port: + number: 8084 +``` + +This exposes the dashboard at `dashboard.example.com` and protects it with basic +auth using admin/admin. Take a look at the [Traefik][traefik-auth] documentation +for details on how to change the username and password. + +## Ambassador + +Ambassador works by defining a [mapping +](https://www.getambassador.io/docs/latest/topics/using/intro-mappings/) as an +annotation on a service. + +The below annotation exposes the dashboard at `dashboard.example.com`. + +```yaml + annotations: + getambassador.io/config: |- + --- + apiVersion: getambassador.io/v2 + kind: Mapping + name: web-mapping + host: dashboard.example.com + prefix: / + host_rewrite: web.linkerd-viz.svc.cluster.local:8084 + service: web.linkerd-viz.svc.cluster.local:8084 +``` + +## DNS Rebinding Protection + +To prevent [DNS-rebinding](https://en.wikipedia.org/wiki/DNS_rebinding) attacks, +the dashboard rejects any request whose `Host` header is not `localhost`, +`127.0.0.1` or the service name `web.linkerd-viz.svc`. + +Note that this protection also covers the [Grafana +dashboard](../../reference/architecture/#grafana). + +The ingress-nginx config above uses the +`nginx.ingress.kubernetes.io/upstream-vhost` annotation to properly set the +upstream `Host` header. Traefik on the other hand doesn't offer that option, so +you'll have to manually set the required `Host` as explained below. + +### Tweaking Host Requirement + +If your HTTP client (Ingress or otherwise) doesn't allow to rewrite the `Host` +header, you can change the validation regexp that the dashboard server uses, +which is fed into the `web` deployment via the `enforced-host` container +argument. + +If you're managing Linkerd with Helm, then you can set the host using the +`enforcedHostRegexp` value. + +Another way of doing that is through Kustomize, as explained in [Customizing +Installation](../customize-install/), using an overlay like this one: + +```yaml +apiVersion: apps/v1 +kind: Deployment +metadata: + name: web +spec: + template: + spec: + containers: + - name: web + args: + - -linkerd-controller-api-addr=linkerd-controller-api.linkerd.svc.cluster.local:8085 + - -linkerd-metrics-api-addr=metrics-api.linkerd-viz.svc.cluster.local:8085 + - -cluster-domain=cluster.local + - -grafana-addr=grafana.linkerd-viz.svc.cluster.local:3000 + - -controller-namespace=linkerd + - -viz-namespace=linkerd-viz + - -log-level=info + - -enforced-host=^dashboard\.example\.com$ +``` + +If you want to completely disable the `Host` header check, simply use a +catch-all regexp `.*` for `-enforced-host`. + +[nginx-auth]: +https://github.com/kubernetes/ingress-nginx/blob/master/docs/examples/auth/basic/README.md +[traefik-auth]: https://docs.traefik.io/middlewares/basicauth/ diff --git a/linkerd.io/content/2.16/tasks/extensions.md b/linkerd.io/content/2.16/tasks/extensions.md new file mode 100644 index 0000000000..162862c384 --- /dev/null +++ b/linkerd.io/content/2.16/tasks/extensions.md @@ -0,0 +1,65 @@ ++++ +title = "Using extensions" +description = "Add functionality to Linkerd with optional extensions." ++++ + +Linkerd extensions are components which can be added to a Linkerd installation +to enable additional functionality. By default, the following extensions are +available: + +* [viz](../../features/dashboard/): Metrics and visibility features +* [jaeger](../distributed-tracing/): Distributed tracing +* [multicluster](../multicluster/): Cross-cluster routing + +But other extensions are also possible. Read on for more! + +## Installing extensions + +Before installing any extensions, make sure that you have already [installed +Linkerd](../install/) and validated your cluster with `linkerd check`. + +Then, you can install the extension with the extension's `install` command. For +example, to install the `viz` extension, you can use: + +```bash +linkerd viz install | kubectl apply -f - +``` + +For built-in extensions, such as `viz`, `jaeger`, and `multicluster`, that's +all you need to do. Of course, these extensions can also be installed by with +Helm by installing that extension's Helm chart. + +Once an extension has been installed, it will be included as part of the +standard `linkerd check` command. + +## Installing third-party extensions + +Third-party extensions require one additional step: you must download the +extension's CLI and put it in your path. This will allow you to invoke the +extension CLI through the Linkerd CLI. (E.g. any call to `linkerd foo` will +automatically call the `linkerd-foo` binary, if it is found on your path.) + +## Listing extensions + +Every extension creates a Kubernetes namespace with the `linkerd.io/extension` +label. Thus, you can list all extensions installed on your cluster by running: + +```bash +kubectl get ns -l linkerd.io/extension +``` + +## Upgrading extensions + +Unless otherwise stated, extensions do not persist any configuration in the +cluster. To upgrade an extension, run the install again with a newer version +of the extension CLI or with a different set of configuration flags. + +## Uninstalling extensions + +All extensions have an `uninstall` command that should be used to gracefully +clean up all resources owned by an extension. For example, to uninstall the +foo extension, run: + +```bash +linkerd foo uninstall | kubectl delete -f - +``` diff --git a/linkerd.io/content/2.16/tasks/external-prometheus.md b/linkerd.io/content/2.16/tasks/external-prometheus.md new file mode 100644 index 0000000000..0703bc166b --- /dev/null +++ b/linkerd.io/content/2.16/tasks/external-prometheus.md @@ -0,0 +1,174 @@ ++++ +title = "Bringing your own Prometheus" +description = "Use an existing Prometheus instance with Linkerd." ++++ + +Even though [the linkerd-viz extension](../../features/dashboard/) comes with +its own Prometheus instance, there can be cases where using an external +instance makes more sense for various reasons. + +This tutorial shows how to configure an external Prometheus instance to scrape both +the control plane as well as the proxy's metrics in a format that is consumable +both by a user as well as Linkerd control plane components like web, etc. + +{{< trylpt >}} + +There are two important points to tackle here. + +- Configuring external Prometheus instance to get the Linkerd metrics. +- Configuring the linkerd-viz extension to use that Prometheus. + +## Prometheus Scrape Configuration + +The following scrape configuration has to be applied to the external +Prometheus instance. + +{{< note >}} +The below scrape configuration is a [subset of the full `linkerd-prometheus` +scrape +configuration](https://github.com/linkerd/linkerd2/blob/main/viz/charts/linkerd-viz/templates/prometheus.yaml#L47-L151). +{{< /note >}} + +Before applying, it is important to replace templated values (present in `{{}}`) +with direct values for the below configuration to work. + +```yaml + - job_name: 'linkerd-controller' + kubernetes_sd_configs: + - role: pod + namespaces: + names: + - '{{.Values.linkerdNamespace}}' + - '{{.Values.namespace}}' + relabel_configs: + - source_labels: + - __meta_kubernetes_pod_container_port_name + action: keep + regex: admin-http + - source_labels: [__meta_kubernetes_pod_container_name] + action: replace + target_label: component + + - job_name: 'linkerd-service-mirror' + kubernetes_sd_configs: + - role: pod + relabel_configs: + - source_labels: + - __meta_kubernetes_pod_label_linkerd_io_control_plane_component + - __meta_kubernetes_pod_container_port_name + action: keep + regex: linkerd-service-mirror;admin-http$ + - source_labels: [__meta_kubernetes_pod_container_name] + action: replace + target_label: component + + - job_name: 'linkerd-proxy' + kubernetes_sd_configs: + - role: pod + relabel_configs: + - source_labels: + - __meta_kubernetes_pod_container_name + - __meta_kubernetes_pod_container_port_name + - __meta_kubernetes_pod_label_linkerd_io_control_plane_ns + action: keep + regex: ^{{default .Values.proxyContainerName "linkerd-proxy" .Values.proxyContainerName}};linkerd-admin;{{.Values.linkerdNamespace}}$ + - source_labels: [__meta_kubernetes_namespace] + action: replace + target_label: namespace + - source_labels: [__meta_kubernetes_pod_name] + action: replace + target_label: pod + # special case k8s' "job" label, to not interfere with prometheus' "job" + # label + # __meta_kubernetes_pod_label_linkerd_io_proxy_job=foo => + # k8s_job=foo + - source_labels: [__meta_kubernetes_pod_label_linkerd_io_proxy_job] + action: replace + target_label: k8s_job + # drop __meta_kubernetes_pod_label_linkerd_io_proxy_job + - action: labeldrop + regex: __meta_kubernetes_pod_label_linkerd_io_proxy_job + # __meta_kubernetes_pod_label_linkerd_io_proxy_deployment=foo => + # deployment=foo + - action: labelmap + regex: __meta_kubernetes_pod_label_linkerd_io_proxy_(.+) + # drop all labels that we just made copies of in the previous labelmap + - action: labeldrop + regex: __meta_kubernetes_pod_label_linkerd_io_proxy_(.+) + # __meta_kubernetes_pod_label_linkerd_io_foo=bar => + # foo=bar + - action: labelmap + regex: __meta_kubernetes_pod_label_linkerd_io_(.+) + # Copy all pod labels to tmp labels + - action: labelmap + regex: __meta_kubernetes_pod_label_(.+) + replacement: __tmp_pod_label_$1 + # Take `linkerd_io_` prefixed labels and copy them without the prefix + - action: labelmap + regex: __tmp_pod_label_linkerd_io_(.+) + replacement: __tmp_pod_label_$1 + # Drop the `linkerd_io_` originals + - action: labeldrop + regex: __tmp_pod_label_linkerd_io_(.+) + # Copy tmp labels into real labels + - action: labelmap + regex: __tmp_pod_label_(.+) +``` + +You will also need to ensure that your Prometheus scrape interval is shorter +than the time duration range of any Prometheus queries. In order to ensure the +web dashboard and Linkerd Grafana work correctly, we recommend a 10 second +scrape interval: + +```yaml + global: + scrape_interval: 10s + scrape_timeout: 10s + evaluation_interval: 10s +``` + +The running configuration of the builtin prometheus can be used as a reference. + +```bash +kubectl -n linkerd-viz get configmap prometheus-config -o yaml +``` + +## Linkerd-Viz Extension Configuration + +Linkerd's viz extension components like `metrics-api`, etc depend +on the Prometheus instance to power the dashboard and CLI. + +The `prometheusUrl` field gives you a single place through +which all these components can be configured to an external Prometheus URL. +This is allowed both through the CLI and Helm. + +### CLI + +This can be done by passing a file with the above field to the `values` flag, +which is available through `linkerd viz install` command. + +```yaml +prometheusUrl: existing-prometheus.xyz:9090 +``` + +Once applied, this configuration is not persistent across installs. +The same has to be passed again by the user during re-installs, upgrades, etc. + +When using an external Prometheus and configuring the `prometheusUrl` +field, Linkerd's Prometheus will still be included in installation. +If you wish to disable it, be sure to include the +following configuration as well: + +```yaml +prometheus: + enabled: false +``` + +### Helm + +The same configuration can be applied through `values.yaml` when using Helm. +Once applied, Helm makes sure that the configuration is +persistent across upgrades. + +More information on installation through Helm can be found +[here](../install-helm/) diff --git a/linkerd.io/content/2.16/tasks/fault-injection.md b/linkerd.io/content/2.16/tasks/fault-injection.md new file mode 100644 index 0000000000..2defa3d8a7 --- /dev/null +++ b/linkerd.io/content/2.16/tasks/fault-injection.md @@ -0,0 +1,211 @@ ++++ +title = "Injecting Faults" +description = "Practice chaos engineering by injecting faults into services with Linkerd." ++++ + +It is easy to inject failures into applications by using the +[HTTPRoute](../../reference/httproute/) resource to redirect a percentage of +traffic to a specific backend. This backend is completely flexible and can +return whatever responses you want - 500s, timeouts or even crazy payloads. + +The [books demo](../books/) is a great way to show off this behavior. The +overall topology looks like: + +{{< fig src="/images/books/topology.png" title="Topology" >}} + +In this guide, you will split some of the requests from `webapp` to `books`. +Most requests will end up at the correct `books` destination, however some of +them will be redirected to a faulty backend. This backend will return 500s for +every request and inject faults into the `webapp` service. No code changes are +required and as this method is configuration driven, it is a process that can be +added to integration tests and CI pipelines. If you are really living the chaos +engineering lifestyle, fault injection could even be used in production. + +## Prerequisites + +To use this guide, you'll need a Kubernetes cluster running: + +- Linkerd and Linkerd-Viz. If you haven't installed these yet, follow the + [Installing Linkerd Guide](../install/). + +## Setup the service + +First, add the [books](../books/) sample application to your cluster: + +```bash +kubectl create ns booksapp && \ + linkerd inject https://run.linkerd.io/booksapp.yml | \ + kubectl -n booksapp apply -f - +``` + +As this manifest is used as a demo elsewhere, it has been configured with an +error rate. To show how fault injection works, the error rate needs to be +removed so that there is a reliable baseline. To increase success rate for +booksapp to 100%, run: + +```bash +kubectl -n booksapp patch deploy authors \ + --type='json' \ + -p='[{"op":"remove", "path":"/spec/template/spec/containers/0/env/2"}]' +``` + +After a little while, the stats will show 100% success rate. You can verify this +by running: + +```bash +linkerd viz -n booksapp stat deploy +``` + +The output will end up looking at little like: + +```bash +NAME MESHED SUCCESS RPS LATENCY_P50 LATENCY_P95 LATENCY_P99 TCP_CONN +authors 1/1 100.00% 7.1rps 4ms 26ms 33ms 6 +books 1/1 100.00% 8.6rps 6ms 73ms 95ms 6 +traffic 1/1 - - - - - - +webapp 3/3 100.00% 7.9rps 20ms 76ms 95ms 9 +``` + +## Create the faulty backend + +Injecting faults into booksapp requires a service that is configured to return +errors. To do this, you can start NGINX and configure it to return 500s by +running: + +```bash +cat <}} +Two versions of the HTTPRoute resource may be used with Linkerd: + +- The upstream version provided by the Gateway API, with the + `gateway.networking.k8s.io` API group +- A Linkerd-specific CRD provided by Linkerd, with the `policy.linkerd.io` API + group + +The two HTTPRoute resource definitions are similar, but the Linkerd version +implements experimental features not yet available with the upstream Gateway API +resource definition. See [the HTTPRoute reference +documentation](../../reference/httproute/#linkerd-and-gateway-api-httproutes) +for details. +{{< /note >}} + +When Linkerd sees traffic going to the `books` service, it will send 9/10 +requests to the original service and 1/10 to the error injector. You can see +what this looks like by running `stat` and filtering explicitly to just the +requests from `webapp`: + +```bash +linkerd viz stat -n booksapp deploy --from deploy/webapp +NAME MESHED SUCCESS RPS LATENCY_P50 LATENCY_P95 LATENCY_P99 TCP_CONN +authors 1/1 98.15% 4.5rps 3ms 36ms 39ms 3 +books 1/1 100.00% 6.7rps 5ms 27ms 67ms 6 +error-injector 1/1 0.00% 0.7rps 1ms 1ms 1ms 3 +``` + +We can also look at the success rate of the `webapp` overall to see the effects +of the error injector. The success rate should be approximately 90%: + +```bash +linkerd viz stat -n booksapp deploy/webapp +NAME MESHED SUCCESS RPS LATENCY_P50 LATENCY_P95 LATENCY_P99 TCP_CONN +webapp 3/3 88.42% 9.5rps 14ms 37ms 75ms 10 +``` + +## Cleanup + +To remove everything in this guide from your cluster, run: + +```bash +kubectl delete ns booksapp +``` diff --git a/linkerd.io/content/2.16/tasks/flagger.md b/linkerd.io/content/2.16/tasks/flagger.md new file mode 100644 index 0000000000..3e4562d762 --- /dev/null +++ b/linkerd.io/content/2.16/tasks/flagger.md @@ -0,0 +1,550 @@ ++++ +title = "Progressive Delivery" +description = "Reduce deployment risk by automating canary releases based on service metrics." +aliases = ["canary-release"] ++++ + +Linkerd's [dynamic request routing](../../features/request-routing/) allows you +to dynamically shift traffic between services. This can be used to implement +lower-risk deployment strategies like blue-green deploys and canaries. + +But simply shifting traffic from one version of a service to the next is just +the beginning. We can combine traffic splitting with [Linkerd's automatic +*golden metrics* telemetry](../../features/telemetry/) and drive traffic +decisions based on the observed metrics. For example, we can gradually shift +traffic from an old deployment to a new one while continually monitoring its +success rate. If at any point the success rate drops, we can shift traffic back +to the original deployment and back out of the release. Ideally, our users +remain happy throughout, not noticing a thing! + +In this tutorial, we'll show you how to use two different progressive delivery +tools: [Flagger](https://flagger.app/) and +[Argo Rollouts](https://argoproj.github.io/rollouts/) and how to tie Linkerd's +metrics and request routing together in a control loop, allowing for +fully-automated, metrics-aware canary deployments. + +{{< trylpt >}} + +## Prerequisites + +To use this guide, you'll need a Kubernetes cluster running: + +- Linkerd and Linkerd-Viz. If you haven't installed these yet, follow the + [Installing Linkerd Guide](../install/). + +## Flagger + +### Install Flagger + +While Linkerd will be managing the actual traffic routing, Flagger automates +the process of creating new Kubernetes resources, watching metrics and +incrementally sending users over to the new version. To add Flagger to your +cluster and have it configured to work with Linkerd, run: + +```bash +kubectl apply -k github.com/fluxcd/flagger/kustomize/linkerd +``` + +This command adds: + +- The canary + [CRD](https://kubernetes.io/docs/concepts/extend-kubernetes/api-extension/custom-resources/) + that enables configuring how a rollout should occur. +- RBAC which grants Flagger permissions to modify all the resources that it + needs to, such as deployments and services. +- A Flagger controller configured to interact with the Linkerd control plane. + +To watch until everything is up and running, you can use `kubectl`: + +```bash +kubectl -n flagger-system rollout status deploy/flagger +``` + +### Set up the demo + +This demo consists of three components: a load generator, a deployment and a +frontend. The deployment creates a pod that returns some information such as +name. You can use the responses to watch the incremental rollout as Flagger +orchestrates it. A load generator simply makes it easier to execute the rollout +as there needs to be some kind of active traffic to complete the operation. +Together, these components have a topology that looks like: + +{{< fig src="/images/canary/simple-topology.svg" + title="Topology" >}} + +To add these components to your cluster and include them in the Linkerd +[data plane](../../reference/architecture/#data-plane), run: + +```bash +kubectl create ns test && \ + kubectl apply -f https://run.linkerd.io/flagger.yml +``` + +Verify that everything has started up successfully by running: + +```bash +kubectl -n test rollout status deploy podinfo +``` + +Check it out by forwarding the frontend service locally and opening +[http://localhost:8080](http://localhost:8080) locally by running: + +```bash +kubectl -n test port-forward svc/frontend 8080 +``` + +{{< note >}} +Request routing occurs on the *client* side of the connection and not the +server side. Any requests coming from outside the mesh will not be shifted and +will always be directed to the primary backend. A service of type `LoadBalancer` +will exhibit this behavior as the source is not part of the mesh. To shift +external traffic, add your ingress controller to the mesh. +{{< /note>}} + +### Configure the release + +Before changing anything, you need to configure how a release should be rolled +out on the cluster. The configuration is contained in a +[Canary](https://docs.flagger.app/tutorials/linkerd-progressive-delivery) +and MetricTemplate definition. To apply to your cluster, run: + +```bash +kubectl apply -f - < 8080/TCP 96m +podinfo ClusterIP 10.7.252.86 9898/TCP 96m +podinfo-canary ClusterIP 10.7.245.17 9898/TCP 23m +podinfo-primary ClusterIP 10.7.249.63 9898/TCP 23m +``` + +At this point, the topology looks a little like: + +{{< fig src="/images/canary/initialized.svg" + title="Initialized" >}} + +{{< note >}} +This guide barely touches all the functionality provided by Flagger. Make sure +to read the [documentation](https://docs.flagger.app/) if you're interested in +combining canary releases with HPA, working off custom metrics or doing other +types of releases such as A/B testing. +{{< /note >}} + +### Start the rollout + +As a system, Kubernetes resources have two major sections: the spec and status. +When a controller sees a spec, it tries as hard as it can to make the status of +the current system match the spec. With a deployment, if any of the pod spec +configuration is changed, a controller will kick off a rollout. By default, the +deployment controller will orchestrate a [rolling +update](https://kubernetes.io/docs/tutorials/kubernetes-basics/update/update-intro/). + +In this example, Flagger will notice that a deployment's spec changed and start +orchestrating the canary rollout. To kick this process off, you can update the +image to a new version by running: + +```bash +kubectl -n test set image deployment/podinfo \ + podinfod=quay.io/stefanprodan/podinfo:1.7.1 +``` + +Any kind of modification to the pod's spec such as updating an environment +variable or annotation would result in the same behavior as updating the image. + +On update, the canary deployment (`podinfo`) will be scaled up. Once ready, +Flagger will begin to update the HTTPRoute incrementally. With a configured +stepWeight of 10, each increment will increase the weight of `podinfo` by 10. +For each period, the success rate will be observed and as long as it is over the +threshold of 99%, Flagger will continue the rollout. To watch this entire +process, run: + +```bash +kubectl -n test get ev --watch +``` + +While an update is occurring, the resources and traffic will look like this at a +high level: + +{{< fig src="/images/canary/ongoing.svg" + title="Ongoing" >}} + +After the update is complete, this picture will go back to looking just like the +figure from the previous section. + +{{< note >}} +You can toggle the image tag between `1.7.1` and `1.7.0` to start the rollout +again. +{{< /note >}} + +### Resource + +The canary resource updates with the current status and progress. You can watch +by running: + +```bash +watch kubectl -n test get canary +``` + +Behind the scenes, Flagger is splitting traffic between the primary and canary +backends by updating the HTTPRoute resource. To watch how this configuration +changes over the rollout, run: + +```bash +kubectl -n test get httproute.gateway.networking.k8s.io podinfo -o yaml +``` + +Each increment will increase the weight of `podinfo-canary` and decrease the +weight of `podinfo-primary`. Once the rollout is successful, the weight of +`podinfo-primary` will be set back to 100 and the underlying canary deployment +(`podinfo`) will be scaled down. + +### Metrics + +As traffic shifts from the primary deployment to the canary one, Linkerd +provides visibility into what is happening to the destination of requests. The +metrics show the backends receiving traffic in real time and measure the success +rate, latencies and throughput. From the CLI, you can watch this by running: + +```bash +watch linkerd viz -n test stat deploy --from deploy/load +``` + +### Browser + +Visit again [http://localhost:8080](http://localhost:8080). Refreshing the page +will show toggling between the new version and a different header color. +Alternatively, running `curl http://localhost:8080` will return a JSON response +that looks something like: + +```bash +{ + "hostname": "podinfo-primary-74459c7db8-lbtxf", + "version": "1.7.0", + "revision": "4fc593f42c7cd2e7319c83f6bfd3743c05523883", + "color": "blue", + "message": "greetings from podinfo v1.7.0", + "goos": "linux", + "goarch": "amd64", + "runtime": "go1.11.2", + "num_goroutine": "6", + "num_cpu": "8" +} +``` + +This response will slowly change as the rollout continues. + +### Cleanup + +To cleanup, remove the Flagger controller from your cluster and delete the +`test` namespace by running: + +```bash +kubectl delete -k github.com/fluxcd/flagger/kustomize/linkerd && \ + kubectl delete ns test +``` + +## Argo Rollouts + +[Argo Rollouts](https://argo-rollouts.readthedocs.io) is another tool which can +use Linkerd to perform incremental canary rollouts based on traffic metrics. + +### Install Argo Rollouts + +Similarly to Flagger, Argo Rollouts will automate the process of creating new +Kubernetes resources, watching metrics and will use Linkerd to incrementally +shift traffic to the new version. To install Argo Rollouts, run: + +```bash +kubectl create namespace argo-rollouts && \ + kubectl apply -n argo-rollouts -f https://github.com/argoproj/argo-rollouts/releases/latest/download/install.yaml +``` + +To use Argo Rollouts with Linkerd, you will also need to enable the GatewayAPI +routing plugin and grant it the necessary RBAC to ready and modify HTTPRoutes: + +```bash +kubectl apply -f - <}} + +## Generating the certificates with `step` + +### Trust anchor certificate + +First generate the root certificate with its private key (using `step` version +0.10.1): + +```bash +step certificate create root.linkerd.cluster.local ca.crt ca.key \ +--profile root-ca --no-password --insecure +``` + +This generates the `ca.crt` and `ca.key` files. The `ca.crt` file is what you +need to pass to the `--identity-trust-anchors-file` option when installing +Linkerd with the CLI, and the `identityTrustAnchorsPEM` value when installing +the `linkerd-control-plane` chart with Helm. + +Note we use `--no-password --insecure` to avoid encrypting those files with a +passphrase. + +For a longer-lived trust anchor certificate, pass the `--not-after` argument +to the step command with the desired value (e.g. `--not-after=87600h`). + +### Issuer certificate and key + +Then generate the intermediate certificate and key pair that will be used to +sign the Linkerd proxies' CSR. + +```bash +step certificate create identity.linkerd.cluster.local issuer.crt issuer.key \ +--profile intermediate-ca --not-after 8760h --no-password --insecure \ +--ca ca.crt --ca-key ca.key +``` + +This will generate the `issuer.crt` and `issuer.key` files. + +## Passing the certificates to Linkerd + +You can finally provide these files when installing Linkerd with the CLI: + +```bash +# first, install the Linkerd CRDs +linkerd install --crds | kubectl apply -f - + +# install the Linkerd control plane, with the certificates we just generated. +linkerd install \ + --identity-trust-anchors-file ca.crt \ + --identity-issuer-certificate-file issuer.crt \ + --identity-issuer-key-file issuer.key \ + | kubectl apply -f - +``` + +Or when installing with Helm, first install the `linkerd-crds` chart: + +```bash +helm install linkerd-crds linkerd/linkerd-crds -n linkerd --create-namespace +``` + +Then install the `linkerd-control-plane` chart: + +```bash +helm install linkerd-control-plane -n linkerd \ + --set-file identityTrustAnchorsPEM=ca.crt \ + --set-file identity.issuer.tls.crtPEM=issuer.crt \ + --set-file identity.issuer.tls.keyPEM=issuer.key \ + linkerd/linkerd-control-plane +``` diff --git a/linkerd.io/content/2.16/tasks/getting-per-route-metrics.md b/linkerd.io/content/2.16/tasks/getting-per-route-metrics.md new file mode 100644 index 0000000000..971a01acc7 --- /dev/null +++ b/linkerd.io/content/2.16/tasks/getting-per-route-metrics.md @@ -0,0 +1,24 @@ ++++ +title = "Getting Per-Route Metrics" +description = "Configure per-route metrics for your application." ++++ + +To get per-route metrics, you must create [HTTPRoute] resources. If a route has +a `parent_ref` which points to a **Service** resource, Linkerd will generate +outbound per-route traffic metrics for all HTTP traffic that it sends to that +Service. If a route has a `parent_ref` which points to a **Server** resource, +Linkerd will generate inbound per-route traffic metrcs for all HTTP traffic that +it receives on that Server. Note that an [HTTPRoute] can have multiple +`parent_ref`s which means that the same [HTTPRoute] resource can be used to +describe both outbound and inbound routes. + +For a tutorial that shows off per-route metrics, check out the +[books demo](../books/#service-profiles). + +{{< note >}} +Routes configured in service profiles are different from [HTTPRoute] resources. +If a [ServiceProfile](../../features/service-profiles/) is defined for a +Service, proxies will ignore any [HTTPRoute] for that Service. +{{< /note >}} + +[HTTPRoute]: ../../features/httproute/ diff --git a/linkerd.io/content/2.16/tasks/gitops.md b/linkerd.io/content/2.16/tasks/gitops.md new file mode 100644 index 0000000000..31c9620be9 --- /dev/null +++ b/linkerd.io/content/2.16/tasks/gitops.md @@ -0,0 +1,546 @@ ++++ +title = "Using GitOps with Linkerd with Argo CD" +description = "Use Argo CD to manage Linkerd installation and upgrade lifecycle." ++++ + +GitOps is an approach to automate the management and delivery of your Kubernetes +infrastructure and applications using Git as a single source of truth. It +usually utilizes some software agents to detect and reconcile any divergence +between version-controlled artifacts in Git with what's running in a cluster. + +This guide will show you how to set up +[Argo CD](https://argoproj.github.io/argo-cd/) to manage the installation and +upgrade of Linkerd using a GitOps workflow. + +{{< trylpt >}} + +Specifically, this guide provides instructions on how to securely generate and +manage Linkerd's mTLS private keys and certificates using +[Sealed Secrets](https://github.com/bitnami-labs/sealed-secrets) and +[cert-manager](https://cert-manager.io). It will also show you how to integrate +the [auto proxy injection](../../features/proxy-injection/) feature into your +workflow. Finally, this guide conclude with steps to upgrade Linkerd to a newer +version following a GitOps workflow. + +{{< fig alt="Linkerd GitOps workflow" + title="Linkerd GitOps workflow" + src="/images/gitops/architecture.png" >}} + +The software and tools used in this guide are selected for demonstration +purposes only. Feel free to choose others that are most suited for your +requirements. + +You will need to clone this +[example repository](https://github.com/linkerd/linkerd-examples) to your local +machine and replicate it in your Kubernetes cluster following the steps defined +in the next section. + +This guide uses the [step cli](https://smallstep.com/cli/) to create certificates +used by the Linkerd clusters to enforce mTLS, so make sure you have installed +step for your environment. + +## Set up the repositories + +Clone the example repository to your local machine: + +```sh +git clone https://github.com/linkerd/linkerd-examples.git +``` + +This repository will be used to demonstrate Git operations like `add`, `commit` +and `push` later in this guide. + +Add a new remote endpoint to the repository to point to the in-cluster Git +server, which will be set up in the next section: + +```sh +cd linkerd-examples + +git remote add git-server git://localhost/linkerd-examples.git +``` + +{{< note >}} +To simplify the steps in this guide, we will be interacting with the in-cluster +Git server via port-forwarding. Hence, the remote endpoint that we just created +targets your localhost. +{{< /note >}} + +Deploy the Git server to the `scm` namespace in your cluster: + +```sh +kubectl apply -f gitops/resources/git-server.yaml +``` + +Later in this guide, Argo CD will be configured to watch the repositories hosted +by this Git server. + +{{< note >}} +This Git server is configured to run as a +[daemon](https://git-scm.com/book/en/v2/Git-on-the-Server-Git-Daemon) over the +`git` protocol, with unauthenticated access to the Git data. This setup is not +recommended for production use. +{{< /note >}} + +Confirm that the Git server is healthy: + +```sh +kubectl -n scm rollout status deploy/git-server +``` + +Clone the example repository to your in-cluster Git server: + +```sh +git_server=`kubectl -n scm get po -l app=git-server -oname | awk -F/ '{ print $2 }'` + +kubectl -n scm exec "${git_server}" -- \ + git clone --bare https://github.com/linkerd/linkerd-examples.git +``` + +Confirm that the remote repository is successfully cloned: + +```sh +kubectl -n scm exec "${git_server}" -- ls -al /git/linkerd-examples.git +``` + +Confirm that you can push from the local repository to the remote repository +via port-forwarding: + +```sh +kubectl -n scm port-forward "${git_server}" 9418 & + +git push git-server master +``` + +## Install the Argo CD CLI + +Before proceeding, install the Argo CD CLI in your local machine by following +the [instructions](https://argo-cd.readthedocs.io/en/stable/cli_installation/) +relevant to your OS. + +## Deploy Argo CD + +Install Argo CD: + +```sh +kubectl create ns argocd + +kubectl -n argocd apply -f \ + https://raw.githubusercontent.com/argoproj/argo-cd/stable/manifests/install.yaml +``` + +Confirm that all the pods are ready: + +```sh +for deploy in "dex-server" "redis" "repo-server" "server"; \ + do kubectl -n argocd rollout status deploy/argocd-${deploy}; \ +done + +kubectl -n argocd rollout status statefulset/argocd-application-controller +``` + +Use port-forward to access the Argo CD dashboard: + +```sh +kubectl -n argocd port-forward svc/argocd-server 8080:443 \ + > /dev/null 2>&1 & +``` + +The Argo CD dashboard is now accessible at +[https://localhost:8080](https://localhost:8080/), using the default `admin` +username and +[password](https://argoproj.github.io/argo-cd/getting_started/#4-login-using-the-cli). + +Authenticate the Argo CD CLI: + +```sh +password=`kubectl -n argocd get secret argocd-initial-admin-secret -o jsonpath="{.data.password}" | base64 -d` + +argocd login 127.0.0.1:8080 \ + --username=admin \ + --password="${password}" \ + --insecure +``` + +## Configure project access and permissions + +Set up the `demo` +[project](https://argoproj.github.io/argo-cd/user-guide/projects/) to group our +[applications](https://argoproj.github.io/argo-cd/operator-manual/declarative-setup/#applications): + +```sh +kubectl apply -f gitops/project.yaml +``` + +This project defines the list of permitted resource kinds and target clusters +that our applications can work with. + +Confirm that the project is deployed correctly: + +```sh +argocd proj get demo +``` + +On the dashboard: + +{{< fig alt="New project in Argo CD dashboard" + title="New project in Argo CD dashboard" + src="/images/gitops/dashboard-project.png" >}} + +### Deploy the applications + +Deploy the `main` application which serves as the "parent" for all the other +applications: + +```sh +kubectl apply -f gitops/main.yaml +``` + +{{< note >}} +The "app of apps" pattern is commonly used in Argo CD workflows to bootstrap +applications. See the Argo CD documentation for more +[information](https://argoproj.github.io/argo-cd/operator-manual/cluster-bootstrapping/#app-of-apps-pattern). +{{< /note >}} + +Confirm that the `main` application is deployed successfully: + +```sh +argocd app get main +``` + +Sync the `main` application: + +```sh +argocd app sync main +``` + +{{< fig alt="Synchronize the main application" + title="Synchronize the main application" + src="/images/gitops/dashboard-applications-main-sync.png" >}} + +Notice that only the `main` application is synchronized. + +Next, we will synchronize the remaining applications individually. + +### Deploy cert-manager + +Synchronize the `cert-manager` application: + +```sh +argocd app sync cert-manager +``` + +Confirm that cert-manager is running: + +```sh +for deploy in "cert-manager" "cert-manager-cainjector" "cert-manager-webhook"; \ + do kubectl -n cert-manager rollout status deploy/${deploy}; \ +done +``` + +{{< fig alt="Synchronize the cert-manager application" + title="Synchronize the cert-manager application" + center="true" + src="/images/gitops/dashboard-cert-manager-sync.png" >}} + +### Deploy Sealed Secrets + +Synchronize the `sealed-secrets` application: + +```sh +argocd app sync sealed-secrets +``` + +Confirm that sealed-secrets is running: + +```sh +kubectl -n kube-system rollout status deploy/sealed-secrets +``` + +{{< fig alt="Synchronize the sealed-secrets application" + title="Synchronize the sealed-secrets application" + center="true" + src="/images/gitops/dashboard-sealed-secrets-sync.png" >}} + +### Create mTLS trust anchor + +Before proceeding with deploying Linkerd, we will need to create the mTLS trust +anchor. Then we will also set up the `linkerd-bootstrap` application to manage +the trust anchor certificate. + +Create a new mTLS trust anchor private key and certificate: + +```sh +step certificate create root.linkerd.cluster.local sample-trust.crt sample-trust.key \ + --profile root-ca \ + --no-password \ + --not-after 43800h \ + --insecure +``` + +Confirm the details (encryption algorithm, expiry date, SAN etc.) of the new +trust anchor: + +```sh +step certificate inspect sample-trust.crt +``` + +Before creating the `SealedSecret`, make sure you have installed the `kubeseal` +utility, as instructed +[here](https://github.com/bitnami-labs/sealed-secrets/releases) + +Now create the `SealedSecret` resource to store the encrypted trust anchor: + +```sh +kubectl create ns linkerd +kubectl -n linkerd create secret tls linkerd-trust-anchor \ + --cert sample-trust.crt \ + --key sample-trust.key \ + --dry-run=client -oyaml | \ +kubeseal --controller-name=sealed-secrets -oyaml - | \ +kubectl patch -f - \ + -p '{"spec": {"template": {"type":"kubernetes.io/tls", "metadata": {"labels": {"linkerd.io/control-plane-component":"identity", "linkerd.io/control-plane-ns":"linkerd"}, "annotations": {"linkerd.io/created-by":"linkerd/cli stable-2.12.0"}}}}}' \ + --dry-run=client \ + --type=merge \ + --local -oyaml > gitops/resources/linkerd/trust-anchor.yaml +``` + +This will overwrite the existing `SealedSecret` resource in your local +`gitops/resources/linkerd/trust-anchor.yaml` file. We will push this change to +the in-cluster Git server. + +Confirm that only the `spec.encryptedData` is changed: + +```sh +git diff gitops/resources/linkerd/trust-anchor.yaml +``` + +Commit and push the new trust anchor secret to your in-cluster Git server: + +```sh +git add gitops/resources/linkerd/trust-anchor.yaml + +git commit -m "update encrypted trust anchor" + +git push git-server master +``` + +Confirm the commit is successfully pushed: + +```sh +kubectl -n scm exec "${git_server}" -- git --git-dir linkerd-examples.git log -1 +``` + +## Deploy linkerd-bootstrap + +Synchronize the `linkerd-bootstrap` application: + +```sh +argocd app sync linkerd-bootstrap +``` + +{{< note >}} +If the issuer and certificate resources appear in a degraded state, it's likely +that the SealedSecrets controller failed to decrypt the sealed +`linkerd-trust-anchor` secret. Check the SealedSecrets controller for error logs. + +For debugging purposes, the sealed resource can be retrieved using the +`kubectl -n linkerd get sealedsecrets linkerd-trust-anchor -oyaml` command. +Ensure that this resource matches the +`gitops/resources/linkerd/trust-anchor.yaml` file you pushed to the in-cluster +Git server earlier. +{{< /note >}} + +{{< fig alt="Synchronize the linkerd-bootstrap application" + title="Synchronize the linkerd-bootstrap application" + src="/images/gitops/dashboard-linkerd-bootstrap-sync.png" >}} + +SealedSecrets should have created a secret containing the decrypted trust +anchor. Retrieve the decrypted trust anchor from the secret: + +```sh +trust_anchor=`kubectl -n linkerd get secret linkerd-trust-anchor -ojsonpath="{.data['tls\.crt']}" | base64 -d -w 0 -` +``` + +Confirm that it matches the decrypted trust anchor certificate you created +earlier in your local `sample-trust.crt` file: + +```sh +diff -b \ + <(echo "${trust_anchor}" | step certificate inspect -) \ + <(step certificate inspect sample-trust.crt) +``` + +### Deploy Linkerd + +Now we are ready to install Linkerd. The decrypted trust anchor we just +retrieved will be passed to the installation process using the +`identityTrustAnchorsPEM` parameter. + +Prior to installing Linkerd, note that the `identityTrustAnchorsPEM` parameter +is set to an "empty" certificate string: + +```sh +argocd app get linkerd-control-plane -ojson | \ + jq -r '.spec.source.helm.parameters[] | select(.name == "identityTrustAnchorsPEM") | .value' +``` + +{{< fig alt="Empty default trust anchor" + title="Empty default trust anchor" + src="/images/gitops/dashboard-trust-anchor-empty.png" >}} + +We will override this parameter in the `linkerd` application with the value of +`${trust_anchor}`. + +Locate the `identityTrustAnchorsPEM` variable in your local +`gitops/argo-apps/linkerd-control-plane.yaml` file, and set its `value` to that +of `${trust_anchor}`. + +Ensure that the multi-line string is indented correctly. E.g., + +```yaml + source: + chart: linkerd-control-plane + repoURL: https://helm.linkerd.io/stable + targetRevision: 1.9.0 + helm: + parameters: + - name: identityTrustAnchorsPEM + value: | + -----BEGIN CERTIFICATE----- + MIIBlTCCATygAwIBAgIRAKQr9ASqULvXDeyWpY1LJUQwCgYIKoZIzj0EAwIwKTEn + MCUGA1UEAxMeaWRlbnRpdHkubGlua2VyZC5jbHVzdGVyLmxvY2FsMB4XDTIwMDkx + ODIwMTAxMFoXDTI1MDkxNzIwMTAxMFowKTEnMCUGA1UEAxMeaWRlbnRpdHkubGlu + a2VyZC5jbHVzdGVyLmxvY2FsMFkwEwYHKoZIzj0CAQYIKoZIzj0DAQcDQgAE+PUp + IR74PsU+geheoyseycyquYyes5eeksIb5FDm8ptOXQ2xPcBpvesZkj6uIyS3k4qV + E0S9VtMmHNeycL7446NFMEMwDgYDVR0PAQH/BAQDAgEGMBIGA1UdEwEB/wQIMAYB + Af8CAQEwHQYDVR0OBBYEFHypCh7hiSLNxsKhMylQgqD9t7NNMAoGCCqGSM49BAMC + A0cAMEQCIEWhI86bXWEd4wKTnG07hBfBuVCT0bxopaYnn3wRFx7UAiAwXyh5uaVg + MwCC5xL+PM+bm3PRqtrmI6TocWH07GbMxg== + -----END CERTIFICATE----- +``` + +Confirm that only one `spec.source.helm.parameters.value` field is changed: + +```sh +git diff gitops/argo-apps/linkerd-control-plane.yaml +``` + +Commit and push the changes to the Git server: + +```sh +git add gitops/argo-apps/linkerd-control-plane.yaml + +git commit -m "set identityTrustAnchorsPEM parameter" + +git push git-server master +``` + +Synchronize the `main` application: + +```sh +argocd app sync main +``` + +Confirm that the new trust anchor is picked up by the `linkerd` application: + +```sh +argocd app get linkerd-control-plane -ojson | \ + jq -r '.spec.source.helm.parameters[] | select(.name == "identityTrustAnchorsPEM") | .value' +``` + +{{< fig alt="Override mTLS trust anchor" + title="Override mTLS trust anchor" + src="/images/gitops/dashboard-trust-anchor-override.png" >}} + +Synchronize the `linkerd-crds` and `linkerd-control-plane` applications: + +```sh +argocd app sync linkerd-crds +argocd app sync linkerd-control-plane +``` + +Check that Linkerd is ready: + +```sh +linkerd check +``` + +{{< fig alt="Synchronize Linkerd" + title="Synchronize Linkerd" + src="/images/gitops/dashboard-linkerd-sync.png" >}} + +### Test with emojivoto + +Deploy emojivoto to test auto proxy injection: + +```sh +argocd app sync emojivoto +``` + +Check that the applications are healthy: + +```sh +for deploy in "emoji" "vote-bot" "voting" "web" ; \ + do kubectl -n emojivoto rollout status deploy/${deploy}; \ +done +``` + +{{< fig alt="Synchronize emojivoto" + title="Synchronize emojivoto" + src="/images/gitops/dashboard-emojivoto-sync.png" >}} + +### Upgrade Linkerd to 2.12.1 + +(Assuming 2.12.1 has already been released ;-) ) + +Use your editor to change the `spec.source.targetRevision` field to `1.9.3` +(that's the Helm chart version corresponding to linkerd stable-2.12.1) in the +`gitops/argo-apps/linkerd-control-plane.yaml` file: + +Confirm that only the `targetRevision` field is changed: + +```sh +git diff gitops/argo-apps/linkerd-control-plane.yaml +``` + +Commit and push this change to the Git server: + +```sh +git add gitops/argo-apps/linkerd-control-plane.yaml + +git commit -m "upgrade Linkerd to 2.12.1" + +git push git-server master +``` + +Synchronize the `main` application: + +```sh +argocd app sync main +``` + +Synchronize the `linkerd-control-plane` application: + +```sh +argocd app sync linkerd-control-plane +``` + +Confirm that the upgrade completed successfully: + +```sh +linkerd check +``` + +Confirm the new version of the control plane: + +```sh +linkerd version +``` + +### Clean up + +All the applications can be removed by removing the `main` application: + +```sh +argocd app delete main --cascade=true +``` diff --git a/linkerd.io/content/2.16/tasks/graceful-shutdown.md b/linkerd.io/content/2.16/tasks/graceful-shutdown.md new file mode 100644 index 0000000000..c4584184ca --- /dev/null +++ b/linkerd.io/content/2.16/tasks/graceful-shutdown.md @@ -0,0 +1,164 @@ ++++ +title = "Graceful Pod Shutdown" +description = "Gracefully handle pod shutdown signal." ++++ + +When Kubernetes begins to terminate a pod, it starts by sending all containers +in that pod a TERM signal. When the Linkerd proxy sidecar receives this signal, +it will immediately begin a graceful shutdown where it refuses all new requests +and allows existing requests to complete before shutting down. + +This means that if the pod's main container attempts to make any new network +calls after the proxy has received the TERM signal, those network calls will +fail. This also has implications for clients of the terminating pod and for +job resources. + +## Graceful shutdown in Kubernetes + +[pod-lifetime]: https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle/#pod-lifetime +[pod-termination]: https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle/#pod-termination +[pod-forced]: https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle/#pod-termination-forced +[hook]: https://kubernetes.io/docs/concepts/containers/container-lifecycle-hooks/#container-hooks + +Pods are ephemeral in nature, and may be [killed due to a number of different +reasons][pod-lifetime], such as: + +* Being scheduled on a node that fails (in which case the pod will be deleted). +* A lack of resources on the node where the pod is scheduled (in which case the + pod is evicted). +* Manual deletion, e.g through `kubectl delete`. + +Since pods fundamentally represent processes running on nodes in a cluster, it +is important to ensure that when killed, they have enough time to clean-up and +terminate gracefully. When a pod is deleted, the [container runtime will send a +TERM signal][pod-termination] to each container running in the pod. + +By default, Kubernetes will wait [30 seconds][pod-forced] to allow processes to +handle the TERM signal. This is known as the **grace period** within which a +process may shut itself down gracefully. If the grace period time runs out, and +the process hasn't gracefully exited, the container runtime will send a KILL +signal, abruptly stopping the process. Grace periods may be overridden at a +workload level. This is useful when a process needs additional time to clean-up +(e.g making network calls, writing to disk, etc.) + +Kubernetes also allows operators of services to define lifecycle hooks for +their containers. Important in the context of graceful shutdown is the +[`preStop`][hook] hook, that will be called when a container is terminated due +to: + +* An API request. +* Liveness/Readiness probe failure. +* Resource contention. + +If a pod has a preStop hook for a container, and the pod receives a TERM signal +from the container runtime, the preStop hook will be executed, and it must +finish before the TERM signal can be propagated to the container itself. It is +worth noting in this case that the **grace period** will start when the preStop +hook is executed, not when the container first starts processing the TERM +signal. + +## Configuration options for graceful shutdown + +Linkerd offers a few options to configure pods and containers to gracefully shutdown. + +* `--wait-before-seconds`: can be used as an install value (either through the + CLI or through Helm), or alternatively, through a [configuration + annotation](../../reference/proxy-configuration/). This will add a + `preStop` hook to the proxy container to delay its handling of the TERM + signal. This will only work when the conditions described above are satisfied + (i.e container runtime sends the TERM signal) +* `config.linkerd.io/shutdown-grace-period`: is an annotation that can be used + on workloads to configure the graceful shutdown time for the _proxy_. If the + period elapses before the proxy has had a chance to gracefully shut itself + down, it will forcefully shut itself down thereby closing all currently open + connections. By default, the shutdown grace period is 120 seconds. This grace + period will be respected regardless of where the TERM signal comes from; the + proxy may receive a shutdown signal from the container runtime, a different + process (e.g a script that sends TERM), or from a networked request to its + shutdown endpoint (only possible on the loopback interface). The proxy will + delay its handling of the TERM signal until all of its open connections have + completed. This option is particularly useful to close long-running + connections that would otherwise prevent the proxy from shutting down + gracefully. +* `linkerd-await`: is a binary that wraps (and spawns) another process, and it + is commonly used to wait for proxy readiness. The await binary can be used + with a `--shutdown` option, in which case, after the process it has wrapped + finished, it will send a shutdown request to the proxy. When used for + graceful shutdown, typically the entrypoint for containers need to be changed + to linkerd-await. + +Depending on the usecase, one option (or utility) might be preferred over the +other. To aid with some common cases, suggestions are given below on what to do +when confronted with slow updating clients and with job resources that will not +complete. + +## Slow Updating Clients + +Before Kubernetes terminates a pod, it first removes that pod from the endpoints +resource of any services that pod is a member of. This means that clients of +that service should stop sending traffic to the pod before it is terminated. +However, certain clients can be slow to receive the endpoints update and may +attempt to send requests to the terminating pod after that pod's proxy has +already received the TERM signal and begun graceful shutdown. Those requests +will fail. + +To mitigate this, use the `--wait-before-exit-seconds` flag with +`linkerd inject` to delay the Linkerd proxy's handling of the TERM signal for +a given number of seconds using a `preStop` hook. This delay gives slow clients +additional time to receive the endpoints update before beginning graceful +shutdown. To achieve max benefit from the option, the main container should have +its own `preStop` hook with the sleep command inside which has a smaller period +than is set for the proxy sidecar. And none of them must be bigger than +`terminationGracePeriodSeconds` configured for the entire pod. + +For example, + +```yaml + # application container + lifecycle: + preStop: + exec: + command: + - /bin/bash + - -c + - sleep 20 + + # for entire pod + terminationGracePeriodSeconds: 160 +``` + +## Graceful shutdown of Job and Cronjob Resources + +Pods which are part of Job or Cronjob resources will run until all of the +containers in the pod complete. However, the Linkerd proxy container runs +continuously until it receives a TERM signal. Since Kubernetes does not give the +proxy a means to know when the Cronjob has completed, by default, Job and +Cronjob pods which have been meshed will continue to run even once the main +container has completed. You can address this either by running Linkerd as a +native sidecar or by manually shutting down the proxy. + +### Native Sidecar + +If you use the `--set proxy.nativeSidecar=true` flag when installing Linkerd, the +Linkerd proxy will run as a [sidecar container](https://kubernetes.io/docs/concepts/workloads/pods/sidecar-containers/) +and will automatically shutdown when the main containers in the pod terminate. +Native sidecars were added in Kubernetes v1.28 and are available by default in +Kubernetes v1.29. + +### Manual shutdown + +Alternatively, you can issue a POST to the `/shutdown` endpoint on the proxy +once the application completes (e.g. via `curl -X POST +http://localhost:4191/shutdown`). This will terminate the proxy gracefully and +allow the Job or Cronjob to complete. These shutdown requests must come on the +loopback interface, i.e. from within the same Kubernetes pod. + +One convenient way to call this endpoint is to wrap your application with the +[linkerd-await](https://github.com/linkerd/linkerd-await) utility. An +application that is called this way (e.g. via `linkerd-await -S $MYAPP`) will +automatically call the proxy's `/shutdown` endpoint when it completes. + +For security reasons, the proxy's `/shutdown` endpoint is disabled by default. +In order to be able to manually shutdown the proxy, you must enable this +endpoint by installing Linkerd with the `--set proxy.enableShutdownEndpoint=true` +flag. diff --git a/linkerd.io/content/2.16/tasks/grafana.md b/linkerd.io/content/2.16/tasks/grafana.md new file mode 100644 index 0000000000..b2f734e0eb --- /dev/null +++ b/linkerd.io/content/2.16/tasks/grafana.md @@ -0,0 +1,111 @@ ++++ +title = "Grafana" +description = "Grafana install instructions and how to link it with the Linkerd Dashboard" ++++ + +Linkerd provides a full [on-cluster metrics stack](../../features/dashboard/) +that can be leveraged by a Prometheus instance and subsequently by a Grafana +instance, in order to show both the real-time and historical behavior of these +metrics. + +First, you need to install Grafana from a variety of possible sources, and then +load the suite of Grafana dashboards that have been pre-configured to consume +the metrics exposed by Linkerd. + +{{< trylpt >}} + +## Install Prometheus + +Before installing Grafana, make sure you have a working instance of Prometheus +properly configured to consume Linkerd metrics. The Linkerd Viz extension comes +with such a pre-configured Prometheus instance, but you can also [bring your own +Prometheus](../external-prometheus/). + +## Install Grafana + +The easiest and recommended way is to install Grafana's official Helm chart: + +```bash +helm repo add grafana https://grafana.github.io/helm-charts +helm install grafana -n grafana --create-namespace grafana/grafana \ + -f https://raw.githubusercontent.com/linkerd/linkerd2/main/grafana/values.yaml +``` + +This is fed the default `values.yaml` file, which configures as a default +datasource Linkerd Viz' Prometheus instance, sets up a reverse proxy (more on +that later), and pre-loads all the Linkerd Grafana dashboards that are published +on . + +{{< note >}} +The access to Linkerd Viz' Prometheus instance is restricted through the +`prometheus-admin` AuthorizationPolicy, granting access only to the +`metrics-api` ServiceAccount. In order to also grant access to Grafana, you need +to add an AuthorizationPolicy pointing to its ServiceAccount. You can apply +[authzpolicy-grafana.yaml](https://github.com/linkerd/linkerd2/blob/release/stable-2.13/grafana/authzpolicy-grafana.yaml) +which grants permission for the `grafana` ServiceAccount. +{{< /note >}} + +A more complex and production-oriented source is the [Grafana +Operator](https://github.com/grafana-operator/grafana-operator). And there are +also hosted solutions such as [Grafana +Cloud](https://grafana.com/products/cloud/). Those projects provide instructions +on how to easily import the same charts published on +. + +{{< note >}} +Grafana's official Helm chart uses an initContainer to download Linkerd's +configuration and dashboards. If you use the CNI plugin, when you add grafana's +pod into the mesh its initContainer will run before the proxy is started and the +traffic cannot flow. +You should either avoid meshing grafana's pod, skip outbound port 443 via +`config.linkerd.io/skip-outbound-ports: "443"` annotation or run the container +with the proxy's UID. +See [Allowing initContainer networking](https://linkerd.io/2.12/features/cni/#allowing-initcontainer-networking) +{{< /note >}} + +## Hook Grafana with Linkerd Viz Dashboard + +It's easy to configure Linkerd Viz dashboard and Grafana such that the former +displays Grafana icons in all the relevant items, providing direct links to the +appropriate Grafana Dashboards. For example, when looking at a list of +deployments for a given namespace, you'll be able to go straight into the +Linkerd Deployments Grafana dashboard providing the same (and more) metrics +(plus their historical behavior). + +### In-cluster Grafana instances + +In the case of in-cluster Grafana instances (such as as the one from the Grafana +Helm chart or the Grafana Operator mentioned above), make sure a reverse proxy +is set up, as shown in the sample `grafana/values.yaml` file: + +```yaml +grafana.ini: + server: + root_url: '%(protocol)s://%(domain)s:/grafana/' +``` + +Then refer the location of your Grafana service in the Linkerd Viz `values.yaml` +entry `grafana.url`. For example, if you installed the Grafana official Helm +chart in the `grafana` namespace, you can install Linkerd Viz through the +command line like so: + +```bash +linkerd viz install --set grafana.url=grafana.grafana:3000 \ + | kubectl apply -f - +``` + +### Off-cluster Grafana instances + +If you're using a hosted solution like Grafana Cloud, after having imported the +Linkerd dashboards, you need to enter the full URL of the Grafana service in the +Linkerd Viz `values.yaml` entry `grafana.externalUrl`: + +```bash +linkerd viz install --set grafana.externalUrl=https://your-co.grafana.net/ \ + | kubectl apply -f - +``` + +If that single Grafana instance is pointing to multiple Linkerd installations, +you can segregate the dashboards through different prefixes in their UIDs, which +you would configure in the `grafana.uidPrefix` setting for each Linkerd +instance. diff --git a/linkerd.io/content/2.16/tasks/install-helm.md b/linkerd.io/content/2.16/tasks/install-helm.md new file mode 100644 index 0000000000..af0249a009 --- /dev/null +++ b/linkerd.io/content/2.16/tasks/install-helm.md @@ -0,0 +1,146 @@ ++++ +title = "Installing Linkerd with Helm" +description = "Install Linkerd onto your Kubernetes cluster using Helm." ++++ + +Linkerd can be installed via Helm rather than with the `linkerd install` +command. This is recommended for production, since it allows for repeatability. + +{{< releases >}} + +## Prerequisite: generate mTLS certificates + +To do [automatic mutual TLS](../../features/automatic-mtls/), Linkerd requires +trust anchor certificate and an issuer certificate and key pair. When you're +using `linkerd install`, we can generate these for you. However, for Helm, you +will need to generate these yourself. + +Please follow the instructions in +[Generating your own mTLS root certificates](../generate-certificates/) to +generate these. + +## Helm install procedure + +```bash +# Add the Helm repo for Linkerd edge releases: +helm repo add linkerd-edge https://helm.linkerd.io/edge +``` + +You need to install two separate charts in succession: first `linkerd-crds` and +then `linkerd-control-plane`. + +{{< note >}} If installing Linkerd in a cluster that uses Cilium in kube-proxy +replacement mode, additional steps may be needed to ensure service discovery +works as intended. Instrunctions are on the +[Cilium cluster configuration](../../reference/cluster-configuration/#cilium) +page. {{< /note >}} + +### linkerd-crds + +The `linkerd-crds` chart sets up the CRDs linkerd requires: + +```bash +helm install linkerd-crds linkerd-edge/linkerd-crds \ + -n linkerd --create-namespace +``` + +{{< note >}} This will create the `linkerd` namespace. If it already exists or +you're creating it beforehand elsewhere in your pipeline, just omit the +`--create-namespace` flag. {{< /note >}} + +{{< note >}} If you are using [Linkerd's CNI plugin](../../features/cni/), you +must also add the `--set cniEnabled=true` flag to your `helm install` command. +{{< /note >}} + +### linkerd-control-plane + +The `linkerd-control-plane` chart sets up all the control plane components: + +```bash +helm install linkerd-control-plane \ + -n linkerd \ + --set-file identityTrustAnchorsPEM=ca.crt \ + --set-file identity.issuer.tls.crtPEM=issuer.crt \ + --set-file identity.issuer.tls.keyPEM=issuer.key \ + linkerd-edge/linkerd-control-plane +``` + +{{< note >}} If you are using [Linkerd's CNI plugin](../../features/cni/), you +must also add the `--set cniEnabled=true` flag to your `helm install` command. +{{< /note >}} + +## Enabling high availability mode + +The `linkerd-control-plane` chart contains a file `values-ha.yaml` that +overrides some default values to set things up under a high-availability +scenario, analogous to the `--ha` option in `linkerd install`. Values such as +higher number of replicas, higher memory/cpu limits, and affinities are +specified in those files. + +You can get `values-ha.yaml` by fetching the chart file: + +```bash +helm fetch --untar linkerd-edge/linkerd-control-plane +``` + +Then use the `-f` flag to provide this override file. For example: + +```bash +helm install linkerd-control-plane \ + -n linkerd \ + --set-file identityTrustAnchorsPEM=ca.crt \ + --set-file identity.issuer.tls.crtPEM=issuer.crt \ + --set-file identity.issuer.tls.keyPEM=issuer.key \ + -f linkerd-control-plane/values-ha.yaml \ + linkerd-edge/linkerd-control-plane +``` + +## Upgrading with Helm + +First, make sure your local Helm repos are updated: + +```bash +helm repo update + +helm search repo linkerd +NAME CHART VERSION APP VERSION DESCRIPTION +linkerd-edge/linkerd-crds Linkerd gives you observability, reliability, and securit... +linkerd-edge/linkerd-control-plane {{% latestedge %}} Linkerd gives you observability, reliability, and securit... +``` + +During an upgrade, you must choose whether you want to reuse the values in the +chart or move to the values specified in the newer chart. Our advice is to use a +`values.yaml` file that stores all custom overrides that you have for your +chart. + +The `helm upgrade` command has a number of flags that allow you to customize its +behavior. Special attention should be paid to `--reuse-values` and +`--reset-values` and how they behave when charts change from version to version +and/or overrides are applied through `--set` and `--set-file`. For example: + +- `--reuse-values` with no overrides - all values are reused +- `--reuse-values` with overrides - all except the values that are overridden + are reused +- `--reset-values` with no overrides - no values are reused and all changes from + provided release are applied during the upgrade +- `--reset-values` with overrides - no values are reused and changed from + provided release are applied together with the overrides +- no flag and no overrides - `--reuse-values` will be used by default +- no flag and overrides - `--reset-values` will be used by default + +Finally, before upgrading, you can consult the +[edge chart](https://artifacthub.io/packages/helm/linkerd2-edge/linkerd-control-plane#values) +docs to check whether there are breaking changes to the chart (i.e. +renamed or moved keys, etc). If there are, make the corresponding changes to +your `values.yaml` file. Then you can use: + +```bash +# the linkerd-crds chart currently doesn't have a values.yaml file +helm upgrade linkerd-crds linkerd-edge/linkerd-crds + +# whereas linkerd-control-plane does +helm upgrade linkerd-control-plane linkerd-edge/linkerd-control-plane --reset-values -f values.yaml --atomic +``` + +The `--atomic` flag will ensure that all changes are rolled back in case the +upgrade operation fails. diff --git a/linkerd.io/content/2.16/tasks/install.md b/linkerd.io/content/2.16/tasks/install.md new file mode 100644 index 0000000000..143255c109 --- /dev/null +++ b/linkerd.io/content/2.16/tasks/install.md @@ -0,0 +1,119 @@ ++++ +title = "Installing Linkerd" +description = "Install Linkerd onto your Kubernetes cluster." +aliases = [ + "../upgrading/", + "../installing/", + "../rbac/" +] ++++ + +Before you can use Linkerd, you'll need to install the [control +plane](../../reference/architecture/#control-plane). This page covers how to +accomplish that. + +{{< note >}} + +The Linkerd project itself only produces [edge release](/releases/) artifacts. +(For more information about the different kinds of Linkerd releases, see the +[Releases and Versions](/releases/) page.) + +As such, this page contains instructions for installing the latest edge +release of Linkerd. If you are using a [stable +distribution](/releases/#stable) of Linkerd, the vendor should provide +additional guidance on installing Linkerd. + +{{< /note >}} + +Linkerd's control plane can be installed in two ways: with the CLI and with +Helm. The CLI is convenient and easy, but for production use cases we recommend +Helm which allows for repeatability. + +In either case, we recommend installing the CLI itself so that you can validate +the success of the installation. See the [Getting Started +Guide](../../getting-started/) for how to install the CLI if you haven't done +this already. + +## Requirements + +Linkerd requires a Kubernetes cluster on which to run. Where this cluster lives +is not important: it might be hosted on a cloud provider, may be running on your +local machine, or even somewhere else. + +Make sure that your Linkerd version and Kubernetes version are compatible by +checking Linkerd's [supported Kubernetes +versions](../../reference/k8s-versions/). + +Before installing the control plane, validate that this Kubernetes cluster is +configured appropriately for Linkerd by running: + +```bash +linkerd check --pre +``` + +Be sure to address any issues that the checks identify before proceeding. + +{{< note >}} +If installing Linkerd on GKE, there are some extra steps required depending on +how your cluster has been configured. If you are using any of these features, +check out the additional instructions on [GKE private +clusters](../../reference/cluster-configuration/#private-clusters) +{{< /note >}} + +{{< note >}} +If installing Linkerd in a cluster that uses Cilium in kube-proxy replacement +mode, additional steps may be needed to ensure service discovery works as +intended. Instrunctions are on the [Cilium cluster +configuration](../../reference/cluster-configuration/#cilium) page. +{{< /note >}} + +## Installing with the CLI + +Once you have a cluster ready, installing Linkerd is as easy as running `linkerd +install --crds`, which installs the Linkerd CRDs, followed by `linkerd install`, +which installs the Linkerd control plane. Both of these commands generate +Kubernetes manifests, which can be applied to your cluster to install Linkerd. + +For example: + +```bash +# install the CRDs first +linkerd install --crds | kubectl apply -f - + +# install the Linkerd control plane once the CRDs have been installed +linkerd install | kubectl apply -f - +``` + +This basic installation should work for most cases. However, there are some +configuration options are provided as flags for `install`. See the [CLI +reference documentation](../../reference/cli/install/) for a complete list of +options. You can also use [tools like Kustomize](../customize-install/) to +programmatically alter this manifest. + +## Installing via Helm + +To install Linkerd with Helm (recommended for production installations), +see the [Installing Linkerd with Helm](../install-helm/). + +## Verification + +After installation (whether CLI or Helm) you can validate that Linkerd is in a +good state running: + +```bash +linkerd check +``` + +## Next steps + +Once you've installed the control plane, you may want to install some +extensions, such as `viz`, `multicluster` and `jaeger`. See [Using +extensions](../extensions/) for how to install them. + +Finally, once the control plane is installed, you'll need to "mesh" any services +you want Linkerd active for. See [Adding your services to +Linkerd](../../adding-your-service/) for how to do this. + +## Uninstalling the control plane + +See [Uninstalling Linkerd](../uninstall/). diff --git a/linkerd.io/content/2.16/tasks/installing-multicluster.md b/linkerd.io/content/2.16/tasks/installing-multicluster.md new file mode 100644 index 0000000000..02dad38e94 --- /dev/null +++ b/linkerd.io/content/2.16/tasks/installing-multicluster.md @@ -0,0 +1,238 @@ ++++ +title = "Installing Multi-cluster Components" +description = "Allow Linkerd to manage cross-cluster communication." ++++ + +Multicluster support in Linkerd requires extra installation and configuration on +top of the default [control plane installation](../install/). This guide +walks through this installation and configuration as well as common problems +that you may encounter. For a detailed walkthrough and explanation of what's +going on, check out [getting started](../multicluster/). + +{{< trylpt >}} + +## Requirements + +- Two clusters. +- A [control plane installation](../install/) in each cluster that shares + a common + [trust anchor](../generate-certificates/#trust-anchor-certificate). + If you have an existing installation, see the + [trust anchor bundle](../installing-multicluster/#trust-anchor-bundle) + documentation to understand what is required. +- Each of these clusters should be configured as `kubectl` + [contexts](https://kubernetes.io/docs/tasks/access-application-cluster/configure-access-multiple-clusters/). +- Elevated privileges on both clusters. We'll be creating service accounts and + granting extended privileges, so you'll need to be able to do that on your + test clusters. +- Support for services of type `LoadBalancer` in the `east` cluster. Check out + the documentation for your cluster provider or take a look at + [inlets](https://blog.alexellis.io/ingress-for-your-local-kubernetes-cluster/). + This is what the `west` cluster will use to communicate with `east` via the + gateway. + +## Step 1: Install the multicluster control plane + +On each cluster, run: + +```bash +linkerd multicluster install | \ + kubectl apply -f - +``` + +To verify that everything has started up successfully, run: + +```bash +linkerd multicluster check +``` + +For a deep dive into what components are being added to your cluster and how all +the pieces fit together, check out the +[getting started documentation](../multicluster/#preparing-your-cluster). + +## Step 2: Link the clusters + +Each cluster must be linked. This consists of installing several resources in +the source cluster including a secret containing a kubeconfig that allows access +to the target cluster Kubernetes API, a service mirror control for mirroring +services, and a Link custom resource for holding configuration. To link cluster +`west` to cluster `east`, you would run: + +```bash +linkerd --context=east multicluster link --cluster-name east | + kubectl --context=west apply -f - +``` + +To verify that the credentials were created successfully and the clusters are +able to reach each other, run: + +```bash +linkerd --context=west multicluster check +``` + +You should also see the list of gateways show up by running. Note that you'll +need Linkerd's Viz extension to be installed in the source cluster to get the +list of gateways: + +```bash +linkerd --context=west multicluster gateways +``` + +For a detailed explanation of what this step does, check out the +[linking the clusters section](../multicluster/#linking-the-clusters). + +## Step 3: Export services + +Services are not automatically mirrored in linked clusters. By default, only +services with the `mirror.linkerd.io/exported` label will be mirrored. For each +service you would like mirrored to linked clusters, run: + +```bash +kubectl label svc foobar mirror.linkerd.io/exported=true +``` + +{{< note >}} You can configure a different label selector by using the +`--selector` flag on the `linkerd multicluster link` command or by editing +the Link resource created by the `linkerd multicluster link` command. +{{< /note >}} + +## Trust Anchor Bundle + +To secure the connections between clusters, Linkerd requires that there is a +shared trust anchor. This allows the control plane to encrypt the requests that +go between clusters and verify the identity of those requests. This identity is +used to control access to clusters, so it is critical that the trust anchor is +shared. + +The easiest way to do this is to have a single trust anchor certificate shared +between multiple clusters. If you have an existing Linkerd installation and have +thrown away the trust anchor key, it might not be possible to have a single +certificate for the trust anchor. Luckily, the trust anchor can be a bundle of +certificates as well! + +To fetch your existing cluster's trust anchor, run: + +```bash +kubectl -n linkerd get cm linkerd-config -ojsonpath="{.data.values}" | \ + yq e .identityTrustAnchorsPEM - > trustAnchor.crt +``` + +{{< note >}} This command requires [yq](https://github.com/mikefarah/yq). If you +don't have yq, feel free to extract the certificate from the `identityTrustAnchorsPEM` +field with your tool of choice. +{{< /note >}} + +Now, you'll want to create a new trust anchor and issuer for the new cluster: + +```bash +step certificate create root.linkerd.cluster.local root.crt root.key \ + --profile root-ca --no-password --insecure +step certificate create identity.linkerd.cluster.local issuer.crt issuer.key \ + --profile intermediate-ca --not-after 8760h --no-password --insecure \ + --ca root.crt --ca-key root.key +``` + +{{< note >}} We use the [step cli](https://smallstep.com/cli/) to generate +certificates. `openssl` works just as well! {{< /note >}} + +With the old cluster's trust anchor and the new cluster's trust anchor, you can +create a bundle by running: + +```bash +cat trustAnchor.crt root.crt > bundle.crt +``` + +You'll want to upgrade your existing cluster with the new bundle. Make sure +every pod you'd like to have talk to the new cluster is restarted so that it can +use this bundle. To upgrade the existing cluster with this new trust anchor +bundle, run: + +```bash +linkerd upgrade --identity-trust-anchors-file=./bundle.crt | \ + kubectl apply -f - +``` + +Finally, you'll be able to install Linkerd on the new cluster by using the trust +anchor bundle that you just created along with the issuer certificate and key. + +```bash +# first, install the Linkerd CRDs on the new cluster +linkerd install --crds | kubectl apply -f - + +# then, install the Linkerd control plane, using the key material we created +linkerd install \ + --identity-trust-anchors-file bundle.crt \ + --identity-issuer-certificate-file issuer.crt \ + --identity-issuer-key-file issuer.key | \ + kubectl apply -f - +``` + +Make sure to verify that the cluster's have started up successfully by running +`check` on each one. + +```bash +linkerd check +``` + +## Installing the multicluster control plane components through Helm + +Linkerd's multicluster components i.e Gateway and Service Mirror can +be installed via Helm rather than the `linkerd multicluster install` command. + +This not only allows advanced configuration, but also allows users to bundle the +multicluster installation as part of their existing Helm based installation +pipeline. + +### Adding Linkerd's Helm repository + +First, let's add the Linkerd's Helm repository by running + +```bash +# To add the repo for Linkerd stable releases: +helm repo add linkerd https://helm.linkerd.io/stable +``` + +### Helm multicluster install procedure + +```bash +helm install linkerd-multicluster -n linkerd-multicluster --create-namespace linkerd/linkerd-multicluster +``` + +The chart values will be picked from the chart's `values.yaml` file. + +You can override the values in that file by providing your own `values.yaml` +file passed with a `-f` option, or overriding specific values using the family of +`--set` flags. + +Full set of configuration options can be found [here](https://github.com/linkerd/linkerd2/tree/main/multicluster/charts/linkerd-multicluster#values) + +The installation can be verified by running + +```bash +linkerd multicluster check +``` + +Installation of the gateway can be disabled with the `gateway` setting. By +default this value is true. + +### Installing additional access credentials + +When the multicluster components are installed onto a target cluster with +`linkerd multicluster install`, a service account is created which source clusters +will use to mirror services. Using a distinct service account for each source +cluster can be beneficial since it gives you the ability to revoke service mirroring +access from specific source clusters. Generating additional service accounts +and associated RBAC can be done using the `linkerd multicluster allow` command +through the CLI. + +The same functionality can also be done through Helm setting the +`remoteMirrorServiceAccountName` value to a list. + +```bash + helm install linkerd-mc-source linkerd/linkerd-multicluster -n linkerd-multicluster --create-namespace \ + --set remoteMirrorServiceAccountName={source1\,source2\,source3} --kube-context target +``` + +Now that the multicluster components are installed, operations like linking, etc +can be performed by using the linkerd CLI's multicluster sub-command as per the +[multicluster task](../../features/multicluster/). diff --git a/linkerd.io/content/2.16/tasks/linkerd-smi.md b/linkerd.io/content/2.16/tasks/linkerd-smi.md new file mode 100644 index 0000000000..875a5aab63 --- /dev/null +++ b/linkerd.io/content/2.16/tasks/linkerd-smi.md @@ -0,0 +1,218 @@ ++++ +title = "Getting started with Linkerd SMI extension" +description = "Use Linkerd SMI extension to work with Service Mesh Interface(SMI) resources." ++++ + +[Service Mesh Interface](https://smi-spec.io/) is a standard interface for +service meshes on Kubernetes. It defines a set of resources that could be +used across service meshes that implement it. +You can read more about it in the [specification](https://github.com/servicemeshinterface/smi-spec) + +Currently, Linkerd supports SMI's `TrafficSplit` specification which can be +used to perform traffic splitting across services natively. This means that +you can apply the SMI resources without any additional +components/configuration but this obviously has some downsides, as +Linkerd may not be able to add extra specific configurations specific to it, +as SMI is more like a lowest common denominator of service mesh functionality. + +To get around these problems, Linkerd can instead have an adaptor that converts +SMI specifications into native Linkerd configurations that it can understand +and perform the operation. This also removes the extra native coupling with SMI +resources with the control-plane, and the adaptor can move independently and +have it's own release cycle. [Linkerd SMI](https://www.github.com/linkerd/linkerd-smi) +is an extension that does just that. + +This guide will walk you through installing the SMI extension and configuring +a `TrafficSplit` specification, to perform Traffic Splitting across services. + +## Prerequisites + +- To use this guide, you'll need to have Linkerd installed on your cluster. + Follow the [Installing Linkerd Guide](../install/) if you haven't + already done this. + +## Install the Linkerd-SMI extension + +### CLI + +Install the SMI extension CLI binary by running: + +```bash +curl -sL https://linkerd.github.io/linkerd-smi/install | sh +``` + +Alternatively, you can download the CLI directly via the [releases page](https://github.com/linkerd/linkerd-smi/releases). + +The first step is installing the Linkerd-SMI extension onto your cluster. +This extension consists of a SMI-Adaptor which converts SMI resources into +native Linkerd resources. + +To install the Linkerd-SMI extension, run the command: + +```bash +linkerd smi install | kubectl apply -f - +``` + +You can verify that the Linkerd-SMI extension was installed correctly by +running: + +```bash +linkerd smi check +``` + +### Helm + +To install the `linkerd-smi` Helm chart, run: + +```bash +helm repo add l5d-smi https://linkerd.github.io/linkerd-smi +helm install l5d-smi/linkerd-smi --generate-name +``` + +## Install Sample Application + +First, let's install the sample application. + +```bash +# create a namespace for the sample application +kubectl create namespace trafficsplit-sample + +# install the sample application +linkerd inject https://raw.githubusercontent.com/linkerd/linkerd2/main/test/integration/viz/trafficsplit/testdata/application.yaml | kubectl -n trafficsplit-sample apply -f - +``` + +This installs a simple client, and two server deployments. +One of the server deployments i.e `failing-svc` always returns a 500 error, +and the other one i.e `backend-svc` always returns a 200. + +```bash +kubectl get deployments -n trafficsplit-sample +NAME READY UP-TO-DATE AVAILABLE AGE +backend 1/1 1 1 2m29s +failing 1/1 1 1 2m29s +slow-cooker 1/1 1 1 2m29s +``` + +By default, the client will hit the `backend-svc`service. This is evident by +the `edges` sub command. + +```bash +linkerd viz edges deploy -n trafficsplit-sample +SRC DST SRC_NS DST_NS SECURED +prometheus backend linkerd-viz trafficsplit-sample √ +prometheus failing linkerd-viz trafficsplit-sample √ +prometheus slow-cooker linkerd-viz trafficsplit-sample √ +slow-cooker backend trafficsplit-sample trafficsplit-sample √ +``` + +## Configuring a TrafficSplit + +Now, Let's apply a `TrafficSplit` resource to perform Traffic Splitting on the +`backend-svc` to distribute load between it and the `failing-svc`. + +```bash +kubectl apply -f - < +Annotations: +API Version: linkerd.io/v1alpha2 +Kind: ServiceProfile +Metadata: + Creation Timestamp: 2021-08-02T12:42:52Z + Generation: 1 + Managed Fields: + API Version: linkerd.io/v1alpha2 + Fields Type: FieldsV1 + fieldsV1: + f:spec: + .: + f:dstOverrides: + Manager: smi-adaptor + Operation: Update + Time: 2021-08-02T12:42:52Z + Resource Version: 3542 + UID: cbcdb74f-07e0-42f0-a7a8-9bbcf5e0e54e +Spec: + Dst Overrides: + Authority: backend-svc.trafficsplit-sample.svc.cluster.local + Weight: 500 + Authority: failing-svc.trafficsplit-sample.svc.cluster.local + Weight: 500 +Events: +``` + +As we can see, A relevant `ServiceProfile` with `DstOverrides` has +been created to perform the TrafficSplit. + +The Traffic Splitting can be verified by running the `edges` command. + +```bash +linkerd viz edges deploy -n trafficsplit-sample +SRC DST SRC_NS DST_NS SECURED +prometheus backend linkerd-viz trafficsplit-sample √ +prometheus failing linkerd-viz trafficsplit-sample √ +prometheus slow-cooker linkerd-viz trafficsplit-sample √ +slow-cooker backend trafficsplit-sample trafficsplit-sample √ +slow-cooker failing trafficsplit-sample trafficsplit-sample √ +``` + +This can also be verified by running `stat` sub command on the `TrafficSplit` +resource. + +```bash +linkerd viz stat ts/backend-split -n traffic-sample +NAME APEX LEAF WEIGHT SUCCESS RPS LATENCY_P50 LATENCY_P95 LATENCY_P99 +backend-split backend-svc backend-svc 500 100.00% 0.5rps 1ms 1ms 1ms +backend-split backend-svc failing-svc 500 0.00% 0.5rps 1ms 1ms 1ms +``` + +This can also be verified by checking the `smi-adaptor` logs. + +```bash +kubectl -n linkerd-smi logs deploy/smi-adaptor smi-adaptor +time="2021-08-04T11:04:35Z" level=info msg="Using cluster domain: cluster.local" +time="2021-08-04T11:04:35Z" level=info msg="Starting SMI Controller" +time="2021-08-04T11:04:35Z" level=info msg="Waiting for informer caches to sync" +time="2021-08-04T11:04:35Z" level=info msg="starting admin server on :9995" +time="2021-08-04T11:04:35Z" level=info msg="Starting workers" +time="2021-08-04T11:04:35Z" level=info msg="Started workers" +time="2021-08-04T11:05:17Z" level=info msg="created serviceprofile/backend-svc.trafficsplit-sample.svc.cluster.local for trafficsplit/backend-split" +time="2021-08-04T11:05:17Z" level=info msg="Successfully synced 'trafficsplit-sample/backend-split'" +``` + +## Cleanup + +Delete the `trafficsplit-sample` resource by running + +```bash +kubectl delete namespace/trafficsplit-sample +``` + +### Conclusion + +Though, Linkerd currently supports reading `TrafficSplit` resources directly +`ServiceProfiles` would always take a precedence over `TrafficSplit` resources. The +support for `TrafficSplit` resource will be removed in a further release at which +the `linkerd-smi` extension would be necessary to use `SMI` resources with Linkerd. diff --git a/linkerd.io/content/2.16/tasks/manually-rotating-control-plane-tls-credentials.md b/linkerd.io/content/2.16/tasks/manually-rotating-control-plane-tls-credentials.md new file mode 100644 index 0000000000..364f4aab67 --- /dev/null +++ b/linkerd.io/content/2.16/tasks/manually-rotating-control-plane-tls-credentials.md @@ -0,0 +1,353 @@ ++++ +title = "Manually Rotating Control Plane TLS Credentials" +description = "Update Linkerd's TLS trust anchor and issuer certificate." +aliases = [ "rotating_identity_certificates" ] ++++ + +Linkerd's [automatic mTLS](../../features/automatic-mtls/) feature uses a set of +TLS credentials to generate TLS certificates for proxies: a trust anchor, and +an issuer certificate and private key. The trust anchor has a limited period of +validity: 365 days if generated by `linkerd install`, or a customized value if +[generated manually](../generate-certificates/). + +Thus, for clusters that are expected to outlive this lifetime, you must +manually rotate the trust anchor. In this document, we describe how to +accomplish this without downtime. + +Independent of the trust anchor, the issuer certificate and key pair can also +expire (though it is possible to [use `cert-manager` to set up automatic +rotation](../automatically-rotating-control-plane-tls-credentials/). This +document also covers how to rotate the issuer certificate and key pair without +downtime. + +## Prerequisites + +These instructions use the following CLI tools: + +- [`step`](https://smallstep.com/cli/) to manipulate certificates and keys; + +## Understanding the current state of your system + +Begin by running: + +```bash +linkerd check --proxy +``` + +If your configuration is valid and your credentials are not expiring soon, you +should see output similar to: + +```text +linkerd-identity +---------------- +√ certificate config is valid +√ trust roots are using supported crypto algorithm +√ trust roots are within their validity period +√ trust roots are valid for at least 60 days +√ issuer cert is using supported crypto algorithm +√ issuer cert is within its validity period +√ issuer cert is valid for at least 60 days +√ issuer cert is issued by the trust root + +linkerd-identity-data-plane +--------------------------- +√ data plane proxies certificate match CA +``` + +However, if you see a message warning you that your trust anchor ("trust root") +or issuer certificates are expiring soon, then you must rotate them. + +**Note that this document only applies if the trust anchor is currently valid.** +If your trust anchor has expired, follow the [Replacing Expired Certificates Guide](../replacing_expired_certificates/) +instead. (If your issuer certificate has expired but your trust anchor is still +valid, continue on with this document.) + +For example, if your issuer certificate has expired, you will see a message +similar to: + +```text +linkerd-identity +---------------- +√ certificate config is valid +√ trust roots are using supported crypto algorithm +√ trust roots are within their validity period +√ trust roots are valid for at least 60 days +√ issuer cert is using supported crypto algorithm +× issuer cert is within its validity period +issuer certificate is not valid anymore. Expired on 2019-12-19T09:02:01Z +see https://linkerd.io/checks/#l5d-identity-issuer-cert-is-time-valid for hints +``` + +If your trust anchor has expired, you will see a message similar to: + +```text +linkerd-identity +---------------- +√ certificate config is valid +√ trust roots are using supported crypto algorithm +× trust roots are within their validity period +Invalid roots: +* 79461543992952791393769540277800684467 identity.linkerd.cluster.local not valid anymore. Expired on 2019-12-19T09:11:30Z +see https://linkerd.io/checks/#l5d-identity-roots-are-time-valid for hints +``` + +## Rotating the trust anchor + +Rotating the trust anchor without downtime is a multi-step process: you must +generate a new trust anchor, bundle it with the old one, rotate the issuer +certificate and key pair, and finally remove the old trust anchor from the +bundle. If you simply need to rotate the issuer certificate and key pair, you +can skip directly to [Rotating the identity issuer +certificate](#rotating-the-identity-issuer-certificate) and ignore the trust +anchor rotation steps. + +## Read the current trust anchor certificate from the cluster + +To avoid downtime, you need to bundle the existing trust anchor certificate +with the newly-generated trust anchor certificate into a certificate bundle: +using the bundle allows workloads ultimately signed with either trust anchor +to work properly in the mesh. Since certificates are not sensitive information, +we can simply pull the existing trust anchor certificate directly from the +cluster. + +The following command uses `kubectl` to fetch the Linkerd config from the +`linkerd-identity-trust-roots` ConfigMap and save it in `original-trust.crt`: + +```bash +kubectl -n linkerd get cm linkerd-identity-trust-roots -o=jsonpath='{.data.ca-bundle\.crt}' > original-trust.crt +``` + +## Generate a new trust anchor + +After saving the current trust anchor certificate, generate a new trust anchor +certificate and private key: + +```bash +step certificate create root.linkerd.cluster.local ca-new.crt ca-new.key --profile root-ca --no-password --insecure +``` + +Note that we use `--no-password --insecure` to avoid encrypting these files +with a passphrase. Store the private key somewhere secure so that it can be +used in the future to [generate new issuer certificates](../generate-certificates/). + +## Bundle your original trust anchor with the new one + +Next, we need to bundle the trust anchor currently used by Linkerd together with +the new anchor. We use `step` to combine the two certificates into one bundle: + +```bash +step certificate bundle ca-new.crt original-trust.crt bundle.crt +``` + +If desired, you can `rm original-trust.crt` too. + +## Deploying the new bundle to Linkerd + +At this point you can use the `linkerd upgrade` command to instruct Linkerd to +work with the new trust bundle: + +```bash +linkerd upgrade --identity-trust-anchors-file=./bundle.crt | kubectl apply -f - +``` + +or you can also use the `helm upgrade` command: + +```bash +helm upgrade linkerd-control-plane --set-file identityTrustAnchorsPEM=./bundle.crt +``` + +Once this is done, you'll need to restart your meshed workloads so that they use +the new trust anchor. For example, doing that for the `emojivoto` namespace would +look like: + +```bash +kubectl -n emojivoto rollout restart deploy +``` + +Now you can run the `check` command to ensure that everything is ok: + +```bash +linkerd check --proxy +``` + +You might have to wait a few moments until all the pods have been restarted and +are configured with the correct trust anchor. Meanwhile you might observe warnings: + +```text +linkerd-identity +---------------- +√ certificate config is valid +√ trust roots are using supported crypto algorithm +√ trust roots are within their validity period +√ trust roots are valid for at least 60 days +√ issuer cert is using supported crypto algorithm +√ issuer cert is within its validity period +‼ issuer cert is valid for at least 60 days + issuer certificate will expire on 2019-12-19T09:51:19Z + see https://linkerd.io/checks/#l5d-identity-issuer-cert-not-expiring-soon for hints +√ issuer cert is issued by the trust root + +linkerd-identity-data-plane +--------------------------- +‼ data plane proxies certificate match CA + Some pods do not have the current trust bundle and must be restarted: + * emojivoto/emoji-d8d7d9c6b-8qwfx + * emojivoto/vote-bot-588499c9f6-zpwz6 + * emojivoto/voting-8599548fdc-6v64k + * emojivoto/web-67c7599f6d-xx98n + * linkerd/linkerd-sp-validator-75f9d96dc-rch4x + * linkerd/linkerd-tap-68d8bbf64-mpzgb + * linkerd/linkerd-web-849f74b7c6-qlhwc + see https://linkerd.io/checks/#l5d-identity-data-plane-proxies-certs-match-ca for hints +``` + +When the rollout completes, your `check` command should stop warning you that +pods need to be restarted. It may still warn you, however, that your issuer +certificate is about to expire soon: + +```text +linkerd-identity +---------------- +√ certificate config is valid +√ trust roots are using supported crypto algorithm +√ trust roots are within their validity period +√ trust roots are valid for at least 60 days +√ issuer cert is using supported crypto algorithm +√ issuer cert is within its validity period +‼ issuer cert is valid for at least 60 days + issuer certificate will expire on 2019-12-19T09:51:19Z + see https://linkerd.io/checks/#l5d-identity-issuer-cert-not-expiring-soon for hints +√ issuer cert is issued by the trust root + +linkerd-identity-data-plane +--------------------------- +√ data plane proxies certificate match CA +``` + +At this point, all meshed workloads are ready to accept connections signed +by either the old or new trust anchor, but they're all still using certificates +signed by the old trust anchor. To change that, we'll need to rotate the +issuer certificate. + +## Rotating the identity issuer certificate + +To rotate the issuer certificate and key pair, start by generating the new +identity issuer certificate and key: + +```bash +step certificate create identity.linkerd.cluster.local issuer-new.crt issuer-new.key \ +--profile intermediate-ca --not-after 8760h --no-password --insecure \ +--ca ca-new.crt --ca-key ca-new.key +``` + +This new issuer certificate is signed by our new trust anchor, which is why it +was critical to install the new trust anchor bundle (as outlined in the previous +section). Once the new bundle is installed and running `linkerd check` shows all +green checks and no warnings, you can safely rotate the identity issuer certificate +and key by using the `upgrade` command again: + +```bash +linkerd upgrade \ + --identity-issuer-certificate-file=./issuer-new.crt \ + --identity-issuer-key-file=./issuer-new.key \ + | kubectl apply -f - +``` + +or + +```bash +helm upgrade linkerd-control-plane \ + --set-file identity.issuer.tls.crtPEM=./issuer-new.crt \ + --set-file identity.issuer.tls.keyPEM=./issuer-new.key +``` + +At this point you can check for the `IssuerUpdated` Kubernetes event to be certain +that Linkerd saw the new issuer certificate: + +```bash +kubectl get events --field-selector reason=IssuerUpdated -n linkerd + +LAST SEEN TYPE REASON OBJECT MESSAGE +9s Normal IssuerUpdated deployment/linkerd-identity Updated identity issuer +``` + +Restart the proxy for all injected workloads in your cluster to ensure that +their proxies pick up certificates issued by the new issuer: + +```bash +kubectl -n emojivoto rollout restart deploy +``` + +Run the `check` command to make sure that everything is going as expected: + +```bash +linkerd check --proxy +``` + +You should see output without any certificate expiration warnings (unless an +expired trust anchor still needs to be removed): + +```text +linkerd-identity +---------------- +√ certificate config is valid +√ trust roots are using supported crypto algorithm +√ trust roots are within their validity period +√ trust roots are valid for at least 60 days +√ issuer cert is using supported crypto algorithm +√ issuer cert is within its validity period +√ issuer cert is valid for at least 60 days +√ issuer cert is issued by the trust root + +linkerd-identity-data-plane +--------------------------- +√ data plane proxies certificate match CA +``` + +## Removing the old trust anchor + +Since the old trust anchor is now completely unused, we can now switch +Linkerd from the bundle we created for the trust anchor to using only +the new trust anchor certificate: + +```bash +linkerd upgrade --identity-trust-anchors-file=./ca-new.crt | kubectl apply -f - +``` + +or + +```bash +helm upgrade linkerd2 --set-file --set-file identityTrustAnchorsPEM=./ca-new.crt +``` + +Note that the ./ca-new.crt file is the same trust anchor you created at the start +of this process. + +Once again, explicitly restart your meshed workloads: + +```bash +kubectl -n emojivoto rollout restart deploy +linkerd check --proxy +``` + +And, again, the output of the `check` command should not produce any warnings or +errors: + +```text +linkerd-identity +---------------- +√ certificate config is valid +√ trust roots are using supported crypto algorithm +√ trust roots are within their validity period +√ trust roots are valid for at least 60 days +√ issuer cert is using supported crypto algorithm +√ issuer cert is within its validity period +√ issuer cert is valid for at least 60 days +√ issuer cert is issued by the trust root + +linkerd-identity-data-plane +--------------------------- +√ data plane proxies certificate match CA +``` + +Congratulations, you have rotated your trust anchor! 🎉 diff --git a/linkerd.io/content/2.16/tasks/modifying-proxy-log-level.md b/linkerd.io/content/2.16/tasks/modifying-proxy-log-level.md new file mode 100644 index 0000000000..1c6280c3f6 --- /dev/null +++ b/linkerd.io/content/2.16/tasks/modifying-proxy-log-level.md @@ -0,0 +1,45 @@ ++++ +title = "Modifying the Proxy Log Level" +description = "Linkerd proxy log levels can be modified dynamically to assist with debugging." ++++ + +Emitting logs is an expensive operation for a network proxy, and by default, +the Linkerd data plane proxies are configured to only log exceptional events. +However, sometimes it is useful to increase the verbosity of proxy logs to +assist with diagnosing proxy behavior. Happily, Linkerd allows you to modify +these logs dynamically. + +{{< note >}} +The proxy's proxy debug logging is distinct from the proxy HTTP access log, +which is configured separately. See the documentation on [enabling access +logging](../../features/access-logging/) for details on configuring Linkerd +proxies to emit an HTTP access log. +{{< /note >}} + +The log level of a Linkerd proxy can be modified on the fly by using the proxy's +`/proxy-log-level` endpoint on the admin-port. + +For example, to change the proxy log-level of a pod to +`debug`, run +(replace `${POD:?}` or set the environment-variable `POD` with the pod name): + +```sh +kubectl port-forward ${POD:?} linkerd-admin +curl -v --data 'linkerd=debug' -X PUT localhost:4191/proxy-log-level +``` + +whereby `linkerd-admin` is the name of the admin-port (`4191` by default) +of the injected sidecar-proxy. + +The resulting logs can be viewed with `kubectl logs ${POD:?}`. + +If changes to the proxy log level should be retained beyond the lifetime of a +pod, add the `config.linkerd.io/proxy-log-level` annotation to the pod template +(or other options, see reference). + +The syntax of the proxy log level can be found in the +[proxy log level reference](../../reference/proxy-log-level/). + +Note that logging has a noticeable, negative impact on proxy throughput. If the +pod will continue to serve production traffic, you may wish to reset the log +level once you are done. diff --git a/linkerd.io/content/2.16/tasks/multicluster-using-statefulsets.md b/linkerd.io/content/2.16/tasks/multicluster-using-statefulsets.md new file mode 100644 index 0000000000..6fd9b448b1 --- /dev/null +++ b/linkerd.io/content/2.16/tasks/multicluster-using-statefulsets.md @@ -0,0 +1,336 @@ ++++ +title = "Multi-cluster communication with StatefulSets" +description = "cross-cluster communication to and from headless services." ++++ + +Linkerd's multi-cluster extension works by "mirroring" service information +between clusters. Exported services in a target cluster will be mirrored as +`clusterIP` replicas. By default, every exported service will be mirrored as +`clusterIP`. When running workloads that require a headless service, such as +[StatefulSets](https://kubernetes.io/docs/concepts/workloads/controllers/statefulset/), +Linkerd's multi-cluster extension can be configured with support for headless +services to preserve the service type. Exported services that are headless will +be mirrored in a source cluster as headless, preserving functionality such as +DNS record creation and the ability to address an individual pod. + +This guide will walk you through installing and configuring Linkerd and the +multi-cluster extension with support for headless services and will exemplify +how a StatefulSet can be deployed in a target cluster. After deploying, we will +also look at how to communicate with an arbitrary pod from the target cluster's +StatefulSet from a client in the source cluster. For a more detailed overview +on how multi-cluster support for headless services work, check out +[multi-cluster communication](../../features/multicluster/). + +## Prerequisites + +- Two Kubernetes clusters. They will be referred to as `east` and `west` with + east being the "source" cluster and "west" the target cluster respectively. + These can be in any cloud or local environment, this guide will make use of + [k3d](https://github.com/rancher/k3d/releases/tag/v4.1.1) to configure two + local clusters. +- [`smallstep/CLI`](https://github.com/smallstep/cli/releases) to generate + certificates for Linkerd installation. +- [`A recent linkerd release`](https://github.com/linkerd/linkerd2/releases) + (2.11 or older). + +To help with cluster creation and installation, there is a demo repository +available. Throughout the guide, we will be using the scripts from the +repository, but you can follow along without cloning or using the scripts. + +## Install Linkerd multi-cluster with headless support + +To start our demo and see everything in practice, we will go through a +multi-cluster scenario where a pod in an `east` cluster will try to communicate +to an arbitrary pod from a `west` cluster. + +The first step is to clone the demo +repository on your local machine. + +```sh +# clone example repository +$ git clone git@github.com:mateiidavid/l2d-k3d-statefulset.git +$ cd l2d-k3d-statefulset +``` + +The second step consists of creating two `k3d` clusters named `east` and +`west`, where the `east` cluster is the source and the `west` cluster is the +target. When creating our clusters, we need a shared trust root. Luckily, the +repository you have just cloned includes a handful of scripts that will greatly +simplify everything. + +```sh +# create k3d clusters +$ ./create.sh + +# list the clusters +$ k3d cluster list +NAME SERVERS AGENTS LOADBALANCER +east 1/1 0/0 true +west 1/1 0/0 true +``` + +Once our clusters are created, we will install Linkerd and the multi-cluster +extension. Finally, once both are installed, we need to link the two clusters +together so their services may be mirrored. To enable support for headless +services, we will pass an additional `--set "enableHeadlessServices=true` flag +to `linkerd multicluster link`. As before, these steps are automated through +the provided scripts, but feel free to have a look! + +```sh +# Install Linkerd and multicluster, output to check should be a success +$ ./install.sh + +# Next, link the two clusters together +$ ./link.sh +``` + +Perfect! If you've made it this far with no errors, then it's a good sign. In +the next chapter, we'll deploy some services and look at how communication +works. + +## Pod-to-Pod: from east, to west + +With our install steps out of the way, we can now focus on our pod-to-pod +communication. First, we will deploy our pods and services: + +- We will mesh the default namespaces in `east` and `west`. +- In `west`, we will deploy an nginx StatefulSet with its own headless + service, `nginx-svc`. +- In `east`, our script will deploy a `curl` pod that will then be used to + curl the nginx service. + +```sh +# deploy services and mesh namespaces +$ ./deploy.sh + +# verify both clusters +# +# verify east +$ kubectl --context=k3d-east get pods +NAME READY STATUS RESTARTS AGE +curl-56dc7d945d-96r6p 2/2 Running 0 7s + +# verify west has headless service +$ kubectl --context=k3d-west get services +NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE +kubernetes ClusterIP 10.43.0.1 443/TCP 10m +nginx-svc ClusterIP None 80/TCP 8s + +# verify west has statefulset +# +# this may take a while to come up +$ kubectl --context=k3d-west get pods +NAME READY STATUS RESTARTS AGE +nginx-set-0 2/2 Running 0 53s +nginx-set-1 2/2 Running 0 43s +nginx-set-2 2/2 Running 0 36s +``` + +Before we go further, let's have a look at the endpoints object for the +`nginx-svc`: + +```sh +$ kubectl --context=k3d-west get endpoints nginx-svc -o yaml +... +subsets: +- addresses: + - hostname: nginx-set-0 + ip: 10.42.0.31 + nodeName: k3d-west-server-0 + targetRef: + kind: Pod + name: nginx-set-0 + namespace: default + resourceVersion: "114743" + uid: 7049f1c1-55dc-4b7b-a598-27003409d274 + - hostname: nginx-set-1 + ip: 10.42.0.32 + nodeName: k3d-west-server-0 + targetRef: + kind: Pod + name: nginx-set-1 + namespace: default + resourceVersion: "114775" + uid: 60df15fd-9db0-4830-9c8f-e682f3000800 + - hostname: nginx-set-2 + ip: 10.42.0.33 + nodeName: k3d-west-server-0 + targetRef: + kind: Pod + name: nginx-set-2 + namespace: default + resourceVersion: "114808" + uid: 3873bc34-26c4-454d-bd3d-7c783de16304 +``` + +We can see, based on the endpoints object that the service has three endpoints, +with each endpoint having an address (or IP) whose hostname corresponds to a +StatefulSet pod. If we were to do a curl to any of these endpoints directly, we +would get an answer back. We can test this out by applying the curl pod to the +`west` cluster: + +```sh +$ kubectl --context=k3d-west apply -f east/curl.yml +$ kubectl --context=k3d-west get pods +NAME READY STATUS RESTARTS AGE +nginx-set-0 2/2 Running 0 5m8s +nginx-set-1 2/2 Running 0 4m58s +nginx-set-2 2/2 Running 0 4m51s +curl-56dc7d945d-s4n8j 0/2 PodInitializing 0 4s + +$ kubectl --context=k3d-west exec -it curl-56dc7d945d-s4n8j -c curl -- bin/sh +/$ # prompt for curl pod +``` + +If we now curl one of these instances, we will get back a response. + +```sh +# exec'd on the pod +/ $ curl nginx-set-0.nginx-svc.default.svc.west.cluster.local +" + + +Welcome to nginx! + + + +

Welcome to nginx!

+

If you see this page, the nginx web server is successfully installed and +working. Further configuration is required.

+ +

For online documentation and support please refer to +nginx.org.
+Commercial support is available at +nginx.com.

+ +

Thank you for using nginx.

+ +" +``` + +Now, let's do the same, but this time from the `east` cluster. We will first +export the service. + +```sh +$ kubectl --context=k3d-west label service nginx-svc mirror.linkerd.io/exported="true" +service/nginx-svc labeled + +$ kubectl --context=k3d-east get services +NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE +kubernetes ClusterIP 10.43.0.1 443/TCP 20h +nginx-svc-west ClusterIP None 80/TCP 29s +nginx-set-0-west ClusterIP 10.43.179.60 80/TCP 29s +nginx-set-1-west ClusterIP 10.43.218.18 80/TCP 29s +nginx-set-2-west ClusterIP 10.43.245.244 80/TCP 29s +``` + +If we take a look at the endpoints object, we will notice something odd, the +endpoints for `nginx-svc-west` will have the same hostnames, but each hostname +will point to one of the services we see above: + +```sh +$ kubectl --context=k3d-east get endpoints nginx-svc-west -o yaml +subsets: +- addresses: + - hostname: nginx-set-0 + ip: 10.43.179.60 + - hostname: nginx-set-1 + ip: 10.43.218.18 + - hostname: nginx-set-2 + ip: 10.43.245.244 +``` + +This is what we outlined at the start of the tutorial. Each pod from the target +cluster (`west`), will be mirrored as a clusterIP service. We will see in a +second why this matters. + +```sh +$ kubectl --context=k3d-east get pods +NAME READY STATUS RESTARTS AGE +curl-56dc7d945d-96r6p 2/2 Running 0 23m + +# exec and curl +$ kubectl --context=k3d-east exec pod curl-56dc7d945d-96r6p -it -c curl -- bin/sh +# we want to curl the same hostname we see in the endpoints object above. +# however, the service and cluster domain will now be different, since we +# are in a different cluster. +# +/ $ curl nginx-set-0.nginx-svc-west.default.svc.east.cluster.local + + + +Welcome to nginx! + + + +

Welcome to nginx!

+

If you see this page, the nginx web server is successfully installed and +working. Further configuration is required.

+ +

For online documentation and support please refer to +nginx.org.
+Commercial support is available at +nginx.com.

+ +

Thank you for using nginx.

+ + +``` + +As you can see, we get the same response back! But, nginx is in a different +cluster. So, what happened behind the scenes? + + 1. When we mirrored the headless service, we created a clusterIP service for + each pod. Since services create DNS records, naming each endpoint with the + hostname from the target gave us these pod FQDNs + (`nginx-set-0.(...).cluster.local`). + 2. Curl resolved the pod DNS name to an IP address. In our case, this IP + would be `10.43.179.60`. + 3. Once the request is in-flight, the linkerd2-proxy intercepts it. It looks + at the IP address and associates it with our `clusterIP` service. The + service itself points to the gateway, so the proxy forwards the request to + the target cluster gateway. This is the usual multi-cluster scenario. + 4. The gateway in the target cluster looks at the request and looks-up the + original destination address. In our case, since this is an "endpoint + mirror", it knows it has to go to `nginx-set-0.nginx-svc` in the same + cluster. + 5. The request is again forwarded by the gateway to the pod, and the response + comes back. + +And that's it! You can now send requests to pods across clusters. Querying any +of the 3 StatefulSet pods should have the same results. + +{{< note >}} + +To mirror a headless service as headless, the service's endpoints +must also have at least one named address (e.g a hostname for an IP), +otherwise, there will be no endpoints to mirror so the service will be mirrored +as `clusterIP`. A headless service may under normal conditions also be created +without exposing a port; the mulit-cluster service-mirror does not support +this, however, since the lack of ports means we cannot create a service that +passes Kubernetes validation. + +{{< /note >}} + +## Cleanup + +To clean-up, you can remove both clusters entirely using the k3d CLI: + +```sh +$ k3d cluster delete east +cluster east deleted +$ k3d cluster delete west +cluster west deleted +``` diff --git a/linkerd.io/content/2.16/tasks/multicluster.md b/linkerd.io/content/2.16/tasks/multicluster.md new file mode 100644 index 0000000000..0230ff04e3 --- /dev/null +++ b/linkerd.io/content/2.16/tasks/multicluster.md @@ -0,0 +1,536 @@ ++++ +title = "Multi-cluster communication" +description = "Allow Linkerd to manage cross-cluster communication." ++++ + +This guide will walk you through installing and configuring Linkerd so that two +clusters can talk to services hosted on both. There are a lot of moving parts +and concepts here, so it is valuable to read through our +[introduction](../../features/multicluster/) that explains how this works beneath +the hood. By the end of this guide, you will understand how to split traffic +between services that live on different clusters. + +At a high level, you will: + +1. [Install Linkerd and Linkerd Viz](#install-linkerd) on two clusters with a + shared trust anchor. +1. [Prepare](#preparing-your-cluster) the clusters. +1. [Link](#linking-the-clusters) the clusters. +1. [Install](#installing-the-test-services) the demo. +1. [Export](#exporting-the-services) the demo services, to control visibility. +1. [Gain visibility](#visibility) in your linked clusters. +1. [Verify](#security) the security of your clusters. +1. [Split traffic](#traffic-splitting) from pods on the source cluster (`west`) + to the target cluster (`east`) + +## Prerequisites + +- Two clusters. We will refer to them as `east` and `west` in this guide. Follow + along with the + [blog post](/2020/02/25/multicluster-kubernetes-with-service-mirroring/) as + you walk through this guide! The easiest way to do this for development is + running a [kind](https://kind.sigs.k8s.io/docs/user/quick-start/) or + [k3d](https://github.com/rancher/k3d#usage) cluster locally on your laptop and + one remotely on a cloud provider, such as + [AKS](https://azure.microsoft.com/en-us/services/kubernetes-service/). +- Each of these clusters should be configured as `kubectl` + [contexts](https://kubernetes.io/docs/tasks/access-application-cluster/configure-access-multiple-clusters/). + We'd recommend you use the names `east` and `west` so that you can follow + along with this guide. It is easy to + [rename contexts](https://kubernetes.io/docs/reference/generated/kubectl/kubectl-commands#-em-rename-context-em-) + with `kubectl`, so don't feel like you need to keep it all named this way + forever. +- Elevated privileges on both clusters. We'll be creating service accounts and + granting extended privileges, so you'll need to be able to do that on your + test clusters. +- Support for services of type `LoadBalancer` in the `east` cluster. Check out + the documentation for your cluster provider or take a look at + [inlets](https://blog.alexellis.io/ingress-for-your-local-kubernetes-cluster/). + This is what the `west` cluster will use to communicate with `east` via the + gateway. + +## Install Linkerd and Linkerd Viz + +{{< fig + alt="install" + title="Two Clusters" + center="true" + src="/images/multicluster/install.svg" >}} + +Linkerd requires a shared +[trust anchor](../generate-certificates/#trust-anchor-certificate) +to exist between the installations in all clusters that communicate with each +other. This is used to encrypt the traffic between clusters and authorize +requests that reach the gateway so that your cluster is not open to the public +internet. Instead of letting `linkerd` generate everything, we'll need to +generate the credentials and use them as configuration for the `install` +command. + +We like to use the [step](https://smallstep.com/cli/) CLI to generate these +certificates. If you prefer `openssl` instead, feel free to use that! To +generate the trust anchor with step, you can run: + +```bash +step certificate create root.linkerd.cluster.local root.crt root.key \ + --profile root-ca --no-password --insecure +``` + +This certificate will form the common base of trust between all your clusters. +Each proxy will get a copy of this certificate and use it to validate the +certificates that it receives from peers as part of the mTLS handshake. With a +common base of trust, we now need to generate a certificate that can be used in +each cluster to issue certificates to the proxies. If you'd like to get a deeper +picture into how this all works, check out the +[deep dive](../../features/automatic-mtls/#how-does-it-work). + +The trust anchor that we've generated is a self-signed certificate which can be +used to create new certificates (a certificate authority). To generate the +[issuer credentials](../generate-certificates/#issuer-certificate-and-key) +using the trust anchor, run: + +```bash +step certificate create identity.linkerd.cluster.local issuer.crt issuer.key \ + --profile intermediate-ca --not-after 8760h --no-password --insecure \ + --ca root.crt --ca-key root.key +``` + +An `identity` service in your cluster will use the certificate and key that you +generated here to generate the certificates that each individual proxy uses. +While we will be using the same issuer credentials on each cluster for this +guide, it is a good idea to have separate ones for each cluster. Read through +the [certificate documentation](../generate-certificates/) for more +details. + +With a valid trust anchor and issuer credentials, we can install Linkerd on your +`west` and `east` clusters now. + +```bash +# first, install the Linkerd CRDs in both clusters +linkerd install --crds \ + | tee \ + >(kubectl --context=west apply -f -) \ + >(kubectl --context=east apply -f -) + +# then install the Linkerd control plane in both clusters +linkerd install \ + --identity-trust-anchors-file root.crt \ + --identity-issuer-certificate-file issuer.crt \ + --identity-issuer-key-file issuer.key \ + | tee \ + >(kubectl --context=west apply -f -) \ + >(kubectl --context=east apply -f -) +``` + +And then Linkerd Viz: + +```bash +for ctx in west east; do + linkerd --context=${ctx} viz install | \ + kubectl --context=${ctx} apply -f - || break +done +``` + +The output from `install` will get applied to each cluster and come up! You can +verify that everything has come up successfully with `check`. + +```bash +for ctx in west east; do + echo "Checking cluster: ${ctx} ........." + linkerd --context=${ctx} check || break + echo "-------------" +done +``` + +## Preparing your cluster + +{{< fig + alt="preparation" + title="Preparation" + center="true" + src="/images/multicluster/prep-overview.svg" >}} + +In order to route traffic between clusters, Linkerd leverages Kubernetes +services so that your application code does not need to change and there is +nothing new to learn. This requires a gateway component that routes incoming +requests to the correct internal service. The gateway will be exposed to the +public internet via a `Service` of type `LoadBalancer`. Only requests verified +through Linkerd's mTLS (with a shared trust anchor) will be allowed through this +gateway. If you're interested, we go into more detail as to why this is +important in [architecting for multicluster Kubernetes](/2020/02/17/architecting-for-multicluster-kubernetes/#requirement-i-support-hierarchical-networks). + +To install the multicluster components on both `west` and `east`, you can run: + +```bash +for ctx in west east; do + echo "Installing on cluster: ${ctx} ........." + linkerd --context=${ctx} multicluster install | \ + kubectl --context=${ctx} apply -f - || break + echo "-------------" +done +``` + +{{< fig + alt="install" + title="Components" + center="true" + src="/images/multicluster/components.svg" >}} + +Installed into the `linkerd-multicluster` namespace, the gateway is a simple +[pause container](https://github.com/linkerd/linkerd2/blob/main/multicluster/charts/linkerd-multicluster/templates/gateway.yaml#L3) +which has been injected with the Linkerd proxy. On the inbound side, Linkerd +takes care of validating that the connection uses a TLS certificate that is part +of the trust anchor, then handles the outbound connection. At this point, the +Linkerd proxy is operating like any other in the data plane and forwards the +requests to the correct service. Make sure the gateway comes up successfully by +running: + +```bash +for ctx in west east; do + echo "Checking gateway on cluster: ${ctx} ........." + kubectl --context=${ctx} -n linkerd-multicluster \ + rollout status deploy/linkerd-gateway || break + echo "-------------" +done +``` + +Double check that the load balancer was able to allocate a public IP address by +running: + +```bash +for ctx in west east; do + printf "Checking cluster: ${ctx} ........." + while [ "$(kubectl --context=${ctx} -n linkerd-multicluster get service -o 'custom-columns=:.status.loadBalancer.ingress[0].ip' --no-headers)" = "" ]; do + printf '.' + sleep 1 + done + printf "\n" +done +``` + +Every cluster is now running the multicluster control plane and ready to start +mirroring services. We'll want to link the clusters together now! + +## Linking the clusters + +{{< fig + alt="link-clusters" + title="Link" + center="true" + src="/images/multicluster/link-flow.svg" >}} + +For `west` to mirror services from `east`, the `west` cluster needs to have +credentials so that it can watch for services in `east` to be exported. You'd +not want anyone to be able to introspect what's running on your cluster after +all! The credentials consist of a service account to authenticate the service +mirror as well as a `ClusterRole` and `ClusterRoleBinding` to allow watching +services. In total, the service mirror component uses these credentials to watch +services on `east` or the target cluster and add/remove them from itself +(`west`). There is a default set added as part of +`linkerd multicluster install`, but if you would like to have separate +credentials for every cluster you can run `linkerd multicluster allow`. + +The next step is to link `west` to `east`. This will create a credentials +secret, a Link resource, and a service-mirror controller. The credentials secret +contains a kubeconfig which can be used to access the target (`east`) cluster's +Kubernetes API. The Link resource is custom resource that configures service +mirroring and contains things such as the gateway address, gateway identity, +and the label selector to use when determining which services to mirror. The +service-mirror controller uses the Link and the secret to find services on +the target cluster that match the given label selector and copy them into +the source (local) cluster. + + To link the `west` cluster to the `east` one, run: + +```bash +linkerd --context=east multicluster link --cluster-name east | + kubectl --context=west apply -f - +``` + +Linkerd will look at your current `east` context, extract the `cluster` +configuration which contains the server location as well as the CA bundle. It +will then fetch the `ServiceAccount` token and merge these pieces of +configuration into a kubeconfig that is a secret. + +Running `check` again will make sure that the service mirror has discovered this +secret and can reach `east`. + +```bash +linkerd --context=west multicluster check +``` + +Additionally, the `east` gateway should now show up in the list: + +```bash +linkerd --context=west multicluster gateways +``` + +{{< note >}} `link` assumes that the two clusters will connect to each other +with the same configuration as you're using locally. If this is not the case, +you'll want to use the `--api-server-address` flag for `link`.{{< /note >}} + +## Installing the test services + +{{< fig + alt="test-services" + title="Topology" + center="true" + src="/images/multicluster/example-topology.svg" >}} + +It is time to test this all out! The first step is to add some services that we +can mirror. To add these to both clusters, you can run: + +```bash +for ctx in west east; do + echo "Adding test services on cluster: ${ctx} ........." + kubectl --context=${ctx} apply \ + -n test -k "github.com/linkerd/website/multicluster/${ctx}/" + kubectl --context=${ctx} -n test \ + rollout status deploy/podinfo || break + echo "-------------" +done +``` + +You'll now have a `test` namespace running two deployments in each cluster - +frontend and podinfo. `podinfo` has been configured slightly differently in each +cluster with a different name and color so that we can tell where requests are +going. + +To see what it looks like from the `west` cluster right now, you can run: + +```bash +kubectl --context=west -n test port-forward svc/frontend 8080 +``` + +{{< fig + alt="west-podinfo" + title="West Podinfo" + center="true" + src="/images/multicluster/west-podinfo.gif" >}} + +With the podinfo landing page available at +[http://localhost:8080](http://localhost:8080), you can see how it looks in the +`west` cluster right now. Alternatively, running `curl http://localhost:8080` +will return a JSON response that looks something like: + +```json +{ + "hostname": "podinfo-5c8cf55777-zbfls", + "version": "4.0.2", + "revision": "b4138fdb4dce7b34b6fc46069f70bb295aa8963c", + "color": "#6c757d", + "logo": "https://raw.githubusercontent.com/stefanprodan/podinfo/gh-pages/cuddle_clap.gif", + "message": "greetings from west", + "goos": "linux", + "goarch": "amd64", + "runtime": "go1.14.3", + "num_goroutine": "8", + "num_cpu": "4" +} +``` + +Notice that the `message` references the `west` cluster name. + +## Exporting the services + +To make sure sensitive services are not mirrored and cluster performance is +impacted by the creation or deletion of services, we require that services be +explicitly exported. For the purposes of this guide, we will be exporting the +`podinfo` service from the `east` cluster to the `west` cluster. To do this, we +must first export the `podinfo` service in the `east` cluster. You can do this +by adding the `mirror.linkerd.io/exported` label: + +```bash +kubectl --context=east label svc -n test podinfo mirror.linkerd.io/exported=true +``` + +{{< note >}} You can configure a different label selector by using the +`--selector` flag on the `linkerd multicluster link` command or by editting +the Link resource created by the `linkerd multicluster link` command. +{{< /note >}} + +Check out the service that was just created by the service mirror controller! + +```bash +kubectl --context=west -n test get svc podinfo-east +``` + +From the +[architecture](https://linkerd.io/2020/02/25/multicluster-kubernetes-with-service-mirroring/#step-2-endpoint-juggling), +you'll remember that the service mirror component is doing more than just moving +services over. It is also managing the endpoints on the mirrored service. To +verify that is setup correctly, you can check the endpoints in `west` and verify +that they match the gateway's public IP address in `east`. + +```bash +kubectl --context=west -n test get endpoints podinfo-east \ + -o 'custom-columns=ENDPOINT_IP:.subsets[*].addresses[*].ip' +kubectl --context=east -n linkerd-multicluster get svc linkerd-gateway \ + -o "custom-columns=GATEWAY_IP:.status.loadBalancer.ingress[*].ip" +``` + +At this point, we can hit the `podinfo` service in `east` from the `west` +cluster. This requires the client to be meshed, so let's run `curl` from within +the frontend pod: + +```bash +kubectl --context=west -n test exec -c nginx -it \ + $(kubectl --context=west -n test get po -l app=frontend \ + --no-headers -o custom-columns=:.metadata.name) \ + -- /bin/sh -c "apk add curl && curl http://podinfo-east:9898" +``` + +You'll see the `greeting from east` message! Requests from the `frontend` pod +running in `west` are being transparently forwarded to `east`. Assuming that +you're still port forwarding from the previous step, you can also reach this +with `curl http://localhost:8080/east`. Make that call a couple times and +you'll be able to get metrics from `linkerd viz stat` as well. + +```bash +linkerd --context=west -n test viz stat --from deploy/frontend svc +``` + +We also provide a grafana dashboard to get a feel for what's going on here (see +the [grafana install instructions](../grafana/) first to have a working grafana +provisioned with Linkerd dashboards). You can get to it by running `linkerd +--context=west viz dashboard` and going to + +{{< fig + alt="grafana-dashboard" + title="Grafana" + center="true" + src="/images/multicluster/grafana-dashboard.png" >}} + +## Security + +By default, requests will be going across the public internet. Linkerd extends +its [automatic mTLS](../../features/automatic-mtls/) across clusters to make sure +that the communication going across the public internet is encrypted. If you'd +like to have a deep dive on how to validate this, check out the +[docs](../securing-your-service/). To quickly check, however, you can run: + +```bash +linkerd --context=west -n test viz tap deploy/frontend | \ + grep "$(kubectl --context=east -n linkerd-multicluster get svc linkerd-gateway \ + -o "custom-columns=GATEWAY_IP:.status.loadBalancer.ingress[*].ip")" +``` + +`tls=true` tells you that the requests are being encrypted! + +{{< note >}} As `linkerd viz edges` works on concrete resources and cannot see +two clusters at once, it is not currently able to show the edges between pods in +`east` and `west`. This is the reason we're using `tap` to validate mTLS here. +{{< /note >}} + +In addition to making sure all your requests are encrypted, it is important to +block arbitrary requests coming into your cluster. We do this by validating that +requests are coming from clients in the mesh. To do this validation, we rely on +a shared trust anchor between clusters. To see what happens when a client is +outside the mesh, you can run: + +```bash +kubectl --context=west -n test run -it --rm --image=alpine:3 test -- \ + /bin/sh -c "apk add curl && curl -vv http://podinfo-east:9898" +``` + +## Traffic Splitting + +{{< fig + alt="with-split" + title="Traffic Split" + center="true" + src="/images/multicluster/with-split.svg" >}} + +It is pretty useful to have services automatically show up in clusters and be +able to explicitly address them, however that only covers one use case for +operating multiple clusters. Another scenario for multicluster is failover. In a +failover scenario, you don't have time to update the configuration. Instead, you +need to be able to leave the application alone and just change the routing. If +this sounds a lot like how we do [canary](../canary-release/) deployments, +you'd be correct! + +`TrafficSplit` allows us to define weights between multiple services and split +traffic between them. In a failover scenario, you want to do this slowly as to +make sure you don't overload the other cluster or trip any SLOs because of the +added latency. To get this all working with our scenario, let's split between +the `podinfo` service in `west` and `east`. To configure this, you'll run: + +```bash +kubectl --context=west apply -f - <}} + +You can also watch what's happening with metrics. To see the source side of +things (`west`), you can run: + +```bash +linkerd --context=west -n test viz stat trafficsplit +``` + +It is also possible to watch this from the target (`east`) side by running: + +```bash +linkerd --context=east -n test viz stat \ + --from deploy/linkerd-gateway \ + --from-namespace linkerd-multicluster \ + deploy/podinfo +``` + +There's even a dashboard! Run `linkerd viz dashboard` and send your browser to +[localhost:50750](http://localhost:50750/namespaces/test/trafficsplits/podinfo). + +{{< fig + alt="podinfo-split" + title="Cross Cluster Podinfo" + center="true" + src="/images/multicluster/ts-dashboard.png" >}} + +## Cleanup + +To cleanup the multicluster control plane, you can run: + +```bash +linkerd --context=west multicluster unlink --cluster-name east | \ + kubectl --context=west delete -f - +for ctx in west east; do \ + kubectl --context=${ctx} delete ns test; \ + linkerd --context=${ctx} multicluster uninstall | kubectl --context=${ctx} delete -f - ; \ +done +``` + +If you'd also like to remove your Linkerd installation, run: + +```bash +for ctx in west east; do + linkerd --context=${ctx} viz uninstall | kubectl --context=${ctx} delete -f - + linkerd --context=${ctx} uninstall | kubectl --context=${ctx} delete -f - +done +``` diff --git a/linkerd.io/content/2.16/tasks/per-request-policy.md b/linkerd.io/content/2.16/tasks/per-request-policy.md new file mode 100644 index 0000000000..95dfcdd3bc --- /dev/null +++ b/linkerd.io/content/2.16/tasks/per-request-policy.md @@ -0,0 +1,34 @@ ++++ +title = "Per-Request Policy" +description = "Using HTTP headers to specify per-request policy" +aliases = [] ++++ + +[Retries](../configuring-retries/) and [timeouts](../configuring-timeouts/) can +be configured by annotating Service, HTTPRoute, or GRPCRoute resources. This +will apply the retry or timeout policy to all requests that are sent to that +service/route. + +Additionally, retry and timeout policy can be configured for individual HTTP +requests by adding special HTTP headers to those requests. + +## Enabling Per-Request Policy + +In order to enable per-request policy, Linkerd must be installed with the +`--set policyController.additionalArgs="--allow-l5d-request-headers"` flag or +the corresponding Helm value. Enabling per-request policy is **not** +recommended if your application accepts requests from untrusted sources (e.g. +if it is an ingress) since this allows untrusted clients to specify Linkerd +policy. + +## Per-Request Policy Headers + +Once per-request policy is enabled, the following HTTP headers can be added to +a request to set or override retry and/or timeout policy for that request: + ++ `l5d-retry-http`: Overrides the `retry.linkerd.io/http` annotation ++ `l5d-retry-grpc`: Overrides the `retry.linkerd.io/grpc` annotation ++ `l5d-retry-limit`: Overrides the `retry.linkerd.io/limit` annotation ++ `l5d-retry-timeout`: Overrides the `retry.linkerd.io/timeout` annotation ++ `l5d-timeout`: Overrides the `timeout.linkerd.io/request` annotation ++ `l5d-response-timeout`: Overrides the `timeout.linkerd.io/response` annotation diff --git a/linkerd.io/content/2.16/tasks/pod-to-pod-multicluster.md b/linkerd.io/content/2.16/tasks/pod-to-pod-multicluster.md new file mode 100644 index 0000000000..5a85ef50eb --- /dev/null +++ b/linkerd.io/content/2.16/tasks/pod-to-pod-multicluster.md @@ -0,0 +1,307 @@ ++++ +title = "Pod-to-Pod Multi-cluster communication" +description = "Multi-Cluster Communication for Flat Networks" ++++ + +By default, Linkerd's [multicluster extension](../multicluster/) works by +sending all cross-cluster traffic through a gateway on the target cluster. +However, when multiple Kubernetes clusters are deployed on a flat network where +pods from one cluster can communicate directly with pods on another, Linkerd +can export multicluster services in *pod-to-pod* mode where cross-cluster +traffic does not go through the gateway, but instead goes directly to the +target pods. + +This guide will walk you through exporting multicluster services in pod-to-pod +mode, setting up authorization policies, and monitoring the traffic. + +## Prerequisites + +- Two clusters. We will refer to them as `east` and `west` in this guide. +- The clusters must be on a *flat network*. In other words, pods from one + cluster must be able to address and connect to pods in the other cluster. +- Each of these clusters should be configured as `kubectl` + [contexts](https://kubernetes.io/docs/tasks/access-application-cluster/configure-access-multiple-clusters/). + We'd recommend you use the names `east` and `west` so that you can follow + along with this guide. It is easy to + [rename contexts](https://kubernetes.io/docs/reference/generated/kubectl/kubectl-commands#-em-rename-context-em-) + with `kubectl`, so don't feel like you need to keep it all named this way + forever. + +## Step 1: Installing Linkerd and Linkerd-Viz + +First, install Linkerd and Linkerd-Viz into both clusters, as described in +the [multicluster guide](../multicluster/#install-linkerd-and-linkerd-viz). +Make sure to take care that both clusters share a common trust anchor. + +## Step 2: Installing Linkerd-Multicluster + +We will install the multicluster extension into both clusters. We can install +without the gateway because we will be using direct pod-to-pod communication. + +```console +> linkerd --context east multicluster install --gateway=false | kubectl --context east apply -f - +> linkerd --context east check + +> linkerd --context west multicluster install --gateway=false | kubectl --context west apply -f - +> linkerd --context west check +``` + +## Step 3: Linking the Clusters + +We use the `linkerd multilcuster link` command to link our two clusters +together. This is exactly the same as in the regular +[Multicluster guide](../multicluster/#linking-the-clusters) except that we pass +the `--gateway=false` flag to create a Link which doesn't require a gateway. + +```console +> linkerd --context east multicluster link --cluster-name=target --gateway=false | kubectl --context west apply -f - +``` + +## Step 4: Deploy and Exporting a Service + +For our guide, we'll deploy the [bb](https://github.com/BuoyantIO/bb) service, +which is a simple server that just returns a static response. We deploy it +into the target cluster: + +```bash +> cat < kubectl --context west create ns mc-demo +``` + +and set a label on the target service to export it. Notice that instead of the +usual `mirror.linkerd.io/exported=true` label, we are setting +`mirror.linkerd.io/exported=remote-discovery` which means that the service +should be exported in remote discovery mode, which skips the gateway and allows +pods from different clusters to talk to each other directly. + +```console +> kubectl --context east -n mc-demo label svc/bb mirror.linkerd.io/exported=remote-discovery +``` + +You should immediately see a mirror service created in the source cluster: + +```console +> kubectl --context west -n mc-demo get svc +NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE +bb-target ClusterIP 10.43.56.245 8080/TCP 114s +``` + +## Step 5: Send some traffic! + +We'll use [slow-cooker](https://github.com/BuoyantIO/slow_cooker) as our load +generator in the source cluster to send to the `bb` service in the target +cluster. Notice that we configure slow-cooker to send to our `bb-target` mirror +service. + +```bash +> cat < linkerd --context east viz stat -n mc-demo deploy +NAME MESHED SUCCESS RPS LATENCY_P50 LATENCY_P95 LATENCY_P99 TCP_CONN +bb 1/1 100.00% 10.3rps 1ms 1ms 1ms 3 +``` + +## Step 6: Authorization Policy + +One advantage of direct pod-to-pod communication is that the server can use +authorization policies which allow only certain clients to connect. This is +not possible when using the gateway, because client identity is lost when going +through the gateway. For more background on how authorization policies work, +see: [Restricting Access To Services](../restricting-access/). + +Let's demonstrate that by creating an authorization policy which only allows +the `slow-cooker` service account to connect to `bb`: + +```bash +> kubectl --context east apply -f - < linkerd --context east viz authz -n mc-demo deploy +ROUTE SERVER AUTHORIZATION UNAUTHORIZED SUCCESS RPS LATENCY_P50 LATENCY_P95 LATENCY_P99 +default bb authorizationpolicy/bb-authz 0.0rps 100.00% 10.0rps 1ms 1ms 1ms +default default:all-unauthenticated default/all-unauthenticated 0.0rps 100.00% 0.1rps 1ms 1ms 1ms +probe default:all-unauthenticated default/probe 0.0rps 100.00% 0.2rps 1ms 1ms 1ms +``` + +To demonstrate that `slow-cooker` is the *only* service account which is allowed +to send to `bb`, we'll create a second load generator called `slow-cooker-evil` +which uses a different service account and which should be denied. + +```bash +> cat < linkerd --context east viz authz -n mc-demo deploy +ROUTE SERVER AUTHORIZATION UNAUTHORIZED SUCCESS RPS LATENCY_P50 LATENCY_P95 LATENCY_P99 +default bb 10.0rps 0.00% 0.0rps 0ms 0ms 0ms +default bb authorizationpolicy/bb-authz 0.0rps 100.00% 10.0rps 1ms 1ms 1ms +default default:all-unauthenticated default/all-unauthenticated 0.0rps 100.00% 0.1rps 1ms 1ms 1ms +probe default:all-unauthenticated default/probe 0.0rps 100.00% 0.2rps 1ms 1ms 1ms +``` diff --git a/linkerd.io/content/2.16/tasks/replacing_expired_certificates.md b/linkerd.io/content/2.16/tasks/replacing_expired_certificates.md new file mode 100644 index 0000000000..b822b3335b --- /dev/null +++ b/linkerd.io/content/2.16/tasks/replacing_expired_certificates.md @@ -0,0 +1,124 @@ ++++ +title = "Replacing expired certificates" +description = "Follow this workflow if any of your TLS certs have expired." ++++ + +If any of your TLS certs are approaching expiry and you are not relying on an +external certificate management solution such as `cert-manager`, you can follow +[Manually Rotating Control Plane TLS Credentials](../rotating_identity_certificates/) +to update them without incurring downtime. However, if any of your certificates +have already expired, your mesh is already in an invalid state and any measures +to avoid downtime are not guaranteed to give good results. Instead, you need to +replace the expired certificates with valid certificates. + +## Replacing only the issuer certificate + +It might be the case that your issuer certificate is expired. If this it true +running `linkerd check --proxy` will produce output similar to: + +```bash +linkerd-identity +---------------- +√ certificate config is valid +√ trust roots are using supported crypto algorithm +√ trust roots are within their validity period +√ trust roots are valid for at least 60 days +√ issuer cert is using supported crypto algorithm +× issuer cert is within its validity period + issuer certificate is not valid anymore. Expired on 2019-12-19T09:21:08Z + see https://linkerd.io/checks/#l5d-identity-issuer-cert-is-time-valid for hints +``` + +In this situation, if you have installed Linkerd with a manually supplied trust +root and you have its key, you can follow the instructions to +[rotate your identity issuer certificate](../manually-rotating-control-plane-tls-credentials/#rotating-the-identity-issuer-certificate) +to update your expired certificate. + +## Replacing the root and issuer certificates + +If your root certificate is expired or you do not have its key, you need to +replace both your root and issuer certificates at the same time. If your root +has expired `linkerd check` will indicate that by outputting an error similar +to: + +```bash +linkerd-identity +---------------- +√ certificate config is valid +√ trust roots are using supported crypto algorithm +× trust roots are within their validity period + Invalid roots: + * 272080721524060688352608293567629376512 identity.linkerd.cluster.local not valid anymore. Expired on 2019-12-19T10:05:31Z + see https://linkerd.io/checks/#l5d-identity-roots-are-time-valid for hints +``` + +You can follow [Generating your own mTLS root certificates](../generate-certificates/#generating-the-certificates-with-step) +to create new root and issuer certificates. Then use the `linkerd upgrade` +command: + +```bash +linkerd upgrade \ + --identity-issuer-certificate-file=./issuer-new.crt \ + --identity-issuer-key-file=./issuer-new.key \ + --identity-trust-anchors-file=./ca-new.crt \ + --force \ + | kubectl apply -f - +``` + +Usually `upgrade` will prevent you from using an issuer certificate that +will not work with the roots your meshed pods are using. At that point we +do not need this check as we are updating both the root and issuer certs at +the same time. Therefore we use the `--force` flag to ignore this error. + +If you run `linkerd check --proxy` while pods are restarting after the trust +bundle is updated, you will probably see warnings about pods not having the +current trust bundle: + +```bash +linkerd-identity +---------------- +√ certificate config is valid +√ trust roots are using supported crypto algorithm +√ trust roots are within their validity period +√ trust roots are valid for at least 60 days +√ issuer cert is using supported crypto algorithm +√ issuer cert is within its validity period +√ issuer cert is valid for at least 60 days +√ issuer cert is issued by the trust root + +linkerd-identity-data-plane +--------------------------- +‼ data plane proxies certificate match CA + Some pods do not have the current trust bundle and must be restarted: + * linkerd/linkerd-controller-5b69fd4fcc-7skqb + * linkerd/linkerd-destination-749df5c74-brchg + * linkerd/linkerd-prometheus-74cb4f4b69-kqtss + * linkerd/linkerd-proxy-injector-cbd5545bd-rblq5 + * linkerd/linkerd-sp-validator-6ff949649f-gjgfl + * linkerd/linkerd-tap-7b5bb954b6-zl9w6 + * linkerd/linkerd-web-84c555f78-v7t44 + see https://linkerd.io/checks/#l5d-identity-data-plane-proxies-certs-match-ca for hints + +``` + +These warnings will disappear as restarts complete. Once they do, you can use +`kubectl rollout restart` to restart your meshed workloads to bring their +configuration up to date. After that is done, `linkerd check` should run with +no warnings or errors: + +```bash +linkerd-identity +---------------- +√ certificate config is valid +√ trust roots are using supported crypto algorithm +√ trust roots are within their validity period +√ trust roots are valid for at least 60 days +√ issuer cert is using supported crypto algorithm +√ issuer cert is within its validity period +√ issuer cert is valid for at least 60 days +√ issuer cert is issued by the trust root + +linkerd-identity-data-plane +--------------------------- +√ data plane proxies certificate match CA +``` diff --git a/linkerd.io/content/2.16/tasks/restricting-access.md b/linkerd.io/content/2.16/tasks/restricting-access.md new file mode 100644 index 0000000000..61f619ff17 --- /dev/null +++ b/linkerd.io/content/2.16/tasks/restricting-access.md @@ -0,0 +1,186 @@ ++++ +title = "Restricting Access To Services" +description = "Use Linkerd policy to restrict access to a service." ++++ + +Linkerd policy resources can be used to restrict which clients may access a +service. In this example, we'll use Emojivoto to show how to restrict access +to the Voting service so that it may only be called from the Web service. + +For a more comprehensive description of the policy resources, see the +[Policy reference docs](../../reference/authorization-policy/). + +## Prerequisites + +To use this guide, you'll need to have Linkerd installed on your cluster, along +with its Viz extension. Follow the [Installing Linkerd Guide](../install/) +if you haven't already done this. + +## Setup + +Inject and install the Emojivoto application: + +```bash +$ linkerd inject https://run.linkerd.io/emojivoto.yml | kubectl apply -f - +... +$ linkerd check -n emojivoto --proxy -o short +... +``` + +## Creating a Server resource + +We start by creating a `Server` resource for the Voting service. A `Server` +is a Linkerd custom resource which describes a specific port of a workload. +Once the `Server` resource has been created, only clients which have been +authorized may access it (we'll see how to authorize clients in a moment). + +```bash +kubectl apply -f - < linkerd viz authz -n emojivoto deploy/voting +ROUTE SERVER AUTHORIZATION UNAUTHORIZED SUCCESS RPS LATENCY_P50 LATENCY_P95 LATENCY_P99 +default default:all-unauthenticated default/all-unauthenticated 0.0rps 100.00% 0.1rps 1ms 1ms 1ms +probe default:all-unauthenticated default/probe 0.0rps 100.00% 0.2rps 1ms 1ms 1ms +default voting-grpc 1.0rps 0.00% 0.0rps 0ms 0ms 0ms +``` + +## Creating a ServerAuthorization resource + +A `ServerAuthorization` grants a set of clients access to a set of `Servers`. +Here we will create a `ServerAuthorization` which grants the Web service access +to the Voting `Server` we created above. Note that meshed mTLS uses +`ServiceAccounts` as the basis for identity, thus our authorization will also +be based on `ServiceAccounts`. + +```bash +kubectl apply -f - < linkerd viz authz -n emojivoto deploy/voting +ROUTE SERVER AUTHORIZATION UNAUTHORIZED SUCCESS RPS LATENCY_P50 LATENCY_P95 LATENCY_P99 +default default:all-unauthenticated default/all-unauthenticated 0.0rps 100.00% 0.1rps 1ms 1ms 1ms +probe default:all-unauthenticated default/probe 0.0rps 100.00% 0.2rps 1ms 1ms 1ms +default voting-grpc serverauthorization/voting-grpc 0.0rps 83.87% 1.0rps 1ms 1ms 1ms +``` + +We can also test that request from other pods will be rejected by creating a +`grpcurl` pod and attempting to access the Voting service from it: + +```bash +> kubectl run grpcurl --rm -it --image=networld/grpcurl --restart=Never --command -- ./grpcurl -plaintext voting-svc.emojivoto:8080 emojivoto.v1.VotingService/VoteDog +Error invoking method "emojivoto.v1.VotingService/VoteDog": failed to query for service descriptor "emojivoto.v1.VotingService": rpc error: code = PermissionDenied desc = +pod "grpcurl" deleted +pod default/grpcurl terminated (Error) +``` + +Because this client has not been authorized, this request gets rejected with a +`PermissionDenied` error. + +You can create as many `ServerAuthorization` resources as you like to authorize +many different clients. You can also specify whether to authorize +unauthenticated (i.e. unmeshed) client, any authenticated client, or only +authenticated clients with a particular identity. For more details, please see +the [Policy reference docs](../../reference/authorization-policy/). + +## Setting a Default Policy + +To further lock down a cluster, you can set a default policy which will apply +to all ports which do not have a Server resource defined. Linkerd uses the +following logic when deciding whether to allow a request: + +* If the port has a Server resource and the client matches a ServerAuthorization + resource for it: ALLOW +* If the port has a Server resource but the client does not match any + ServerAuthorizations for it: DENY +* If the port does not have a Server resource: use the default policy + +We can set the default policy to `deny` using the `linkerd upgrade` command: + +```bash +> linkerd upgrade --default-inbound-policy deny | kubectl apply -f - +``` + +Alternatively, default policies can be set on individual workloads or namespaces +by setting the `config.linkerd.io/default-inbound-policy` annotation. See the +[Policy reference docs](../../reference/authorization-policy/) for more details. + +If a port does not have a Server defined, Linkerd will automatically use a +default Server which allows readiness and liveness probes. However, if you +create a Server resource for a port which handles probes, you will need to +explicitly create an authorization to allow those probe requests. For more +information about adding route-scoped authorizations, see +[Configuring Per-Route Policy](../configuring-per-route-policy/). + +## Further Considerations - Audit Mode + +You may have noticed that there was a period of time after we created the +`Server` resource but before we created the `ServerAuthorization` where all +requests were being rejected. To avoid this situation in live systems, we +recommend that you enable [audit mode](../../features/server-policy/#audit-mode) +in the `Server` resource (via `accessPolicy:audit`) and check the proxy +logs/metrics in the target services to see if traffic would get inadvertently +denied. Afterwards, when you're sure about your policy rules, you can fully +enable them by resetting `accessPolicy` back to `deny`. + +## Per-Route Policy + +In addition to service-level authorization policy, authorization policy can also +be configured for individual HTTP routes. To learn more about per-route policy, +see the documentation on [configuring per-route +policy](../configuring-per-route-policy/). diff --git a/linkerd.io/content/2.16/tasks/rotating_webhooks_certificates.md b/linkerd.io/content/2.16/tasks/rotating_webhooks_certificates.md new file mode 100644 index 0000000000..24b286f35e --- /dev/null +++ b/linkerd.io/content/2.16/tasks/rotating_webhooks_certificates.md @@ -0,0 +1,104 @@ ++++ +title = "Rotating webhooks certificates" +description = "Follow these steps to rotate your Linkerd webhooks certificates." ++++ + +Linkerd uses the +[Kubernetes admission webhooks](https://kubernetes.io/docs/reference/access-authn-authz/extensible-admission-controllers/#admission-webhooks) +and +[extension API server](https://kubernetes.io/docs/concepts/extend-kubernetes/api-extension/apiserver-aggregation/) +to implement some of its core features like +[automatic proxy injection](../../features/proxy-injection/) and +[service profiles validation](../../features/service-profiles/). + +Also, the viz extension uses a webhook to make pods tappable, as does the jaeger +extension to turn on tracing on pods. + +To secure the connections between the Kubernetes API server and the +webhooks, all the webhooks are TLS-enabled. The x509 certificates used by these +webhooks are issued by the self-signed CA certificates embedded in the webhooks +configuration. + +By default, these certificates have a validity period of 365 days. They are +stored in the following secrets: + +- In the `linkerd` namespace: `linkerd-policy-validator-k8s-tls`, + `linkerd-proxy-injector-k8s-tls` and `linkerd-sp-validator-k8s-tls` +- In the `linkerd-viz` namespace: `tap-injector-k8s-tls` +- In the `linkerd-jaeger` namespace: `jaeger-injector-k8s-tls` + +The rest of this documentation provides instructions on how to renew these +certificates. + +## Renewing the webhook certificates + +To check the validity of all the TLS secrets +(using [`step`](https://smallstep.com/cli/)): + +```bash +# assuming you have viz and jaeger installed, otherwise trim down these arrays +# accordingly +SECRETS=("linkerd-policy-validator-k8s-tls" "linkerd-proxy-injector-k8s-tls" "linkerd-sp-validator-k8s-tls" "tap-injector-k8s-tls" "jaeger-injector-k8s-tls") +NS=("linkerd" "linkerd" "linkerd-viz" "linkerd-jaeger") +for idx in "${!SECRETS[@]}"; do \ + kubectl -n "${NS[$idx]}" get secret "${SECRETS[$idx]}" -ojsonpath='{.data.tls\.crt}' | \ + base64 --decode - | \ + step certificate inspect - | \ + grep -iA2 validity; \ +done +``` + +Manually delete these secrets and use `upgrade`/`install` to recreate them: + +```bash +for idx in "${!SECRETS[@]}"; do \ + kubectl -n "${NS[$idx]}" delete secret "${SECRETS[$idx]}"; \ +done + +linkerd upgrade | kubectl apply -f - +linkerd viz install | kubectl apply -f - +linkerd jaeger install | kubectl apply -f - +``` + +The above command will recreate the secrets without restarting Linkerd. + +{{< note >}} +For Helm users, use the `helm upgrade` command to recreate the deleted secrets. + +If you render the helm charts externally and apply them with `kubectl apply` +(e.g. in a CI/CD pipeline), you do not need to delete the secrets manually, +as they wil be overwritten by a new cert and key generated by the helm chart. +{{< /note >}} + +Confirm that the secrets are recreated with new certificates: + +```bash +for idx in "${!SECRETS[@]}"; do \ + kubectl -n "${NS[$idx]}" get secret "${SECRETS[$idx]}" -ojsonpath='{.data.crt\.pem}' | \ + base64 --decode - | \ + step certificate inspect - | \ + grep -iA2 validity; \ +done +``` + +Ensure that Linkerd remains healthy: + +```bash +linkerd check +``` + +Restarting the pods that implement the webhooks and API services is usually not +necessary. But if the cluster is large, or has a high pod churn, it may be +advisable to restart the pods manually, to avoid cascading failures. + +If you observe certificate expiry errors or mismatched CA certs, restart their +pods with: + +```sh +kubectl -n linkerd rollout restart deploy \ + linkerd-proxy-injector \ + linkerd-sp-validator \ + +kubectl -n linkerd-viz rollout restart deploy tap tap-injector +kubectl -n linkerd-jaeger rollout restart deploy jaeger-injector +``` diff --git a/linkerd.io/content/2.16/tasks/securing-linkerd-tap.md b/linkerd.io/content/2.16/tasks/securing-linkerd-tap.md new file mode 100644 index 0000000000..fc50c312d0 --- /dev/null +++ b/linkerd.io/content/2.16/tasks/securing-linkerd-tap.md @@ -0,0 +1,227 @@ ++++ +title = "Securing Linkerd Tap" +description = "Best practices for securing Linkerd's tap feature." +aliases = [ + "../tap-rbac/", + "securing-your-cluster/", +] ++++ + +Linkerd provides a powerful tool called `tap` which allows users +to introspect live traffic in real time. While powerful, this feature can +potentially expose sensitive data such as request and response headers, which may +contain confidential information. To mitigate this risk, Linkerd has a configuration +field called `tap.ignoreHeaders` which can be used to exclude specific headers from +being captured by `tap`. Access to `tap` is controlled using +[role-based access control (RBAC)](https://kubernetes.io/docs/reference/access-authn-authz/rbac/). +This page illustrates best practices to enable this introspection in a secure +way. + +## Tap + +Linkerd's Viz extension includes Tap support. This feature is available via the +following commands: + +- [`linkerd viz tap`](../../reference/cli/viz/#tap) +- [`linkerd viz top`](../../reference/cli/viz/#top) +- [`linkerd viz profile --tap`](../../reference/cli/viz/#profile) +- [`linkerd viz dashboard`](../../reference/cli/viz/#dashboard) + +Depending on your RBAC setup, you may need to perform additional steps to enable +your user(s) to perform Tap actions. + +{{< note >}} +If you are on GKE, skip to the [GKE section below](#gke). +{{< /note >}} + +### Check for Tap access + +Use `kubectl` to determine whether your user is authorized to perform tap +actions. For more information, see the +[Kubernetes docs on authorization](https://kubernetes.io/docs/reference/access-authn-authz/authorization/#checking-api-access). + +To determine if you can watch pods in all namespaces: + +```bash +kubectl auth can-i watch pods.tap.linkerd.io --all-namespaces +``` + +To determine if you can watch deployments in the emojivoto namespace: + +```bash +kubectl auth can-i watch deployments.tap.linkerd.io -n emojivoto +``` + +To determine if a specific user can watch deployments in the emojivoto namespace: + +```bash +kubectl auth can-i watch deployments.tap.linkerd.io -n emojivoto --as $(whoami) +``` + +You can also use the Linkerd CLI's `--as` flag to confirm: + +```bash +$ linkerd viz tap -n linkerd deploy/linkerd-controller --as $(whoami) +Cannot connect to Linkerd Viz: namespaces is forbidden: User "XXXX" cannot list resource "namespaces" in API group "" at the cluster scope +Validate the install with: linkerd viz check +... +``` + +### Enabling Tap access + +If the above commands indicate you need additional access, you can enable access +with as much granularity as you choose. + +#### Granular Tap access + +To enable tap access to all resources in all namespaces, you may bind your user +to the `linkerd-linkerd-tap-admin` ClusterRole, installed by default: + +```bash +$ kubectl describe clusterroles/linkerd-linkerd-viz-tap-admin +Name: linkerd-linkerd-viz-tap-admin +Labels: component=tap + linkerd.io/extension=viz +Annotations: kubectl.kubernetes.io/last-applied-configuration: + {"apiVersion":"rbac.authorization.k8s.io/v1","kind":"ClusterRole","metadata":{"annotations":{},"labels":{"component=tap... +PolicyRule: + Resources Non-Resource URLs Resource Names Verbs + --------- ----------------- -------------- ----- + *.tap.linkerd.io [] [] [watch] +``` + +{{< note >}} +This ClusterRole name includes the Linkerd Viz namespace, so it may vary if you +installed Viz into a non-default namespace: +`linkerd-[LINKERD_VIZ_NAMESPACE]-tap-admin` +{{< /note >}} + +To bind the `linkerd-linkerd-viz-tap-admin` ClusterRole to a particular user: + +```bash +kubectl create clusterrolebinding \ + $(whoami)-tap-admin \ + --clusterrole=linkerd-linkerd-viz-tap-admin \ + --user=$(whoami) +``` + +You can verify you now have tap access with: + +```bash +$ linkerd viz tap -n linkerd deploy/linkerd-controller --as $(whoami) +req id=3:0 proxy=in src=10.244.0.1:37392 dst=10.244.0.13:9996 tls=not_provided_by_remote :method=GET :authority=10.244.0.13:9996 :path=/ping +... +``` + +#### Cluster admin access + +To simply give your user cluster-admin access: + +```bash +kubectl create clusterrolebinding \ + $(whoami)-cluster-admin \ + --clusterrole=cluster-admin \ + --user=$(whoami) +``` + +{{< note >}} +Not recommended for production, only do this for testing/development. +{{< /note >}} + +### GKE + +Google Kubernetes Engine (GKE) provides access to your Kubernetes cluster via +Google Cloud IAM. See the +[GKE IAM Docs](https://cloud.google.com/kubernetes-engine/docs/how-to/iam) for +more information. + +Because GCloud provides this additional level of access, there are cases where +`kubectl auth can-i` will report you have Tap access when your RBAC user may +not. To validate this, check whether your GCloud user has Tap access: + +```bash +$ kubectl auth can-i watch pods.tap.linkerd.io --all-namespaces +yes +``` + +And then validate whether your RBAC user has Tap access: + +```bash +$ kubectl auth can-i watch pods.tap.linkerd.io --all-namespaces --as $(gcloud config get-value account) +no - no RBAC policy matched +``` + +If the second command reported you do not have access, you may enable access +with: + +```bash +kubectl create clusterrolebinding \ + $(whoami)-tap-admin \ + --clusterrole=linkerd-linkerd-viz-tap-admin \ + --user=$(gcloud config get-value account) +``` + +To simply give your user cluster-admin access: + +```bash +kubectl create clusterrolebinding \ + $(whoami)-cluster-admin \ + --clusterrole=cluster-admin \ + --user=$(gcloud config get-value account) +``` + +{{< note >}} +Not recommended for production, only do this for testing/development. +{{< /note >}} + +### Linkerd Dashboard tap access + +By default, the [Linkerd dashboard](../../features/dashboard/) has the RBAC +privileges necessary to tap resources. + +To confirm: + +```bash +$ kubectl auth can-i watch pods.tap.linkerd.io --all-namespaces --as system:serviceaccount:linkerd-viz:web +yes +``` + +This access is enabled via a `linkerd-linkerd-viz-web-admin` ClusterRoleBinding: + +```bash +$ kubectl describe clusterrolebindings/linkerd-linkerd-viz-web-admin +Name: linkerd-linkerd-viz-web-admin +Labels: component=web + linkerd.io/extensions=viz +Annotations: kubectl.kubernetes.io/last-applied-configuration: + {"apiVersion":"rbac.authorization.k8s.io/v1","kind":"ClusterRoleBinding","metadata":{"annotations":{},"labels":{"component=web... +Role: + Kind: ClusterRole + Name: linkerd-linkerd-viz-tap-admin +Subjects: + Kind Name Namespace + ---- ---- --------- + ServiceAccount web linkerd-viz +``` + +If you would like to restrict the Linkerd dashboard's tap access. You may +install Linkerd viz with the `--set dashboard.restrictPrivileges` flag: + +```bash +linkerd viz install --set dashboard.restrictPrivileges +``` + +This will omit the `linkerd-linkerd-web-admin` ClusterRoleBinding. If you have +already installed Linkerd, you may simply delete the ClusterRoleBinding +manually: + +```bash +kubectl delete clusterrolebindings/linkerd-linkerd-viz-web-admin +``` + +To confirm: + +```bash +$ kubectl auth can-i watch pods.tap.linkerd.io --all-namespaces --as system:serviceaccount:linkerd-viz:web +no +``` diff --git a/linkerd.io/content/2.16/tasks/setting-up-service-profiles.md b/linkerd.io/content/2.16/tasks/setting-up-service-profiles.md new file mode 100644 index 0000000000..e2b8364bec --- /dev/null +++ b/linkerd.io/content/2.16/tasks/setting-up-service-profiles.md @@ -0,0 +1,148 @@ ++++ +title = "Setting Up Service Profiles" +description = "Create a service profile that provides more details for Linkerd to build on." ++++ + +[Service profiles](../../features/service-profiles/) provide Linkerd additional +information about a service and how to handle requests for a service. + +When an HTTP (not HTTPS) request is received by a Linkerd proxy, +the `destination service` of that request is identified. If a +service profile for that destination service exists, then that +service profile is used to +to provide [per-route metrics](../getting-per-route-metrics/), +[retries](../configuring-retries/) and +[timeouts](../configuring-timeouts/). + +The `destination service` for a request is computed by selecting +the value of the first header to exist of, `l5d-dst-override`, +`:authority`, and `Host`. The port component, if included and +including the colon, is stripped. That value is mapped to the fully +qualified DNS name. When the `destination service` matches the +name of a service profile in the namespace of the sender or the +receiver, Linkerd will use that to provide [per-route +metrics](../getting-per-route-metrics/), +[retries](../configuring-retries/) and +[timeouts](../configuring-timeouts/). + +There are times when you may need to define a service profile for +a service which resides in a namespace that you do not control. To +accomplish this, simply create a service profile as before, but +edit the namespace of the service profile to the namespace of the +pod which is calling the service. When Linkerd proxies a request +to a service, a service profile in the source namespace will take +priority over a service profile in the destination namespace. + +Your `destination service` may be a [ExternalName +service](https://kubernetes.io/docs/concepts/services-networking/service/#externalname). +In that case, use the `spec.metadata.name` and the +`spec.metadata.namespace' values to name your ServiceProfile. For +example, + +```yaml +apiVersion: v1 +kind: Service +metadata: + name: my-service + namespace: prod +spec: + type: ExternalName + externalName: my.database.example.com +``` + +use the name `my-service.prod.svc.cluster.local` for the ServiceProfile. + +Note that at present, you cannot view statistics gathered for routes +in this ServiceProfile in the web dashboard. You can get the +statistics using the CLI. + +For a complete demo walkthrough, check out the +[books](../books/#service-profiles) demo. + +There are a couple different ways to use `linkerd profile` to create service +profiles. + +{{< pagetoc >}} + +Requests which have been associated with a route will have a `rt_route` +annotation. To manually verify if the requests are being associated correctly, +run `tap` on your own deployment: + +```bash +linkerd viz tap -o wide | grep req +``` + +The output will stream the requests that `deploy/webapp` is receiving in real +time. A sample is: + +```bash +req id=0:1 proxy=in src=10.1.3.76:57152 dst=10.1.3.74:7000 tls=disabled :method=POST :authority=webapp.default:7000 :path=/books/2878/edit src_res=deploy/traffic src_ns=foobar dst_res=deploy/webapp dst_ns=default rt_route=POST /books/{id}/edit +``` + +Conversely, if `rt_route` is not present, a request has *not* been associated +with any route. Try running: + +```bash +linkerd viz tap -o wide | grep req | grep -v rt_route +``` + +## Swagger + +If you have an [OpenAPI (Swagger)](https://swagger.io/docs/specification/about/) +spec for your service, you can use the `--open-api` flag to generate a service +profile from the OpenAPI spec file. + +```bash +linkerd profile --open-api webapp.swagger webapp +``` + +This generates a service profile from the `webapp.swagger` OpenAPI spec file +for the `webapp` service. The resulting service profile can be piped directly +to `kubectl apply` and will be installed into the service's namespace. + +```bash +linkerd profile --open-api webapp.swagger webapp | kubectl apply -f - +``` + +## Protobuf + +If you have a [protobuf](https://developers.google.com/protocol-buffers/) format +for your service, you can use the `--proto` flag to generate a service profile. + +```bash +linkerd profile --proto web.proto web-svc +``` + +This generates a service profile from the `web.proto` format file for the +`web-svc` service. The resulting service profile can be piped directly to +`kubectl apply` and will be installed into the service's namespace. + +## Auto-Creation + +It is common to not have an OpenAPI spec or a protobuf format. You can also +generate service profiles from watching live traffic. This is based off tap data +and is a great way to understand what service profiles can do for you. To start +this generation process, you can use the `--tap` flag: + +```bash +linkerd viz profile -n emojivoto web-svc --tap deploy/web --tap-duration 10s +``` + +This generates a service profile from the traffic observed to +`deploy/web` over the 10 seconds that this command is running. The resulting service +profile can be piped directly to `kubectl apply` and will be installed into the +service's namespace. + +## Template + +Alongside all the methods for automatically creating service profiles, you can +get a template that allows you to add routes manually. To generate the template, +run: + +```bash +linkerd profile -n emojivoto web-svc --template +``` + +This generates a service profile template with examples that can be manually +updated. Once you've updated the service profile, use `kubectl apply` to get it +installed into the service's namespace on your cluster. diff --git a/linkerd.io/content/2.16/tasks/traffic-shifting.md b/linkerd.io/content/2.16/tasks/traffic-shifting.md new file mode 100644 index 0000000000..a31cacb76e --- /dev/null +++ b/linkerd.io/content/2.16/tasks/traffic-shifting.md @@ -0,0 +1,247 @@ ++++ +title = "Traffic Shifting" +description = "Dynamically split and shift traffic between backends" ++++ + +Traffic splitting and shifting are powerful features that enable operators to +dynamically shift traffic to different backend Services. This can be used to +implement A/B experiments, red/green deploys, canary rollouts, +[fault injection](../fault-injection/) and more. + +Linkerd supports two different ways to configure traffic shifting: you can +use the [Linkerd SMI extension](../linkerd-smi/) and +[TrafficSplit](https://github.com/servicemeshinterface/smi-spec/blob/main/apis/traffic-split/v1alpha2/traffic-split.md/) +resources, or you can use [HTTPRoute](../../features/httproute/) resources which +Linkerd natively supports. While certain integrations such as +[Flagger](../flagger/) rely on the SMI and `TrafficSplit` approach, using +`HTTPRoute` is the preferred method going forward. + +{{< trylpt >}} + +## Prerequisites + +To use this guide, you'll need a Kubernetes cluster running: + +- Linkerd and Linkerd-Viz. If you haven't installed these yet, follow the + [Installing Linkerd Guide](../install/). + +## Set up the demo + +We will set up a minimal demo which involves a load generator and two backends +called `v1` and `v2` respectively. You could imagine that these represent two +different versions of a service and that we would like to test `v2` on a small +sample of traffic before rolling it out completely. + +For load generation we'll use +[Slow-Cooker](https://github.com/BuoyantIO/slow_cooker) +and for the backends we'll use [BB](https://github.com/BuoyantIO/bb). + +To add these components to your cluster and include them in the Linkerd +[data plane](../../reference/architecture/#data-plane), run: + +```bash +cat < linkerd viz -n traffic-shift-demo stat --from deploy/slow-cooker deploy +NAME MESHED SUCCESS RPS LATENCY_P50 LATENCY_P95 LATENCY_P99 TCP_CONN +v1 1/1 100.00% 10.1rps 1ms 1ms 8ms 1 +``` + +## Shifting Traffic + +Now let's create an HTTPRoute and split 10% of traffic to the v2 backend: + +```bash +cat < linkerd viz -n traffic-shift-demo stat --from deploy/slow-cooker deploy +NAME MESHED SUCCESS RPS LATENCY_P50 LATENCY_P95 LATENCY_P99 TCP_CONN +v1 1/1 100.00% 9.0rps 1ms 1ms 1ms 1 +v2 1/1 100.00% 1.0rps 1ms 1ms 1ms 1 +``` + +From here, we can continue to tweak the weights in the HTTPRoute to gradually +shift traffic over to the `bb-v2` Service or shift things back if it's looking +dicey. To conclude this demo, let's shift 100% of traffic over to `bb-v2`: + +```bash +cat < linkerd viz -n traffic-shift-demo stat --from deploy/slow-cooker deploy +NAME MESHED SUCCESS RPS LATENCY_P50 LATENCY_P95 LATENCY_P99 TCP_CONN +v1 1/1 - - - - - - +v2 1/1 100.00% 10.0rps 1ms 1ms 2ms 1 +``` diff --git a/linkerd.io/content/2.16/tasks/troubleshooting.md b/linkerd.io/content/2.16/tasks/troubleshooting.md new file mode 100644 index 0000000000..d07d4e89b0 --- /dev/null +++ b/linkerd.io/content/2.16/tasks/troubleshooting.md @@ -0,0 +1,2321 @@ ++++ +title = "Troubleshooting" +description = "Troubleshoot issues with your Linkerd installation." ++++ + +This section provides resolution steps for common problems reported with the +`linkerd check` command. + +## The "pre-kubernetes-cluster-setup" checks {#pre-k8s-cluster} + +These checks only run when the `--pre` flag is set. This flag is intended for +use prior to running `linkerd install`, to verify your cluster is prepared for +installation. + +### √ control plane namespace does not already exist {#pre-ns} + +Example failure: + +```bash +× control plane namespace does not already exist + The "linkerd" namespace already exists +``` + +By default `linkerd install` will create a `linkerd` namespace. Prior to +installation, that namespace should not exist. To check with a different +namespace, run: + +```bash +linkerd check --pre --linkerd-namespace linkerd-test +``` + +### √ can create Kubernetes resources {#pre-k8s-cluster-k8s} + +The subsequent checks in this section validate whether you have permission to +create the Kubernetes resources required for Linkerd installation, specifically: + +```bash +√ can create Namespaces +√ can create ClusterRoles +√ can create ClusterRoleBindings +√ can create CustomResourceDefinitions +``` + +## The "pre-kubernetes-setup" checks {#pre-k8s} + +These checks only run when the `--pre` flag is set This flag is intended for use +prior to running `linkerd install`, to verify you have the correct RBAC +permissions to install Linkerd. + +```bash +√ can create Namespaces +√ can create ClusterRoles +√ can create ClusterRoleBindings +√ can create CustomResourceDefinitions +√ can create PodSecurityPolicies +√ can create ServiceAccounts +√ can create Services +√ can create Deployments +√ can create ConfigMaps +``` + +### √ no clock skew detected {#pre-k8s-clock-skew} + +This check detects any differences between the system running the +`linkerd install` command and the Kubernetes nodes (known as clock skew). Having +a substantial clock skew can cause TLS validation problems because a node may +determine that a TLS certificate is expired when it should not be, or vice +versa. + +Linkerd version edge-20.3.4 and later check for a difference of at most 5 +minutes and older versions of Linkerd (including stable-2.7) check for a +difference of at most 1 minute. If your Kubernetes node heartbeat interval is +longer than this difference, you may experience false positives of this check. +The default node heartbeat interval was increased to 5 minutes in Kubernetes +1.17 meaning that users running Linkerd versions prior to edge-20.3.4 on +Kubernetes 1.17 or later are likely to experience these false positives. If this +is the case, you can upgrade to Linkerd edge-20.3.4 or later. If you choose to +ignore this error, we strongly recommend that you verify that your system clocks +are consistent. + +## The "pre-kubernetes-capability" checks {#pre-k8s-capability} + +These checks only run when the `--pre` flag is set. This flag is intended for +use prior to running `linkerd install`, to verify you have the correct +Kubernetes capability permissions to install Linkerd. + +## The "pre-linkerd-global-resources" checks {#pre-l5d-existence} + +These checks only run when the `--pre` flag is set. This flag is intended for +use prior to running `linkerd install`, to verify you have not already installed +the Linkerd control plane. + +```bash +√ no ClusterRoles exist +√ no ClusterRoleBindings exist +√ no CustomResourceDefinitions exist +√ no MutatingWebhookConfigurations exist +√ no ValidatingWebhookConfigurations exist +√ no PodSecurityPolicies exist +``` + +## The "pre-kubernetes-single-namespace-setup" checks {#pre-single} + +If you do not expect to have the permission for a full cluster install, try the +`--single-namespace` flag, which validates if Linkerd can be installed in a +single namespace, with limited cluster access: + +```bash +linkerd check --pre --single-namespace +``` + +## The "kubernetes-api" checks {#k8s-api} + +Example failures: + +```bash +× can initialize the client + error configuring Kubernetes API client: stat badconfig: no such file or directory +× can query the Kubernetes API + Get https://8.8.8.8/version: dial tcp 8.8.8.8:443: i/o timeout +``` + +Ensure that your system is configured to connect to a Kubernetes cluster. +Validate that the `KUBECONFIG` environment variable is set properly, and/or +`~/.kube/config` points to a valid cluster. + +For more information see these pages in the Kubernetes Documentation: + +- [Accessing Clusters](https://kubernetes.io/docs/tasks/access-application-cluster/access-cluster/) +- [Configure Access to Multiple Clusters](https://kubernetes.io/docs/tasks/access-application-cluster/configure-access-multiple-clusters/) + +Also verify that these command works: + +```bash +kubectl config view +kubectl cluster-info +kubectl version +``` + +Another example failure: + +```bash +✘ can query the Kubernetes API + Get REDACTED/version: x509: certificate signed by unknown authority +``` + +As an (unsafe) workaround to this, you may try: + +```bash +kubectl config set-cluster ${KUBE_CONTEXT} --insecure-skip-tls-verify=true \ + --server=${KUBE_CONTEXT} +``` + +## The "kubernetes-version" checks + +### √ is running the minimum Kubernetes API version {#k8s-version} + +Example failure: + +```bash +× is running the minimum Kubernetes API version + Kubernetes is on version [1.7.16], but version [1.13.0] or more recent is required +``` + +Linkerd requires at least version `1.13.0`. Verify your cluster version with: + +```bash +kubectl version +``` + +### √ is running the minimum kubectl version {#kubectl-version} + +Example failure: + +```bash +× is running the minimum kubectl version + kubectl is on version [1.9.1], but version [1.13.0] or more recent is required + see https://linkerd.io/checks/#kubectl-version for hints +``` + +Linkerd requires at least version `1.13.0`. Verify your kubectl version with: + +```bash +kubectl version --client --short +``` + +To fix please update kubectl version. + +For more information on upgrading Kubernetes, see the page in the Kubernetes +Documentation. + +## The "linkerd-config" checks {#l5d-config} + +This category of checks validates that Linkerd's cluster-wide RBAC and related +resources have been installed. + +### √ control plane Namespace exists {#l5d-existence-ns} + +Example failure: + +```bash +× control plane Namespace exists + The "foo" namespace does not exist + see https://linkerd.io/checks/#l5d-existence-ns for hints +``` + +Ensure the Linkerd control plane namespace exists: + +```bash +kubectl get ns +``` + +The default control plane namespace is `linkerd`. If you installed Linkerd into +a different namespace, specify that in your check command: + +```bash +linkerd check --linkerd-namespace linkerdtest +``` + +### √ control plane ClusterRoles exist {#l5d-existence-cr} + +Example failure: + +```bash +× control plane ClusterRoles exist + missing ClusterRoles: linkerd-linkerd-identity + see https://linkerd.io/checks/#l5d-existence-cr for hints +``` + +Ensure the Linkerd ClusterRoles exist: + +```bash +$ kubectl get clusterroles | grep linkerd +linkerd-linkerd-destination 9d +linkerd-linkerd-identity 9d +linkerd-linkerd-proxy-injector 9d +linkerd-policy 9d +``` + +Also ensure you have permission to create ClusterRoles: + +```bash +$ kubectl auth can-i create clusterroles +yes +``` + +### √ control plane ClusterRoleBindings exist {#l5d-existence-crb} + +Example failure: + +```bash +× control plane ClusterRoleBindings exist + missing ClusterRoleBindings: linkerd-linkerd-identity + see https://linkerd.io/checks/#l5d-existence-crb for hints +``` + +Ensure the Linkerd ClusterRoleBindings exist: + +```bash +$ kubectl get clusterrolebindings | grep linkerd +linkerd-linkerd-destination 9d +linkerd-linkerd-identity 9d +linkerd-linkerd-proxy-injector 9d +linkerd-destination-policy 9d +``` + +Also ensure you have permission to create ClusterRoleBindings: + +```bash +$ kubectl auth can-i create clusterrolebindings +yes +``` + +### √ control plane ServiceAccounts exist {#l5d-existence-sa} + +Example failure: + +```bash +× control plane ServiceAccounts exist + missing ServiceAccounts: linkerd-identity + see https://linkerd.io/checks/#l5d-existence-sa for hints +``` + +Ensure the Linkerd ServiceAccounts exist: + +```bash +$ kubectl -n linkerd get serviceaccounts +NAME SECRETS AGE +default 1 14m +linkerd-destination 1 14m +linkerd-heartbeat 1 14m +linkerd-identity 1 14m +linkerd-proxy-injector 1 14m +``` + +Also ensure you have permission to create ServiceAccounts in the Linkerd +namespace: + +```bash +$ kubectl -n linkerd auth can-i create serviceaccounts +yes +``` + +### √ control plane CustomResourceDefinitions exist {#l5d-existence-crd} + +Example failure: + +```bash +× control plane CustomResourceDefinitions exist + missing CustomResourceDefinitions: serviceprofiles.linkerd.io + see https://linkerd.io/checks/#l5d-existence-crd for hints +``` + +Ensure the Linkerd CRD exists: + +```bash +$ kubectl get customresourcedefinitions +NAME CREATED AT +serviceprofiles.linkerd.io 2019-04-25T21:47:31Z +``` + +Also ensure you have permission to create CRDs: + +```bash +$ kubectl auth can-i create customresourcedefinitions +yes +``` + +### √ control plane MutatingWebhookConfigurations exist {#l5d-existence-mwc} + +Example failure: + +```bash +× control plane MutatingWebhookConfigurations exist + missing MutatingWebhookConfigurations: linkerd-proxy-injector-webhook-config + see https://linkerd.io/checks/#l5d-existence-mwc for hints +``` + +Ensure the Linkerd MutatingWebhookConfigurations exists: + +```bash +$ kubectl get mutatingwebhookconfigurations | grep linkerd +linkerd-proxy-injector-webhook-config 2019-07-01T13:13:26Z +``` + +Also ensure you have permission to create MutatingWebhookConfigurations: + +```bash +$ kubectl auth can-i create mutatingwebhookconfigurations +yes +``` + +### √ control plane ValidatingWebhookConfigurations exist {#l5d-existence-vwc} + +Example failure: + +```bash +× control plane ValidatingWebhookConfigurations exist + missing ValidatingWebhookConfigurations: linkerd-sp-validator-webhook-config + see https://linkerd.io/checks/#l5d-existence-vwc for hints +``` + +Ensure the Linkerd ValidatingWebhookConfiguration exists: + +```bash +$ kubectl get validatingwebhookconfigurations | grep linkerd +linkerd-sp-validator-webhook-config 2019-07-01T13:13:26Z +``` + +Also ensure you have permission to create ValidatingWebhookConfigurations: + +```bash +$ kubectl auth can-i create validatingwebhookconfigurations +yes +``` + +### √ proxy-init container runs as root if docker container runtime is used {#l5d-proxy-init-run-as-root} + +Example failure: + +```bash +× proxy-init container runs as root user if docker container runtime is used + there are nodes using the docker container runtime and proxy-init container must run as root user. +try installing linkerd via --set proxyInit.runAsRoot=true + see https://linkerd.io/2.11/checks/#l5d-proxy-init-run-as-root for hints +``` + +Kubernetes nodes running with docker as the container runtime +([CRI](https://kubernetes.io/docs/concepts/architecture/cri/)) require the init +container to run as root for iptables. + +Newer distributions of managed k8s use containerd where this is not an issue. + +Without root in the init container you might get errors such as: + +```bash +time="2021-11-15T04:41:31Z" level=info msg="iptables-save -t nat" +Error: exit status 1 +time="2021-11-15T04:41:31Z" level=info msg="iptables-save v1.8.7 (legacy): Cannot initialize: Permission denied (you must be root)\n\n" +``` + +See [linkerd/linkerd2#7283](https://github.com/linkerd/linkerd2/issues/7283) and +[linkerd/linkerd2#7308](https://github.com/linkerd/linkerd2/issues/7308) for +further details. + +## The "linkerd-existence" checks {#l5d-existence} + +### √ 'linkerd-config' config map exists {#l5d-existence-linkerd-config} + +Example failure: + +```bash +× 'linkerd-config' config map exists + missing ConfigMaps: linkerd-config + see https://linkerd.io/checks/#l5d-existence-linkerd-config for hints +``` + +Ensure the Linkerd ConfigMap exists: + +```bash +$ kubectl -n linkerd get configmap/linkerd-config +NAME DATA AGE +linkerd-config 3 61m +``` + +Also ensure you have permission to create ConfigMaps: + +```bash +$ kubectl -n linkerd auth can-i create configmap +yes +``` + +### √ control plane replica sets are ready {#l5d-existence-replicasets} + +This failure occurs when one of Linkerd's ReplicaSets fails to schedule a pod. + +For more information, see the Kubernetes documentation on +[Failed Deployments](https://kubernetes.io/docs/concepts/workloads/controllers/deployment/#failed-deployment). + +### √ no unschedulable pods {#l5d-existence-unschedulable-pods} + +Example failure: + +```bash +× no unschedulable pods + linkerd-prometheus-6b668f774d-j8ncr: 0/1 nodes are available: 1 Insufficient cpu. + see https://linkerd.io/checks/#l5d-existence-unschedulable-pods for hints +``` + +For more information, see the Kubernetes documentation on the +[Unschedulable Pod Condition](https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle/#pod-conditions). + +## The "linkerd-identity" checks {#l5d-identity} + +### √ certificate config is valid {#l5d-identity-cert-config-valid} + +Example failures: + +```bash +× certificate config is valid + key ca.crt containing the trust anchors needs to exist in secret linkerd-identity-issuer if --identity-external-issuer=true + see https://linkerd.io/checks/#l5d-identity-cert-config-valid +``` + +```bash +× certificate config is valid + key crt.pem containing the issuer certificate needs to exist in secret linkerd-identity-issuer if --identity-external-issuer=false + see https://linkerd.io/checks/#l5d-identity-cert-config-valid +``` + +Ensure that your `linkerd-identity-issuer` secret contains the correct keys for +the `scheme` that Linkerd is configured with. If the scheme is +`kubernetes.io/tls` your secret should contain the `tls.crt`, `tls.key` and +`ca.crt` keys. Alternatively if your scheme is `linkerd.io/tls`, the required +keys are `crt.pem` and `key.pem`. + +### √ trust roots are using supported crypto algorithm {#l5d-identity-trustAnchors-use-supported-crypto} + +Example failure: + +```bash +× trust roots are using supported crypto algorithm + Invalid roots: + * 165223702412626077778653586125774349756 identity.linkerd.cluster.local must use P-256 curve for public key, instead P-521 was used + see https://linkerd.io/checks/#l5d-identity-trustAnchors-use-supported-crypto +``` + +You need to ensure that all of your roots use ECDSA P-256 for their public key +algorithm. + +### √ trust roots are within their validity period {#l5d-identity-trustAnchors-are-time-valid} + +Example failure: + +```bash +× trust roots are within their validity period + Invalid roots: + * 199607941798581518463476688845828639279 identity.linkerd.cluster.local not valid anymore. Expired on 2019-12-19T13:08:18Z + see https://linkerd.io/checks/#l5d-identity-trustAnchors-are-time-valid for hints +``` + +Failures of such nature indicate that your roots have expired. If that is the +case you will have to update both the root and issuer certificates at once. You +can follow the process outlined in +[Replacing Expired Certificates](../replacing_expired_certificates/) to get your +cluster back to a stable state. + +### √ trust roots are valid for at least 60 days {#l5d-identity-trustAnchors-not-expiring-soon} + +Example warnings: + +```bash +‼ trust roots are valid for at least 60 days + Roots expiring soon: + * 66509928892441932260491975092256847205 identity.linkerd.cluster.local will expire on 2019-12-19T13:30:57Z + see https://linkerd.io/checks/#l5d-identity-trustAnchors-not-expiring-soon for hints +``` + +This warning indicates that the expiry of some of your roots is approaching. In +order to address this problem without incurring downtime, you can follow the +process outlined in +[Rotating your identity certificates](../rotating_identity_certificates/). + +### √ issuer cert is using supported crypto algorithm {#l5d-identity-issuer-cert-uses-supported-crypto} + +Example failure: + +```bash +× issuer cert is using supported crypto algorithm + issuer certificate must use P-256 curve for public key, instead P-521 was used + see https://linkerd.io/checks/#5d-identity-issuer-cert-uses-supported-crypto for hints +``` + +You need to ensure that your issuer certificate uses ECDSA P-256 for its public +key algorithm. You can refer to +[Generating your own mTLS root certificates](../generate-certificates/#generating-the-certificates-with-step) +to see how you can generate certificates that will work with Linkerd. + +### √ issuer cert is within its validity period {#l5d-identity-issuer-cert-is-time-valid} + +Example failure: + +```bash +× issuer cert is within its validity period + issuer certificate is not valid anymore. Expired on 2019-12-19T13:35:49Z + see https://linkerd.io/checks/#l5d-identity-issuer-cert-is-time-valid +``` + +This failure indicates that your issuer certificate has expired. In order to +bring your cluster back to a valid state, follow the process outlined in +[Replacing Expired Certificates](../replacing_expired_certificates/). + +### √ issuer cert is valid for at least 60 days {#l5d-identity-issuer-cert-not-expiring-soon} + +Example warning: + +```bash +‼ issuer cert is valid for at least 60 days + issuer certificate will expire on 2019-12-19T13:35:49Z + see https://linkerd.io/checks/#l5d-identity-issuer-cert-not-expiring-soon for hints +``` + +This warning means that your issuer certificate is expiring soon. If you do not +rely on external certificate management solution such as `cert-manager`, you can +follow the process outlined in +[Rotating your identity certificates](../rotating_identity_certificates/) + +### √ issuer cert is issued by the trust root {#l5d-identity-issuer-cert-issued-by-trust-anchor} + +Example error: + +```bash +× issuer cert is issued by the trust root + x509: certificate signed by unknown authority (possibly because of "x509: ECDSA verification failure" while trying to verify candidate authority certificate "identity.linkerd.cluster.local") + see https://linkerd.io/checks/#l5d-identity-issuer-cert-issued-by-trust-anchor for hints +``` + +This error indicates that the issuer certificate that is in the +`linkerd-identity-issuer` secret cannot be verified with any of the roots that +Linkerd has been configured with. Using the CLI install process, this should +never happen. If Helm was used for installation or the issuer certificates are +managed by a malfunctioning certificate management solution, it is possible for +the cluster to end up in such an invalid state. At that point the best to do is +to use the upgrade command to update your certificates: + +```bash +linkerd upgrade \ + --identity-issuer-certificate-file=./your-new-issuer.crt \ + --identity-issuer-key-file=./your-new-issuer.key \ + --identity-trust-anchors-file=./your-new-roots.crt \ + --force | kubectl apply -f - +``` + +Once the upgrade process is over, the output of `linkerd check --proxy` should +be: + +```bash +linkerd-identity +---------------- +√ certificate config is valid +√ trust roots are using supported crypto algorithm +√ trust roots are within their validity period +√ trust roots are valid for at least 60 days +√ issuer cert is using supported crypto algorithm +√ issuer cert is within its validity period +√ issuer cert is valid for at least 60 days +√ issuer cert is issued by the trust root + +linkerd-identity-data-plane +--------------------------- +√ data plane proxies certificate match CA +``` + +## The "linkerd-webhooks-and-apisvc-tls" checks {#l5d-webhook} + +### √ proxy-injector webhook has valid cert {#l5d-proxy-injector-webhook-cert-valid} + +Example failure: + +```bash +× proxy-injector webhook has valid cert + secrets "linkerd-proxy-injector-tls" not found + see https://linkerd.io/checks/#l5d-proxy-injector-webhook-cert-valid for hints +``` + +Ensure that the `linkerd-proxy-injector-k8s-tls` secret exists and contains the +appropriate `tls.crt` and `tls.key` data entries. For versions before 2.9, the +secret is named `linkerd-proxy-injector-tls` and it should contain the `crt.pem` +and `key.pem` data entries. + +```bash +× proxy-injector webhook has valid cert + cert is not issued by the trust anchor: x509: certificate is valid for xxxxxx, not linkerd-proxy-injector.linkerd.svc + see https://linkerd.io/checks/#l5d-proxy-injector-webhook-cert-valid for hints +``` + +Here you need to make sure the certificate was issued specifically for +`linkerd-proxy-injector.linkerd.svc`. + +### √ proxy-injector cert is valid for at least 60 days {#l5d-proxy-injector-webhook-cert-not-expiring-soon} + +Example failure: + +```bash +‼ proxy-injector cert is valid for at least 60 days + certificate will expire on 2020-11-07T17:00:07Z + see https://linkerd.io/checks/#l5d-proxy-injector-webhook-cert-not-expiring-soon for hints +``` + +This warning indicates that the expiry of proxy-injnector webhook cert is +approaching. In order to address this problem without incurring downtime, you +can follow the process outlined in +[Automatically Rotating your webhook TLS Credentials](../automatically-rotating-webhook-tls-credentials/). + +### √ sp-validator webhook has valid cert {#l5d-sp-validator-webhook-cert-valid} + +Example failure: + +```bash +× sp-validator webhook has valid cert + secrets "linkerd-sp-validator-tls" not found + see https://linkerd.io/checks/#l5d-sp-validator-webhook-cert-valid for hints +``` + +Ensure that the `linkerd-sp-validator-k8s-tls` secret exists and contains the +appropriate `tls.crt` and `tls.key` data entries. For versions before 2.9, the +secret is named `linkerd-sp-validator-tls` and it should contain the `crt.pem` +and `key.pem` data entries. + +```bash +× sp-validator webhook has valid cert + cert is not issued by the trust anchor: x509: certificate is valid for xxxxxx, not linkerd-sp-validator.linkerd.svc + see https://linkerd.io/checks/#l5d-sp-validator-webhook-cert-valid for hints +``` + +Here you need to make sure the certificate was issued specifically for +`linkerd-sp-validator.linkerd.svc`. + +### √ sp-validator cert is valid for at least 60 days {#l5d-sp-validator-webhook-cert-not-expiring-soon} + +Example failure: + +```bash +‼ sp-validator cert is valid for at least 60 days + certificate will expire on 2020-11-07T17:00:07Z + see https://linkerd.io/checks/#l5d-sp-validator-webhook-cert-not-expiring-soon for hints +``` + +This warning indicates that the expiry of sp-validator webhook cert is +approaching. In order to address this problem without incurring downtime, you +can follow the process outlined in +[Automatically Rotating your webhook TLS Credentials](../automatically-rotating-webhook-tls-credentials/). + +### √ policy-validator webhook has valid cert {#l5d-policy-validator-webhook-cert-valid} + +Example failure: + +```bash +× policy-validator webhook has valid cert + secrets "linkerd-policy-validator-tls" not found + see https://linkerd.io/checks/#l5d-policy-validator-webhook-cert-valid for hints +``` + +Ensure that the `linkerd-policy-validator-k8s-tls` secret exists and contains +the appropriate `tls.crt` and `tls.key` data entries. + +```bash +× policy-validator webhook has valid cert + cert is not issued by the trust anchor: x509: certificate is valid for xxxxxx, not linkerd-policy-validator.linkerd.svc + see https://linkerd.io/checks/#l5d-policy-validator-webhook-cert-valid for hints +``` + +Here you need to make sure the certificate was issued specifically for +`linkerd-policy-validator.linkerd.svc`. + +### √ policy-validator cert is valid for at least 60 days {#l5d-policy-validator-webhook-cert-not-expiring-soon} + +Example failure: + +```bash +‼ policy-validator cert is valid for at least 60 days + certificate will expire on 2020-11-07T17:00:07Z + see https://linkerd.io/checks/#l5d-policy-validator-webhook-cert-not-expiring-soon for hints +``` + +This warning indicates that the expiry of policy-validator webhook cert is +approaching. In order to address this problem without incurring downtime, you +can follow the process outlined in +[Automatically Rotating your webhook TLS Credentials](../automatically-rotating-webhook-tls-credentials/). + +## The "linkerd-identity-data-plane" checks {#l5d-identity-data-plane} + +### √ data plane proxies certificate match CA {#l5d-identity-data-plane-proxies-certs-match-ca} + +Example warning: + +```bash +‼ data plane proxies certificate match CA + Some pods do not have the current trust bundle and must be restarted: + * emojivoto/emoji-d8d7d9c6b-8qwfx + * emojivoto/vote-bot-588499c9f6-zpwz6 + * emojivoto/voting-8599548fdc-6v64k + see https://linkerd.io/checks/{#l5d-identity-data-plane-proxies-certs-match-ca for hints +``` + +Observing this warning indicates that some of your meshed pods have proxies that +have stale certificates. This is most likely to happen during `upgrade` +operations that deal with cert rotation. In order to solve the problem you can +use `rollout restart` to restart the pods in question. That should cause them to +pick the correct certs from the `linkerd-config` configmap. When `upgrade` is +performed using the `--identity-trust-anchors-file` flag to modify the roots, +the Linkerd components are restarted. While this operation is in progress the +`check --proxy` command may output a warning, pertaining to the Linkerd +components: + +```bash +‼ data plane proxies certificate match CA + Some pods do not have the current trust bundle and must be restarted: + * linkerd/linkerd-sp-validator-75f9d96dc-rch4x + * linkerd-viz/tap-68d8bbf64-mpzgb + * linkerd-viz/web-849f74b7c6-qlhwc + see https://linkerd.io/checks/{#l5d-identity-data-plane-proxies-certs-match-ca for hints +``` + +If that is the case, simply wait for the `upgrade` operation to complete. The +stale pods should terminate and be replaced by new ones, configured with the +correct certificates. + +## The "linkerd-api" checks {#l5d-api} + +### √ control plane pods are ready {#l5d-api-control-ready} + +Example failure: + +```bash +× control plane pods are ready + No running pods for "linkerd-sp-validator" +``` + +Verify the state of the control plane pods with: + +```bash +$ kubectl -n linkerd get po +NAME READY STATUS RESTARTS AGE +linkerd-destination-5fd7b5d466-szgqm 2/2 Running 1 12m +linkerd-identity-54df78c479-hbh5m 2/2 Running 0 12m +linkerd-proxy-injector-67f8cf65f7-4tvt5 2/2 Running 1 12m +``` + +### √ cluster networks can be verified {#l5d-cluster-networks-verified} + +Example failure: + +```bash +‼ cluster networks can be verified + the following nodes do not expose a podCIDR: + node-0 + see https://linkerd.io/2/checks/#l5d-cluster-networks-verified for hints +``` + +Linkerd has a `clusterNetworks` setting which allows it to differentiate between +intra-cluster and egress traffic. Through each Node's `podCIDR` field, Linkerd +can verify that all possible Pod IPs are included in the `clusterNetworks` +setting. When a Node is missing the `podCIDR` field, Linkerd can not verify +this, and it's possible that the Node creates a Pod with an IP outside of +`clusterNetworks`; this may result in it not being meshed properly. + +Nodes are not required to expose a `podCIDR` field which is why this results in +a warning. Getting a Node to expose this field depends on the specific +distribution being used. + +### √ cluster networks contains all node podCIDRs {#l5d-cluster-networks-cidr} + +Example failure: + +```bash +× cluster networks contains all node podCIDRs + node has podCIDR(s) [10.244.0.0/24] which are not contained in the Linkerd clusterNetworks. + Try installing linkerd via --set clusterNetworks=10.244.0.0/24 + see https://linkerd.io/2/checks/#l5d-cluster-networks-cidr for hints +``` + +Linkerd has a `clusterNetworks` setting which allows it to differentiate between +intra-cluster and egress traffic. This warning indicates that the cluster has a +podCIDR which is not included in Linkerd's `clusterNetworks`. Traffic to pods in +this network may not be meshed properly. To remedy this, update the +`clusterNetworks` setting to include all pod networks in the cluster. + +### √ cluster networks contains all pods {#l5d-cluster-networks-pods} + +Example failures: + +```bash +× the Linkerd clusterNetworks [10.244.0.0/24] do not include pod default/foo (104.21.63.202) + see https://linkerd.io/2/checks/#l5d-cluster-networks-pods for hints +``` + +```bash +× the Linkerd clusterNetworks [10.244.0.0/24] do not include svc default/bar (10.96.217.194) + see https://linkerd.io/2/checks/#l5d-cluster-networks-pods for hints +``` + +Linkerd has a `clusterNetworks` setting which allows it to differentiate between +intra-cluster and egress traffic. This warning indicates that the cluster has a +pod or ClusterIP service which is not included in Linkerd's `clusterNetworks`. +Traffic to pods or services in this network may not be meshed properly. To +remedy this, update the `clusterNetworks` setting to include all pod and service +networks in the cluster. + +## The "linkerd-version" checks {#l5d-version} + +### √ can determine the latest version {#l5d-version-latest} + +Example failure: + +```bash +× can determine the latest version + Get https://versioncheck.linkerd.io/version.json?version=edge-19.1.2&uuid=test-uuid&source=cli: context deadline exceeded +``` + +Ensure you can connect to the Linkerd version check endpoint from the +environment the `linkerd` cli is running: + +```bash +$ curl "https://versioncheck.linkerd.io/version.json?version=edge-19.1.2&uuid=test-uuid&source=cli" +{"stable":"stable-2.1.0","edge":"edge-19.1.2"} +``` + +### √ cli is up-to-date {#l5d-version-cli} + +Example failures: + + + +**unsupported version channel** + + + +```bash +‼ cli is up-to-date + unsupported version channel: stable-2.14.10 +``` + +As of February 2024, the Linkerd project itself only produces [edge +release](/releases/) artifacts. For more details, read the [Releases and +Versions](/releases/) page. + + + +**is running version X but the latest version is Y** + + + +```bash +‼ cli is up-to-date + is running version 19.1.1 but the latest edge version is 19.1.2 +``` + +There is a newer version of the `linkerd` cli. See the page on +[Upgrading Linkerd](../../upgrade/). + +## The "control-plane-version" checks {#l5d-version-control} + +### √ control plane is up-to-date {#l5d-version-control-up-to-date} + +Example failures: + + + +**unsupported version channel** + + + +```bash +‼ control plane is up-to-date + unsupported version channel: stable-2.14.10 +``` + +As of February 2024, the Linkerd project itself only produces [edge +release](/releases/) artifacts. For more details, read the [Releases and +Versions](/releases/) page. + + + +**is running version X but the latest version is Y** + + + +```bash +‼ control plane is up-to-date + is running version 19.1.1 but the latest edge version is 19.1.2 +``` + +There is a newer version of the control plane. See the page on +[Upgrading Linkerd](../../upgrade/). + +### √ control plane and cli versions match {#l5d-version-control-mismatched} + +Example failure: + +```bash +‼ control plane and cli versions match + mismatched channels: running stable-2.1.0 but retrieved edge-19.1.2 +``` + +Your CLI and your control plane are running different types of releases. This +may cause issues. + +## The "linkerd-control-plane-proxy" checks {#linkerd-control-plane-proxy} + +### √ control plane proxies are healthy {#l5d-cp-proxy-healthy} + +This error indicates that the proxies running in the Linkerd control plane are +not healthy. Ensure that Linkerd has been installed with all of the correct +setting or re-install Linkerd as necessary. + +### √ control plane proxies are up-to-date {#l5d-cp-proxy-version} + +This warning indicates the proxies running in the Linkerd control plane are +running an old version. We recommend downloading the latest Linkerd release and +[Upgrading Linkerd](../../upgrade/). + +### √ control plane proxies and cli versions match {#l5d-cp-proxy-cli-version} + +This warning indicates that the proxies running in the Linkerd control plane are +running a different version from the Linkerd CLI. We recommend keeping this +versions in sync by updating either the CLI or the control plane as necessary. + +## The "linkerd-data-plane" checks {#l5d-data-plane} + +These checks only run when the `--proxy` flag is set. This flag is intended for +use after running `linkerd inject`, to verify the injected proxies are operating +normally. + +### √ data plane namespace exists {#l5d-data-plane-exists} + +Example failure: + +```bash +$ linkerd check --proxy --namespace foo +... +× data plane namespace exists + The "foo" namespace does not exist +``` + +Ensure the `--namespace` specified exists, or, omit the parameter to check all +namespaces. + +### √ data plane proxies are ready {#l5d-data-plane-ready} + +Example failure: + +```bash +× data plane proxies are ready + No "linkerd-proxy" containers found +``` + +Ensure you have injected the Linkerd proxy into your application via the +`linkerd inject` command. + +For more information on `linkerd inject`, see +[Step 5: Install the demo app](../../getting-started/#step-5-install-the-demo-app) +in our [Getting Started](../../getting-started/) guide. + +### √ data plane is up-to-date {#l5d-data-plane-version} + +Example failure: + +```bash +‼ data plane is up-to-date + linkerd/linkerd-prometheus-74d66f86f6-6t6dh: is running version 19.1.2 but the latest edge version is 19.1.3 +``` + +See the page on [Upgrading Linkerd](../../upgrade/). + +### √ data plane and cli versions match {#l5d-data-plane-cli-version} + +```bash +‼ data plane and cli versions match + linkerd/linkerd-identity-5f6c45d6d9-9hd9j: is running version 19.1.2 but the latest edge version is 19.1.3 +``` + +See the page on [Upgrading Linkerd](../../upgrade/). + +### √ data plane pod labels are configured correctly {#l5d-data-plane-pod-labels} + +Example failure: + +```bash +‼ data plane pod labels are configured correctly + Some labels on data plane pods should be annotations: + * emojivoto/voting-ff4c54b8d-tv9pp + linkerd.io/inject +``` + +`linkerd.io/inject`, `config.linkerd.io/*` or `config.alpha.linkerd.io/*` should +be annotations in order to take effect. + +### √ data plane service labels are configured correctly {#l5d-data-plane-services-labels} + +Example failure: + +```bash +‼ data plane service labels and annotations are configured correctly + Some labels on data plane services should be annotations: + * emojivoto/emoji-svc + config.linkerd.io/control-port +``` + +`config.linkerd.io/*` or `config.alpha.linkerd.io/*` should be annotations in +order to take effect. + +### √ data plane service annotations are configured correctly {#l5d-data-plane-services-annotations} + +Example failure: + +```bash +‼ data plane service annotations are configured correctly + Some annotations on data plane services should be labels: + * emojivoto/emoji-svc + mirror.linkerd.io/exported +``` + +`mirror.linkerd.io/exported` should be a label in order to take effect. + +### √ opaque ports are properly annotated {#linkerd-opaque-ports-definition} + +Example failure: + +```bash +× opaque ports are properly annotated + * service emoji-svc targets the opaque port 8080 through 8080; add 8080 to its config.linkerd.io/opaque-ports annotation + see https://linkerd.io/2/checks/#linkerd-opaque-ports-definition for hints +``` + +If a Pod marks a port as opaque by using the `config.linkerd.io/opaque-ports` +annotation, then any Service which targets that port must also use the +`config.linkerd.io/opaque-ports` annotation to mark that port as opaque. Having +a port marked as opaque on the Pod but not the Service (or vice versa) can cause +inconsistent behavior depending on if traffic is sent to the Pod directly (for +example with a headless Service) or through a ClusterIP Service. This error can +be remedied by adding the `config.linkerd.io/opaque-ports` annotation to both +the Pod and Service. See +[Protocol Detection](../../features/protocol-detection/) for more information. + +## The "linkerd-ha-checks" checks {#l5d-ha} + +These checks are ran if Linkerd has been installed in HA mode. + +### √ multiple replicas of control plane pods {#l5d-control-plane-replicas} + +Example warning: + +```bash +‼ multiple replicas of control plane pods + not enough replicas available for [linkerd-identity] + see https://linkerd.io/checks/#l5d-control-plane-replicas for hints +``` + +This happens when one of the control plane pods doesn't have at least two +replicas running. This is likely caused by insufficient node resources. + +### The "extensions" checks {#extensions} + +When any [Extensions](../extensions/) are installed, The Linkerd binary tries to +invoke `check --output json` on the extension binaries. It is important that the +extension binaries implement it. For more information, See +[Extension developer docs](https://github.com/linkerd/linkerd2/blob/main/EXTENSIONS.md) + +Example error: + +```bash +invalid extension check output from \"jaeger\" (JSON object expected) +``` + +Make sure that the extension binary implements `check --output json` which +returns the healthchecks in the +[expected json format](https://github.com/linkerd/linkerd2/blob/main/EXTENSIONS.md#linkerd-name-check). + +Example error: + +```bash +× Linkerd command jaeger exists +``` + +Make sure that relevant binary exists in `$PATH`. + +For more information about Linkerd extensions. See +[Extension developer docs](https://github.com/linkerd/linkerd2/blob/main/EXTENSIONS.md) + +## The "linkerd-cni-plugin" checks {#l5d-cni} + +These checks run if Linkerd has been installed with the `--linkerd-cni-enabled` +flag. Alternatively they can be run as part of the pre-checks by providing the +`--linkerd-cni-enabled` flag. Most of these checks verify that the required +resources are in place. If any of them are missing, you can use +`linkerd install-cni | kubectl apply -f -` to re-install them. + +### √ cni plugin ConfigMap exists {#cni-plugin-cm-exists} + +Example error: + +```bash +× cni plugin ConfigMap exists + configmaps "linkerd-cni-config" not found + see https://linkerd.io/checks/#cni-plugin-cm-exists for hints +``` + +Ensure that the linkerd-cni-config ConfigMap exists in the CNI namespace: + +```bash +$ kubectl get cm linkerd-cni-config -n linkerd-cni +NAME PRIV CAPS SELINUX RUNASUSER FSGROUP SUPGROUP READONLYROOTFS VOLUMES +linkerd-linkerd-cni-cni false RunAsAny RunAsAny RunAsAny RunAsAny false hostPath,secret +``` + +Also ensure you have permission to create ConfigMaps: + +```bash +$ kubectl auth can-i create ConfigMaps +yes +``` + +### √ cni plugin ClusterRole exist {#cni-plugin-cr-exists} + +Example error: + +```bash +× cni plugin ClusterRole exists + missing ClusterRole: linkerd-cni + see https://linkerd.io/checks/#cni-plugin-cr-exists for hints +``` + +Ensure that the cluster role exists: + +```bash +$ kubectl get clusterrole linkerd-cni +NAME AGE +linkerd-cni 54m +``` + +Also ensure you have permission to create ClusterRoles: + +```bash +$ kubectl auth can-i create ClusterRoles +yes +``` + +### √ cni plugin ClusterRoleBinding exist {#cni-plugin-crb-exists} + +Example error: + +```bash +× cni plugin ClusterRoleBinding exists + missing ClusterRoleBinding: linkerd-cni + see https://linkerd.io/checks/#cni-plugin-crb-exists for hints +``` + +Ensure that the cluster role binding exists: + +```bash +$ kubectl get clusterrolebinding linkerd-cni +NAME AGE +linkerd-cni 54m +``` + +Also ensure you have permission to create ClusterRoleBindings: + +```bash +$ kubectl auth can-i create ClusterRoleBindings +yes +``` + +### √ cni plugin ServiceAccount exists {#cni-plugin-sa-exists} + +Example error: + +```bash +× cni plugin ServiceAccount exists + missing ServiceAccount: linkerd-cni + see https://linkerd.io/checks/#cni-plugin-sa-exists for hints +``` + +Ensure that the CNI service account exists in the CNI namespace: + +```bash +$ kubectl get ServiceAccount linkerd-cni -n linkerd-cni +NAME SECRETS AGE +linkerd-cni 1 45m +``` + +Also ensure you have permission to create ServiceAccount: + +```bash +$ kubectl auth can-i create ServiceAccounts -n linkerd-cni +yes +``` + +### √ cni plugin DaemonSet exists {#cni-plugin-ds-exists} + +Example error: + +```bash +× cni plugin DaemonSet exists + missing DaemonSet: linkerd-cni + see https://linkerd.io/checks/#cni-plugin-ds-exists for hints +``` + +Ensure that the CNI daemonset exists in the CNI namespace: + +```bash +$ kubectl get ds -n linkerd-cni +NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE +linkerd-cni 1 1 1 1 1 beta.kubernetes.io/os=linux 14m +``` + +Also ensure you have permission to create DaemonSets: + +```bash +$ kubectl auth can-i create DaemonSets -n linkerd-cni +yes +``` + +### √ cni plugin pod is running on all nodes {#cni-plugin-ready} + +Example failure: + +```bash +‼ cni plugin pod is running on all nodes + number ready: 2, number scheduled: 3 + see https://linkerd.io/checks/#cni-plugin-ready +``` + +Ensure that all the CNI pods are running: + +```bash +$ kubectl get po -n linkerd-cn +NAME READY STATUS RESTARTS AGE +linkerd-cni-rzp2q 1/1 Running 0 9m20s +linkerd-cni-mf564 1/1 Running 0 9m22s +linkerd-cni-p5670 1/1 Running 0 9m25s +``` + +Ensure that all pods have finished the deployment of the CNI config and binary: + +```bash +$ kubectl logs linkerd-cni-rzp2q -n linkerd-cni +Wrote linkerd CNI binaries to /host/opt/cni/bin +Created CNI config /host/etc/cni/net.d/10-kindnet.conflist +Done configuring CNI. Sleep=true +``` + +## The "linkerd-multicluster checks {#l5d-multicluster} + +These checks run if the service mirroring controller has been installed. +Additionally they can be ran with `linkerd multicluster check`. Most of these +checks verify that the service mirroring controllers are working correctly along +with remote gateways. Furthermore the checks ensure that end to end TLS is +possible between paired clusters. + +### √ Link CRD exists {#l5d-multicluster-link-crd-exists} + +Example error: + +```bash +× Link CRD exists + multicluster.linkerd.io/Link CRD is missing + see https://linkerd.io/checks/#l5d-multicluster-link-crd-exists for hints +``` + +Make sure multicluster extension is correctly installed and that the +`links.multicluster.linkerd.io` CRD is present. + +```bash +$ kubectll get crds | grep multicluster +NAME CREATED AT +links.multicluster.linkerd.io 2021-03-10T09:58:10Z +``` + +### √ Link resources are valid {#l5d-multicluster-links-are-valid} + +Example error: + +```bash +× Link resources are valid + failed to parse Link east + see https://linkerd.io/checks/#l5d-multicluster-links-are-valid for hints +``` + +Make sure all the link objects are specified in the expected format. + +### √ remote cluster access credentials are valid {#l5d-smc-target-clusters-access} + +Example error: + +```bash +× remote cluster access credentials are valid + * secret [east/east-config]: could not find east-config secret + see https://linkerd.io/checks/#l5d-smc-target-clusters-access for hints +``` + +Make sure the relevant Kube-config with relevant permissions. for the specific +target cluster is present as a secret correctly + +### √ clusters share trust anchors {#l5d-multicluster-clusters-share-anchors} + +Example errors: + +```bash +× clusters share trust anchors + Problematic clusters: + * remote + see https://linkerd.io/checks/#l5d-multicluster-clusters-share-anchors for hints +``` + +The error above indicates that your trust anchors are not compatible. In order +to fix that you need to ensure that both your anchors contain identical sets of +certificates. + +```bash +× clusters share trust anchors + Problematic clusters: + * remote: cannot parse trust anchors + see https://linkerd.io/checks/#l5d-multicluster-clusters-share-anchors for hints +``` + +Such an error indicates that there is a problem with your anchors on the cluster +named `remote` You need to make sure the identity config aspect of your Linkerd +installation on the `remote` cluster is ok. You can run `check` against the +remote cluster to verify that: + +```bash +linkerd --context=remote check +``` + +### √ service mirror controller has required permissions {#l5d-multicluster-source-rbac-correct} + +Example error: + +```bash +× service mirror controller has required permissions + missing Service mirror ClusterRole linkerd-service-mirror-access-local-resources: unexpected verbs expected create,delete,get,list,update,watch, got create,delete,get,update,watch + see https://linkerd.io/checks/#l5d-multicluster-source-rbac-correct for hints +``` + +This error indicates that the local RBAC permissions of the service mirror +service account are not correct. In order to ensure that you have the correct +verbs and resources you can inspect your ClusterRole and Role object and look at +the rules section. + +Expected rules for `linkerd-service-mirror-access-local-resources` cluster role: + +```bash +$ kubectl --context=local get clusterrole linkerd-service-mirror-access-local-resources -o yaml +kind: ClusterRole +metadata: + labels: + linkerd.io/control-plane-component: linkerd-service-mirror + name: linkerd-service-mirror-access-local-resources +rules: +- apiGroups: + - "" + resources: + - endpoints + - services + verbs: + - list + - get + - watch + - create + - delete + - update +- apiGroups: + - "" + resources: + - namespaces + verbs: + - create + - list + - get + - watch +``` + +Expected rules for `linkerd-service-mirror-read-remote-creds` role: + +```bash +$ kubectl --context=local get role linkerd-service-mirror-read-remote-creds -n linkerd-multicluster -o yaml +kind: Role +metadata: + labels: + linkerd.io/control-plane-component: linkerd-service-mirror + name: linkerd-service-mirror-read-remote-creds + namespace: linkerd-multicluster + rules: +- apiGroups: + - "" + resources: + - secrets + verbs: + - list + - get + - watch +``` + +### √ service mirror controllers are running {#l5d-multicluster-service-mirror-running} + +Example error: + +```bash +× service mirror controllers are running + Service mirror controller is not present + see https://linkerd.io/checks/#l5d-multicluster-service-mirror-running for hints +``` + +Note, it takes a little bit for pods to be scheduled, images to be pulled and +everything to start up. If this is a permanent error, you'll want to validate +the state of the controller pod with: + +```bash +$ kubectl --all-namespaces get po --selector linkerd.io/control-plane-component=linkerd-service-mirror +NAME READY STATUS RESTARTS AGE +linkerd-service-mirror-7bb8ff5967-zg265 2/2 Running 0 50m +``` + +### √ all gateway mirrors are healthy {#l5d-multicluster-gateways-endpoints} + +Example errors: + +```bash +‼ all gateway mirrors are healthy + Some gateway mirrors do not have endpoints: + linkerd-gateway-gke.linkerd-multicluster mirrored from cluster [gke] + see https://linkerd.io/checks/#l5d-multicluster-gateways-endpoints for hints +``` + +The error above indicates that some gateway mirror services in the source +cluster do not have associated endpoints resources. These endpoints are created +by the Linkerd service mirror controller on the source cluster whenever a link +is established with a target cluster. + +Such an error indicates that there could be a problem with the creation of the +resources by the service mirror controller or the external IP of the gateway +service in target cluster. + +### √ all mirror services have endpoints {#l5d-multicluster-services-endpoints} + +Example errors: + +```bash +‼ all mirror services have endpoints + Some mirror services do not have endpoints: + voting-svc-gke.emojivoto mirrored from cluster [gke] (gateway: [linkerd-multicluster/linkerd-gateway]) + see https://linkerd.io/checks/#l5d-multicluster-services-endpoints for hints +``` + +The error above indicates that some mirror services in the source cluster do not +have associated endpoints resources. These endpoints are created by the Linkerd +service mirror controller when creating a mirror service with endpoints values +as the remote gateway's external IP. + +Such an error indicates that there could be a problem with the creation of the +mirror resources by the service mirror controller or the mirror gateway service +in the source cluster or the external IP of the gateway service in target +cluster. + +### √ all mirror services are part of a Link {#l5d-multicluster-orphaned-services} + +Example errors: + +```bash +‼ all mirror services are part of a Link + mirror service voting-east.emojivoto is not part of any Link + see https://linkerd.io/checks/#l5d-multicluster-orphaned-services for hints +``` + +The error above indicates that some mirror services in the source cluster do not +have associated link. These mirror services are created by the Linkerd service +mirror controller when a remote service is marked to be mirrored. + +Make sure services are marked to be mirrored correctly at remote, and delete if +there are any unnecessary ones. + +### √ multicluster extension proxies are healthy {#l5d-multicluster-proxy-healthy} + +This error indicates that the proxies running in the multicluster extension are +not healthy. Ensure that linkerd-multicluster has been installed with all of the +correct setting or re-install as necessary. + +### √ multicluster extension proxies are up-to-date {#l5d-multicluster-proxy-cp-version} + +This warning indicates the proxies running in the multicluster extension are +running an old version. We recommend downloading the latest linkerd-multicluster +and upgrading. + +### √ multicluster extension proxies and cli versions match {#l5d-multicluster-proxy-cli-version} + +This warning indicates that the proxies running in the multicluster extension +are running a different version from the Linkerd CLI. We recommend keeping this +versions in sync by updating either the CLI or linkerd-multicluster as +necessary. + +## The "linkerd-viz" checks {#l5d-viz} + +These checks only run when the `linkerd-viz` extension is installed. This check +is intended to verify the installation of linkerd-viz extension which comprises +of `tap`, `web`, `metrics-api` and optional `grafana` and `prometheus` instances +along with `tap-injector` which injects the specific tap configuration to the +proxies. + +### √ linkerd-viz Namespace exists {#l5d-viz-ns-exists} + +This is the basic check used to verify if the linkerd-viz extension namespace is +installed or not. The extension can be installed by running the following +command: + +```bash +linkerd viz install | kubectl apply -f - +``` + +The installation can be configured by using the `--set`, `--values`, +`--set-string` and `--set-file` flags. See +[Linkerd Viz Readme](https://www.github.com/linkerd/linkerd2/tree/main/viz/charts/linkerd-viz/README.md) +for a full list of configurable fields. + +### √ linkerd-viz ClusterRoles exist {#l5d-viz-cr-exists} + +Example failure: + +```bash +× linkerd-viz ClusterRoles exist + missing ClusterRoles: linkerd-linkerd-viz-metrics-api + see https://linkerd.io/checks/#l5d-viz-cr-exists for hints +``` + +Ensure the linkerd-viz extension ClusterRoles exist: + +```bash +$ kubectl get clusterroles | grep linkerd-viz +linkerd-linkerd-viz-metrics-api 2021-01-26T18:02:17Z +linkerd-linkerd-viz-prometheus 2021-01-26T18:02:17Z +linkerd-linkerd-viz-tap 2021-01-26T18:02:17Z +linkerd-linkerd-viz-tap-admin 2021-01-26T18:02:17Z +linkerd-linkerd-viz-web-check 2021-01-26T18:02:18Z +``` + +Also ensure you have permission to create ClusterRoles: + +```bash +$ kubectl auth can-i create clusterroles +yes +``` + +### √ linkerd-viz ClusterRoleBindings exist {#l5d-viz-crb-exists} + +Example failure: + +```bash +× linkerd-viz ClusterRoleBindings exist + missing ClusterRoleBindings: linkerd-linkerd-viz-metrics-api + see https://linkerd.io/checks/#l5d-viz-crb-exists for hints +``` + +Ensure the linkerd-viz extension ClusterRoleBindings exist: + +```bash +$ kubectl get clusterrolebindings | grep linkerd-viz +linkerd-linkerd-viz-metrics-api ClusterRole/linkerd-linkerd-viz-metrics-api 18h +linkerd-linkerd-viz-prometheus ClusterRole/linkerd-linkerd-viz-prometheus 18h +linkerd-linkerd-viz-tap ClusterRole/linkerd-linkerd-viz-tap 18h +linkerd-linkerd-viz-tap-auth-delegator ClusterRole/system:auth-delegator 18h +linkerd-linkerd-viz-web-admin ClusterRole/linkerd-linkerd-viz-tap-admin 18h +linkerd-linkerd-viz-web-check ClusterRole/linkerd-linkerd-viz-web-check 18h +``` + +Also ensure you have permission to create ClusterRoleBindings: + +```bash +$ kubectl auth can-i create clusterrolebindings +yes +``` + +### √ viz extension proxies are healthy {#l5d-viz-proxy-healthy} + +This error indicates that the proxies running in the viz extension are not +healthy. Ensure that linkerd-viz has been installed with all of the correct +setting or re-install as necessary. + +### √ viz extension proxies are up-to-date {#l5d-viz-proxy-cp-version} + +This warning indicates the proxies running in the viz extension are running an +old version. We recommend downloading the latest linkerd-viz and upgrading. + +### √ viz extension proxies and cli versions match {#l5d-viz-proxy-cli-version} + +This warning indicates that the proxies running in the viz extension are running +a different version from the Linkerd CLI. We recommend keeping this versions in +sync by updating either the CLI or linkerd-viz as necessary. + +### √ tap API server has valid cert {#l5d-tap-cert-valid} + +Example failure: + +```bash +× tap API server has valid cert + secrets "tap-k8s-tls" not found + see https://linkerd.io/checks/#l5d-tap-cert-valid for hints +``` + +Ensure that the `tap-k8s-tls` secret exists and contains the appropriate +`tls.crt` and `tls.key` data entries. For versions before 2.9, the secret is +named `linkerd-tap-tls` and it should contain the `crt.pem` and `key.pem` data +entries. + +```bash +× tap API server has valid cert + cert is not issued by the trust anchor: x509: certificate is valid for xxxxxx, not tap.linkerd-viz.svc + see https://linkerd.io/checks/#l5d-tap-cert-valid for hints +``` + +Here you need to make sure the certificate was issued specifically for +`tap.linkerd-viz.svc`. + +### √ tap API server cert is valid for at least 60 days {#l5d-tap-cert-not-expiring-soon} + +Example failure: + +```bash +‼ tap API server cert is valid for at least 60 days + certificate will expire on 2020-11-07T17:00:07Z + see https://linkerd.io/checks/#l5d-webhook-cert-not-expiring-soon for hints +``` + +This warning indicates that the expiry of the tap API Server webhook cert is +approaching. In order to address this problem without incurring downtime, you +can follow the process outlined in +[Automatically Rotating your webhook TLS Credentials](../automatically-rotating-webhook-tls-credentials/). + +### √ tap api service is running {#l5d-tap-api} + +Example failure: + +```bash +× FailedDiscoveryCheck: no response from https://10.233.31.133:443: Get https://10.233.31.133:443: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers) +``` + +tap uses the +[kubernetes Aggregated Api-Server model](https://kubernetes.io/docs/concepts/extend-kubernetes/api-extension/apiserver-aggregation/) +to allow users to have k8s RBAC on top. This model has the following specific +requirements in the cluster: + +- tap Server must be + [reachable from kube-apiserver](https://kubernetes.io/docs/concepts/architecture/master-node-communication/#master-to-cluster) +- The kube-apiserver must be correctly configured to + [enable an aggregation layer](https://kubernetes.io/docs/tasks/access-kubernetes-api/configure-aggregation-layer/) + +### √ linkerd-viz pods are injected {#l5d-viz-pods-injection} + +```bash +× linkerd-viz extension pods are injected + could not find proxy container for tap-59f5595fc7-ttndp pod + see https://linkerd.io/checks/#l5d-viz-pods-injection for hints +``` + +Ensure all the linkerd-viz pods are injected + +```bash +$ kubectl -n linkerd-viz get pods +NAME READY STATUS RESTARTS AGE +grafana-68cddd7cc8-nrv4h 2/2 Running 3 18h +metrics-api-77f684f7c7-hnw8r 2/2 Running 2 18h +prometheus-5f6898ff8b-s6rjc 2/2 Running 2 18h +tap-59f5595fc7-ttndp 2/2 Running 2 18h +web-78d6588d4-pn299 2/2 Running 2 18h +tap-injector-566f7ff8df-vpcwc 2/2 Running 2 18h +``` + +Make sure that the `proxy-injector` is working correctly by running +`linkerd check` + +### √ viz extension pods are running {#l5d-viz-pods-running} + +```bash +× viz extension pods are running + container linkerd-proxy in pod tap-59f5595fc7-ttndp is not ready + see https://linkerd.io/checks/#l5d-viz-pods-running for hints +``` + +Ensure all the linkerd-viz pods are running with 2/2 + +```bash +$ kubectl -n linkerd-viz get pods +NAME READY STATUS RESTARTS AGE +grafana-68cddd7cc8-nrv4h 2/2 Running 3 18h +metrics-api-77f684f7c7-hnw8r 2/2 Running 2 18h +prometheus-5f6898ff8b-s6rjc 2/2 Running 2 18h +tap-59f5595fc7-ttndp 2/2 Running 2 18h +web-78d6588d4-pn299 2/2 Running 2 18h +tap-injector-566f7ff8df-vpcwc 2/2 Running 2 18h +``` + +Make sure that the `proxy-injector` is working correctly by running +`linkerd check` + +### √ prometheus is installed and configured correctly {#l5d-viz-prometheus} + +```bash +× prometheus is installed and configured correctly + missing ClusterRoles: linkerd-linkerd-viz-prometheus + see https://linkerd.io/checks/#l5d-viz-cr-exists for hints +``` + +Ensure all the prometheus related resources are present and running correctly. + +```bash +❯ kubectl -n linkerd-viz get deploy,cm | grep prometheus +deployment.apps/prometheus 1/1 1 1 3m18s +configmap/prometheus-config 1 3m18s +❯ kubectl get clusterRoleBindings | grep prometheus +linkerd-linkerd-viz-prometheus ClusterRole/linkerd-linkerd-viz-prometheus 3m37s +❯ kubectl get clusterRoles | grep prometheus +linkerd-linkerd-viz-prometheus 2021-02-26T06:03:11Zh +``` + +### √ can initialize the client {#l5d-viz-existence-client} + +Example failure: + +```bash +× can initialize the client + Failed to get deploy for pod metrics-api-77f684f7c7-hnw8r: not running +``` + +Verify that the metrics API pod is running correctly + +```bash +❯ kubectl -n linkerd-viz get pods +NAME READY STATUS RESTARTS AGE +metrics-api-7bb8cb8489-cbq4m 2/2 Running 0 4m58s +tap-injector-6b9bc6fc4-cgbr4 2/2 Running 0 4m56s +tap-5f6ddcc684-k2fd6 2/2 Running 0 4m57s +web-cbb846484-d987n 2/2 Running 0 4m56s +grafana-76fd8765f4-9rg8q 2/2 Running 0 4m58s +prometheus-7c5c48c466-jc27g 2/2 Running 0 4m58s +``` + +### √ viz extension self-check {#l5d-viz-metrics-api} + +Example failure: + +```bash +× viz extension self-check + No results returned +``` + +Check the logs on the viz extensions's metrics API: + +```bash +kubectl -n linkerd-viz logs deploy/metrics-api metrics-api +``` + +### √ prometheus is authorized to scrape data plane pods {#l5d-viz-data-plane-prom-authz} + +Example failure: + +```bash + +‼ prometheus is authorized to scrape data plane pods + prometheus may not be authorized to scrape the following pods: + * emojivoto/voting-5f46cbcdc6-p5dhn + * emojivoto/emoji-54f8786975-6qc8s + * emojivoto/vote-bot-85dfbf8996-86c44 + * emojivoto/web-79db6f4548-4mzkg + consider running `linkerd viz allow-scrapes` to authorize prometheus scrapes + see https://linkerd.io/2/checks/#l5d-viz-data-plane-prom-authz for hints +``` + +This warning indicates that the listed pods have the +[`deny` default inbound policy](../../features/server-policy/#policy-annotations), +which may prevent the `linkerd-viz` Prometheus instance from scraping the data +plane proxies in those pods. If Prometheus cannot scrape a data plane pod, +`linkerd viz` commands targeting that pod will return no data. + +This may be resolved by running the `linkerd viz allow-scrapes` command, which +generates [policy resources](../../features/server-policy/) authorizing +Prometheus to scrape the data plane proxies in a namespace: + +```bash +linkerd viz allow-scrapes --namespace emojivoto | kubectl apply -f - +``` + +Note that this warning _only_ checks for the existence of the policy resources +generated by `linkerd viz allow-scrapes` in namespaces that contain pods with +the `deny` default inbound policy. In some cases, Prometheus scrapes may also be +authorized by other, user-generated authorization policies. If metrics from the +listed pods are present in Prometheus, this warning is a false positive and can +be safely disregarded. + +### √ data plane proxy metrics are present in Prometheus {#l5d-data-plane-prom} + +Example failure: + +```bash +× data plane proxy metrics are present in Prometheus + Data plane metrics not found for linkerd/linkerd-identity-b8c4c48c8-pflc9. +``` + +Ensure Prometheus can connect to each `linkerd-proxy` via the Prometheus +dashboard: + +```bash +kubectl -n linkerd-viz port-forward svc/prometheus 9090 +``` + +...and then browse to +[http://localhost:9090/targets](http://localhost:9090/targets), validate the +`linkerd-proxy` section. + +You should see all your pods here. If they are not: + +- Prometheus might be experiencing connectivity issues with the k8s api server. + Check out the logs and delete the pod to flush any possible transient errors. + +## The "linkerd-jaeger" checks {#l5d-jaeger} + +These checks only run when the `linkerd-jaeger` extension is installed. This +check is intended to verify the installation of linkerd-jaeger extension which +comprises of open-census collector and jaeger components along with +`jaeger-injector` which injects the specific trace configuration to the proxies. + +### √ linkerd-jaeger extension Namespace exists {#l5d-jaeger-ns-exists} + +This is the basic check used to verify if the linkerd-jaeger extension namespace +is installed or not. The extension can be installed by running the following +command + +```bash +linkerd jaeger install | kubectl apply -f - +``` + +The installation can be configured by using the `--set`, `--values`, +`--set-string` and `--set-file` flags. See +[Linkerd Jaeger Readme](https://www.github.com/linkerd/linkerd2/tree/main/jaeger/charts/linkerd-jaeger/README.md) +for a full list of configurable fields. + +### √ jaeger extension proxies are healthy {#l5d-jaeger-proxy-healthy} + +This error indicates that the proxies running in the jaeger extension are not +healthy. Ensure that linkerd-jaeger has been installed with all of the correct +setting or re-install as necessary. + +### √ jaeger extension proxies are up-to-date {#l5d-jaeger-proxy-cp-version} + +This warning indicates the proxies running in the jaeger extension are running +an old version. We recommend downloading the latest linkerd-jaeger and +upgrading. + +### √ jaeger extension proxies and cli versions match {#l5d-jaeger-proxy-cli-version} + +This warning indicates that the proxies running in the jaeger extension are +running a different version from the Linkerd CLI. We recommend keeping this +versions in sync by updating either the CLI or linkerd-jaeger as necessary. + +### √ jaeger extension pods are injected {#l5d-jaeger-pods-injection} + +```bash +× jaeger extension pods are injected + could not find proxy container for jaeger-6f98d5c979-scqlq pod + see https://linkerd.io/checks/#l5d-jaeger-pods-injections for hints +``` + +Ensure all the jaeger pods are injected + +```bash +$ kubectl -n linkerd-jaeger get pods +NAME READY STATUS RESTARTS AGE +collector-69cc44dfbc-rhpfg 2/2 Running 0 11s +jaeger-6f98d5c979-scqlq 2/2 Running 0 11s +jaeger-injector-6c594f5577-cz75h 2/2 Running 0 10s +``` + +Make sure that the `proxy-injector` is working correctly by running +`linkerd check` + +### √ jaeger extension pods are running {#l5d-jaeger-pods-running} + +```bash +× jaeger extension pods are running + container linkerd-proxy in pod jaeger-59f5595fc7-ttndp is not ready + see https://linkerd.io/checks/#l5d-jaeger-pods-running for hints +``` + +Ensure all the linkerd-jaeger pods are running with 2/2 + +```bash +$ kubectl -n linkerd-jaeger get pods +NAME READY STATUS RESTARTS AGE +jaeger-injector-548684d74b-bcq5h 2/2 Running 0 5s +collector-69cc44dfbc-wqf6s 2/2 Running 0 5s +jaeger-6f98d5c979-vs622 2/2 Running 0 5sh +``` + +Make sure that the `proxy-injector` is working correctly by running +`linkerd check` + +## The "linkerd-buoyant" checks {#l5d-buoyant} + +These checks only run when the `linkerd-buoyant` extension is installed. This +check is intended to verify the installation of linkerd-buoyant extension which +comprises `linkerd-buoyant` CLI, the `buoyant-cloud-agent` Deployment, and the +`buoyant-cloud-metrics` DaemonSet. + +### √ Linkerd extension command linkerd-buoyant exists + +```bash +‼ Linkerd extension command linkerd-buoyant exists + exec: "linkerd-buoyant": executable file not found in $PATH + see https://linkerd.io/2/checks/#extensions for hints +``` + +Ensure you have the `linkerd-buoyant` cli installed: + +```bash +linkerd-buoyant check +``` + +To install the CLI: + +```bash +curl https://buoyant.cloud/install | sh +``` + +### √ linkerd-buoyant can determine the latest version + +```bash +‼ linkerd-buoyant can determine the latest version + Get "https://buoyant.cloud/version.json": dial tcp: lookup buoyant.cloud: no such host + see https://linkerd.io/checks#l5d-buoyant for hints +``` + +Ensure you can connect to the Linkerd Buoyant version check endpoint from the +environment the `linkerd` cli is running: + +```bash +$ curl https://buoyant.cloud/version.json +{"linkerd-buoyant":"v0.4.4"} +``` + +### √ linkerd-buoyant cli is up-to-date + +```bash +‼ linkerd-buoyant cli is up-to-date + CLI version is v0.4.3 but the latest is v0.4.4 + see https://linkerd.io/checks#l5d-buoyant for hints +``` + +To update to the latest version of the `linkerd-buoyant` CLI: + +```bash +curl https://buoyant.cloud/install | sh +``` + +### √ buoyant-cloud Namespace exists + +```bash +× buoyant-cloud Namespace exists + namespaces "buoyant-cloud" not found + see https://linkerd.io/checks#l5d-buoyant for hints +``` + +Ensure the `buoyant-cloud` namespace exists: + +```bash +kubectl get ns/buoyant-cloud +``` + +If the namespace does not exist, the `linkerd-buoyant` installation may be +missing or incomplete. To install the extension: + +```bash +linkerd-buoyant install | kubectl apply -f - +``` + +### √ buoyant-cloud Namespace has correct labels + +```bash +× buoyant-cloud Namespace has correct labels + missing app.kubernetes.io/part-of label + see https://linkerd.io/checks#l5d-buoyant for hints +``` + +The `linkerd-buoyant` installation may be missing or incomplete. To install the +extension: + +```bash +linkerd-buoyant install | kubectl apply -f - +``` + +### √ buoyant-cloud-agent ClusterRole exists + +```bash +× buoyant-cloud-agent ClusterRole exists + missing ClusterRole: buoyant-cloud-agent + see https://linkerd.io/checks#l5d-buoyant for hints +``` + +Ensure that the cluster role exists: + +```bash +$ kubectl get clusterrole buoyant-cloud-agent +NAME CREATED AT +buoyant-cloud-agent 2020-11-13T00:59:50Z +``` + +Also ensure you have permission to create ClusterRoles: + +```bash +$ kubectl auth can-i create ClusterRoles +yes +``` + +### √ buoyant-cloud-agent ClusterRoleBinding exists + +```bash +× buoyant-cloud-agent ClusterRoleBinding exists + missing ClusterRoleBinding: buoyant-cloud-agent + see https://linkerd.io/checks#l5d-buoyant for hints +``` + +Ensure that the cluster role binding exists: + +```bash +$ kubectl get clusterrolebinding buoyant-cloud-agent +NAME ROLE AGE +buoyant-cloud-agent ClusterRole/buoyant-cloud-agent 301d +``` + +Also ensure you have permission to create ClusterRoleBindings: + +```bash +$ kubectl auth can-i create ClusterRoleBindings +yes +``` + +### √ buoyant-cloud-agent ServiceAccount exists + +```bash +× buoyant-cloud-agent ServiceAccount exists + missing ServiceAccount: buoyant-cloud-agent + see https://linkerd.io/checks#l5d-buoyant for hints +``` + +Ensure that the service account exists: + +```bash +$ kubectl -n buoyant-cloud get serviceaccount buoyant-cloud-agent +NAME SECRETS AGE +buoyant-cloud-agent 1 301d +``` + +Also ensure you have permission to create ServiceAccounts: + +```bash +$ kubectl -n buoyant-cloud auth can-i create ServiceAccount +yes +``` + +### √ buoyant-cloud-id Secret exists + +```bash +× buoyant-cloud-id Secret exists + missing Secret: buoyant-cloud-id + see https://linkerd.io/checks#l5d-buoyant for hints +``` + +Ensure that the secret exists: + +```bash +$ kubectl -n buoyant-cloud get secret buoyant-cloud-id +NAME TYPE DATA AGE +buoyant-cloud-id Opaque 4 301d +``` + +Also ensure you have permission to create ServiceAccounts: + +```bash +$ kubectl -n buoyant-cloud auth can-i create ServiceAccount +yes +``` + +### √ buoyant-cloud-agent Deployment exists + +```bash +× buoyant-cloud-agent Deployment exists + deployments.apps "buoyant-cloud-agent" not found + see https://linkerd.io/checks#l5d-buoyant for hints +``` + +Ensure the `buoyant-cloud-agent` Deployment exists: + +```bash +kubectl -n buoyant-cloud get deploy/buoyant-cloud-agent +``` + +If the Deployment does not exist, the `linkerd-buoyant` installation may be +missing or incomplete. To reinstall the extension: + +```bash +linkerd-buoyant install | kubectl apply -f - +``` + +### √ buoyant-cloud-agent Deployment is running + +```bash +× buoyant-cloud-agent Deployment is running + no running pods for buoyant-cloud-agent Deployment + see https://linkerd.io/checks#l5d-buoyant for hints +``` + +Note, it takes a little bit for pods to be scheduled, images to be pulled and +everything to start up. If this is a permanent error, you'll want to validate +the state of the `buoyant-cloud-agent` Deployment with: + +```bash +$ kubectl -n buoyant-cloud get po --selector app=buoyant-cloud-agent +NAME READY STATUS RESTARTS AGE +buoyant-cloud-agent-6b8c6888d7-htr7d 2/2 Running 0 156m +``` + +Check the agent's logs with: + +```bash +kubectl logs -n buoyant-cloud buoyant-cloud-agent-6b8c6888d7-htr7d buoyant-cloud-agent +``` + +### √ buoyant-cloud-agent Deployment is injected + +```bash +× buoyant-cloud-agent Deployment is injected + could not find proxy container for buoyant-cloud-agent-6b8c6888d7-htr7d pod + see https://linkerd.io/checks#l5d-buoyant for hints +``` + +Ensure the `buoyant-cloud-agent` pod is injected, the `READY` column should show +`2/2`: + +```bash +$ kubectl -n buoyant-cloud get pods --selector app=buoyant-cloud-agent +NAME READY STATUS RESTARTS AGE +buoyant-cloud-agent-6b8c6888d7-htr7d 2/2 Running 0 161m +``` + +Make sure that the `proxy-injector` is working correctly by running +`linkerd check`. + +### √ buoyant-cloud-agent Deployment is up-to-date + +```bash +‼ buoyant-cloud-agent Deployment is up-to-date + incorrect app.kubernetes.io/version label: v0.4.3, expected: v0.4.4 + see https://linkerd.io/checks#l5d-buoyant for hints +``` + +Check the version with: + +```bash +$ linkerd-buoyant version +CLI version: v0.4.4 +Agent version: v0.4.4 +``` + +To update to the latest version: + +```bash +linkerd-buoyant install | kubectl apply -f - +``` + +### √ buoyant-cloud-agent Deployment is running a single pod + +```bash +× buoyant-cloud-agent Deployment is running a single pod + expected 1 buoyant-cloud-agent pod, found 2 + see https://linkerd.io/checks#l5d-buoyant for hints +``` + +`buoyant-cloud-agent` should run as a singleton. Check for other pods: + +```bash +kubectl get po -A --selector app=buoyant-cloud-agent +``` + +### √ buoyant-cloud-metrics DaemonSet exists + +```bash +× buoyant-cloud-metrics DaemonSet exists + deployments.apps "buoyant-cloud-metrics" not found + see https://linkerd.io/checks#l5d-buoyant for hints +``` + +Ensure the `buoyant-cloud-metrics` DaemonSet exists: + +```bash +kubectl -n buoyant-cloud get daemonset/buoyant-cloud-metrics +``` + +If the DaemonSet does not exist, the `linkerd-buoyant` installation may be +missing or incomplete. To reinstall the extension: + +```bash +linkerd-buoyant install | kubectl apply -f - +``` + +### √ buoyant-cloud-metrics DaemonSet is running + +```bash +× buoyant-cloud-metrics DaemonSet is running + no running pods for buoyant-cloud-metrics DaemonSet + see https://linkerd.io/checks#l5d-buoyant for hints +``` + +Note, it takes a little bit for pods to be scheduled, images to be pulled and +everything to start up. If this is a permanent error, you'll want to validate +the state of the `buoyant-cloud-metrics` DaemonSet with: + +```bash +$ kubectl -n buoyant-cloud get po --selector app=buoyant-cloud-metrics +NAME READY STATUS RESTARTS AGE +buoyant-cloud-metrics-kt9mv 2/2 Running 0 163m +buoyant-cloud-metrics-q8jhj 2/2 Running 0 163m +buoyant-cloud-metrics-qtflh 2/2 Running 0 164m +buoyant-cloud-metrics-wqs4k 2/2 Running 0 163m +``` + +Check the agent's logs with: + +```bash +kubectl logs -n buoyant-cloud buoyant-cloud-metrics-kt9mv buoyant-cloud-metrics +``` + +### √ buoyant-cloud-metrics DaemonSet is injected + +```bash +× buoyant-cloud-metrics DaemonSet is injected + could not find proxy container for buoyant-cloud-agent-6b8c6888d7-htr7d pod + see https://linkerd.io/checks#l5d-buoyant for hints +``` + +Ensure the `buoyant-cloud-metrics` pods are injected, the `READY` column should +show `2/2`: + +```bash +$ kubectl -n buoyant-cloud get pods --selector app=buoyant-cloud-metrics +NAME READY STATUS RESTARTS AGE +buoyant-cloud-metrics-kt9mv 2/2 Running 0 166m +buoyant-cloud-metrics-q8jhj 2/2 Running 0 166m +buoyant-cloud-metrics-qtflh 2/2 Running 0 166m +buoyant-cloud-metrics-wqs4k 2/2 Running 0 166m +``` + +Make sure that the `proxy-injector` is working correctly by running +`linkerd check`. + +### √ buoyant-cloud-metrics DaemonSet is up-to-date + +```bash +‼ buoyant-cloud-metrics DaemonSet is up-to-date + incorrect app.kubernetes.io/version label: v0.4.3, expected: v0.4.4 + see https://linkerd.io/checks#l5d-buoyant for hints +``` + +Check the version with: + +```bash +$ kubectl -n buoyant-cloud get daemonset/buoyant-cloud-metrics -o jsonpath='{.metadata.labels}' +{"app.kubernetes.io/name":"metrics","app.kubernetes.io/part-of":"buoyant-cloud","app.kubernetes.io/version":"v0.4.4"} +``` + +To update to the latest version: + +```bash +linkerd-buoyant install | kubectl apply -f - +``` diff --git a/linkerd.io/content/2.16/tasks/uninstall-multicluster.md b/linkerd.io/content/2.16/tasks/uninstall-multicluster.md new file mode 100644 index 0000000000..205cda90c8 --- /dev/null +++ b/linkerd.io/content/2.16/tasks/uninstall-multicluster.md @@ -0,0 +1,41 @@ ++++ +title = "Uninstalling Multicluster" +description = "Unlink and uninstall Linkerd multicluster." ++++ + +The Linkerd multicluster components allow for sending traffic from one cluster +to another. For more information on how to set this up, see [installing multicluster](../installing-multicluster/). + +## Unlinking + +Unlinking a cluster will delete all resources associated with that link +including: + +* the service mirror controller +* the Link resource +* the credentials secret +* mirror services + +It is recommended that you use the `unlink` command rather than deleting any +of these resources individually to help ensure that all mirror services get +cleaned up correctly and are not left orphaned. + +To unlink, run the `linkerd multicluster unlink` command and pipe the output +to `kubectl delete`: + +```bash +linkerd multicluster unlink --cluster-name=target | kubectl delete -f - +``` + +## Uninstalling + +Uninstalling the multicluster components will remove all components associated +with Linkerd's multicluster functionality including the gateway and service +account. Before you can uninstall, you must remove all existing links as +described above. Once all links have been removed, run: + +```bash +linkerd multicluster uninstall | kubectl delete -f - +``` + +Attempting to uninstall while at least one link remains will result in an error. diff --git a/linkerd.io/content/2.16/tasks/uninstall.md b/linkerd.io/content/2.16/tasks/uninstall.md new file mode 100644 index 0000000000..742afa7e56 --- /dev/null +++ b/linkerd.io/content/2.16/tasks/uninstall.md @@ -0,0 +1,52 @@ ++++ +title = "Uninstalling Linkerd" +description = "Linkerd can be easily removed from a Kubernetes cluster." ++++ + +Removing Linkerd from a Kubernetes cluster requires a few steps: removing any +data plane proxies, removing all the extensions and then removing the core +control plane. + +## Removing Linkerd data plane proxies + +To remove the Linkerd data plane proxies, you should remove any [Linkerd proxy +injection annotations](../../features/proxy-injection/) and roll the deployments. +When Kubernetes recreates the pods, they will not have the Linkerd data plane +attached. + +## Removing extensions + +To remove any extension, call its `uninstall` subcommand and pipe it to `kubectl +delete -f -`. For the bundled extensions that means: + +```bash +# To remove Linkerd Viz +linkerd viz uninstall | kubectl delete -f - + +# To remove Linkerd Jaeger +linkerd jaeger uninstall | kubectl delete -f - + +# To remove Linkerd Multicluster +linkerd multicluster uninstall | kubectl delete -f - +``` + +## Removing the control plane + +{{< note >}} +Uninstallating the control plane requires cluster-wide permissions. +{{< /note >}} + +To remove the [control plane](../../reference/architecture/#control-plane), run: + +```bash +linkerd uninstall | kubectl delete -f - +``` + +The `linkerd uninstall` command outputs the manifest for all of the Kubernetes +resources necessary for the control plane, including namespaces, service +accounts, CRDs, and more; `kubectl delete` then deletes those resources. + +This command can also be used to remove control planes that have been partially +installed. Note that `kubectl delete` will complain about any resources that it +was asked to delete that hadn't been created, but these errors can be safely +ignored. diff --git a/linkerd.io/content/2.16/tasks/upgrade.md b/linkerd.io/content/2.16/tasks/upgrade.md new file mode 100644 index 0000000000..dec5d3e1de --- /dev/null +++ b/linkerd.io/content/2.16/tasks/upgrade.md @@ -0,0 +1,685 @@ ++++ +title = "Upgrading Linkerd" +description = "Perform zero-downtime upgrades for Linkerd." +aliases = [ + "../upgrade/", + "../update/" +] ++++ + +In this guide, we'll walk you through how to perform zero-downtime upgrades for +Linkerd. + +{{< note >}} + +This page contains instructions for upgrading to the latest edge release of +Linkerd. If you have installed a [stable distribution](/releases/#stable) of +Linkerd, the vendor may have alternative guidance on how to upgrade. You can +find more information about the different kinds of Linkerd releases on the +[Releases and Versions](/releases/) page. + +{{< /note >}} + +Read through this guide carefully. Additionally, before starting a specific +upgrade, please read through the version-specific upgrade notices below, which +may contain important information about your version. + +- [Upgrade notice: 2.15 and beyond](#upgrade-notice-stable-215-and-beyond) +- [Upgrade notice: stable-2.14.0](#upgrade-notice-stable-2140) +- [Upgrade notice: stable-2.13.0](#upgrade-notice-stable-2130) +- [Upgrade notice: stable-2.12.0](#upgrade-notice-stable-2120) +- [Upgrade notice: stable-2.11.0](#upgrade-notice-stable-2110) +- [Upgrade notice: stable-2.10.0](#upgrade-notice-stable-2100) + +## Version numbering + +### Stable releases + +For stable releases, Linkerd follows a version numbering scheme of the form +`2..`. In other words, "2" is a static prefix, followed by the +major version, then the minor. + +Changes in minor versions are intended to be backwards compatible with the +previous version. Changes in major version *may* introduce breaking changes, +although we try to avoid that whenever possible. + +### Edge releases + +For edge releases, Linkerd issues explicit [guidance about each +release](../../../releases/#edge-release-guidance). Be sure to consult this +guidance before installing any release artifact. + +{{< note >}} + +Edge releases are **not** semantically versioned; the edge release number +itself does not give you any assurance about breaking changes, +incompatibilities, etc. Instead, this information is available in the [release +notes](https://github.com/linkerd/linkerd2/releases). + +{{< /note >}} + +## Upgrade paths + +The following upgrade paths are generally safe. However, before starting a +deploy, it is important to check the upgrade notes before +proceeding—occasionally, specific minor releases may have additional +restrictions. + +**Stable within the same major version**. It is usually safe to upgrade to the +latest minor version within the same major version. In other words, if you are +currently running version *2.x.y*, upgrading to *2.x.z*, where *z* is the +latest minor version for major version *x*, is safe. This is true even if you +would skip intermediate intermediate minor versions, i.e. it is still safe +even if *z* > *y + 1*. + +**Stable to the next major version**. It is usually safe to upgrade to the +latest minor version of the *next* major version. In other words, if you are +currently running version *2.x.y*, upgrading to *2.x + 1.w* will be safe, +where *w* is the latest minor version available for major version *x + 1*. + +**Stable to a later major version**. Upgrades that skip one or more major +versions are not supported. Instead, you should upgrade major versions +incrementally. + +**Edge release to a later edge release**. This is generally safe unless +the `Cautions` for the later edge release indicate otherwise. + +Again, please check the upgrade notes or release guidance for the specific +version you are upgrading *to* for any version-specific caveats. + +## Data plane vs control plane version skew + +Since a Linkerd upgrade always starts by upgrading the control plane, there is +a period during which the control plane is running the new version, but the +data plane is still running the older version. The extent to which this skew +can be supported depends on what kind of release you're running. Note that new +features introduced by the release may not be available for workloads with +older data planes. + +### Stable releases + +For stable releases, it is usually safe to upgrade one major version at a +time. This is independent of minor version, i.e. a *2.x.y* data plane and a +*2.x + 1.z* control plane will work regardless of *y* and *z*. Please check +the version-specific upgrade notes before proceeding. + +### Edge releases + +For edge releases, it is also usually safe to upgrade one major version at a +time. The major version of an edge release is included in the release notes +for each edge release: for example, `edge-24.4.1` is part of Linkerd 2.15, so +it should be safe to upgrade from `edge-24.4.1` to any edge release within +Linkerd 2.15 or Linkerd 2.16. + +For any situation where this is not the case, the edge release guidance will +have more information. + +## Overall upgrade process + +There are four components that need to be upgraded: + +- [The CLI](#upgrade-the-cli) +- [The control plane](#upgrade-the-control-plane) +- [The control plane extensions](#upgrade-extensions) +- [The data plane](#upgrade-the-data-plane) + +These steps should be performed in sequence. + +## Before upgrading + +Before you commence an upgrade, you should ensure that the current state +of Linkerd is healthy, e.g. by using `linkerd check`. For major version +upgrades, you should also ensure that your data plane is up-to-date, e.g. +with `linkerd check --proxy`, to avoid unintentional version skew. + +Make sure that your Linkerd version and Kubernetes version are compatible by +checking Linkerd's [supported Kubernetes +versions](../../reference/k8s-versions/). + +## Upgrading the CLI + +The CLI can be used to validate whether Linkerd was installed correctly. + +### Stable releases + +Consult the upgrade instructions from the vendor supplying your stable release +for information about how to upgrade the CLI. + +### Edge releases + +To upgrade the CLI, run: + +```bash +curl --proto '=https' --tlsv1.2 -sSfL https://run.linkerd.io/install-edge | sh +``` + +Alternatively, you can download the CLI directly via the [Linkerd releases +page](https://github.com/linkerd/linkerd2/releases/). + +Verify the CLI is installed and running the expected version with: + +```bash +linkerd version --client +``` + +## Upgrading the control plane + +### Upgrading the control plane with the CLI + +For users who have installed Linkerd via the CLI, the `linkerd upgrade` command +will upgrade the control plane. This command ensures that all of the control +plane's existing configuration and TLS secrets are retained. Linkerd's CRDs +should be upgraded first, using the `--crds` flag, followed by upgrading the +control plane. + +(If you are using a stable release, your vendor's upgrade instructions may +have more information.) + +```bash +linkerd upgrade --crds | kubectl apply -f - +linkerd upgrade | kubectl apply -f - +``` + +Next, we use the `linkerd prune` command to remove any resources that were +present in the previous version but should not be present in this one. + +```bash +linkerd prune | kubectl delete -f - +``` + +### Upgrading the control plane with Helm + +For Helm control plane installations, please follow the instructions at [Helm +upgrade procedure](../install-helm/#helm-upgrade-procedure). + +### Verifying the control plane upgrade + +Once the upgrade process completes, check to make sure everything is healthy +by running: + +```bash +linkerd check +``` + +This will run through a set of checks against your control plane and make sure +that it is operating correctly. + +To verify the Linkerd control plane version, run: + +```bash +linkerd version +``` + +Which should display the latest versions for both client and server. + +## Upgrading extensions + +[Linkerd's extensions](../extensions/) provide additional functionality to +Linkerd in a modular way. Generally speaking, extensions are versioned +separately from Linkerd releases and follow their own schedule; however, some +extensions are updated alongside Linkerd releases and you may wish to update +them as part of the same process. + +Each extension can be upgraded independently. If using Helm, the procedure is +similar to the control plane upgrade, using the respective charts. For the CLI, +the extension CLI commands don't provide `upgrade` subcommands, but using +`install` again is fine. For example: + +```bash +linkerd viz install | kubectl apply -f - +linkerd multicluster install | kubectl apply -f - +linkerd jaeger install | kubectl apply -f - +``` + +Most extensions also include a `prune` command for removing resources which +were present in the previous version but should not be present in the current +version. For example: + +```bash +linkerd viz prune | kubectl delete -f - +``` + +### Upgrading the multicluster extension + +Upgrading the multicluster extension doesn't cause downtime in the traffic going +through the mirrored services, unless otherwise noted in the version-specific +notes below. Note however that for the service mirror *deployments* (which +control the creation of the mirrored services) to be updated, you need to +re-link your clusters through `linkerd multicluster link`. + +## Upgrading the data plane + +Upgrading the data plane requires updating the proxy added to each meshed +workload. Since pods are immutable in Kubernetes, Linkerd is unable to simply +update the proxies in place. Thus, the standard option is to restart each +workload, allowing the proxy injector to inject the latest version of the proxy +as they come up. + +For example, you can use the `kubectl rollout restart` command to restart a +meshed deployment: + +```bash +kubectl -n rollout restart deploy +``` + +As described earlier, a skew of one major version between data plane and control +plane is always supported. Thus, for some systems it is possible to do this data +plane upgrade "lazily", and simply allow workloads to pick up the newest proxy +as they are restarted for other reasons (e.g. for new code rollouts). However, +newer features may only be available on workloads with the latest proxy. + +A skew of more than one major version between data plane and control plane is +not supported. + +### Verify the data plane upgrade + +Check to make sure everything is healthy by running: + +```bash +linkerd check --proxy +``` + +This will run through a set of checks to verify that the data plane is +operating correctly, and will list any pods that are still running older +versions of the proxy. + +Congratulation! You have successfully upgraded your Linkerd to the newer +version. + +## Upgrade notices + +This section contains release-specific information about upgrading. + +### Upgrade notice: stable-2.15 and beyond + +As of February 2024, the Linkerd project itself only produces [edge +release](/releases/) artifacts. The [Releases and Versions](/releases/) page +contains more information about the different kinds of Linkerd releases. + +### Upgrade notice: stable-2.14.0 + +For this release, if you're using the multicluster extension, you should re-link +your clusters after upgrading to stable-2.14.0, as explained +[above](#upgrading-the-multicluster-extension). Not doing so immediately won't +cause any downtime in cross-cluster connections, but `linkerd multicluster +check` will not succeed until the clusters are re-linked. + +There are no other extra steps for upgrading to 2.14.0. + +### Upgrade notice: stable-2.13.0 + +Please be sure to read the [Linkerd 2.13.0 release +notes](https://github.com/linkerd/linkerd2/releases/tag/stable-2.13.0). + +There are no other extra steps for upgrading to 2.13.0. + +### Upgrade notice: stable-2.12.0 + +Please be sure to read the [Linkerd 2.12.0 release +notes](https://github.com/linkerd/linkerd2/releases/tag/stable-2.12.0). + +There are a couple important changes that affect the upgrade process for 2.12.0: + +- The minimum Kubernetes version supported is `v1.21.0`. +- The TrafficSplit CRD has been moved to the Linkerd SMI extension. +- Support for Helm v2 has been removed. +- The viz extension no longer installs Grafana due to licensing concerns. +- The linkerd2 Helm chart has been split into two charts: linkerd-crds and + linkerd-control-plane. +- The viz, multicluster, jaeger, and linkerd2-cni Helm charts now rely on a + post-install hook required metadata into their namespaces. + +Read on for how to handle these changes as part of the upgrade process. + +#### Upgrading to 2.12.0 using the CLI + +If you installed Linkerd `2.11.x` with the CLI _and_ are using the +`TrafficSplit` CRD, you need to take an extra stop to avoid losing your +`TrafficSplit` CRs. (If you're not using `TrafficSplit` then you can +perform the usual CLI upgrade as [described above](#with-linkerd-cli).) + +The `TrafficSplit` CRD has been moved to the SMI extension. But before +installing that extension, you need to add the following annotations and label +to the CRD so that the `linkerd-smi` chart can adopt it: + +```bash +kubectl annotate --overwrite crd/trafficsplits.split.smi-spec.io \ + meta.helm.sh/release-name=linkerd-smi \ + meta.helm.sh/release-namespace=linkerd-smi +kubectl label crd/trafficsplits.split.smi-spec.io \ + app.kubernetes.io/managed-by=Helm +``` + +Now you can install the SMI extension. E.g. via Helm: + +```bash +helm repo add l5d-smi https://linkerd.github.io/linkerd-smi +helm install linkerd-smi -n linkerd-smi --create-namespace l5d-smi/linkerd-smi +``` + +And finally you can proceed with the usual [CLI upgrade +instructions](#with-linkerd-cli), but avoid using the `--prune` flag when +applying the output of `linkerd upgrade --crds` to avoid removing the +`TrafficSplit` CRD. + +#### Upgrading to 2.12.0 using Helm + +Note that support for Helm v2 has been dropped in the Linkerd 2.12.0 release. + +This section provides instructions on how to perform a migration from Linkerd +`2.11.x` to `2.12.0` without control plane downtime, when your existing Linkerd +instance was installed via Helm. There were several changes to the Linkerd Helm +charts as part of this release, so this upgrade process is a little more +involved than usual. + +##### Retrieving existing customization and PKI setup + +The `linkerd2` chart has been replaced by two charts: `linkerd-crds` and +`linkerd-control-plane` (and optionally `linkerd-smi` if you're using +`TrafficSplit`). To migrate to this new setup, we need to ensure your +customization values, including TLS certificates and keys, are migrated +to the new charts. + +Find the release name you used for the `linkerd2` chart, and the namespace where +this release stored its config: + +```bash +$ helm ls -A +NAME NAMESPACE REVISION UPDATED STATUS CHART APP VERSION +linkerd default 1 2021-11-22 17:14:50.751436374 -0500 -05 deployed linkerd2-2.11.1 stable-2.11.1 +``` + +(The example output above matches the default case.) Note that even if Linkerd is +installed in the `linkerd` namespace, the Helm config should have been installed +in the `default` namespace, unless you specified something different in the +`namespace` value when you installed. Take note of this release name (linkerd) +and namespace (default) to use in the commands that follow. + +Next, retrieve all your chart values customizations, especially your trust +root and issuer keys (`identityTrustAnchorsPEM`, `identity.issuer.tls.crtPEM` +and `identity.issuer.tls.keyPEM`). These values will need to be fed again into +the `helm install` command below for the `linkerd-control-plane` chart. These +values can be retrieved with the following command: + +```bash +helm get -n default values linkerd +``` + +##### Migrate resources to the new charts + +Next, we need to prepare these values for use with the new charts. Note that the +examples below use the [yq](https://github.com/mikefarah/yq) utility. + +The following snippets will change the `meta.helm.sh/release-name` and +`meta.helm.sh/release-namespace` annotations for each resource in the `linkerd` +release (use your own name as explained above), so that they can be adopted by +the `linkerd-crds`, `linkerd-control-plane` and `linkerd-smi` charts: + +```bash +# First migrate the CRDs +$ helm -n default get manifest linkerd | \ + yq 'select(.kind == "CustomResourceDefinition") | .metadata.name' | \ + grep -v '\-\-\-' | \ + xargs -n1 sh -c \ + 'kubectl annotate --overwrite crd/$0 meta.helm.sh/release-name=linkerd-crds meta.helm.sh/release-namespace=linkerd' + +# Special case for TrafficSplit (only use if you have TrafficSplit CRs) +$ kubectl annotate --overwrite crd/trafficsplits.split.smi-spec.io \ + meta.helm.sh/release-name=linkerd-smi meta.helm.sh/release-namespace=linkerd-smi + +# Now migrate all the other resources +$ helm -n default get manifest linkerd | \ + yq 'select(.kind != "CustomResourceDefinition")' | \ + yq '.kind, .metadata.name, .metadata.namespace' | \ + grep -v '\-\-\-' | + xargs -n3 sh -c 'kubectl annotate --overwrite -n $2 $0/$1 meta.helm.sh/release-name=linkerd-control-plane meta.helm.sh/release-namespace=linkerd' +``` + +##### Installing the new charts + +Next, we need to install the new charts using our customization values +prepared above. + +```bash +# First make sure you update the helm repo +$ helm repo up + +# Install the linkerd-crds chart +$ helm install linkerd-crds -n linkerd --create-namespace linkerd/linkerd-crds + +# Install the linkerd-control-plane chart +# (remember to add any customizations you retrieved above) +$ helm install linkerd-control-plane \ + -n linkerd \ + --set-file identityTrustAnchorsPEM=ca.crt \ + --set-file identity.issuer.tls.crtPEM=issuer.crt \ + --set-file identity.issuer.tls.keyPEM=issuer.key \ + linkerd/linkerd-control-plane + +# Optional: if using TrafficSplit CRs +$ helm repo add l5d-smi https://linkerd.github.io/linkerd-smi +$ helm install linkerd-smi -n linkerd-smi --create-namespace l5d-smi/linkerd-smi +``` + +##### Cleaning up the old linkerd2 Helm release + +After installing the new charts, we need to clean up the old Helm chart. The +`helm delete` command would delete all the linkerd resources, so instead we just +remove the Helm release config for the old `linkerd2` chart (assuming you used +the "Secret" storage backend, which is the default): + +```bash +$ kubectl -n default delete secret \ + --field-selector type=helm.sh/release.v1 \ + -l name=linkerd,owner=helm +``` + +##### Upgrading extension Helm charts + +Finally, we need to upgrade our extensions. In Linkerd 2.12.0 the viz, +multicluster, jaeger, and linkerd2-cni extensions no longer install their +namespaces, instead leaving that to the `helm` command (or to a previous step in +your CD pipeline) and relying on an post-install hook to add the required +metadata into that namespace. Therefore the Helm upgrade path for these +extensions is to delete and reinstall them. + +For example, for the viz extension: + +```bash +# update the helm repo +helm repo up + +# delete your current instance +# (assuming you didn't use the -n flag when installing) +helm delete linkerd-viz + +# install the new chart version +helm install linkerd-viz -n linkerd-viz --create-namespace linkerd/linkerd-viz +``` + +##### Upgrading the multicluster extension with Helm + +Note that reinstalling the multicluster extension via Helm as explained above +will result in the recreation of the `linkerd-multicluster` namespace, thus +deleting all the `Link` resources that associate the source cluster with any +target clusters. The mirrored services, which live on their respective +namespaces, won't be deleted so there won't be any downtime. So after finishing +the upgrade, make sure you re-link your clusters again with `linkerd +multicluster link`. This will also bring the latest versions of the service +mirror deployments. + +##### Adding Grafana + +The viz extension no longer installs a Grafana instance due to licensing +concerns. Instead we recommend you install it directly from the [Grafana +official Helm +chart](https://github.com/grafana/helm-charts/tree/main/charts/grafana) or the +[Grafana Operator](https://github.com/grafana-operator/grafana-operator). +Linkerd's Grafana dashboards have been published in +, and the new [Grafana +docs](../grafana/) provide detailed instructions on how to load them. + +### Upgrade notice: stable-2.11.0 + +The minimum Kubernetes version supported is now `v1.17.0`. + +There are two breaking changes in the 2.11.0 release: pods in `ingress` no +longer support non-HTTP traffic to meshed workloads; and the proxy no longer +forwards traffic to ports that are bound only to localhost. + +Users of the multi-cluster extension will need to re-link their cluster after +upgrading. + +The Linkerd proxy container is now the *first* container in the pod. This may +affect tooling that assumed the application was the first container in the pod. + +#### Control plane changes + +The `controller` pod has been removed from the control plane. All configuration +options that previously applied to it are no longer valid (e.g +`publicAPIResources` and all of its nested fields). Additionally, the +destination pod has a new `policy` container that runs the policy controller. + +#### Data plane changes + +In order to fix a class of startup race conditions, the container ordering +within meshed pods has changed so that the Linkerd proxy container is now the +*first* container in the pod, the application container now waits to start until +the proxy is ready. This may affect tooling that assumed the application +container was the first container in the pod. + +Using [linkerd-await](https://github.com/linkerd/linkerd-await) to enforce +container startup ordering is thus longer necessary. (However, using +`linkerd-await -S` to ensure proxy shutdown in Jobs and Cronjobs is still +valid.) + +#### Routing breaking changes + +There are two breaking changes to be aware of when it comes to how traffic is +routed. + +First, when the proxy runs in ingress mode (`config.linkerd.io/inject: +ingress`), non-HTTP traffic to meshed pods is no longer supported. To get +around this, you will need to use the `config.linkerd.io/skip-outbound-ports` +annotation on your ingress controller pod. In many cases, ingress mode is no +longer necessary. Before upgrading, it may be worth revisiting [how to use +ingress](../using-ingress/) with Linkerd. + +Second, the proxy will no longer forward traffic to ports only bound on +localhost, such as `127.0.0.1:8080`. Services that want to receive traffic from +other pods should now be bound to a public interface (e.g `0.0.0.0:8080`). This +change prevents ports from being accidentally exposed outside of the pod. + +#### Multicluster + +The gateway component has been changed to use a `pause` container instead of +`nginx`. This change should reduce the footprint of the extension; the proxy +routes traffic internally and does not need to rely on `nginx` to receive or +forward traffic. While this will not cause any downtime when upgrading +multicluster, it does affect probing. `linkerd multicluster gateways` will +falsely advertise the target cluster gateway as being down until the clusters +are re-linked. + +Multicluster now supports `NodePort` type services for the gateway. To support +this change, the configuration options in the Helm values file are now grouped +under the `gateway` field. If you have installed the extension with other +options than the provided defaults, you will need to update your `values.yaml` +file to reflect this change in field grouping. + +#### Other changes + +Besides the breaking changes described above, there are other minor changes to +be aware of when upgrading from `stable-2.10.x`: + +- `PodSecurityPolicy` (PSP) resources are no longer installed by default as a + result of their deprecation in Kubernetes v1.21 and above. The control plane + and core extensions will now be shipped without PSPs; they can be enabled + through a new install option `enablePSP: true`. +- The `tcp_connection_duration_ms` metric has been removed. +- Opaque ports changes: `443` is no longer included in the default opaque ports + list. Ports `4444`, `6379` and `9300` corresponding to Galera, Redis and + ElasticSearch respectively (all server speak first protocols) have been added + to the default opaque ports list. The default ignore inbound ports list has + also been changed to include ports `4567` and `4568`. + +### Upgrade notice: stable-2.10.0 + +If you are currently running Linkerd 2.9.0, 2.9.1, 2.9.2, or 2.9.3 (but *not* +2.9.4), and you *upgraded* to that release using the `--prune` flag (as opposed +to installing it fresh), you will need to use the `linkerd repair` command as +outlined in the [Linkerd 2.9.3 upgrade notes](#upgrade-notice-stable-2-9-3) +before you can upgrade to Linkerd 2.10. + +Additionally, there are two changes in the 2.10.0 release that may affect you. +First, the handling of certain ports and protocols has changed. Please read +through our [ports and protocols in 2.10 upgrade +guide](../upgrading-2.10-ports-and-protocols/) for the repercussions. + +Second, we've introduced [extensions](../extensions/) and moved the +default visualization components into a Linkerd-Viz extension. Read on for what +this means for you. + +#### Visualization components moved to Linkerd-Viz extension + +With the introduction of [extensions](../extensions/), all of the +Linkerd control plane components related to visibility (including Prometheus, +Grafana, Web, and Tap) have been removed from the main Linkerd control plane +and moved into the Linkerd-Viz extension. This means that when you upgrade to +stable-2.10.0, these components will be removed from your cluster and you will +not be able to run commands such as `linkerd stat` or +`linkerd dashboard`. To restore this functionality, you must install the +Linkerd-Viz extension by running `linkerd viz install | kubectl apply -f -` +and then invoke those commands through `linkerd viz stat`, +`linkerd viz dashboard`, etc. + +```bash +# Upgrade the control plane (this will remove viz components). +linkerd upgrade | kubectl apply --prune -l linkerd.io/control-plane-ns=linkerd -f - +# Prune cluster-scoped resources +linkerd upgrade | kubectl apply --prune -l linkerd.io/control-plane-ns=linkerd \ + --prune-allowlist=rbac.authorization.k8s.io/v1/clusterrole \ + --prune-allowlist=rbac.authorization.k8s.io/v1/clusterrolebinding \ + --prune-allowlist=apiregistration.k8s.io/v1/apiservice -f - +# Install the Linkerd-Viz extension to restore viz functionality. +linkerd viz install | kubectl apply -f - +``` + +Helm users should note that configuration values related to these visibility +components have moved to the Linkerd-Viz chart. Please update any values +overrides you have and use these updated overrides when upgrading the Linkerd +chart or installing the Linkerd-Viz chart. See below for a complete list of +values which have moved. + +```bash +helm repo update +# Upgrade the control plane (this will remove viz components). +helm upgrade linkerd2 linkerd/linkerd2 --reset-values -f values.yaml --atomic +# Install the Linkerd-Viz extension to restore viz functionality. +helm install linkerd2-viz linkerd/linkerd2-viz -f viz-values.yaml +``` + +The following values were removed from the Linkerd2 chart. Most of the removed +values have been moved to the Linkerd-Viz chart or the Linkerd-Jaeger chart. + +- `dashboard.replicas` moved to Linkerd-Viz as `dashboard.replicas` +- `tap` moved to Linkerd-Viz as `tap` +- `tapResources` moved to Linkerd-Viz as `tap.resources` +- `tapProxyResources` moved to Linkerd-Viz as `tap.proxy.resources` +- `webImage` moved to Linkerd-Viz as `dashboard.image` +- `webResources` moved to Linkerd-Viz as `dashboard.resources` +- `webProxyResources` moved to Linkerd-Viz as `dashboard.proxy.resources` +- `grafana` moved to Linkerd-Viz as `grafana` +- `grafana.proxy` moved to Linkerd-Viz as `grafana.proxy` +- `prometheus` moved to Linkerd-Viz as `prometheus` +- `prometheus.proxy` moved to Linkerd-Viz as `prometheus.proxy` +- `global.proxy.trace.collectorSvcAddr` moved to Linkerd-Jaeger as `webhook.collectorSvcAddr` +- `global.proxy.trace.collectorSvcAccount` moved to Linkerd-Jaeger as `webhook.collectorSvcAccount` +- `tracing.enabled` removed +- `tracing.collector` moved to Linkerd-Jaeger as `collector` +- `tracing.jaeger` moved to Linkerd-Jaeger as `jaeger` + +Also please note the global scope from the Linkerd2 chart values has been +dropped, moving the config values underneath it into the root scope. Any values +you had customized there will need to be migrated; in particular +`identityTrustAnchorsPEM` in order to conserve the value you set during +install." diff --git a/linkerd.io/content/2.16/tasks/using-custom-domain.md b/linkerd.io/content/2.16/tasks/using-custom-domain.md new file mode 100644 index 0000000000..e26c283f41 --- /dev/null +++ b/linkerd.io/content/2.16/tasks/using-custom-domain.md @@ -0,0 +1,35 @@ ++++ +title = "Using a Custom Cluster Domain" +description = "Use Linkerd with a custom cluster domain." ++++ + +For Kubernetes clusters that use [custom cluster domain](https://kubernetes.io/docs/tasks/administer-cluster/dns-custom-nameservers/), +Linkerd must be installed using the `--cluster-domain` option: + +```bash +# first, install the Linkerd CRDs: +linkerd install --crds | kubectl apply -f - + +# next, install the Linkerd control plane, using the custom cluster domain: +linkerd install --cluster-domain=example.org \ + --identity-trust-domain=example.org \ + | kubectl apply -f - + +# The Linkerd Viz extension also requires a similar setting: +linkerd viz install --set clusterDomain=example.org | kubectl apply -f - + +# And so does the Multicluster extension: +linkerd multicluster install --set identityTrustDomain=example.org | kubectl apply -f - +``` + +This ensures that all Linkerd handles all service discovery, routing, service +profiles and traffic split resources using the `example.org` domain. + +{{< note >}} +Note that the identity trust domain must match the cluster domain for mTLS to +work. +{{< /note >}} + +{{< note >}} +Changing the cluster domain while upgrading Linkerd isn't supported. +{{< /note >}} diff --git a/linkerd.io/content/2.16/tasks/using-debug-endpoints.md b/linkerd.io/content/2.16/tasks/using-debug-endpoints.md new file mode 100644 index 0000000000..2302d30a26 --- /dev/null +++ b/linkerd.io/content/2.16/tasks/using-debug-endpoints.md @@ -0,0 +1,64 @@ ++++ +title = "Control Plane Debug Endpoints" +description = "Linkerd's control plane components provide debug endpoints." ++++ + +All of the control plane components expose runtime profiling information through +the path `/debug/pprof`, using Go's +[pprof](https://golang.org/pkg/net/http/pprof/) package. This endpoint is +disabled by default but can be enabled to gather profiling data. + +You can consume the provided data with `go tool pprof` to generate output in +many formats (PDF, DOT, PNG, etc). + +The following diagnostics are provided (a summary with links is provided at +`/debug/pprof`): + +- allocs: A sampling of all past memory allocations +- block: Stack traces that led to blocking on synchronization primitives +- cmdline: The command line invocation of the current program +- goroutine: Stack traces of all current goroutines +- heap: A sampling of memory allocations of live objects. You can specify the gc + GET parameter to run GC before taking the heap sample. +- mutex: Stack traces of holders of contended mutexes +- profile: CPU profile. You can specify the duration in the seconds GET + parameter. After you get the profile file, use the go tool pprof command to + investigate the profile. +- threadcreate: Stack traces that led to the creation of new OS threads +- trace: A trace of execution of the current program. You can specify the + duration in the seconds GET parameter. After you get the trace file, use the + go tool trace command to investigate the trace. + +## Example Usage + +The pprof endpoint can be enabled by setting the `--set enablePprof=true` flag +when installing or upgrading Linkerd or by setting the `enablePprof=true` Helm +value. + +This data is served over the `admin-http` port. +To find this port, you can examine the pod's yaml, or for the identity pod for +example, issue a command like so: + +```bash +kubectl -n linkerd get po \ + $(kubectl -n linkerd get pod -l linkerd.io/control-plane-component=identity \ + -o jsonpath='{.items[0].metadata.name}') \ + -o=jsonpath='{.spec.containers[*].ports[?(@.name=="admin-http")].containerPort}' +``` + +Then use the `kubectl port-forward` command to access that port from outside +the cluster (in this example the port is 9990): + +```bash +kubectl -n linkerd port-forward \ + $(kubectl -n linkerd get pod -l linkerd.io/control-plane-component=identity \ + -o jsonpath='{.items[0].metadata.name}') \ + 9990 +``` + +It is now possible to use `go tool` to inspect this data. For example to +generate a graph in a PDF file describing memory allocations: + +```bash +go tool pprof -seconds 5 -pdf http://localhost:9990/debug/pprof/allocs +``` diff --git a/linkerd.io/content/2.16/tasks/using-ingress.md b/linkerd.io/content/2.16/tasks/using-ingress.md new file mode 100644 index 0000000000..8f96da0e34 --- /dev/null +++ b/linkerd.io/content/2.16/tasks/using-ingress.md @@ -0,0 +1,728 @@ ++++ +title = "Handling ingress traffic" +description = "Linkerd can work alongside your ingress controller of choice." ++++ + +Ingress traffic refers to traffic that comes into your cluster from outside the +cluster. For reasons of simplicity and composability, Linkerd itself doesn't +provide a built-in ingress solution for handling traffic coming into the +cluster. Instead, Linkerd is designed to work with the many existing Kubernetes +ingress options. + +Combining Linkerd and your ingress solution of choice requires two things: + +1. Configuring your ingress to support Linkerd (if necessary). +2. Meshing your ingress pods. + +Strictly speaking, meshing your ingress pods is not required to allow traffic +into the cluster. However, it is recommended, as it allows Linkerd to provide +features like L7 metrics and mutual TLS the moment the traffic enters the +cluster. + +## Handling external TLS + +One common job for ingress controllers is to terminate TLS from the outside +world, e.g. HTTPS calls. + +Like all pods, traffic to a meshed ingress has both an inbound and an outbound +component. If your ingress terminates TLS, Linkerd will treat this inbound TLS +traffic as an opaque TCP stream, and will only be able to provide byte-level +metrics for this side of the connection. + +Once the ingress controller terminates the TLS connection and issues the +corresponding HTTP or gRPC traffic to internal services, these outbound calls +will have the full set of metrics and mTLS support. + +## Ingress mode {#ingress-mode} + +Most ingress controllers can be meshed like any other service, i.e. by +applying the `linkerd.io/inject: enabled` annotation at the appropriate level. +(See [Adding your services to Linkerd](../adding-your-service/) for more.) + +However, some ingress options need to be meshed in a special "ingress" mode, +using the `linkerd.io/inject: ingress` annotation. + +The instructions below will describe, for each ingress, whether it requires this +mode of operation. + +If you're using "ingress" mode, we recommend that you set this ingress +annotation at the workload level rather than at the namespace level, so that +other resources in the ingress namespace are be meshed normally. + +{{< warning id=open-relay-warning >}} +When an ingress is meshed in ingress mode, you _must_ configure it to remove +the `l5d-dst-override` header to avoid creating an open relay to cluster-local +and external endpoints. +{{< /warning >}} + +{{< note >}} +Linkerd versions 2.13.0 through 2.13.4 had a bug whereby the `l5d-dst-override` +header was *required* in ingress mode, or the request would fail. This bug was +fixed in 2.13.5, and was not present prior to 2.13.0. +{{< /note >}} + +For more on ingress mode and why it's necessary, see [Ingress +details](#ingress-details) below. + +## Common ingress options for Linkerd + +Common ingress options that Linkerd has been used with include: + +- [Ambassador (aka Emissary)](#ambassador) +- [Nginx (community version)](#nginx-community-version) +- [Nginx (F5 NGINX version)](#nginx-f5-nginx-version) +- [Traefik](#traefik) + - [Traefik 1.x](#traefik-1x) + - [Traefik 2.x](#traefik-2x) +- [GCE](#gce) +- [Gloo](#gloo) +- [Contour](#contour) +- [Kong](#kong) +- [Haproxy](#haproxy) +- [EnRoute](#enroute) +- [ngrok](#ngrok) + +For a quick start guide to using a particular ingress, please visit the section +for that ingress below. If your ingress is not on that list, never fear—it +likely works anyways. See [Ingress details](#ingress-details) below. + +## Emissary-Ingress (aka Ambassador) {#ambassador} + +Emissary-Ingress can be meshed normally: it does not require the [ingress +mode](#ingress-mode) annotation. An example manifest for configuring +Ambassador / Emissary is as follows: + +```yaml +apiVersion: getambassador.io/v3alpha1 +kind: Mapping +metadata: + name: web-ambassador-mapping + namespace: emojivoto +spec: + hostname: "*" + prefix: / + service: http://web-svc.emojivoto.svc.cluster.local:80 +``` + +For a more detailed guide, we recommend reading [Installing the Emissary ingress +with the Linkerd service +mesh](https://buoyant.io/2021/05/24/emissary-and-linkerd-the-best-of-both-worlds/). + +## Nginx (community version) + +This section refers to the Kubernetes community version +of the Nginx ingress controller +[kubernetes/ingress-nginx](https://github.com/kubernetes/ingress-nginx). + +Nginx can be meshed normally: it does not require the [ingress +mode](#ingress-mode) annotation. + +The +[`nginx.ingress.kubernetes.io/service-upstream`](https://kubernetes.github.io/ingress-nginx/user-guide/nginx-configuration/annotations/#service-upstream) +annotation should be set to `"true"`. For example: + +```yaml +# apiVersion: networking.k8s.io/v1beta1 # for k8s < v1.19 +apiVersion: networking.k8s.io/v1 +kind: Ingress +metadata: + name: emojivoto-web-ingress + namespace: emojivoto + annotations: + nginx.ingress.kubernetes.io/service-upstream: "true" +spec: + ingressClassName: nginx + defaultBackend: + service: + name: web-svc + port: + number: 80 +``` + +If using [the ingress-nginx Helm +chart](https://artifacthub.io/packages/helm/ingress-nginx/ingress-nginx), note +that the namespace containing the ingress controller should NOT be annotated +with `linkerd.io/inject: enabled`. Instead, you should annotate the `kind: +Deployment` (`.spec.template.metadata.annotations`). For example: + +```yaml +controller: + podAnnotations: + linkerd.io/inject: enabled +... +``` + +The reason is because this Helm chart defines (among other things) two +Kubernetes resources: + +1) `kind: ValidatingWebhookConfiguration`. This creates a short-lived pod named + something like `ingress-nginx-admission-create-XXXXX` which quickly terminates. + +2) `kind: Deployment`. This creates a long-running pod named something like +`ingress-nginx-controller-XXXX` which contains the Nginx docker + container. + +Setting the injection annotation at the namespace level would mesh the +short-lived pod, which would prevent it from terminating as designed. + +## Nginx (F5 NGINX version) + +This section refers to the Nginx ingress controller +developed and maintained by F5 NGINX +[nginxinc/kubernetes-ingress](https://github.com/nginxinc/kubernetes-ingress). + +This version of Nginx can also be meshed normally +and does not require the [ingress mode](#ingress-mode) annotation. + +The [VirtualServer/VirtualServerRoute CRD resource](https://docs.nginx.com/nginx-ingress-controller/configuration/virtualserver-and-virtualserverroute-resources/#virtualserverroute) +should be used in favor of the `ingress` resource (see +[this Github issue](https://github.com/nginxinc/kubernetes-ingress/issues/2529) +for more information). + +The `use-cluster-ip` field should be set to `true`. For example: + +```yaml +apiVersion: k8s.nginx.org/v1 +kind: VirtualServer +metadata: + name: emojivoto-web-ingress + namespace: emojivoto +spec: + ingressClassName: nginx + upstreams: + - name: web + service: web-svc + port: 80 + use-cluster-ip: true + routes: + - path: / + action: + pass: web +``` + +## Traefik + +Traefik should be meshed with [ingress mode enabled](#ingress-mode), i.e. with +the `linkerd.io/inject: ingress` annotation rather than the default `enabled`. + +Instructions differ for 1.x and 2.x versions of Traefik. + +### Traefik 1.x {#traefik-1x} + +The simplest way to use Traefik 1.x as an ingress for Linkerd is to configure a +Kubernetes `Ingress` resource with the +`ingress.kubernetes.io/custom-request-headers` like this: + +```yaml +# apiVersion: networking.k8s.io/v1beta1 # for k8s < v1.19 +apiVersion: networking.k8s.io/v1 +kind: Ingress +metadata: + name: web-ingress + namespace: emojivoto + annotations: + ingress.kubernetes.io/custom-request-headers: l5d-dst-override:web-svc.emojivoto.svc.cluster.local:80 +spec: + ingressClassName: traefik + rules: + - host: example.com + http: + paths: + - path: / + pathType: Prefix + backend: + service: + name: web-svc + port: + number: 80 +``` + +The important annotation here is: + +```yaml +ingress.kubernetes.io/custom-request-headers: l5d-dst-override:web-svc.emojivoto.svc.cluster.local:80 +``` + +Traefik will add a `l5d-dst-override` header to instruct Linkerd what service +the request is destined for. You'll want to include both the Kubernetes service +FQDN (`web-svc.emojivoto.svc.cluster.local`) *and* the destination +`servicePort`. + +To test this, you'll want to get the external IP address for your controller. If +you installed Traefik via Helm, you can get that IP address by running: + +```bash +kubectl get svc --all-namespaces \ + -l app=traefik \ + -o='custom-columns=EXTERNAL-IP:.status.loadBalancer.ingress[0].ip' +``` + +You can then use this IP with curl: + +```bash +curl -H "Host: example.com" http://external-ip +``` + +{{< note >}} +This solution won't work if you're using Traefik's service weights as +Linkerd will always send requests to the service name in `l5d-dst-override`. A +workaround is to use `traefik.frontend.passHostHeader: "false"` instead. +{{< /note >}} + +### Traefik 2.x {#traefik-2x} + +Traefik 2.x adds support for path based request routing with a Custom Resource +Definition (CRD) called +[`IngressRoute`](https://docs.traefik.io/providers/kubernetes-crd/). + +If you choose to use `IngressRoute` instead of the default Kubernetes `Ingress` +resource, then you'll also need to use the Traefik's +[`Middleware`](https://docs.traefik.io/middlewares/headers/) Custom Resource +Definition to add the `l5d-dst-override` header. + +The YAML below uses the Traefik CRDs to produce the same results for the +`emojivoto` application, as described above. + +```yaml +apiVersion: traefik.containo.us/v1alpha1 +kind: Middleware +metadata: + name: l5d-header-middleware + namespace: traefik +spec: + headers: + customRequestHeaders: + l5d-dst-override: "web-svc.emojivoto.svc.cluster.local:80" +--- +apiVersion: traefik.containo.us/v1alpha1 +kind: IngressRoute +metadata: + annotations: + kubernetes.io/ingress.class: traefik + creationTimestamp: null + name: emojivoto-web-ingress-route + namespace: emojivoto +spec: + entryPoints: [] + routes: + - kind: Rule + match: PathPrefix(`/`) + priority: 0 + middlewares: + - name: l5d-header-middleware + services: + - kind: Service + name: web-svc + port: 80 +``` + +## GCE + +The GCE ingress should be meshed with with [ingress mode +enabled](#ingress-mode), , i.e. with the `linkerd.io/inject: ingress` +annotation rather than the default `enabled`. + +This example shows how to use a [Google Cloud Static External IP +Address](https://cloud.google.com/compute/docs/ip-addresses/reserve-static-external-ip-address) +and TLS with a [Google-managed +certificate](https://cloud.google.com/load-balancing/docs/ssl-certificates#managed-certs). + +```yaml +# apiVersion: networking.k8s.io/v1beta1 # for k8s < v1.19 +apiVersion: networking.k8s.io/v1 +kind: Ingress +metadata: + name: web-ingress + namespace: emojivoto + annotations: + ingress.kubernetes.io/custom-request-headers: "l5d-dst-override: web-svc.emojivoto.svc.cluster.local:80" + ingress.gcp.kubernetes.io/pre-shared-cert: "managed-cert-name" + kubernetes.io/ingress.global-static-ip-name: "static-ip-name" +spec: + ingressClassName: gce + rules: + - host: example.com + http: + paths: + - path: / + pathType: Prefix + backend: + service: + name: web-svc + port: + number: 80 +``` + +To use this example definition, substitute `managed-cert-name` and +`static-ip-name` with the short names defined in your project (n.b. use the name +for the IP address, not the address itself). + +The managed certificate will take about 30-60 minutes to provision, but the +status of the ingress should be healthy within a few minutes. Once the managed +certificate is provisioned, the ingress should be visible to the Internet. + +## Gloo + +Gloo should be meshed with [ingress mode enabled](#ingress-mode), i.e. with the +`linkerd.io/inject: ingress` annotation rather than the default `enabled`. + +As of Gloo v0.13.20, Gloo has native integration with Linkerd, so that the +required Linkerd headers are added automatically. Assuming you installed Gloo +to the default location, you can enable the native integration by running: + +```bash +kubectl patch settings -n gloo-system default \ + -p '{"spec":{"linkerd":true}}' --type=merge +``` + +Gloo will now automatically add the `l5d-dst-override` header to every +Kubernetes upstream. + +Now simply add a route to the upstream, e.g.: + +```bash +glooctl add route --path-prefix=/ --dest-name booksapp-webapp-7000 +``` + +## Contour + +Contour should be meshed with [ingress mode enabled](#ingress-mode), i.e. with +the `linkerd.io/inject: ingress` annotation rather than the default `enabled`. + +The following example uses the +[Contour getting started](https://projectcontour.io/getting-started/) documentation +to demonstrate how to set the required header manually. + +Contour's Envoy DaemonSet doesn't auto-mount the service account token, which +is required for the Linkerd proxy to do mTLS between pods. So first we need to +install Contour uninjected, patch the DaemonSet with +`automountServiceAccountToken: true`, and then inject it. Optionally you can +create a dedicated service account to avoid using the `default` one. + +```bash +# install Contour +kubectl apply -f https://projectcontour.io/quickstart/contour.yaml + +# create a service account (optional) +kubectl apply -f - << EOF +apiVersion: v1 +kind: ServiceAccount +metadata: + name: envoy + namespace: projectcontour +EOF + +# add service account to envoy (optional) +kubectl patch daemonset envoy -n projectcontour --type json -p='[{"op": "add", "path": "/spec/template/spec/serviceAccount", "value": "envoy"}]' + +# auto mount the service account token (required) +kubectl patch daemonset envoy -n projectcontour --type json -p='[{"op": "replace", "path": "/spec/template/spec/automountServiceAccountToken", "value": true}]' + +# inject linkerd first into the DaemonSet +kubectl -n projectcontour get daemonset -oyaml | linkerd inject - | kubectl apply -f - + +# inject linkerd into the Deployment +kubectl -n projectcontour get deployment -oyaml | linkerd inject - | kubectl apply -f - +``` + +Verify your Contour and Envoy installation has a running Linkerd sidecar. + +Next we'll deploy a demo service: + +```bash +linkerd inject https://projectcontour.io/examples/kuard.yaml | kubectl apply -f - +``` + +To route external traffic to your service you'll need to provide a HTTPProxy: + +```yaml +apiVersion: projectcontour.io/v1 +kind: HTTPProxy +metadata: + name: kuard + namespace: default +spec: + routes: + - requestHeadersPolicy: + set: + - name: l5d-dst-override + value: kuard.default.svc.cluster.local:80 + services: + - name: kuard + port: 80 + virtualhost: + fqdn: 127.0.0.1.nip.io +``` + +Notice the `l5d-dst-override` header is explicitly set to the target `service`. + +Finally, you can test your working service mesh: + +```bash +kubectl port-forward svc/envoy -n projectcontour 3200:80 +http://127.0.0.1.nip.io:3200 +``` + +{{< note >}} +You should annotate the pod spec with `config.linkerd.io/skip-outbound-ports: +8001`. The Envoy pod will try to connect to the Contour pod at port 8001 +through TLS, which is not supported under this ingress mode, so you need to +have the proxy skip that outbound port. +{{< /note >}} + +{{< note >}} +If you are using Contour with [flagger](https://github.com/weaveworks/flagger) +the `l5d-dst-override` headers will be set automatically. +{{< /note >}} + +### Kong + +Kong should be meshed with [ingress mode enabled](#ingress-mode), i.e. with the +`linkerd.io/inject: ingress` annotation rather than the default `enabled`. + +This example will use the following elements: + +- The [Kong chart](https://github.com/Kong/charts) +- The [emojivoto](../../getting-started/) example application + +Before installing emojivoto, install Linkerd and Kong on your cluster. When +injecting the Kong deployment, use the `--ingress` flag (or annotation). + +We need to declare KongPlugin (a Kong CRD) and Ingress resources as well. + +```yaml +apiVersion: configuration.konghq.com/v1 +kind: KongPlugin +metadata: + name: set-l5d-header + namespace: emojivoto +plugin: request-transformer +config: + remove: + headers: + - l5d-dst-override # Prevents open relay + add: + headers: + - l5d-dst-override:$(headers.host).svc.cluster.local +--- +# apiVersion: networking.k8s.io/v1beta1 # for k8s < v1.19 +apiVersion: networking.k8s.io/v1 +kind: Ingress +metadata: + name: web-ingress + namespace: emojivoto + annotations: + konghq.com/plugins: set-l5d-header +spec: + ingressClassName: kong + rules: + - http: + paths: + - path: /api/vote + pathType: Prefix + backend: + service: + name: web-svc + port: + name: http + - path: /api/list + pathType: Prefix + backend: + service: + name: web-svc + port: + name: http +``` + +Here we are explicitly setting the `l5d-dst-override` in the `KongPlugin`. +Using [templates as +values](https://docs.konghq.com/hub/kong-inc/request-transformer/#template-as-value), +we can use the `host` header from requests and set the `l5d-dst-override` value +based off that. + +Finally, install emojivoto so that it's `deploy/vote-bot` targets the +ingress and includes a `host` header value for the `web-svc.emojivoto` service. + +Before applying the injected emojivoto application, make the following changes +to the `vote-bot` Deployment: + +```yaml +env: +# Target the Kong ingress instead of the Emojivoto web service +- name: WEB_HOST + value: kong-proxy.kong:80 +# Override the host header on requests so that it can be used to set the l5d-dst-override header +- name: HOST_OVERRIDE + value: web-svc.emojivoto +``` + +### Haproxy + +{{< note >}} +There are two different haproxy-based ingress controllers. This example is for +the [kubernetes-ingress controller by +haproxytech](https://www.haproxy.com/documentation/kubernetes/latest/) and not +the [haproxy-ingress controller](https://haproxy-ingress.github.io/). +{{< /note >}} + +Haproxy should be meshed with [ingress mode enabled](#ingress-mode), i.e. with +the `linkerd.io/inject: ingress` annotation rather than the default `enabled`. + +The simplest way to use Haproxy as an ingress for Linkerd is to configure a +Kubernetes `Ingress` resource with the +`haproxy.org/request-set-header` annotation like this: + +```yaml +apiVersion: networking.k8s.io/v1 +kind: Ingress +metadata: + name: web-ingress + namespace: emojivoto + annotations: + kubernetes.io/ingress.class: haproxy + haproxy.org/request-set-header: | + l5d-dst-override web-svc.emojivoto.svc.cluster.local:80 +spec: + rules: + - host: example.com + http: + paths: + - path: / + pathType: Prefix + backend: + service: + name: web-svc + port: + number: 80 +``` + +Unfortunately, there is currently no support to do this dynamically in +a global config map by using the service name, namespace and port as variable. +This also means, that you can't combine more than one service ingress rule +in an ingress manifest as each one needs their own +`haproxy.org/request-set-header` annotation with hard coded value. + +## EnRoute OneStep {#enroute} + +Meshing EnRoute with Linkerd involves only setting one flag globally: + +```yaml +apiVersion: enroute.saaras.io/v1 +kind: GlobalConfig +metadata: + labels: + app: web + name: enable-linkerd + namespace: default +spec: + name: linkerd-global-config + type: globalconfig_globals + config: | + { + "linkerd_enabled": true + } +``` + +EnRoute can now be meshed by injecting Linkerd proxy in EnRoute pods. +Using the `linkerd` utility, we can update the EnRoute deployment +to inject Linkerd proxy. + +```bash +kubectl get -n enroute-demo deploy -o yaml | linkerd inject - | kubectl apply -f - +``` + +The `linkerd_enabled` flag automatically sets `l5d-dst-override` header. +The flag also delegates endpoint selection for routing to linkerd. + +More details and customization can be found in, +[End to End encryption using EnRoute with +Linkerd](https://getenroute.io/blog/end-to-end-encryption-mtls-linkerd-enroute/) + +## ngrok + +ngrok can be meshed normally: it does not require the +[ingress mode](#ingress-mode) annotation. + +After signing up for a [free ngrok account](https://ngrok.com/signup), and +running through the [installation steps for the ngrok Ingress controller +](https://github.com/ngrok/kubernetes-ingress-controller#installation), +you can add ingress by configuring an ingress object for your service and +applying it with `kubectl apply -f ingress.yaml`. + +This is an example for the emojivoto app used in the Linkerd getting started +guide. You will need to replace the `host` value with your +[free static domain](https://dashboard.ngrok.com/cloud-edge/domains) available +in your ngrok account. If you have a paid ngrok account, you can configure this +the same way you would use the [`--domain` +flag](https://ngrok.com/docs/secure-tunnels/ngrok-agent/reference/ngrok/) on +the ngrok agent. + +```yaml +apiVersion: networking.k8s.io/v1 +kind: Ingress +metadata: + name: emojivoto-ingress + namespace: emojivoto +spec: + ingressClassName: ngrok + rules: + - host: [YOUR STATIC DOMAIN.ngrok-free.app] + http: + paths: + - path: / + pathType: Prefix + backend: + service: + name: web-svc + port: + number: 80 +``` + +Your emojivoto app should be available to anyone in the world at your static +domain. + +## Ingress details + +In this section we cover how Linkerd interacts with ingress controllers in +general. + +In order for Linkerd to properly apply L7 features such as route-based metrics +and dynamic traffic routing, Linkerd needs the ingress controller to connect +to the IP/port of the destination Kubernetes Service. However, by default, +many ingresses do their own endpoint selection and connect directly to the +IP/port of the destination Pod, rather than the Service. + +Thus, combining an ingress with Linkerd takes one of two forms: + +1. Configure the ingress to connect to the IP and port of the Service as the + destination, i.e. to skip its own endpoint selection. (E.g. see + [Nginx](#nginx) above.) + +2. Alternatively, configure the ingress to pass the Service IP/port in a + header such as `l5d-dst-override`, `Host`, or `:authority`, and configure + Linkerd in *ingress* mode. In this mode, it will read from one of those + headers instead. + +The most common approach in form #2 is to use the explicit `l5d-dst-override` header. + +{{< note >}} +Some ingress controllers support sticky sessions. For session stickiness, the +ingress controller has to do its own endpoint selection. This means that +Linkerd will not be able to connect to the IP/port of the Kubernetes Service, +and will instead establish a direct connection to a pod. Therefore, sticky +sessions and `ServiceProfiles` are mutually exclusive. +{{< /note >}} + +{{< note >}} +If requests experience a 2-3 second delay after injecting your ingress +controller, it is likely that this is because the service of `type: +LoadBalancer` is obscuring the client source IP. You can fix this by setting +`externalTrafficPolicy: Local` in the ingress' service definition. +{{< /note >}} + +{{< note >}} +While the Kubernetes Ingress API definition allows a `backend`'s `servicePort` +to be a string value, only numeric `servicePort` values can be used with +Linkerd. If a string value is encountered, Linkerd will default to using port +80. +{{< /note >}} diff --git a/linkerd.io/content/2.16/tasks/using-psp.md b/linkerd.io/content/2.16/tasks/using-psp.md new file mode 100644 index 0000000000..5e06e3a15f --- /dev/null +++ b/linkerd.io/content/2.16/tasks/using-psp.md @@ -0,0 +1,11 @@ ++++ +title = "Linkerd and Pod Security Policies (PSP)" +description = "Using Linkerd with a pod security policies enabled." ++++ + +[Pod Security Policies](https://kubernetes.io/docs/concepts/policy/pod-security-policy/) +have been deprecated in Kuberenetes v1.21 and removed in v1.25. However, for +users who still want them, the Linkerd control plane comes with its own +minimally privileged Pod Security Policy and the associated RBAC resources which +can be optionally created by setting the `--set enablePSP=true` flag during +Linkerd install or upgrade, or by using the `enablePSP` Helm value. diff --git a/linkerd.io/content/2.16/tasks/using-the-debug-container.md b/linkerd.io/content/2.16/tasks/using-the-debug-container.md new file mode 100644 index 0000000000..63b6824ea8 --- /dev/null +++ b/linkerd.io/content/2.16/tasks/using-the-debug-container.md @@ -0,0 +1,104 @@ ++++ +title = "Using the Debug Sidecar" +description = "Inject the debug container to capture network packets." ++++ + +Debugging a service mesh can be hard. When something just isn't working, is +the problem with the proxy? With the application? With the client? With the +underlying network? Sometimes, nothing beats looking at raw network data. + +In cases where you need network-level visibility into packets entering and +leaving your application, Linkerd provides a *debug sidecar* with some helpful +tooling. Similar to how [proxy sidecar +injection](../../features/proxy-injection/) works, you add a debug sidecar to +a pod by setting the `config.linkerd.io/enable-debug-sidecar: "true"` annotation +at pod creation time. For convenience, the `linkerd inject` command provides an +`--enable-debug-sidecar` option that does this annotation for you. + +(Note that the set of containers in a Kubernetes pod is not mutable, so simply +adding this annotation to a pre-existing pod will not work. It must be present +at pod *creation* time.) + +{{< trylpt >}} + +The debug sidecar image contains +[`tshark`](https://www.wireshark.org/docs/man-pages/tshark.html), `tcpdump`, +`lsof`, and `iproute2`. Once installed, it starts automatically logging all +incoming and outgoing traffic with `tshark`, which can then be viewed with +`kubectl logs`. Alternatively, you can use `kubectl exec` to access the +container and run commands directly. + +For instance, if you've gone through the [Linkerd Getting +Started](../../getting-started/) guide and installed the +*emojivoto* application, and wish to debug traffic to the *voting* service, you +could run: + +```bash +kubectl -n emojivoto get deploy/voting -o yaml \ + | linkerd inject --enable-debug-sidecar - \ + | kubectl apply -f - +``` + +to deploy the debug sidecar container to all pods in the *voting* service. +(Note that there's only one pod in this deployment, which will be recreated +to do this--see the note about pod mutability above.) + +You can confirm that the debug container is running by listing +all the containers in pods with the `voting-svc` label: + +```bash +kubectl get pods -n emojivoto -l app=voting-svc \ + -o jsonpath='{.items[*].spec.containers[*].name}' +``` + +Then, you can watch live tshark output from the logs by simply running: + +```bash +kubectl -n emojivoto logs deploy/voting linkerd-debug -f +``` + +If that's not enough, you can exec to the container and run your own commands +in the context of the network. For example, if you want to inspect the HTTP headers +of the requests, you could run something like this: + +```bash +kubectl -n emojivoto exec -it \ + $(kubectl -n emojivoto get pod -l app=voting-svc \ + -o jsonpath='{.items[0].metadata.name}') \ + -c linkerd-debug -- tshark -i any -f "tcp" -V -Y "http.request" +``` + +A real-world error message written by the proxy that the debug sidecar is +effective in troubleshooting is a `Connection Refused` error like this one: + + ```log +ERR! [