Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add some documentation about the webhook #2158

Merged
merged 17 commits into from
Dec 13, 2019
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 3 additions & 1 deletion docs/operator-config.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -68,6 +68,8 @@ to:
[id="{p}-ns-config"]
=== Namespace and role configuration

The `operator-roles` and `namespaces` flags have some intricacies that are worth discussing. A fully functioning operator will *require* both `global` and `namespace` roles running in the cluster (though potentially in different operator deployments). That is to say, with `--operator-roles=global,namespace` (or `--operator-roles=all`). If you want to limit the operator to a specific set of namespaces, you must set the `namespaces` flag as well. For example `--operator-roles=global,namespace --namespaces=my-namespace1,mynamespace2`. To have it manage all namespaces, you can simply omit the `namespaces` flag.
The `operator-roles` and `namespaces` flags have some intricacies that are worth discussing. A fully functioning operator will *require* both `global` and `namespace` roles running in the cluster (though potentially in different operator deployments). That is to say, with `--operator-roles=global,namespace` (or `--operator-roles=all`). To limit the operator to a specific set of namespaces, set the `namespaces` flag as well. For example `--operator-roles=global,namespace --namespaces=my-namespace1,mynamespace2`. To make the operator manage all namespaces, omit the `namespaces` flag.

The global role acts across namespaces and is not related to a specific deployment of the Elastic stack. The global operator deployed cluster-wide is responsible for high-level cross-cluster features.

include::webhook.asciidoc[]
2 changes: 2 additions & 0 deletions docs/troubleshooting.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@ When things don't work as expected, you can investigate by taking the following
- <<{p}-pause-controllers,Pause ECK controllers>>
- <<{p}-get-k8s-events,Get Kubernetes events>>
- <<{p}-exec-into-containers,Exec into containers>>
- <<{p}-webhook-troubleshooting,Webhook troubleshooting>>
- <<{p}-ask-for-help,Ask for help>>

[float]
Expand Down Expand Up @@ -258,6 +259,7 @@ This can also be done for Kibana and APM Server.

On startup, the operator deploys an https://kubernetes.io/docs/reference/access-authn-authz/extensible-admission-controllers/[admission webhook] that points to the operator's service. If this is inaccessible, you may see errors in your Kubernetes API server logs indicating that it cannot reach the service. A common cause may be that the operator pods are failing to start for some reason, or that the control plane is isolated from the operator pod by some mechanism (for instance via network policies or running the control plane externally as in https://github.com/elastic/cloud-on-k8s/issues/896#issuecomment-507224945[issue #869] and https://github.com/elastic/cloud-on-k8s/issues/1369[issue #1369]).

You can also change the https://kubernetes.io/docs/reference/access-authn-authz/extensible-admission-controllers/#failure-policy[`failurePolicy`] of the webhook configuration to `Fail`, which will cause creations and updates to error out if there is an error contacting the webhook.
[float]
[id="{p}-ask-for-help"]
=== Ask for help
Expand Down
7 changes: 0 additions & 7 deletions docs/uninstalling-eck.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -22,10 +22,3 @@ Then, you can uninstall the operator:
----
kubectl delete -f https://download.elastic.co/downloads/eck/{eck_version}/all-in-one.yaml
----

barkbay marked this conversation as resolved.
Show resolved Hide resolved
And remove the webhook configuration:

[source,shell]
----
kubectl delete validatingwebhookconfigurations validating-webhook-configuration
----
109 changes: 109 additions & 0 deletions docs/webhook.asciidoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,109 @@
[id="{p}-webhook"]
=== Validating webhook

A validating webhook provides additional validation of Elasticsearch resources: it provides immediate feedback on the Elasticsearch manifests you submit, allowing you to catch errors right away before ECK even tries to fulfill your request.

[float]
=== Architecture
The webhook is composed of 4 main components. Here is a brief description of each of them to understand how they interact, their naming, and how they are managed.

. A `ValidatingWebhookConfiguration` object that defines the validating webhook, named `elastic-es-validation.k8s.elastic.co`. It must be created before starting the operator. The `caBundle` field can be automatically managed as part of the automatic certificate management _(see below)_.
. A Kubernetes Service is used to expose the validating server, named `elastic-webhook-server`. It is in the same Namespace as the webhook server.
. A webhook server that actually validates the submitted resources. In ECK it is the operator itself when it is configured with the `webhook` role. See <<{p}-operator-config,Configuring ECK>> for more information about the `operator-roles` flag.
. A Secret containing the required certificates to secure the connection between the API server and the webhook server.
Like the ValidatingWebhookConfiguration, it must be created before starting the operator, even if it is empty. By default its name is `elastic-webhook-server-cert`.
The content of this Secret and the lifecycle of the certificates are automatically managed for you. ECK generates a dedicated and separate certificate authority and ensures that all components are rotated before the expiration date. The certificate authority is also used to configure the `caBundle` field of the `ValidatingWebhookConfiguration`. You can disable this feature if you want to manage the certificates yourself or with https://github.com/jetstack/cert-manager[cert-manager]. See an example of the latter below.

[float]
=== Managing the webhook certificate with cert-manager

If ECK is currently running you first must ensure that the automatic certificate management feature is disabled. This can be done by updating the operator deployment manifest and adding the `--manage-webhook-certs=false` flag.

Then, cert-manager v0.11+ must be installed as described in the https://docs.cert-manager.io/en/latest/getting-started/install/[cert-manager documentation].

The following example shows how to create all the resources that a webhook requires to function.

[source,yaml,subs="attributes,+macros"]
----
cat $$<<$$EOF | kubectl apply -f -
---
# this configures
# - a self signed cert-manager issuer
# - a service to point to the webhook
# - a self signed certificate for the webhook service
# - a validating webhook configuration
apiVersion: cert-manager.io/v1alpha2
kind: Issuer
metadata:
name: selfsigned-issuer
namespace: elastic-system
spec:
selfSigned: {}
---
apiVersion: cert-manager.io/v1alpha2
kind: Certificate
metadata:
name: elastic-webhook
namespace: elastic-system
spec:
commonName: elastic-webhook.elastic-system.svc
dnsNames:
- elastic-webhook.elastic-system.svc.cluster.local
- elastic-webhook.elastic-system.svc
issuerRef:
kind: Issuer
name: selfsigned-issuer
secretName: elastic-webhook-server-cert
---
apiVersion: v1
kind: Service
metadata:
name: elastic-webhook-server
namespace: elastic-system
spec:
ports:
- port: 443
protocol: TCP
targetPort: 9443
selector:
control-plane: elastic-operator
sessionAffinity: None
type: ClusterIP
---
apiVersion: admissionregistration.k8s.io/v1beta1
kind: ValidatingWebhookConfiguration
metadata:
name: elastic-webhook.k8s.elastic.co
annotations:
cert-manager.io/inject-ca-from: elastic-system/elastic-webhook
webhooks:
- clientConfig:
caBundle: Cg==
service:
name: elastic-webhook
namespace: elastic-system
# this is the path controller-runtime automatically generates
path: /validate-elasticsearch-k8s-elastic-co-{eck_crd_version}-elasticsearch
failurePolicy: Ignore
name: elastic-es-validation.k8s.elastic.co
sideEffects: None
rules:
- apiGroups:
- elasticsearch.k8s.elastic.co
apiVersions:
- {eck_crd_version}
operations:
- CREATE
- UPDATE
resources:
- elasticsearches
EOF
Copy link
Contributor

@sebgl sebgl Dec 12, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:(
I hate that we have to duplicate and maintain up to date the operator manifest here. But I don't see an easy way out. Could we still try to display a minimal subset of the operator manifest instead with only non-default fields?

Copy link
Contributor

@anyasabo anyasabo Dec 12, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed on the duplication. When I first wrote this we did not have the main operator yaml configured for webhooks, so it seemed useful since there were a lot of changes necessary. Now we have most of them in the all-in-one though. So maybe we can just omit the sset entirely here and direct people to disable cert management in the operator? But keep the cert-manager resource examples

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It makes me realize that the sample is wrong here, we should include --manage-webhook-certs=false in the args of the container. I will remove the sset since it is clearly stated that "you first must ensure that the automatic certificate management feature is disabled"

----

NOTE: This example assumes that you have installed the operator in the `elastic-system` namespace.

[float]
=== Troubleshooting

Webhooks require network connectivity between the Kubernetes API server and the operator.
See <<{p}-webhook-troubleshooting,Webhook troubleshooting>> for more information about some known problems with some Kubernetes providers.