From 3a9c43ce3682bc2e38a038a7b3ff747806addc57 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Kamil=20Bregu=C5=82a?= Date: Sat, 6 Mar 2021 16:16:41 +0100 Subject: [PATCH 1/3] Create a new documentation package for Helm Chart --- chart/README.md | 422 +-------------- chart/values.schema.json | 16 +- docs/conf.py | 3 + .../docs_build/dev_index_template.html.jinja2 | 9 + docs/exts/docs_build/docs_builder.py | 2 +- docs/helm-chart/airflow-configuration.rst | 64 +++ docs/helm-chart/external-redis.rst | 33 ++ docs/helm-chart/index.rst | 83 +++ docs/helm-chart/keda.rst | 72 +++ docs/helm-chart/manage-dags-files.rst | 74 +++ docs/helm-chart/parameters-ref.rst | 487 ++++++++++++++++++ docs/helm-chart/quick-start.rst | 95 ++++ docs/spelling_wordlist.txt | 7 + 13 files changed, 943 insertions(+), 424 deletions(-) create mode 100644 docs/helm-chart/airflow-configuration.rst create mode 100644 docs/helm-chart/external-redis.rst create mode 100644 docs/helm-chart/index.rst create mode 100644 docs/helm-chart/keda.rst create mode 100644 docs/helm-chart/manage-dags-files.rst create mode 100644 docs/helm-chart/parameters-ref.rst create mode 100644 docs/helm-chart/quick-start.rst diff --git a/chart/README.md b/chart/README.md index 0a7d700ae2731d..7ae8612e5026df 100644 --- a/chart/README.md +++ b/chart/README.md @@ -19,6 +19,8 @@ # Helm Chart for Apache Airflow +> :warning: **This Helm Chart has yet to be released**. We are working to [release it officially](https://github.com/apache/airflow/issues/10752) as soon as possible. + [Apache Airflow](https://airflow.apache.org/) is a platform to programmatically author, schedule and monitor workflows. ## Introduction @@ -32,423 +34,13 @@ cluster using the [Helm](https://helm.sh) package manager. - Helm 2.11+ or Helm 3.0+ - PV provisioner support in the underlying infrastructure -## Configuring Airflow - -All Airflow configuration parameters (equivalent of `airflow.cfg`) are stored in [values.yaml](https://github.com/apache/airflow/blob/master/chart/values.yaml) under the `config` key . The following code demonstrates how one would allow webserver users to view the config from within the webserver application. See the bottom line of the example: - -```yaml -# Config settings to go into the mounted airflow.cfg -# -# Please note that these values are passed through the `tpl` function, so are -# all subject to being rendered as go templates. If you need to include a -# literal `{{` in a value, it must be expressed like this: -# -# a: '{{ "{{ not a template }}" }}' -# -# yamllint disable rule:line-length -config: - core: - dags_folder: '{{ include "airflow_dags" . }}' - load_examples: 'False' - executor: '{{ .Values.executor }}' - # For Airflow 1.10, backward compatibility - colored_console_log: 'False' - remote_logging: '{{- ternary "True" "False" .Values.elasticsearch.enabled }}' - # Authentication backend used for the experimental API - api: - auth_backend: airflow.api.auth.backend.deny_all - logging: - remote_logging: '{{- ternary "True" "False" .Values.elasticsearch.enabled }}' - colored_console_log: 'False' - logging_level: DEBUG - metrics: - statsd_on: '{{ ternary "True" "False" .Values.statsd.enabled }}' - statsd_port: 9125 - statsd_prefix: airflow - statsd_host: '{{ printf "%s-statsd" .Release.Name }}' - webserver: - enable_proxy_fix: 'True' - expose_config: 'True' # <<<<<<<<<< BY DEFAULT THIS IS 'False' BUT WE CHANGE IT TO 'True' PRIOR TO INSTALLING THE CHART -``` - -Generally speaking, it is useful to familiarize oneself with the Airflow configuration prior to installing and deploying the service. - -## Installing the Chart - -To install this repository from source (using helm 3) - -```bash -kubectl create namespace airflow -helm repo add stable https://charts.helm.sh/stable/ -helm dep update -helm install airflow . --namespace airflow -``` - -The command deploys Airflow on the Kubernetes cluster in the default configuration. The [Parameters](#parameters) -section lists the parameters that can be configured during installation. - -> **Tip**: List all releases using `helm list` - -## Upgrading the Chart - -To upgrade the chart with the release name `airflow`: - -```bash -helm upgrade airflow . --namespace airflow -``` - -## Uninstalling the Chart - -To uninstall/delete the `airflow` deployment: - -```bash -helm delete airflow --namespace airflow -``` - -The command removes all the Kubernetes components associated with the chart and deletes the release. - -## Updating DAGs - -The recommended way to update your DAGs with this chart is to build a new docker image with the latest DAG code (`docker build -t my-company/airflow:8a0da78 .`), push it to an accessible registry (`docker push my-company/airflow:8a0da78`), then update the Airflow pods with that image: - -```bash -helm upgrade airflow . \ - --set images.airflow.repository=my-company/airflow \ - --set images.airflow.tag=8a0da78 -``` - -For local development purpose you can also build the image locally and use it via deployment method described by Breeze. - -## Mounting DAGS using Git-Sync side car with Persistence enabled - -This option will use a Persistent Volume Claim with an accessMode of `ReadWriteMany`. The scheduler pod will sync DAGs from a git repository onto the PVC every configured number of seconds. The other pods will read the synced DAGs. Not all volume plugins have support for `ReadWriteMany` accessMode. Refer [Persistent Volume Access Modes](https://kubernetes.io/docs/concepts/storage/persistent-volumes/#access-modes) for details - -```bash -helm upgrade airflow . \ - --set dags.persistence.enabled=true \ - --set dags.gitSync.enabled=true - # you can also override the other persistence or gitSync values - # by setting the dags.persistence.* and dags.gitSync.* values - # Please refer to values.yaml for details -``` - -## Mounting DAGS using Git-Sync side car without Persistence - -This option will use an always running Git-Sync side car on every scheduler, webserver and worker pods. The Git-Sync side car containers will sync DAGs from a git repository every configured number of seconds. If you are using the KubernetesExecutor, Git-sync will run as an initContainer on your worker pods. - -```bash -helm upgrade airflow . \ - --set dags.persistence.enabled=false \ - --set dags.gitSync.enabled=true - # you can also override the other gitSync values - # by setting the dags.gitSync.* values - # Refer values.yaml for details -``` - -## Mounting DAGS from an externally populated PVC - -In this approach, Airflow will read the DAGs from a PVC which has `ReadOnlyMany` or `ReadWriteMany` accessMode. You will have to ensure that the PVC is populated/updated with the required DAGs(this won't be handled by the chart). You can pass in the name of the volume claim to the chart - -```bash -helm upgrade airflow . \ - --set dags.persistence.enabled=true \ - --set dags.persistence.existingClaim=my-volume-claim - --set dags.gitSync.enabled=false -``` - - -## Parameters - -The following tables lists the configurable parameters of the Airflow chart and their default values. - -| Parameter | Description | Default | -| ----------------------------------------------------- | ------------------------------------------------------------------------------------------------------------ | ------------------------------------------------- | -| `uid` | UID to run airflow pods under | `50000` | -| `gid` | GID to run airflow pods under | `50000` | -| `nodeSelector` | Node labels for pod assignment | `{}` | -| `affinity` | Affinity labels for pod assignment | `{}` | -| `tolerations` | Toleration labels for pod assignment | `[]` | -| `labels` | Common labels to add to all objects defined in this chart | `{}` | -| `privateRegistry.enabled` | Enable usage of a private registry for Airflow base image | `false` | -| `privateRegistry.repository` | Repository where base image lives (eg: quay.io) | `~` | -| `ingress.enabled` | Enable Kubernetes Ingress support | `false` | -| `ingress.web.*` | Configs for the Ingress of the web Service | Please refer to `values.yaml` | -| `ingress.flower.*` | Configs for the Ingress of the flower Service | Please refer to `values.yaml` | -| `networkPolicies.enabled` | Enable Network Policies to restrict traffic | `true` | -| `airflowHome` | Location of airflow home directory | `/opt/airflow` | -| `rbacEnabled` | Deploy pods with Kubernetes RBAC enabled | `true` | -| `executor` | Airflow executor (eg SequentialExecutor, LocalExecutor, CeleryExecutor, KubernetesExecutor) | `KubernetesExecutor` | -| `allowPodLaunching` | Allow airflow pods to talk to Kubernetes API to launch more pods | `true` | -| `defaultAirflowRepository` | Fallback docker repository to pull airflow image from | `apache/airflow` | -| `defaultAirflowTag` | Fallback docker image tag to deploy | `1.10.10.1-alpha2-python3.6` | -| `images.airflow.repository` | Docker repository to pull image from. Update this to deploy a custom image | `~` | -| `images.airflow.tag` | Docker image tag to pull image from. Update this to deploy a new custom image tag | `~` | -| `images.airflow.pullPolicy` | PullPolicy for airflow image | `IfNotPresent` | -| `images.flower.repository` | Docker repository to pull image from. Update this to deploy a custom image | `~` | -| `images.flower.tag` | Docker image tag to pull image from. Update this to deploy a new custom image tag | `~` | -| `images.flower.pullPolicy` | PullPolicy for flower image | `IfNotPresent` | -| `images.statsd.repository` | Docker repository to pull image from. Update this to deploy a custom image | `apache/airflow` | -| `images.statsd.tag` | Docker image tag to pull image from. Update this to deploy a new custom image tag | `airflow-statsd-exporter-2020.09.05-v0.17.0` | -| `images.statsd.pullPolicy` | PullPolicy for statsd-exporter image | `IfNotPresent` | -| `images.redis.repository` | Docker repository to pull image from. Update this to deploy a custom image | `redis` | -| `images.redis.tag` | Docker image tag to pull image from. Update this to deploy a new custom image tag | `6-buster` | -| `images.redis.pullPolicy` | PullPolicy for redis image | `IfNotPresent` | -| `images.pgbouncer.repository` | Docker repository to pull image from. Update this to deploy a custom image | `apache/airflow` | -| `images.pgbouncer.tag` | Docker image tag to pull image from. Update this to deploy a new custom image tag | `airflow-pgbouncer-2020.09.05-1.14.0` | -| `images.pgbouncer.pullPolicy` | PullPolicy for pgbouncer image | `IfNotPresent` | -| `images.pgbouncerExporter.repository` | Docker repository to pull image from. Update this to deploy a custom image | `apache/airflow` | -| `images.pgbouncerExporter.tag` | Docker image tag to pull image from. Update this to deploy a new custom image tag | `airflow-pgbouncer-exporter-2020.09.25-0.5.0` | -| `images.pgbouncerExporter.pullPolicy` | PullPolicy for pgbouncer-exporter image | `IfNotPresent` | -| `env` | Environment variables key/values to mount into Airflow pods (deprecated, prefer using extraEnv) | `[]` | -| `secret` | Secret name/key pairs to mount into Airflow pods | `[]` | -| `extraEnv` | Extra env 'items' that will be added to the definition of airflow containers | `~` | -| `extraEnvFrom` | Extra envFrom 'items' that will be added to the definition of airflow containers | `~` | -| `extraSecrets` | Extra Secrets that will be managed by the chart | `{}` | -| `extraConfigMaps` | Extra ConfigMaps that will be managed by the chart | `{}` | -| `data.metadataSecretName` | Secret name to mount Airflow connection string from | `~` | -| `data.resultBackendSecretName` | Secret name to mount Celery result backend connection string from | `~` | -| `data.brokerUrlSecretName` | Secret name to mount redis connection url string from | `~` | -| `data.metadataConection` | Field separated connection data (alternative to secret name) | `{}` | -| `data.resultBackendConnection` | Field separated connection data (alternative to secret name) | `{}` | -| `data.brokerUrl` | String containing the redis broker url (if you are using an "external" redis) | `{}` | -| `fernetKey` | String representing an Airflow Fernet key | `~` | -| `fernetKeySecretName` | Secret name for Airflow Fernet key | `~` | -| `kerberos.enabled` | Enable kerberos support for workers | `false` | -| `kerberos.ccacheMountPath` | Location of the ccache volume | `/var/kerberos-ccache` | -| `kerberos.ccacheFileName` | Name of the ccache file | `ccache` | -| `kerberos.configPath` | Path for the Kerberos config file | `/etc/krb5.conf` | -| `kerberos.keytabPath` | Path for the Kerberos keytab file | `/etc/airflow.keytab` | -| `kerberos.principal` | Name of the Kerberos principal | `airflow` | -| `kerberos.reinitFrequency` | Frequency of reinitialization of the Kerberos token | `3600` | -| `kerberos.config` | Content of the configuration file for kerberos (might be templated using Helm templates) | `` | -| `workers.replicas` | Replica count for Celery workers (if applicable) | `1` | -| `workers.keda.enabled` | Enable KEDA autoscaling features | `false` | -| `workers.keda.pollingInverval` | How often KEDA should poll the backend database for metrics in seconds | `5` | -| `workers.keda.cooldownPeriod` | How often KEDA should wait before scaling down in seconds | `30` | -| `workers.keda.maxReplicaCount` | Maximum number of Celery workers KEDA can scale to | `10` | -| `workers.kerberosSidecar.enabled` | Enable Kerberos sidecar for the worker | `false` | -| `workers.kerberosSidecar.resources.limits.cpu` | CPU Limit of Kerberos sidecar for the worker | `~` | -| `workers.kerberosSidecar.resources.limits.memory` | Memory Limit of Kerberos sidecar for the worker | `~` | -| `workers.kerberosSidecar.resources.requests.cpu` | CPU Request of Kerberos sidecar for the worker | `~` | -| `workers.kerberosSidecar.resources.requests.memory` | Memory Request of Kerberos sidecar for the worker | `~` | -| `workers.persistence.enabled` | Enable log persistence in workers via StatefulSet | `false` | -| `workers.persistence.size` | Size of worker volumes if enabled | `100Gi` | -| `workers.persistence.storageClassName` | StorageClass worker volumes should use if enabled | `default` | -| `workers.resources.limits.cpu` | CPU Limit of workers | `~` | -| `workers.resources.limits.memory` | Memory Limit of workers | `~` | -| `workers.resources.requests.cpu` | CPU Request of workers | `~` | -| `workers.resources.requests.memory` | Memory Request of workers | `~` | -| `workers.terminationGracePeriodSeconds` | How long Kubernetes should wait for Celery workers to gracefully drain before force killing | `600` | -| `workers.safeToEvict` | Allow Kubernetes to evict worker pods if needed (node downscaling) | `true` | -| `workers.serviceAccountAnnotations` | Annotations to add to worker kubernetes service account | `{}` | -| `workers.extraVolumes` | Mount additional volumes into worker | `[]` | -| `workers.extraVolumeMounts` | Mount additional volumes into worker | `[]` | -| `workers.nodeSelector` | Node labels for pod assignment | `{}` | -| `workers.affinity` | Affinity labels for pod assignment | `{}` | -| `workers.tolerations` | Toleration labels for pod assignment | `[]` | -| `scheduler.podDisruptionBudget.enabled` | Enable PDB on Airflow scheduler | `false` | -| `scheduler.podDisruptionBudget.config.maxUnavailable` | MaxUnavailable pods for scheduler | `1` | -| `scheduler.replicas` | # of parallel schedulers (Airflow 2.0 using Mysql 8+ or Postgres only) | `1` | -| `scheduler.resources.limits.cpu` | CPU Limit of scheduler | `~` | -| `scheduler.resources.limits.memory` | Memory Limit of scheduler | `~` | -| `scheduler.resources.requests.cpu` | CPU Request of scheduler | `~` | -| `scheduler.resources.requests.memory` | Memory Request of scheduler | `~` | -| `scheduler.airflowLocalSettings` | Custom Airflow local settings python file | `~` | -| `scheduler.safeToEvict` | Allow Kubernetes to evict scheduler pods if needed (node downscaling) | `true` | -| `scheduler.serviceAccountAnnotations` | Annotations to add to scheduler kubernetes service account | `{}` | -| `scheduler.extraVolumes` | Mount additional volumes into scheduler | `[]` | -| `scheduler.extraVolumeMounts` | Mount additional volumes into scheduler | `[]` | -| `scheduler.nodeSelector` | Node labels for pod assignment | `{}` | -| `scheduler.affinity` | Affinity labels for pod assignment | `{}` | -| `scheduler.tolerations` | Toleration labels for pod assignment | `[]` | -| `webserver.livenessProbe.initialDelaySeconds` | Webserver LivenessProbe initial delay | `15` | -| `webserver.livenessProbe.timeoutSeconds` | Webserver LivenessProbe timeout seconds | `30` | -| `webserver.livenessProbe.failureThreshold` | Webserver LivenessProbe failure threshold | `20` | -| `webserver.livenessProbe.periodSeconds` | Webserver LivenessProbe period seconds | `5` | -| `webserver.readinessProbe.initialDelaySeconds` | Webserver ReadinessProbe initial delay | `15` | -| `webserver.readinessProbe.timeoutSeconds` | Webserver ReadinessProbe timeout seconds | `30` | -| `webserver.readinessProbe.failureThreshold` | Webserver ReadinessProbe failure threshold | `20` | -| `webserver.readinessProbe.periodSeconds` | Webserver ReadinessProbe period seconds | `5` | -| `webserver.replicas` | How many Airflow webserver replicas should run | `1` | -| `webserver.resources.limits.cpu` | CPU Limit of webserver | `~` | -| `webserver.resources.limits.memory` | Memory Limit of webserver | `~` | -| `webserver.resources.requests.cpu` | CPU Request of webserver | `~` | -| `webserver.resources.requests.memory` | Memory Request of webserver | `~` | -| `webserver.service.annotations` | Annotations to be added to the webserver service | `{}` | -| `webserver.defaultUser` | Optional default airflow user information | `{}` | -| `webserver.nodeSelector` | Node labels for pod assignment | `{}` | -| `webserver.affinity` | Affinity labels for pod assignment | `{}` | -| `webserver.tolerations` | Toleration labels for pod assignment | `[]` | -| `flower.enabled` | Enable flower | `true` | -| `flower.nodeSelector` | Node labels for pod assignment | `{}` | -| `flower.affinity` | Affinity labels for pod assignment | `{}` | -| `flower.tolerations` | Toleration labels for pod assignment | `[]` | -| `statsd.nodeSelector` | Node labels for pod assignment | `{}` | -| `statsd.affinity` | Affinity labels for pod assignment | `{}` | -| `statsd.tolerations` | Toleration labels for pod assignment | `[]` | -| `statsd.extraMappings` | Additional mappings for statsd exporter | `[]` | -| `pgbouncer.nodeSelector` | Node labels for pod assignment | `{}` | -| `pgbouncer.affinity` | Affinity labels for pod assignment | `{}` | -| `pgbouncer.tolerations` | Toleration labels for pod assignment | `[]` | -| `redis.enabled` | Enable the redis provisioned by the chart | `true` | -| `redis.terminationGracePeriodSeconds` | Grace period for tasks to finish after SIGTERM is sent from Kubernetes. | `600` | -| `redis.persistence.enabled` | Enable persistent volumes. | `true` | -| `redis.persistence.size` | Volume size for redis StatefulSet. | `1Gi` | -| `redis.persistence.storageClassName` | If using a custom storageClass, pass name ref to all StatefulSets here. | `default` | -| `redis.resources.limits.cpu` | CPU Limit of redis | `~` | -| `redis.resources.limits.memory` | Memory Limit of redis | `~` | -| `redis.resources.requests.cpu` | CPU Request of redis | `~` | -| `redis.resources.requests.memory` | Memory Request of redis | `~` | -| `redis.passwordSecretName` | Redis password secret. | `~` | -| `redis.password` | If password is set, create secret with it, else generate a new one on install. | `~` | -| `redis.safeToEvict` | This setting tells Kubernetes that its ok to evict when it wants to scale a node down. | `true` | -| `redis.nodeSelector` | Node labels for pod assignment | `{}` | -| `redis.affinity` | Affinity labels for pod assignment | `{}` | -| `redis.tolerations` | Toleration labels for pod assignment | `[]` | -| `cleanup.nodeSelector` | Node labels for pod assignment | `{}` | -| `cleanup.affinity` | Affinity labels for pod assignment | `{}` | -| `cleanup.tolerations` | Toleration labels for pod assignment | `[]` | -| `dags.persistence.*` | Dag persistence configuration | Please refer to `values.yaml` | -| `dags.gitSync.*` | Git sync configuration | Please refer to `values.yaml` | -| `multiNamespaceMode` | Whether the KubernetesExecutor can launch pods in multiple namespaces | `False` | -| `serviceAccountAnnottions.*` | Map of annotations for worker, webserver, scheduler kubernetes service accounts | {} | - - -Specify each parameter using the `--set key=value[,key=value]` argument to `helm install`. For example, - -```bash -helm install --name my-release \ - --set executor=CeleryExecutor \ - --set enablePodLaunching=false . -``` - -## Autoscaling with KEDA - -*This feature is still experimental.* - -KEDA stands for Kubernetes Event Driven Autoscaling. [KEDA](https://github.com/kedacore/keda) is a custom controller that allows users to create custom bindings -to the Kubernetes [Horizontal Pod Autoscaler](https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale/). -We've built a scaler that allows users to create scalers based on postgreSQL queries and shared it with the community. This enables us to scale the number of airflow workers deployed on Kubernetes by this chart depending on the number of task that are `queued` or `running`. - -```bash -helm repo add kedacore https://kedacore.github.io/charts - -helm repo update - -helm install \ - --set image.keda=docker.io/kedacore/keda:1.2.0 \ - --set image.metricsAdapter=docker.io/kedacore/keda-metrics-adapter:1.2.0 \ - --namespace keda --name keda kedacore/keda -``` - -Once KEDA is installed (which should be pretty quick since there is only one pod). You can try out KEDA autoscaling -on this chart by setting `workers.keda.enabled=true` your helm command or in the `values.yaml`. -(Note: KEDA does not support StatefulSets so you need to set `worker.persistence.enabled` to `false`) - -```bash -kubectl create namespace airflow - -helm install airflow . \ - --namespace airflow \ - --set executor=CeleryExecutor \ - --set workers.keda.enabled=true \ - --set workers.persistence.enabled=false -``` - -KEDA will derive the desired number of celery workers by querying Airflow metadata database: - -```sql -SELECT - ceil(COUNT(*)::decimal / {{ .Values.config.celery.worker_concurrency }}) -FROM task_instance -WHERE state='running' OR state='queued' -``` - -You should set celery worker concurrency through the helm value `config.celery.worker_concurrency` (i.e. instead of airflow.cfg or environment variables) so that the KEDA trigger will be consistent with the worker concurrency setting. - -## Using an external redis instance - -When using the `CeleryExecutor` or the `CeleryKubernetesExecutor` the chart will by default create a redis Deployment/StatefulSet alongside airflow. -You can also use "your own" redis instance by providing the `data.brokerUrl` (or `data.borkerUrlSecretName`) value directly: - -```bash -helm install airflow . \ - --namespace airflow \ - --set executor=CeleryExecutor \ - --set redis.enabled=false \ - --set data.brokerUrl=redis://redis-user:password@redis-host:6379/0 -``` - -## Walkthrough using kind - -**Install kind, and create a cluster:** - -We recommend testing with Kubernetes 1.15, as this image doesn't support Kubernetes 1.16+ for CeleryExecutor presently. - -``` -kind create cluster \ - --image kindest/node:v1.15.7@sha256:e2df133f80ef633c53c0200114fce2ed5e1f6947477dbc83261a6a921169488d -``` - -Confirm it's up: - -``` -kubectl cluster-info --context kind-kind -``` - - -**Create namespace + install the chart:** - -``` -kubectl create namespace airflow -helm install airflow --n airflow . -``` - -It may take a few minutes. Confirm the pods are up: - -``` -kubectl get pods --all-namespaces -helm list -n airflow -``` - -Run `kubectl port-forward svc/airflow-webserver 8080:8080 -n airflow` -to port-forward the Airflow UI to http://localhost:8080/ to confirm Airflow is working. - -**Build a Docker image from your DAGs:** - -1. Create a project - - ```shell script - mkdir my-airflow-project && cd my-airflow-project - mkdir dags # put dags here - cat < Dockerfile - FROM apache/airflow - COPY . . - EOM - ``` - -2. Then build the image: - - ```shell script - docker build -t my-dags:0.0.1 . - ``` - -3. Load the image into kind: - - ```shell script - kind load docker-image my-dags:0.0.1 - ``` +## Documentation -4. Upgrade Helm deployment: +Documentation can be found at [../docs/helm-chart](/docs/helm-chart) directory. - ```shell script - # from airflow chart directory - helm upgrade airflow -n airflow \ - --set images.airflow.repository=my-dags \ - --set images.airflow.tag=0.0.1 \ - . - ``` +The latest development version is published on: +[http://apache-airflow-docs.s3-website.eu-central-1.amazonaws.com/docs/helm-chart/latest/index.html](http://apache-airflow-docs.s3-website.eu-central-1.amazonaws.com/docs/helm-chart/latest/index.html) ## Contributing -Check out [our contributing guide!](../CONTRIBUTING.rst) +Want to help build Apache Airflow? Check out our [contributing documentation](../CONTRIBUTING.rst). diff --git a/chart/values.schema.json b/chart/values.schema.json index 1478619f0c0284..618706cee794c5 100644 --- a/chart/values.schema.json +++ b/chart/values.schema.json @@ -280,37 +280,37 @@ } }, "pgbouncer": { - "description": "Configuration of the pgbouncer image.", + "description": "Configuration of the PgBouncer image.", "type": "object", "properties": { "repository": { - "description": "The pgbouncer image repository.", + "description": "The PgBouncer image repository.", "type": "string" }, "tag": { - "description": "The pgbouncer image tag.", + "description": "The PgBouncer image tag.", "type": "string" }, "pullPolicy": { - "description": "The pgbouncer image pull policy.", + "description": "The PgBouncer image pull policy.", "type": "string" } } }, "pgbouncerExporter": { - "description": "Configuration of the pgbouncerExporter image.", + "description": "Configuration of the PgBouncer exporter image.", "type": "object", "properties": { "repository": { - "description": "The pgbouncerExporter image repository.", + "description": "The PgBouncer exporter image repository.", "type": "string" }, "tag": { - "description": "The pgbouncerExporter image tag.", + "description": "The PgBouncer exporter image tag.", "type": "string" }, "pullPolicy": { - "description": "The pgbouncerExporter image pull policy.", + "description": "The PgBouncer exporter image pull policy.", "type": "string" } } diff --git a/docs/conf.py b/docs/conf.py index 353b2eb2f57730..45b1f297c68885 100644 --- a/docs/conf.py +++ b/docs/conf.py @@ -145,6 +145,9 @@ 'providers_packages_ref', ] ) +elif PACKAGE_NAME == "helm-chart": + # No extra extensions + pass else: extensions.append('autoapi.extension') # List of patterns, relative to source directory, that match files and diff --git a/docs/exts/docs_build/dev_index_template.html.jinja2 b/docs/exts/docs_build/dev_index_template.html.jinja2 index 0de5879307adef..0a92dc7c5941ba 100644 --- a/docs/exts/docs_build/dev_index_template.html.jinja2 +++ b/docs/exts/docs_build/dev_index_template.html.jinja2 @@ -67,6 +67,15 @@ +
+
+

Helm Chart

+

+ It will help you set up your own Airflow on a cloud/on-prem k8s environment and leverage its scalable nature to support a large group of users. Thanks to Kubernetes, we are not tied to a specific cloud provider. +

+
+
+ diff --git a/docs/exts/docs_build/docs_builder.py b/docs/exts/docs_build/docs_builder.py index b2867ce4a185f3..42e9ad92d642e0 100644 --- a/docs/exts/docs_build/docs_builder.py +++ b/docs/exts/docs_build/docs_builder.py @@ -241,4 +241,4 @@ def get_available_providers_packages(): def get_available_packages(): """Get list of all available packages to build.""" provider_package_names = get_available_providers_packages() - return ["apache-airflow", *provider_package_names, "apache-airflow-providers"] + return ["apache-airflow", *provider_package_names, "apache-airflow-providers", "helm-chart"] diff --git a/docs/helm-chart/airflow-configuration.rst b/docs/helm-chart/airflow-configuration.rst new file mode 100644 index 00000000000000..dbbc9e777eab8d --- /dev/null +++ b/docs/helm-chart/airflow-configuration.rst @@ -0,0 +1,64 @@ + .. Licensed to the Apache Software Foundation (ASF) under one + or more contributor license agreements. See the NOTICE file + distributed with this work for additional information + regarding copyright ownership. The ASF licenses this file + to you under the Apache License, Version 2.0 (the + "License"); you may not use this file except in compliance + with the License. You may obtain a copy of the License at + + .. http://www.apache.org/licenses/LICENSE-2.0 + + .. Unless required by applicable law or agreed to in writing, + software distributed under the License is distributed on an + "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + KIND, either express or implied. See the License for the + specific language governing permissions and limitations + under the License. + +Configuring Airflow +------------------- + +All Airflow configuration parameters (equivalent of ``airflow.cfg``) are +stored in +`values.yaml `__ +under the ``config`` key . The following code demonstrates how one would +allow webserver users to view the config from within the webserver +application. See the bottom line of the example: + +.. code-block:: yaml + + # Config settings to go into the mounted airflow.cfg + # + # Please note that these values are passed through the ``tpl`` function, so are + # all subject to being rendered as go templates. If you need to include a + # literal ``{{`` in a value, it must be expressed like this: + # + # a: '{{ "{{ not a template }}" }}' + # + # yamllint disable rule:line-length + config: + core: + dags_folder: '{{ include "airflow_dags" . }}' + load_examples: 'False' + executor: '{{ .Values.executor }}' + # For Airflow 1.10, backward compatibility + colored_console_log: 'False' + remote_logging: '{{- ternary "True" "False" .Values.elasticsearch.enabled }}' + # Authentication backend used for the experimental API + api: + auth_backend: airflow.api.auth.backend.deny_all + logging: + remote_logging: '{{- ternary "True" "False" .Values.elasticsearch.enabled }}' + colored_console_log: 'False' + logging_level: DEBUG + metrics: + statsd_on: '{{ ternary "True" "False" .Values.statsd.enabled }}' + statsd_port: 9125 + statsd_prefix: airflow + statsd_host: '{{ printf "%s-statsd" .Release.Name }}' + webserver: + enable_proxy_fix: 'True' + expose_config: 'True' # <<<<<<<<<< BY DEFAULT THIS IS 'False' BUT WE CHANGE IT TO 'True' PRIOR TO INSTALLING THE CHART + +Generally speaking, it is useful to familiarize oneself with the Airflow +configuration prior to installing and deploying the service. diff --git a/docs/helm-chart/external-redis.rst b/docs/helm-chart/external-redis.rst new file mode 100644 index 00000000000000..90a9d1daf0aaa9 --- /dev/null +++ b/docs/helm-chart/external-redis.rst @@ -0,0 +1,33 @@ + .. Licensed to the Apache Software Foundation (ASF) under one + or more contributor license agreements. See the NOTICE file + distributed with this work for additional information + regarding copyright ownership. The ASF licenses this file + to you under the Apache License, Version 2.0 (the + "License"); you may not use this file except in compliance + with the License. You may obtain a copy of the License at + + .. http://www.apache.org/licenses/LICENSE-2.0 + + .. Unless required by applicable law or agreed to in writing, + software distributed under the License is distributed on an + "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + KIND, either express or implied. See the License for the + specific language governing permissions and limitations + under the License. + +External Redis +-------------- + +When using the ``CeleryExecutor`` or the ``CeleryKubernetesExecutor`` +the chart will by default create a redis Deployment/StatefulSet +alongside airflow. You can also use “your own” redis instance by +providing the ``data.brokerUrl`` (or ``data.borkerUrlSecretName``) value +directly: + +.. code-block:: bash + + helm install airflow . \ + --namespace airflow \ + --set executor=CeleryExecutor \ + --set redis.enabled=false \ + --set data.brokerUrl=redis://redis-user:password@redis-host:6379/0 diff --git a/docs/helm-chart/index.rst b/docs/helm-chart/index.rst new file mode 100644 index 00000000000000..95a07ded78adb0 --- /dev/null +++ b/docs/helm-chart/index.rst @@ -0,0 +1,83 @@ + .. Licensed to the Apache Software Foundation (ASF) under one + or more contributor license agreements. See the NOTICE file + distributed with this work for additional information + regarding copyright ownership. The ASF licenses this file + to you under the Apache License, Version 2.0 (the + "License"); you may not use this file except in compliance + with the License. You may obtain a copy of the License at + + .. http://www.apache.org/licenses/LICENSE-2.0 + + .. Unless required by applicable law or agreed to in writing, + software distributed under the License is distributed on an + "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + KIND, either express or implied. See the License for the + specific language governing permissions and limitations + under the License. + +Helm Chart for Apache Airflow +============================= + +.. toctree:: + :hidden: + + Home + quick-start + airflow-configuration + manage-dags-files + keda + external-redis + +.. toctree:: + :hidden: + :caption: References + + Parameters + + +This chart will bootstrap an `Airflow `__ +deployment on a `Kubernetes `__ cluster using the +`Helm `__ package manager. + +Prerequisites +------------- + +- Kubernetes 1.14+ cluster +- Helm 2.11+ or Helm 3.0+ +- PV provisioner support in the underlying infrastructure + +Installing the Chart +-------------------- + +To install this repository from source (using helm 3) + +.. code-block:: bash + + kubectl create namespace airflow + helm dep update + helm install airflow . --namespace airflow + +The command deploys Airflow on the Kubernetes cluster in the default configuration. The :doc:`parameters-ref` +section lists the parameters that can be configured during installation. + +> **Tip**: List all releases using ``helm list``. + +Upgrading the Chart +------------------- + +To upgrade the chart with the release name ``airflow``: + +.. code-block:: bash + + helm upgrade airflow . --namespace airflow + +Uninstalling the Chart +---------------------- + +To uninstall/delete the ``airflow`` deployment: + +.. code-block:: bash + + helm delete airflow --namespace airflow + +The command removes all the Kubernetes components associated with the chart and deletes the release. diff --git a/docs/helm-chart/keda.rst b/docs/helm-chart/keda.rst new file mode 100644 index 00000000000000..7fc9b666c1c5b9 --- /dev/null +++ b/docs/helm-chart/keda.rst @@ -0,0 +1,72 @@ + .. Licensed to the Apache Software Foundation (ASF) under one + or more contributor license agreements. See the NOTICE file + distributed with this work for additional information + regarding copyright ownership. The ASF licenses this file + to you under the Apache License, Version 2.0 (the + "License"); you may not use this file except in compliance + with the License. You may obtain a copy of the License at + + .. http://www.apache.org/licenses/LICENSE-2.0 + + .. Unless required by applicable law or agreed to in writing, + software distributed under the License is distributed on an + "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + KIND, either express or implied. See the License for the + specific language governing permissions and limitations + under the License. + +Autoscaling with KEDA +--------------------- + +*This feature is still experimental.* + +KEDA stands for Kubernetes Event Driven Autoscaling. +`KEDA `__ is a custom controller that +allows users to create custom bindings to the Kubernetes `Horizontal Pod +Autoscaler `__. +We have built a scaler that allows users to create scalers based on +PostgreSQL queries and shared it with the community. This enables us to +scale the number of airflow workers deployed on Kubernetes by this chart +depending on the number of task that are ``queued`` or ``running``. + +.. code-block:: bash + + helm repo add kedacore https://kedacore.github.io/charts + + helm repo update + + helm install \ + --set image.keda=docker.io/kedacore/keda:1.2.0 \ + --set image.metricsAdapter=docker.io/kedacore/keda-metrics-adapter:1.2.0 \ + --namespace keda --name keda kedacore/keda + +Once KEDA is installed (which should be pretty quick since there is only +one pod). You can try out KEDA autoscaling on this chart by setting +``workers.keda.enabled=true`` your helm command or in the +``values.yaml``. (Note: KEDA does not support StatefulSets so you need +to set ``worker.persistence.enabled`` to ``false``) + +.. code-block:: bash + + kubectl create namespace airflow + + helm install airflow . \ + --namespace airflow \ + --set executor=CeleryExecutor \ + --set workers.keda.enabled=true \ + --set workers.persistence.enabled=false + +KEDA will derive the desired number of celery workers by querying +Airflow metadata database: + +.. code-block:: none + + SELECT + ceil(COUNT(*)::decimal / {{ .Values.config.celery.worker_concurrency }}) + FROM task_instance + WHERE state='running' OR state='queued' + +You should set celery worker concurrency through the helm value +``config.celery.worker_concurrency`` (i.e. instead of airflow.cfg or +environment variables) so that the KEDA trigger will be consistent with +the worker concurrency setting. diff --git a/docs/helm-chart/manage-dags-files.rst b/docs/helm-chart/manage-dags-files.rst new file mode 100644 index 00000000000000..a9d46cc467845b --- /dev/null +++ b/docs/helm-chart/manage-dags-files.rst @@ -0,0 +1,74 @@ + .. Licensed to the Apache Software Foundation (ASF) under one + or more contributor license agreements. See the NOTICE file + distributed with this work for additional information + regarding copyright ownership. The ASF licenses this file + to you under the Apache License, Version 2.0 (the + "License"); you may not use this file except in compliance + with the License. You may obtain a copy of the License at + + .. http://www.apache.org/licenses/LICENSE-2.0 + + .. Unless required by applicable law or agreed to in writing, + software distributed under the License is distributed on an + "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + KIND, either express or implied. See the License for the + specific language governing permissions and limitations + under the License. + +Manage DAGs files +================= + +When you create new or modify existing DAG files, it is necessary to implement them into the environment. This section will describe some basic techniques you can use. + +Bake DAGs in Docker image +------------------------- + +The recommended way to update your DAGs with this chart is to build a new docker image with the latest DAG code (``docker build -t my-company/airflow:8a0da78 . ``), push it to an accessible registry ```docker push my-company/airflow:8a0da78``), then update the Airflow pods with that image: + +.. code-block:: bash + + helm upgrade airflow . \ + --set images.airflow.repository=my-company/airflow \ + --set images.airflow.tag=8a0da78 + +For local development purpose you can also build the image locally and use it via deployment method described by Breeze. + +Mounting DAGs using Git-Sync sidecar with Persistence enabled +------------------------------------------------------------- + +This option will use a Persistent Volume Claim with an access mode of ``ReadWriteMany``. The scheduler pod will sync DAGs from a git repository onto the PVC every configured number of seconds. The other pods will read the synced DAGs. Not all volume plugins have support for ``ReadWriteMany`` access mode. Refer `Persistent Volume Access Modes `__ for details + +.. code-block:: bash + + helm upgrade airflow . \ + --set dags.persistence.enabled=true \ + --set dags.gitSync.enabled=true + # you can also override the other persistence or gitSync values + # by setting the dags.persistence.* and dags.gitSync.* values + # Please refer to values.yaml for details + +Mounting DAGs using Git-Sync sidecar without Persistence +-------------------------------------------------------- + +This option will use an always running Git-Sync side car on every scheduler, webserver and worker pods. The Git-Sync side car containers will sync DAGs from a git repository every configured number of seconds. If you are using the KubernetesExecutor, Git-sync will run as an init container on your worker pods. + +.. code-block:: bash + + helm upgrade airflow . \ + --set dags.persistence.enabled=false \ + --set dags.gitSync.enabled=true + # you can also override the other gitSync values + # by setting the dags.gitSync.* values + # Refer values.yaml for details + +Mounting DAGs from an externally populated PVC +---------------------------------------------- + +In this approach, Airflow will read the DAGs from a PVC which has ``ReadOnlyMany`` or ``ReadWriteMany`` access mode. You will have to ensure that the PVC is populated/updated with the required DAGs(this won't be handled by the chart). You can pass in the name of the volume claim to the chart + +.. code-block:: bash + + helm upgrade airflow . \ + --set dags.persistence.enabled=true \ + --set dags.persistence.existingClaim=my-volume-claim + --set dags.gitSync.enabled=false diff --git a/docs/helm-chart/parameters-ref.rst b/docs/helm-chart/parameters-ref.rst new file mode 100644 index 00000000000000..8fac0af0f1e673 --- /dev/null +++ b/docs/helm-chart/parameters-ref.rst @@ -0,0 +1,487 @@ + .. Licensed to the Apache Software Foundation (ASF) under one + or more contributor license agreements. See the NOTICE file + distributed with this work for additional information + regarding copyright ownership. The ASF licenses this file + to you under the Apache License, Version 2.0 (the + "License"); you may not use this file except in compliance + with the License. You may obtain a copy of the License at + + .. http://www.apache.org/licenses/LICENSE-2.0 + + .. Unless required by applicable law or agreed to in writing, + software distributed under the License is distributed on an + "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + KIND, either express or implied. See the License for the + specific language governing permissions and limitations + under the License. + +Parameters reference +==================== + +The following tables lists the configurable parameters of the Airflow chart and their default values. + +.. list-table:: + :widths: 15 10 30 + :header-rows: 1 + + * - Parameter + - Description + - Default + * - ``uid`` + - UID to run airflow pods under + - ``1`` + * - ``gid`` + - GID to run airflow pods under + - ``1`` + * - ``nodeSelector`` + - Node labels for pod assignment + - ``1`` + * - ``affinity`` + - Affinity labels for pod assignment + - ``1`` + * - ``tolerations`` + - Toleration labels for pod assignment + - ``1`` + * - ``labels`` + - Common labels to add to all objects defined in this chart + - ``1`` + * - ``privateRegistry.enabled`` + - Enable usage of a private registry for Airflow base image + - ``1`` + * - ``privateRegistry.repository`` + - Repository where base image lives (eg: quay.io) + - ``1`` + * - ``ingress.enabled`` + - Enable Kubernetes Ingress support + - ``1`` + * - ``ingress.web.*`` + - Configs for the Ingress of the web Service + - Please refer to ``values.yaml`` + * - ``ingress.flower.*`` + - Configs for the Ingress of the flower Service + - Please refer to ``values.yaml`` + * - ``networkPolicies.enabled`` + - Enable Network Policies to restrict traffic + - ``1`` + * - ``airflowHome`` + - Location of airflow home directory + - ``1`` + * - ``rbacEnabled`` + - Deploy pods with Kubernetes RBAC enabled + - ``1`` + * - ``executor`` + - Airflow executor (eg SequentialExecutor, LocalExecutor, CeleryExecutor, KubernetesExecutor) + - ``1`` + * - ``allowPodLaunching`` + - Allow airflow pods to talk to Kubernetes API to launch more pods + - ``1`` + * - ``defaultAirflowRepository`` + - Fallback docker repository to pull airflow image from + - ``1`` + * - ``defaultAirflowTag`` + - Fallback docker image tag to deploy + - ``1`` + * - ``images.airflow.repository`` + - Docker repository to pull image from. Update this to deploy a custom image + - ``1`` + * - ``images.airflow.tag`` + - Docker image tag to pull image from. Update this to deploy a new custom image tag + - ``1`` + * - ``images.airflow.pullPolicy`` + - PullPolicy for airflow image + - ``1`` + * - ``images.flower.repository`` + - Docker repository to pull image from. Update this to deploy a custom image + - ``1`` + * - ``images.flower.tag`` + - Docker image tag to pull image from. Update this to deploy a new custom image tag + - ``1`` + * - ``images.flower.pullPolicy`` + - PullPolicy for flower image + - ``1`` + * - ``images.statsd.repository`` + - Docker repository to pull image from. Update this to deploy a custom image + - ``1`` + * - ``images.statsd.tag`` + - Docker image tag to pull image from. Update this to deploy a new custom image tag + - ``1`` + * - ``images.statsd.pullPolicy`` + - PullPolicy for statsd-exporter image + - ``1`` + * - ``images.redis.repository`` + - Docker repository to pull image from. Update this to deploy a custom image + - ``1`` + * - ``images.redis.tag`` + - Docker image tag to pull image from. Update this to deploy a new custom image tag + - ``1`` + * - ``images.redis.pullPolicy`` + - PullPolicy for redis image + - ``1`` + * - ``images.pgbouncer.repository`` + - Docker repository to pull image from. Update this to deploy a custom image + - ``1`` + * - ``images.pgbouncer.tag`` + - Docker image tag to pull image from. Update this to deploy a new custom image tag + - ``1`` + * - ``images.pgbouncer.pullPolicy`` + - PullPolicy for PgBouncer image + - ``1`` + * - ``images.pgbouncerExporter.repository`` + - Docker repository to pull image from. Update this to deploy a custom image + - ``1`` + * - ``images.pgbouncerExporter.tag`` + - Docker image tag to pull image from. Update this to deploy a new custom image tag + - ``1`` + * - ``images.pgbouncerExporter.pullPolicy`` + - PullPolicy for ``pgbouncer-exporter`` image + - ``1`` + * - ``env`` + - Environment variables key/values to mount into Airflow pods (deprecated, prefer using ``extraEnv``) + - ``1`` + * - ``secret`` + - Secret name/key pairs to mount into Airflow pods + - ``1`` + * - ``extraEnv`` + - Extra env 'items' that will be added to the definition of airflow containers + - ``1`` + * - ``extraEnvFrom`` + - Extra envFrom 'items' that will be added to the definition of airflow containers + - ``1`` + * - ``extraSecrets`` + - Extra Secrets that will be managed by the chart + - ``1`` + * - ``extraConfigMaps`` + - Extra ConfigMaps that will be managed by the chart + - ``1`` + * - ``data.metadataSecretName`` + - Secret name to mount Airflow connection string from + - ``1`` + * - ``data.resultBackendSecretName`` + - Secret name to mount Celery result backend connection string from + - ``1`` + * - ``data.brokerUrlSecretName`` + - Secret name to mount redis connection url string from + - ``1`` + * - ``data.metadataConection`` + - Field separated connection data (alternative to secret name) + - ``1`` + * - ``data.resultBackendConnection`` + - Field separated connection data (alternative to secret name) + - ``1`` + * - ``data.brokerUrl`` + - String containing the redis broker url (if you are using an "external" redis) + - ``1`` + * - ``fernetKey`` + - String representing an Airflow Fernet key + - ``1`` + * - ``fernetKeySecretName`` + - Secret name for Airflow Fernet key + - ``1`` + * - ``kerberos.enabled`` + - Enable kerberos support for workers + - ``1`` + * - ``kerberos.ccacheMountPath`` + - Location of the ccache volume + - ``1`` + * - ``kerberos.ccacheFileName`` + - Name of the ccache file + - ``1`` + * - ``kerberos.configPath`` + - Path for the Kerberos config file + - ``1`` + * - ``kerberos.keytabPath`` + - Path for the Kerberos keytab file + - ``1`` + * - ``kerberos.principal`` + - Name of the Kerberos principal + - ``1`` + * - ``kerberos.reinitFrequency`` + - Frequency of reinitialization of the Kerberos token + - ``1`` + * - ``kerberos.config`` + - Content of the configuration file for kerberos (might be templated using Helm templates) + - ``1`` + * - ``workers.replicas`` + - Replica count for Celery workers (if applicable) + - ``1`` + * - ``workers.keda.enabled`` + - Enable KEDA autoscaling features + - ``1`` + * - ``workers.keda.pollingInverval`` + - How often KEDA should poll the backend database for metrics in seconds + - ``1`` + * - ``workers.keda.cooldownPeriod`` + - How often KEDA should wait before scaling down in seconds + - ``1`` + * - ``workers.keda.maxReplicaCount`` + - Maximum number of Celery workers KEDA can scale to + - ``1`` + * - ``workers.kerberosSidecar.enabled`` + - Enable Kerberos sidecar for the worker + - ``1`` + * - ``workers.kerberosSidecar.resources.limits.cpu`` + - CPU Limit of Kerberos sidecar for the worker + - ``1`` + * - ``workers.kerberosSidecar.resources.limits.memory`` + - Memory Limit of Kerberos sidecar for the worker + - ``1`` + * - ``workers.kerberosSidecar.resources.requests.cpu`` + - CPU Request of Kerberos sidecar for the worker + - ``1`` + * - ``workers.kerberosSidecar.resources.requests.memory`` + - Memory Request of Kerberos sidecar for the worker + - ``1`` + * - ``workers.persistence.enabled`` + - Enable log persistence in workers via StatefulSet + - ``1`` + * - ``workers.persistence.size`` + - Size of worker volumes if enabled + - ``1`` + * - ``workers.persistence.storageClassName`` + - Storage class worker volumes should use if enabled + - ``1`` + * - ``workers.resources.limits.cpu`` + - CPU Limit of workers + - ``1`` + * - ``workers.resources.limits.memory`` + - Memory Limit of workers + - ``1`` + * - ``workers.resources.requests.cpu`` + - CPU Request of workers + - ``1`` + * - ``workers.resources.requests.memory`` + - Memory Request of workers + - ``1`` + * - ``workers.terminationGracePeriodSeconds`` + - How long Kubernetes should wait for Celery workers to gracefully drain before force killing + - ``1`` + * - ``workers.safeToEvict`` + - Allow Kubernetes to evict worker pods if needed (node downscaling) + - ``1`` + * - ``workers.serviceAccountAnnotations`` + - Annotations to add to worker kubernetes service account + - ``1`` + * - ``workers.extraVolumes`` + - Mount additional volumes into worker + - ``1`` + * - ``workers.extraVolumeMounts`` + - Mount additional volumes into worker + - ``1`` + * - ``workers.nodeSelector`` + - Node labels for pod assignment + - ``1`` + * - ``workers.affinity`` + - Affinity labels for pod assignment + - ``1`` + * - ``workers.tolerations`` + - Toleration labels for pod assignment + - ``1`` + * - ``scheduler.podDisruptionBudget.enabled`` + - Enable PDB on Airflow scheduler + - ``1`` + * - ``scheduler.podDisruptionBudget.config.maxUnavailable`` + - MaxUnavailable pods for scheduler + - ``1`` + * - ``scheduler.replicas`` + - # of parallel schedulers (Airflow 2.0 using Mysql 8+ or Postgres only) + - ``1`` + * - ``scheduler.resources.limits.cpu`` + - CPU Limit of scheduler + - ``1`` + * - ``scheduler.resources.limits.memory`` + - Memory Limit of scheduler + - ``1`` + * - ``scheduler.resources.requests.cpu`` + - CPU Request of scheduler + - ``1`` + * - ``scheduler.resources.requests.memory`` + - Memory Request of scheduler + - ``1`` + * - ``scheduler.airflowLocalSettings`` + - Custom Airflow local settings python file + - ``1`` + * - ``scheduler.safeToEvict`` + - Allow Kubernetes to evict scheduler pods if needed (node downscaling) + - ``1`` + * - ``scheduler.serviceAccountAnnotations`` + - Annotations to add to scheduler kubernetes service account + - ``1`` + * - ``scheduler.extraVolumes`` + - Mount additional volumes into scheduler + - ``1`` + * - ``scheduler.extraVolumeMounts`` + - Mount additional volumes into scheduler + - ``1`` + * - ``scheduler.nodeSelector`` + - Node labels for pod assignment + - ``1`` + * - ``scheduler.affinity`` + - Affinity labels for pod assignment + - ``1`` + * - ``scheduler.tolerations`` + - Toleration labels for pod assignment + - ``1`` + * - ``webserver.livenessProbe.initialDelaySeconds`` + - Webserver LivenessProbe initial delay + - ``1`` + * - ``webserver.livenessProbe.timeoutSeconds`` + - Webserver LivenessProbe timeout seconds + - ``1`` + * - ``webserver.livenessProbe.failureThreshold`` + - Webserver LivenessProbe failure threshold + - ``1`` + * - ``webserver.livenessProbe.periodSeconds`` + - Webserver LivenessProbe period seconds + - ``1`` + * - ``webserver.readinessProbe.initialDelaySeconds`` + - Webserver ReadinessProbe initial delay + - ``1`` + * - ``webserver.readinessProbe.timeoutSeconds`` + - Webserver ReadinessProbe timeout seconds + - ``1`` + * - ``webserver.readinessProbe.failureThreshold`` + - Webserver ReadinessProbe failure threshold + - ``1`` + * - ``webserver.readinessProbe.periodSeconds`` + - Webserver ReadinessProbe period seconds + - ``1`` + * - ``webserver.replicas`` + - How many Airflow webserver replicas should run + - ``1`` + * - ``webserver.resources.limits.cpu`` + - CPU Limit of webserver + - ``1`` + * - ``webserver.resources.limits.memory`` + - Memory Limit of webserver + - ``1`` + * - ``webserver.resources.requests.cpu`` + - CPU Request of webserver + - ``1`` + * - ``webserver.resources.requests.memory`` + - Memory Request of webserver + - ``1`` + * - ``webserver.service.annotations`` + - Annotations to be added to the webserver service + - ``1`` + * - ``webserver.defaultUser`` + - Optional default airflow user information + - ``1`` + * - ``webserver.nodeSelector`` + - Node labels for pod assignment + - ``1`` + * - ``webserver.affinity`` + - Affinity labels for pod assignment + - ``1`` + * - ``webserver.tolerations`` + - Toleration labels for pod assignment + - ``1`` + * - ``flower.enabled`` + - Enable flower + - ``1`` + * - ``flower.nodeSelector`` + - Node labels for pod assignment + - ``1`` + * - ``flower.affinity`` + - Affinity labels for pod assignment + - ``1`` + * - ``flower.tolerations`` + - Toleration labels for pod assignment + - ``1`` + * - ``statsd.nodeSelector`` + - Node labels for pod assignment + - ``1`` + * - ``statsd.affinity`` + - Affinity labels for pod assignment + - ``1`` + * - ``statsd.tolerations`` + - Toleration labels for pod assignment + - ``1`` + * - ``statsd.extraMappings`` + - Additional mappings for statsd exporter + - ``1`` + * - ``pgbouncer.nodeSelector`` + - Node labels for pod assignment + - ``1`` + * - ``pgbouncer.affinity`` + - Affinity labels for pod assignment + - ``1`` + * - ``pgbouncer.tolerations`` + - Toleration labels for pod assignment + - ``1`` + * - ``redis.enabled`` + - Enable the redis provisioned by the chart + - ``1`` + * - ``redis.terminationGracePeriodSeconds`` + - Grace period for tasks to finish after SIGTERM is sent from Kubernetes. + - ``1`` + * - ``redis.persistence.enabled`` + - Enable persistent volumes. + - ``1`` + * - ``redis.persistence.size`` + - Volume size for redis StatefulSet. + - ``1Gi`` + * - ``redis.persistence.storageClassName`` + - If using a custom storage class, pass name ref to all StatefulSets here. + - ``1`` + * - ``redis.resources.limits.cpu`` + - CPU Limit of redis + - ``1`` + * - ``redis.resources.limits.memory`` + - Memory Limit of redis + - ``1`` + * - ``redis.resources.requests.cpu`` + - CPU Request of redis + - ``1`` + * - ``redis.resources.requests.memory`` + - Memory Request of redis + - ``1`` + * - ``redis.passwordSecretName`` + - Redis password secret. + - ``1`` + * - ``redis.password`` + - If password is set, create secret with it, else generate a new one on install. + - ``1`` + * - ``redis.safeToEvict`` + - This setting tells Kubernetes that its ok to evict when it wants to scale a node down. + - ``1`` + * - ``redis.nodeSelector`` + - Node labels for pod assignment + - ``1`` + * - ``redis.affinity`` + - Affinity labels for pod assignment + - ``1`` + * - ``redis.tolerations`` + - Toleration labels for pod assignment + - ``1`` + * - ``cleanup.nodeSelector`` + - Node labels for pod assignment + - ``1`` + * - ``cleanup.affinity`` + - Affinity labels for pod assignment + - ``1`` + * - ``cleanup.tolerations`` + - Toleration labels for pod assignment + - ``1`` + * - ``dags.persistence.*`` + - Dag persistence configuration + - Please refer to ``values.yaml`` + * - ``dags.gitSync.*`` + - Git sync configuration + - Please refer to ``values.yaml`` + * - ``multiNamespaceMode`` + - Whether the KubernetesExecutor can launch pods in multiple namespaces + - ``1`` + * - ``serviceAccountAnnottions.*`` + - Map of annotations for worker, webserver, scheduler kubernetes service accounts + - ``{}`` + + + + +Specify each parameter using the ``--set key=value[,key=value]`` argument to ``helm install``. For example, + +.. code-block:: bash + + helm install --name my-release \ + --set executor=CeleryExecutor \ + --set enablePodLaunching=false . diff --git a/docs/helm-chart/quick-start.rst b/docs/helm-chart/quick-start.rst new file mode 100644 index 00000000000000..7caccc41f2d2f5 --- /dev/null +++ b/docs/helm-chart/quick-start.rst @@ -0,0 +1,95 @@ + .. Licensed to the Apache Software Foundation (ASF) under one + or more contributor license agreements. See the NOTICE file + distributed with this work for additional information + regarding copyright ownership. The ASF licenses this file + to you under the Apache License, Version 2.0 (the + "License"); you may not use this file except in compliance + with the License. You may obtain a copy of the License at + + .. http://www.apache.org/licenses/LICENSE-2.0 + + .. Unless required by applicable law or agreed to in writing, + software distributed under the License is distributed on an + "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + KIND, either express or implied. See the License for the + specific language governing permissions and limitations + under the License. + +Quick start with kind +===================== + +This article will show you how to install Airflow using Helm Chart on `Kind `__ + +Install kind, and create a cluster +---------------------------------- + +We recommend testing with Kubernetes 1.15, as this image doesn’t support +Kubernetes 1.16+ for CeleryExecutor presently. + +.. code-block:: bash + + kind create cluster \ + --image kindest/node:v1.15.7@sha256:e2df133f80ef633c53c0200114fce2ed5e1f6947477dbc83261a6a921169488d + +Confirm it’s up: + +.. code-block:: bash + + kubectl cluster-info --context kind-kind + +Create namespace and Install the chart +-------------------------------------- + +.. code-block:: bash + + kubectl create namespace airflow + helm install airflow --n airflow . + +It may take a few minutes. Confirm the pods are up: + +.. code-block:: bash + + kubectl get pods --all-namespaces + helm list -n airflow + +Run ``kubectl port-forward svc/airflow-webserver 8080:8080 -n airflow`` +to port-forward the Airflow UI to http://localhost:8080/ to confirm +Airflow is working. + +Build a Docker image from your DAGs +----------------------------------- + +1. Create a project + + .. code-block:: bash + + mkdir my-airflow-project && cd my-airflow-project + mkdir dags # put dags here + cat < Dockerfile + FROM apache/airflow + COPY . . + EOM + + +2. Then build the image: + + .. code-block:: bash + + docker build -t my-dags:0.0.1 . + + +3. Load the image into kind: + + .. code-block:: bash + + kind load docker-image my-dags:0.0.1 + +4. Upgrade Helm deployment: + + .. code-block:: bash + + # from airflow chart directory + helm upgrade airflow -n airflow \ + --set images.airflow.repository=my-dags \ + --set images.airflow.tag=0.0.1 \ + . diff --git a/docs/spelling_wordlist.txt b/docs/spelling_wordlist.txt index 3f2871a5738b84..78b157776c04a0 100644 --- a/docs/spelling_wordlist.txt +++ b/docs/spelling_wordlist.txt @@ -345,6 +345,8 @@ Sqoop Stackdriver Standarization Standish +StatefulSet +StatefulSets StatsD Statsd StoredInfoType @@ -690,6 +692,7 @@ docstrings doesn doesnt donot +downscaling dropdown druidHook ds @@ -1110,6 +1113,7 @@ projectId projectid proto protobuf +provisioner psql psycopg pty @@ -1155,6 +1159,7 @@ reformats regexes reidentify reinit +reinitialization relativedelta renewer replicaSet @@ -1193,6 +1198,8 @@ sasl savedModel scalability scalable +scaler +scalers sched schedulable schedulername From 982e514dc9933fe1c90c126d2857b9391b1b910a Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Kamil=20Bregu=C5=82a?= Date: Sat, 6 Mar 2021 18:06:37 +0100 Subject: [PATCH 2/3] Update docs/helm-chart/keda.rst Co-authored-by: Tomek Urbaszek --- docs/helm-chart/keda.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/helm-chart/keda.rst b/docs/helm-chart/keda.rst index 7fc9b666c1c5b9..eef1df2b0e785a 100644 --- a/docs/helm-chart/keda.rst +++ b/docs/helm-chart/keda.rst @@ -24,7 +24,7 @@ KEDA stands for Kubernetes Event Driven Autoscaling. `KEDA `__ is a custom controller that allows users to create custom bindings to the Kubernetes `Horizontal Pod Autoscaler `__. -We have built a scaler that allows users to create scalers based on +We have built a scalers that allows users to create scalers based on PostgreSQL queries and shared it with the community. This enables us to scale the number of airflow workers deployed on Kubernetes by this chart depending on the number of task that are ``queued`` or ``running``. From 93521fc3a6a28dc5a5cd44651699df1797bc29ec Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Kamil=20Bregu=C5=82a?= Date: Sat, 6 Mar 2021 19:47:00 +0100 Subject: [PATCH 3/3] Update docs/helm-chart/keda.rst --- docs/helm-chart/keda.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/helm-chart/keda.rst b/docs/helm-chart/keda.rst index eef1df2b0e785a..97a3421daf38bf 100644 --- a/docs/helm-chart/keda.rst +++ b/docs/helm-chart/keda.rst @@ -24,7 +24,7 @@ KEDA stands for Kubernetes Event Driven Autoscaling. `KEDA `__ is a custom controller that allows users to create custom bindings to the Kubernetes `Horizontal Pod Autoscaler `__. -We have built a scalers that allows users to create scalers based on +We have built scalers that allows users to create scalers based on PostgreSQL queries and shared it with the community. This enables us to scale the number of airflow workers deployed on Kubernetes by this chart depending on the number of task that are ``queued`` or ``running``.