[stable/airflow] Add a feature to run extra init scripts in the scheduler after initdb #21047

NBardelot · 2020-02-26T11:27:55Z

Adds a new configuration value to the Helm chart for Airflow:

If set .Values.airflow.extraStartupScripts must be the name of a configmap that contains at least one entry, which key must be named run.sh, and corresponds to a shell script that will be executed in the scheduler container after it has finished initdb and other standard startup scripts (i.e. just before it starts).

Other entries provided in the same configmap will also be mounted in the same directory (using the key name as their filename). This allows you to split initialization tasks into several scripts that one can call from the main run.sh script, for example.

This is needed because initContainers are executed before the airflow initdb is executed in the scheduler at startup. Thus initContainers remains viable for things that do not necessitate that the DB is ready, but this PR allows 'airflow' commands that were previously impossible to automate.

See also by @javamonkey79 :

Issue #20568 (which started the idea)
PR #20593 (which was aborted because it was centered around the idea of doing the same thing in the wrong place)

k8s-ci-robot · 2020-02-26T11:28:10Z

Hi @NBardelot. Thanks for your PR.

I'm waiting for a helm member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

javamonkey79

/lgtm

k8s-ci-robot · 2020-02-26T19:33:41Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: javamonkey79, NBardelot
To complete the pull request process, please assign gsemet
You can assign the PR to them by writing /assign @gsemet in a comment when ready.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

stable/airflow/OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

k8s-ci-robot · 2020-02-26T19:33:46Z

@javamonkey79: adding LGTM is restricted to approvers and reviewers in OWNERS files.

In response to this:

/lgtm

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

javamonkey79 · 2020-02-26T23:26:14Z

stable/airflow/values.yaml

+  #   the scheduler restarts
+  # - if those scripts use sensitive data, you should provide the sensitive data using secrets
+  #   (see .Values.airflow.extraConfigmapMounts and .Values.airflow.extraEnv)
+  extraStartupScripts:


Ok, perhaps I'm missing something, but, is there a way to do this with just helm or, does it require having a configmap ran on the cluster prior to running the helm chart?

javamonkey79 · 2020-02-26T23:53:40Z

stable/airflow/templates/deployments-scheduler.yaml

+        - name: extra-startup-scripts
+          configMap:
+            name: {{ .Values.airflow.extraStartupScripts }}
+            defaultMode: 0740


I got permission denied, I think this needs to be more permissive like 755:

[2020-02-26 23:48:53,468] {{__init__.py:51}} INFO - Using executor CeleryExecutor 1 of 1 variables successfully updated. running extra startup scripts bash: /usr/local/extra-startup-scripts/run.sh: Permission denied

A 755 would not change anything since it would mean u=rwx,g=rw,o=rw. Normally u=rwx,g=r,o= (i.e. 740) should've sufficed.

I'll change it to 770 then, maybe the files are executed via the group and not the user OpenShift style...

OK, I pulled your changes and 770 does not work either:

[2020-02-28 15:21:03,242] {{__init__.py:51}} INFO - Using executor CeleryExecutor 4 of 4 variables successfully updated. running extra startup scripts bash: /usr/local/extra-startup-scripts/run.sh: Permission denied

@NBardelot did you try it yourself? If you don't get the same permissions error, I'd be curious why... I would think you'd get the same error though.

I've taken a more thorough look at it and...

you were absolutely right about 755 (which is u=rwx,g=rx,o=rx and not what I said... stupid me)

o=rx is needed because the airflow user in not included in the root group as it should be (it is a good practice for Docker/Kubernetes and especially useful when running OpenShift)

I'll fix this right away. Sorry for that.

hey @NBardelot no worries! I have been running this branch out, so I knew there was something up. I appreciate your efforts.

Now, it would be good to get it merged in :)

I have force-pushed the fix so the pull request is ready now. If you can confirm that this version of the commit works for you it'd be great.

Confirmed (I already had the change exactly as you have it local).

…uler Signed-off-by: Noël Bardelot <[email protected]>

NBardelot · 2020-03-05T10:16:20Z

See also:

https://issues.apache.org/jira/browse/AIRFLOW-6987
apache/airflow#7629

NBardelot · 2020-03-09T11:30:18Z

Hi @maver1ck, I understand that the Airflow team is going to redo the Helm chart soon (how soon? that I don't know :D), and that may mean a little less attention to PRs for the project currently. But this PR is especially useful in the meanwhile because it's the only way to automate post-deployment tasks properly. Would you be OK to give it a review ?

My use-case :

I run a vault-agent to get the Airflow connections (including RSA private keys for SSH, and passwords) from a Vault as an init-container -> it populates a volume with the sensitive data
I need the feature from this PR to then run a script after initdb that creates the connections (and removes the sensitive data from the volume)

Thanks for your time.

javamonkey79 · 2020-03-18T23:02:39Z

@NBardelot looks like there are conflicts now.

NBardelot · 2020-03-19T09:56:20Z

Yes, it seems @maver1ck has less activity on the Helm charts project since the beginning of this year. But I don't know what their rythm of integrating PRs is, or wether there are other reviewers who could take a look.

@maver1ck is there anything we can do to help you review this PR ?

stale · 2020-04-18T10:01:22Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Any further update will cause the issue/pull request to no longer be considered stale. Thank you for your contributions.

stale · 2020-05-02T10:29:33Z

This issue is being automatically closed due to inactivity.

NBardelot · 2020-05-04T08:56:13Z

/reopen

helm-bot added the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label Feb 26, 2020

k8s-ci-robot requested a review from maver1ck February 26, 2020 11:27

k8s-ci-robot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Feb 26, 2020

NBardelot force-pushed the airflow/startupScripts branch from 8555df8 to fdc0fa0 Compare February 26, 2020 11:33

helm-bot added the Contribution Allowed If the contributor has signed the DCO or the CNCF CLA (prior to the move to a DCO). label Feb 26, 2020

This was referenced Feb 26, 2020

[stable/airflow] (ISSUE 20568) implemented feature to remove default connections #21018

Closed

[stable/airflow] ISSUE:20568 - added startupScript option #20593

Closed

javamonkey79 approved these changes Feb 26, 2020

View reviewed changes

javamonkey79 reviewed Feb 26, 2020

View reviewed changes

NBardelot force-pushed the airflow/startupScripts branch from fdc0fa0 to de4f8f2 Compare February 28, 2020 10:48

helm-bot removed the Contribution Allowed If the contributor has signed the DCO or the CNCF CLA (prior to the move to a DCO). label Mar 3, 2020

NBardelot force-pushed the airflow/startupScripts branch from 2a24fce to 21bd7a1 Compare March 3, 2020 14:32

helm-bot added the Contribution Allowed If the contributor has signed the DCO or the CNCF CLA (prior to the move to a DCO). label Mar 3, 2020

[stable/airflow] Add a feature to run extra init scripts in the sched…

7b2207d

…uler Signed-off-by: Noël Bardelot <[email protected]>

NBardelot force-pushed the airflow/startupScripts branch from 21bd7a1 to 7b2207d Compare March 4, 2020 13:01

NBardelot mentioned this pull request Mar 5, 2020

[AIRFLOW-6987] Avoid creating default connections apache/airflow#7629

Merged

stale bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Apr 18, 2020

stale bot closed this May 2, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[stable/airflow] Add a feature to run extra init scripts in the scheduler after initdb #21047

[stable/airflow] Add a feature to run extra init scripts in the scheduler after initdb #21047

NBardelot commented Feb 26, 2020 •

edited

Loading

k8s-ci-robot commented Feb 26, 2020

javamonkey79 left a comment

k8s-ci-robot commented Feb 26, 2020

k8s-ci-robot commented Feb 26, 2020

javamonkey79 Feb 26, 2020

javamonkey79 Feb 26, 2020

NBardelot Feb 28, 2020

NBardelot Feb 28, 2020

javamonkey79 Feb 28, 2020

javamonkey79 Mar 3, 2020

NBardelot Mar 3, 2020

javamonkey79 Mar 3, 2020

NBardelot Mar 3, 2020

javamonkey79 Mar 3, 2020

NBardelot commented Mar 5, 2020

NBardelot commented Mar 9, 2020

javamonkey79 commented Mar 18, 2020

NBardelot commented Mar 19, 2020

stale bot commented Apr 18, 2020

stale bot commented May 2, 2020

NBardelot commented May 4, 2020

[stable/airflow] Add a feature to run extra init scripts in the scheduler after initdb #21047

[stable/airflow] Add a feature to run extra init scripts in the scheduler after initdb #21047

Conversation

NBardelot commented Feb 26, 2020 • edited Loading

k8s-ci-robot commented Feb 26, 2020

javamonkey79 left a comment

Choose a reason for hiding this comment

k8s-ci-robot commented Feb 26, 2020

k8s-ci-robot commented Feb 26, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

NBardelot commented Mar 5, 2020

NBardelot commented Mar 9, 2020

javamonkey79 commented Mar 18, 2020

NBardelot commented Mar 19, 2020

stale bot commented Apr 18, 2020

stale bot commented May 2, 2020

NBardelot commented May 4, 2020

NBardelot commented Feb 26, 2020 •

edited

Loading