Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[AIRFLOW-6949] Respect explicit spark.kubernetes.namespace conf to SparkSubmitOperator #7575

Merged
merged 1 commit into from
Feb 28, 2020

Conversation

ashb
Copy link
Member

@ashb ashb commented Feb 27, 2020

This means the value from the Operator/dag file takes precedence over
the connection

The previous behaviour was to emit one line from the conf arg, but then
a later one from the connection:

Example operator from original reporter:

   worker_events_job = SparkSubmitOperator(
        task_id="worker_events_job",
        application="s3a://spark-jobs/twilio/worker_events.py",
        conn_id="spark-default",
        driver_memory="2g",
        executor_cores="4",
        executor_memory="4g",
        conf={
            "spark.kubernetes.authenticate.driver.serviceAccountName": "spark",
            "spark.kubernetes.container.image": "1234.dkr.ecr.us-east-1.amazonaws.com/spark:0.0.1",
            "spark.kubernetes.container.image.pullPolicy": "Always",
            "spark.kubernetes.namespace": "airflow",
            "spark.driver.cores": "2"
        }
    )

and it produced this (wrong) spark-submit command:

--conf spark.kubernetes.namespace=airflow \
--conf spark.kubernetes.namespace=default \

Issue link: AIRFLOW-6949

Make sure to mark the boxes below before creating PR: [x]

  • Description above provides context of the change
  • Commit message/PR title starts with [AIRFLOW-NNNN]. AIRFLOW-NNNN = JIRA ID*
  • Unit tests coverage for changes (not needed for documentation changes)
  • Commits follow "How to write a good git commit message"
  • Relevant documentation is updated including usage instructions.
  • I will engage committers as explained in Contribution Workflow Example.

* For document-only changes commit message can start with [AIRFLOW-XXXX].


In case of fundamental code change, Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in UPDATING.md.
Read the Pull Request Guidelines for more information.

…SparkSubmitOperator

This means the value from the Operator/dag file takes precedence over
the connection

The previous behaviour was to emit one line from the conf arg, but then
a later one from the connection:

```
--conf spark.kubernetes.namespace=airflow \
--conf spark.kubernetes.namespace=default \
```
@codecov-io
Copy link

Codecov Report

Merging #7575 into master will increase coverage by <.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #7575      +/-   ##
==========================================
+ Coverage   86.56%   86.56%   +<.01%     
==========================================
  Files         896      896              
  Lines       42622    42623       +1     
==========================================
+ Hits        36896    36897       +1     
  Misses       5726     5726
Impacted Files Coverage Δ
...rflow/providers/apache/spark/hooks/spark_submit.py 84.67% <100%> (+0.05%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update d031f84...c2a16db. Read the comment docs.

Copy link
Contributor

@Fokko Fokko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@Fokko Fokko merged commit b59042b into apache:master Feb 28, 2020
@Fokko
Copy link
Contributor

Fokko commented Feb 28, 2020

Also pushed this onto 1.10 branch: 7978bf7 f8a83a718

Cherry-picking was impossible due to the moving of the files.

ashb added a commit to ashb/airflow that referenced this pull request Feb 28, 2020
…SparkSubmitOperator (apache#7575)

This means the value from the Operator/dag file takes precedence over
the connection

The previous behaviour was to emit one line from the conf arg, but then
a later one from the connection:

```
--conf spark.kubernetes.namespace=airflow \
--conf spark.kubernetes.namespace=default \
```

(cherry picked from commit b59042b)
@ashb ashb deleted the spark-submit-conf-namespace branch February 28, 2020 11:15
@ashb
Copy link
Member Author

ashb commented Feb 28, 2020

Thanks Fokko! I usually use git commit --reuse-message=$orig-sha when doing this so message, author and timestamp are kept. (I've force pushed v1-10 with that updated)

galuszkak pushed a commit to FlyrInc/apache-airflow that referenced this pull request Mar 5, 2020
…SparkSubmitOperator (apache#7575)

This means the value from the Operator/dag file takes precedence over
the connection

The previous behaviour was to emit one line from the conf arg, but then
a later one from the connection:

```
--conf spark.kubernetes.namespace=airflow \
--conf spark.kubernetes.namespace=default \
```
kaxil pushed a commit that referenced this pull request Mar 17, 2020
kaxil pushed a commit that referenced this pull request Mar 19, 2020
kaxil pushed a commit to astronomer/airflow that referenced this pull request Mar 19, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants