Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[AIRFLOW-5280] conn: Remove aws_default's default region name #5879

Merged
merged 14 commits into from
Sep 30, 2019

Conversation

mrshu
Copy link
Contributor

@mrshu mrshu commented Aug 22, 2019

Make sure you have checked all steps below.

Jira

  • My PR addresses the following Airflow Jira issues and references them in the PR title. For example, "[AIRFLOW-XXX] My Airflow PR"
    • https://issues.apache.org/jira/browse/AIRFLOW-5280
    • In case you are fixing a typo in the documentation you can prepend your commit with [AIRFLOW-XXX], code changes always need a Jira issue.
    • In case you are proposing a fundamental code change, you need to create an Airflow Improvement Proposal (AIP).
    • In case you are adding a dependency, check if the license complies with the ASF 3rd Party License Policy.

Description

  • Here are some details about my PR, including screenshots of any UI changes:
  • The aws_default by default specifies the region_name to be
    us-east-1 in its extra field. This causes trouble when the desired
    AWS account uses a different region as this default value has priority
    over the $AWS_REGION and $AWS_DEFAULT_REGION environment variables,
    gets passed directly to botocore and does not seem to be documented.

  • This commit removes the default region name from the aws_default's
    extra field. This means that it will have to be set manually, which
    would follow the "explicit is better than implicit" philosophy.

Signed-off-by: mr.Shu [email protected]

Tests

  • My PR adds the following unit tests OR does not need testing for this extremely good reason:

I am not sure how to provide a test here, but I am happy to follow any guidance on this matter.

Commits

  • My commits all reference Jira issues in their subject lines, and I have squashed multiple commits if they address the same issue. In addition, my commits follow the guidelines from "How to write a good git commit message":
    1. Subject is separated from body by a blank line
    2. Subject is limited to 50 characters (not including Jira issue reference)
    3. Subject does not end with a period
    4. Subject uses the imperative mood ("add", not "adding")
    5. Body wraps at 72 characters
    6. Body explains "what" and "why", not "how"

Documentation

  • In case of new functionality, my PR adds documentation that describes how to use it.
    • All the public functions and the classes in the PR contain docstrings that explain what it does
    • If you implement backwards incompatible changes, please leave a note in the Updating.md so we can assign it to a appropriate release

Code Quality

  • Passes flake8

@mrshu mrshu force-pushed the mrshu/remove_aws_default_conn_region_name branch from 41f4bd8 to 7dfb15b Compare August 22, 2019 07:21
Copy link
Member

@ashb ashb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At the very least please update the docs about Managing AWS connections to mention this.

@mrshu mrshu force-pushed the mrshu/remove_aws_default_conn_region_name branch from 7dfb15b to f8a154f Compare August 22, 2019 11:13
@mrshu
Copy link
Contributor Author

mrshu commented Aug 22, 2019

@ashb I am also happy to just update the docs if you think that would be better -- it's just that I haven't found this documented anywhere and do not see a significant advantage in keeping it the way it currently is. My scope is just quite limited, so I am happy to be proved wrong :)

Thanks!

@ashb
Copy link
Member

ashb commented Aug 22, 2019

https://airflow.apache.org/howto/connection/aws.html

(which should be docs/source/howto/connection/aws.rst)

* The `aws_default` by default specifies the `region_name` to be
  `us-east-1` in its `extra` field. This causes trouble when the desired
  AWS account uses a different region as this default value has priority
  over the $AWS_REGION and $AWS_DEFAULT_REGION environment variables,
  gets passed directly to `botocore` and does not seem to be documented.

* This commit removes the default region name from the `aws_default`'s
  extra field. This means that it will have to be set manually, which
  would follow the "explicit is better than implicit" philosophy.

* The AWS_DEFAULT_REGION variable has been added to the test executing
  scripts, so that they would still pass.

* A quick note on this change has been added to the AWS connection
  documentation as well.

Signed-off-by: mr.Shu <[email protected]>
@mrshu mrshu force-pushed the mrshu/remove_aws_default_conn_region_name branch from 2822f13 to 599fe2d Compare August 23, 2019 20:00
@mrshu
Copy link
Contributor Author

mrshu commented Aug 23, 2019

@ashb I got the tests to pass once AWS_DEFAULT_REGION has been exported in proper places. I also added a quick note on this change to the AWS connection docs.

Let me try to reiterate once again why do I think this change should be applied: When you install Airflow right now and use either the AIRFLOW_CONN_ environment variable or the ~/.aws/credentials to set the AWS connection parameters, the region setting gets ignored and us-east-1 gets used. The only way of finding that out is diving deep into botocore and trying to debug the values it gets passed by Airflow, which is what prompted me to create this Pull Request. Just documenting the fact would be certainly helpful, but I believe this change would be much better: in the case I outlined above you would receive a NoRegionSet error which is much more informational than the current state of silently falling back to us-east-1.

That said, if you do not agree with this change for any reason, I am happy to close this PR and create a new one that would only add the documentation bit.

Thanks again!

@mik-laj mik-laj added the provider:amazon-aws AWS/Amazon - related issues label Aug 25, 2019
docs/howto/connection/aws.rst Outdated Show resolved Hide resolved
Copy link
Member

@ashb ashb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a sensible change - following the convention for people familiar with AWS tools is the right thing to do.

We should add a note about this to UPDATING.md - note that anyone existing install won't have the connection edited, so this isn't a "breaking change" but I think it is worth calling out in the updating instructions.

mrshu and others added 4 commits September 3, 2019 20:57
Co-Authored-By: Ash Berlin-Taylor <[email protected]>
…/remove_aws_default_conn_region_name

Signed-off-by: mr.Shu <[email protected]>
…m:mrshu/airflow into mrshu/remove_aws_default_conn_region_name
* Mention the change of aws_default region in `UPDATING.md`

Signed-off-by: mr.Shu <[email protected]>
@mrshu mrshu force-pushed the mrshu/remove_aws_default_conn_region_name branch from 19f33b5 to a38caa2 Compare September 4, 2019 16:52
@mrshu
Copy link
Contributor Author

mrshu commented Sep 5, 2019

Thanks @ashb, I've added a few lines to UPDATING.md -- once the tests pass (which they should momentarily), this should be good to go.

Note that the AWS_DEFAULT_REGION env variables that are exported before execution of tests now inherit from the shell they are being executed in (e.g. when AWS_DEFAULT_REGION is already exported, the value will be reused, otherwise us-east-1 will be used).

The tests seem to be failing here sporadically, but mostly because the docker run -i --rm -v /home/travis/build/apache/airflow/scripts/ci/kubernetes/minikube:/build ubuntu:14.04 step does not produce any output in 10 minutes. Would you mind advising me how to deal with that?

Thanks!

UPDATING.md Outdated
### Changes to FileSensor
FileSensor is now takes a glob pattern, not just a filename. If the filename you are looking for has `*`, `?`, or `[` in it then you should replace these with `[*]`, `[?]`, and `[[]`.

### Change dag loading duration metric name
Change DAG file loading duration metric from
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you avoid making white-space only changes to the rest of this file? (This will make this change harder to cherry-pick back to the release branch)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Understood -- will do in a new commit.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These changes still are here - can you remove them please.

scripts/ci/in_container/entrypoint_ci.sh Outdated Show resolved Hide resolved
mrshu and others added 3 commits September 17, 2019 13:53
* Fix whitespace in `UPDATING.md` which happend before.

Signed-off-by: mr.Shu <[email protected]>
…m:mrshu/airflow into mrshu/remove_aws_default_conn_region_name
@mrshu mrshu force-pushed the mrshu/remove_aws_default_conn_region_name branch from de16321 to 8763ac6 Compare September 17, 2019 12:23
…/remove_aws_default_conn_region_name

Signed-off-by: mr.Shu <[email protected]>
@mrshu mrshu force-pushed the mrshu/remove_aws_default_conn_region_name branch from 8763ac6 to 8d9a5d9 Compare September 17, 2019 12:27
@mrshu
Copy link
Contributor Author

mrshu commented Sep 17, 2019

Thanks for all the help here @ashb!

@codecov-io
Copy link

codecov-io commented Sep 17, 2019

Codecov Report

Merging #5879 into master will increase coverage by 0.01%.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #5879      +/-   ##
==========================================
+ Coverage   79.99%   80.01%   +0.01%     
==========================================
  Files         610      610              
  Lines       35176    35176              
==========================================
+ Hits        28140    28146       +6     
+ Misses       7036     7030       -6
Impacted Files Coverage Δ
airflow/utils/db.py 90.09% <ø> (ø) ⬆️
airflow/contrib/operators/ssh_operator.py 83.75% <0%> (+1.25%) ⬆️
airflow/jobs/backfill_job.py 91.43% <0%> (+1.52%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 93e856e...4748f8e. Read the comment docs.

…/remove_aws_default_conn_region_name

Signed-off-by: mr.Shu <[email protected]>
@mrshu
Copy link
Contributor Author

mrshu commented Sep 20, 2019

@ashb I've just noticed there was a conflict in the UPDATING.md file -- I've just resolved it.

Please do let me know if I can do anything else here.

Thanks!

@mrshu mrshu force-pushed the mrshu/remove_aws_default_conn_region_name branch from ea93332 to 3cf92f2 Compare September 24, 2019 20:54
* Add a note on change in aws_default's region

Signed-off-by: mr.Shu <[email protected]>
@mrshu mrshu force-pushed the mrshu/remove_aws_default_conn_region_name branch from 3cf92f2 to 0076b70 Compare September 24, 2019 20:55
…/remove_aws_default_conn_region_name

Signed-off-by: mr.Shu <[email protected]>
@mrshu
Copy link
Contributor Author

mrshu commented Sep 24, 2019

@ashb Sorry for the delay here, both the conflict and the whitespace issues should be resolved here. The PR should be otherwise (hopefully) ready.

Thank you for your patience again!

…/remove_aws_default_conn_region_name

Signed-off-by: mr.Shu <[email protected]>
@mrshu mrshu force-pushed the mrshu/remove_aws_default_conn_region_name branch from 7dea464 to 4748f8e Compare September 26, 2019 19:32
@mrshu
Copy link
Contributor Author

mrshu commented Sep 26, 2019

@ashb UPDATING.md ran into a conflicting situation again in the meantime but I have resolved it with the last commit on the list.

@ashb
Copy link
Member

ashb commented Sep 27, 2019

Hmmmm - thinking about this a bit more, does this mean that on a "bare" install the aws_default connection won't work?

@mrshu
Copy link
Contributor Author

mrshu commented Sep 27, 2019

@ashb It will not work in the sense that you will additionally need to specify either the region_name in the extra parameters (which the docs mention) or set the AWS_DEFAULT_REGION environment variable.

I would argue that the aws_default connection with "bare" install does not work for anyone who operates outside of the us-east-1. I realize it is special for historical reasons but with 22 regions, I would wager that many will be hit by the unexpected behavior I described in the PR description: that you won't realize Airflow is silently using us-east-1 as the default region without digging through the code.

I firmly believe that explicit is better than implicit, especially in cases of default configuration. Hence, I see it more as fixing a small bug than changing some already-established behavior.

@ashb ashb merged commit 781d001 into apache:master Sep 30, 2019
ashb pushed a commit that referenced this pull request Sep 30, 2019
The `aws_default` by default specifies the `region_name` to be
`us-east-1` in its `extra` field. This causes trouble when the desired
AWS account uses a different region as this default value has priority
over the $AWS_REGION and $AWS_DEFAULT_REGION environment variables,
gets passed directly to `botocore` and does not seem to be documented.

This commit removes the default region name from the `aws_default`'s
extra field. This means that it will have to be set manually, which
would follow the "explicit is better than implicit" philosophy.

(cherry picked from commit 781d001)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
provider:amazon-aws AWS/Amazon - related issues
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants