Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[AIRFLOW-5104] Set default schedule for GCP Transfer operators #5726

Merged
merged 2 commits into from
Aug 13, 2019

Conversation

TV4Fun
Copy link
Contributor

@TV4Fun TV4Fun commented Aug 5, 2019

Make sure you have checked all steps below.

Jira

  • [ X ] My PR addresses the following Airflow Jira issues and references them in the PR title. For example, "[AIRFLOW-XXX] My Airflow PR"
    • https://issues.apache.org/jira/browse/AIRFLOW-5104
    • In case you are fixing a typo in the documentation you can prepend your commit with [AIRFLOW-XXX], code changes always need a Jira issue.
    • In case you are proposing a fundamental code change, you need to create an Airflow Improvement Proposal (AIP).
    • In case you are adding a dependency, check if the license complies with the ASF 3rd Party License Policy.

Description

  • [ X ] Here are some details about my PR, including screenshots of any UI changes:

The GCS Transfer Service REST API requires that a schedule be set, even for
one-time immediate runs. This adds code to
S3ToGoogleCloudStorageTransferOperator and
GoogleCloudStorageToGoogleCloudStorageTransferOperator to set a default
one-time immediate run schedule when no schedule argument is passed.

Tests

  • [ X ] My PR adds the following unit tests OR does not need testing for this extremely good reason:
    This should just be fixing existing behavior, and is hard to test as it only produces an error when actually sent to the GCP API. I have tested it by running it on a Cloud Composer cluster and using it to transfer files from S3 to GCS using S3ToGoogleCloudStorageTransferOperator. I have not tested GoogleCloudStorageToGoogleCloudStorageTransferOperator in a similar way, so maybe someone should do that as I don't think anyone ever actually tested it before releasing it. This combined with [AIRFLOW-5114] Fix gcp_transfer_hook behavior with default operator arguments #5727 allows S3ToGoogleCloudStorageTransferOperator to run correctly with default arguments for scheduling and timeout.

Commits

  • [ X ] My commits all reference Jira issues in their subject lines, and I have squashed multiple commits if they address the same issue. In addition, my commits follow the guidelines from "How to write a good git commit message":
    1. Subject is separated from body by a blank line
    2. Subject is limited to 50 characters (not including Jira issue reference)
    3. Subject does not end with a period
    4. Subject uses the imperative mood ("add", not "adding")
    5. Body wraps at 72 characters
    6. Body explains "what" and "why", not "how"

Documentation

  • [ X ] In case of new functionality, my PR adds documentation that describes how to use it.
    • All the public functions and the classes in the PR contain docstrings that explain what it does
    • If you implement backwards incompatible changes, please leave a note in the Updating.md so we can assign it to a appropriate release

Code Quality

  • [ X ] Passes flake8

The GCS Transfer Service REST API requires that a schedule be set, even for
one-time immediate runs. This adds code to
`S3ToGoogleCloudStorageTransferOperator` and
`GoogleCloudStorageToGoogleCloudStorageTransferOperator` to set a default
one-time immediate run schedule when no `schedule` argument is passed.
@TV4Fun
Copy link
Contributor Author

TV4Fun commented Aug 5, 2019

A potential problem with this that I haven't investigated in depth is what happens when the Airflow instance running this operator is in a different timezone from the GCP project. This relies on date.today() to send the current date at operator run time to GCS, but I don't know under what circumstances those might be different and lead to unexpected behavior.

@mik-laj mik-laj self-requested a review August 5, 2019 20:59
@mik-laj mik-laj added the provider:google Google (including GCP) related issues label Aug 5, 2019
@mik-laj
Copy link
Member

mik-laj commented Aug 12, 2019

I don't think anyone ever actually tested it before releasing it.

This operator has system tests.
airflow/tests/contrib/operators/test_gcp_transfer_operator_system.py

These tests were repeatedly run during the creation of the operator as well as during its further development.
airflow/contrib/example_dags/example_gcp_transfer.py
This system test runs an example DAG and uses real GCP and AWS servers

@TV4Fun
Copy link
Contributor Author

TV4Fun commented Aug 12, 2019

@mik-laj, I have a hard time believing this was tested with default arguments on an actual GCP cluster, as doing so causes an API error for a bad request. Though I suppose it is possible that the API was changed in a breaking way since this was released.

@mik-laj mik-laj merged commit 1cf8bc4 into apache:master Aug 13, 2019
kaxil pushed a commit that referenced this pull request Aug 30, 2019
kaxil pushed a commit that referenced this pull request Aug 30, 2019
kaxil pushed a commit that referenced this pull request Aug 30, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
provider:google Google (including GCP) related issues
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants