Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[AIRFLOW-5726] Delete table as file name in RedshiftToS3Transfer #6396

Merged
merged 28 commits into from
Nov 26, 2019
Merged

[AIRFLOW-5726] Delete table as file name in RedshiftToS3Transfer #6396

merged 28 commits into from
Nov 26, 2019

Conversation

JavierLopezT
Copy link
Contributor

@JavierLopezT JavierLopezT commented Oct 23, 2019

Make sure you have checked all steps below.

Jira

  • My PR addresses the following Airflow Jira issues and references them in the PR title. For example, "[AIRFLOW-XXX] My Airflow PR"
    • https://issues.apache.org/jira/browse/AIRFLOW-5726
    • In case you are fixing a typo in the documentation you can prepend your commit with [AIRFLOW-XXX], code changes always need a Jira issue.
    • In case you are proposing a fundamental code change, you need to create an Airflow Improvement Proposal (AIP).
    • In case you are adding a dependency, check if the license complies with the ASF 3rd Party License Policy.

Description

Now it is possible to have whatever file name for the file that will be stored in S3 and there is no limitation for table name. There is still the possibility to keep this choice though.

Tests

  • My PR adds the following unit tests OR does not need testing for this extremely good reason:

Commits

  • My commits all reference Jira issues in their subject lines, and I have squashed multiple commits if they address the same issue. In addition, my commits follow the guidelines from "How to write a good git commit message":
    1. Subject is separated from body by a blank line
    2. Subject is limited to 50 characters (not including Jira issue reference)
    3. Subject does not end with a period
    4. Subject uses the imperative mood ("add", not "adding")
    5. Body wraps at 72 characters
    6. Body explains "what" and "why", not "how"

Documentation

  • In case of new functionality, my PR adds documentation that describes how to use it.
    • All the public functions and the classes in the PR contain docstrings that explain what it does
    • If you implement backwards incompatible changes, please leave a note in the Updating.md so we can assign it to a appropriate release

@JavierLopezT
Copy link
Contributor Author

JavierLopezT commented Oct 24, 2019

I have a problem here with a part of the documentation, the one defining the param s3_key.

If I leave it's definition as a single line ':param s3_key: reference to a specific S3 key. If table_as_file_name is set to False, this param must include the desired file name',
it doesn't pass the pylint check due to line length.

If I split it in two lines as:
1st line: :param s3_key: reference to a specific S3 key. If table_as_file_name is set
2nd line: to False, this param must include the desired file name
It doesn't pass the documentation test.

Any idea on how to handle this? Thank you very much

@mik-laj mik-laj added the provider:amazon-aws AWS/Amazon - related issues label Oct 25, 2019
@OmerJog
Copy link
Contributor

OmerJog commented Oct 27, 2019

@JavierLopezT I'm not really sure I'm following.
The error is:
airflow/operators/redshift_to_s3_operator.py:40:0: C0301: Line too long (135/110) (line-too-long)

You can have more than one row example:
https://github.com/apache/airflow/blob/master/airflow/gcp/operators/bigquery.py#L72

@codecov-io
Copy link

codecov-io commented Oct 28, 2019

Codecov Report

Merging #6396 into master will increase coverage by 2.91%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #6396      +/-   ##
==========================================
+ Coverage   80.59%   83.51%   +2.91%     
==========================================
  Files         626      672      +46     
  Lines       36243    37594    +1351     
==========================================
+ Hits        29211    31396    +2185     
+ Misses       7032     6198     -834
Impacted Files Coverage Δ
airflow/operators/redshift_to_s3_operator.py 97.29% <100%> (+0.15%) ⬆️
airflow/contrib/sensors/cassandra_record_sensor.py 0% <0%> (-100%) ⬇️
airflow/contrib/hooks/aws_firehose_hook.py 0% <0%> (-100%) ⬇️
airflow/contrib/sensors/cassandra_table_sensor.py 0% <0%> (-100%) ⬇️
...low/contrib/sensors/aws_redshift_cluster_sensor.py 0% <0%> (-100%) ⬇️
airflow/contrib/hooks/redshift_hook.py 0% <0%> (-75%) ⬇️
airflow/kubernetes/volume_mount.py 44.44% <0%> (-55.56%) ⬇️
airflow/kubernetes/volume.py 52.94% <0%> (-47.06%) ⬇️
airflow/kubernetes/pod_launcher.py 45.25% <0%> (-46.72%) ⬇️
airflow/example_dags/example_python_operator.py 63.33% <0%> (-31.12%) ⬇️
... and 243 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 74d2a0d...297f06f. Read the comment docs.

JavierLTAtr added 2 commits October 28, 2019 11:09
Solving errors

More errors

Trying to pass documentation test

indendation comment

Delete table as file name in RedshiftToS3Transfer
@JavierLopezT
Copy link
Contributor Author

@JavierLopezT I'm not really sure I'm following.
The error is:
airflow/operators/redshift_to_s3_operator.py:40:0: C0301: Line too long (135/110) (line-too-long)

You can have more than one row example:
https://github.com/apache/airflow/blob/master/airflow/gcp/operators/bigquery.py#L72

I was missing the indentation on the 2nd row, thanks. Everything seems fine now

@kaxil
Copy link
Member

kaxil commented Oct 28, 2019

It looks good but can you please add tests or update the current tests !

@JavierLopezT
Copy link
Contributor Author

It looks good but can you please add tests or update the current tests !

How can I do that? I thought that passing the TRAVIS CI checks was enough and I don't really find any documentation in Airflow contributing guiding through the process to add tests. Could you help me please?

@kaxil
Copy link
Member

kaxil commented Oct 30, 2019

image
We have got this in the PR guidelines that you ticked :D

@kaxil
Copy link
Member

kaxil commented Oct 30, 2019

@JavierLopezT
Copy link
Contributor Author

Thanks for the clarification @kaxil

I have already added the test (I think)

Copy link
Member

@kaxil kaxil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. But would like other committers that have AWS experience to review it too.

@potiuk @ashb any comments?

airflow/operators/redshift_to_s3_operator.py Outdated Show resolved Hide resolved
airflow/operators/redshift_to_s3_operator.py Outdated Show resolved Hide resolved
Copy link
Member

@ashb ashb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also you need to add a test covering table_as_file_name=False case.

tests/operators/test_redshift_to_s3_operator.py Outdated Show resolved Hide resolved
@OmerJog
Copy link
Contributor

OmerJog commented Nov 15, 2019

@JavierLopezT
The added test is failing:

======================================================================
62) ERROR: test_execute_false (tests.operators.test_redshift_to_s3_operator.TestRedshiftToS3Transfer)
----------------------------------------------------------------------
   Traceback (most recent call last):
   TypeError: test_execute_false() missing 2 required positional arguments: 'mock_run' and 'mock_session'

assert mock_run.call_count == 1
assertEqualIgnoreMultipleSpaces(self, mock_run.call_args[0][0], unload_query)

def test_execute_false(self, mock_run, mock_session):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can use parameterized in def test_execute instead of creating new test

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Example:

@parameterized([
    (2, 2, 4),
    (2, 3, 8),
    (1, 9, 1),
    (0, 9, 0),
])
def test_pow(base, exponent, expected):
   assert_equal(math.pow(base, exponent), expected)

@@ -31,7 +32,8 @@ class TestRedshiftToS3Transfer(unittest.TestCase):

@mock.patch("boto3.session.Session")
@mock.patch("airflow.hooks.postgres_hook.PostgresHook.run")
def test_execute(self, mock_run, mock_session):
@parameterized([(True), (False)])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
@parameterized([(True), (False)])
@parameterized([(True, ), (False, )])

@@ -31,7 +32,8 @@ class TestRedshiftToS3Transfer(unittest.TestCase):

@mock.patch("boto3.session.Session")
@mock.patch("airflow.hooks.postgres_hook.PostgresHook.run")
def test_execute(self, mock_run, mock_session):
@parameterized([(True, ), (False, )])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
@parameterized([(True, ), (False, )])
@parameterized.expand([(True, ), (False, )])

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should hopefully fix the failing test

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately, it is still failing =(

@@ -31,7 +32,8 @@ class TestRedshiftToS3Transfer(unittest.TestCase):

@mock.patch("boto3.session.Session")
@mock.patch("airflow.hooks.postgres_hook.PostgresHook.run")
def test_execute(self, mock_run, mock_session):
@parameterized.expand([(True, ), (False, )])
def test_execute(self, mock_run, mock_session, boolean_value):
Copy link
Member

@kaxil kaxil Nov 21, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am going to try and run this my test on my machine and let you know

@kaxil
Copy link
Member

kaxil commented Nov 21, 2019

@JavierLopezT Apply the following code and the test will pass. Passes on my local machine:

diff --git a/tests/operators/test_redshift_to_s3_operator.py b/tests/operators/test_redshift_to_s3_operator.py
index 5fd8d46e3..baa4aad32 100644
--- a/tests/operators/test_redshift_to_s3_operator.py
+++ b/tests/operators/test_redshift_to_s3_operator.py
@@ -30,10 +30,13 @@ from airflow.utils.tests import assertEqualIgnoreMultipleSpaces

 class TestRedshiftToS3Transfer(unittest.TestCase):

+    @parameterized.expand([
+        [True, "key/table_"],
+        [False, "key"],
+    ])
     @mock.patch("boto3.session.Session")
     @mock.patch("airflow.hooks.postgres_hook.PostgresHook.run")
-    @parameterized.expand([(True, ), (False, )])
-    def test_execute(self, mock_run, mock_session, boolean_value):
+    def test_execute(self, table_as_file_name, expected_s3_key, mock_run, mock_session,):
         access_key = "aws_access_key_id"
         secret_key = "aws_secret_access_key"
         mock_session.return_value = Session(access_key, secret_key)
@@ -42,7 +45,6 @@ class TestRedshiftToS3Transfer(unittest.TestCase):
         s3_bucket = "bucket"
         s3_key = "key"
         unload_options = ['HEADER', ]
-        table_as_file_name = boolean_value

         RedshiftToS3Transfer(
             schema=schema,
@@ -62,14 +64,14 @@ class TestRedshiftToS3Transfer(unittest.TestCase):
         select_query = "SELECT * FROM {schema}.{table}".format(schema=schema, table=table)
         unload_query = """
                     UNLOAD ('{select_query}')
-                    TO 's3://{s3_bucket}/{s3_key}/{table}_'
+                    TO 's3://{s3_bucket}/{s3_key}'
                     with credentials
                     'aws_access_key_id={access_key};aws_secret_access_key={secret_key}'
                     {unload_options};
                     """.format(select_query=select_query,
-                               table=table,
                                s3_bucket=s3_bucket,
-                               s3_key=s3_key,
+                               s3_key=expected_s3_key,
                                access_key=access_key,
                                secret_key=secret_key,
                                unload_options=unload_options)

@@ -31,7 +32,11 @@ class TestRedshiftToS3Transfer(unittest.TestCase):

@mock.patch("boto3.session.Session")
@mock.patch("airflow.hooks.postgres_hook.PostgresHook.run")
def test_execute(self, mock_run, mock_session):
@parameterized.expand([
Copy link
Member

@kaxil kaxil Nov 25, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You have applied this to the wrong place, please check the diff #6396 (comment) I suggested 4 days back

Copy link
Member

@kaxil kaxil Nov 25, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be the following:

	@parameterized.expand([
	       [True, "key/table_"],
	       [False, "key"],
	    ])
	@mock.patch("boto3.session.Session")
	@mock.patch("airflow.hooks.postgres_hook.PostgresHook.run")
	def test_execute(self, table_as_file_name, expected_s3_key, mock_run, mock_session,):

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Check the position of @parameterized.expand([

@kaxil kaxil merged commit 4a17bca into apache:master Nov 26, 2019
kaxil pushed a commit that referenced this pull request Dec 18, 2019
ashb pushed a commit that referenced this pull request Dec 19, 2019
@JavierLopezT JavierLopezT mentioned this pull request Jan 7, 2020
6 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
provider:amazon-aws AWS/Amazon - related issues
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants