Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update copy command for s3 to redshift #16241

Merged
merged 5 commits into from
Jun 13, 2021

Conversation

sunki-hong
Copy link
Contributor

Update copy command for s3_to_redshift to be able to select column names while copying files from s3 to redshift.

COPY tablename (column1 [,column2, ...]) 

Related aws documentation (Link),


^ Add meaningful description above

Read the Pull Request Guidelines for more information.
In case of fundamental code change, Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in UPDATING.md.

@boring-cyborg boring-cyborg bot added area:providers provider:amazon-aws AWS/Amazon - related issues labels Jun 3, 2021
@sunki-hong sunki-hong force-pushed the update-copy-command-for-s3-to-redshift branch from eb81630 to 98e102d Compare June 3, 2021 14:13
op.execute(None)

credentials_block = build_credentials_block(mock_session.return_value)
copy_query = op._build_copy_query(credentials_block, copy_options)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you copy the result of this function to the test?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @mik-laj :) Thanks for the review.

Could you explain it in more detail?

Do you want the result of op._build_copy_query to be asserted with expected plain copy text?
I thought that line 105 assert_equal_ignore_multiple_spaces(self, mock_run.call_args[0][0], copy_query) checks the result

FYI expected copy_query looks like this

COPY schema.table (column_1, column_2)
FROM 's3://bucket/key'
with credentials
'aws_access_key_id=aws_access_key_id;aws_secret_access_key=aws_secret_access_key'
;

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI expected copy_query looks like this

Can you add it to tests?

Copy link
Contributor Author

@sunki-hong sunki-hong Jun 6, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh then like this?

Suggested change
copy_query = op._build_copy_query(credentials_block, copy_options)
copy_query = op._build_copy_query(credentials_block, copy_options)
expected_copy_query = '''
COPY schema.table (column_1, column_2)
FROM 's3://bucket/key'
with credentials
'aws_access_key_id=aws_access_key_id;aws_secret_access_key=aws_secret_access_key'
;
'''
assert_equal_ignore_multiple_spaces(self, expected_copy_query, copy_query)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exactly. Building this SQL is crucial in integrating with a thirty-patty service, so it should show up in tests.

Copy link
Member

@mik-laj mik-laj Jun 6, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add tests for SQL without columns also?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

assert mock_run.call_count == 1
assert access_key in copy_query
assert secret_key in copy_query
assert_equal_ignore_multiple_spaces(self, mock_run.call_args[0][0], copy_query)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here we have access to the query, so we can drop of invoking private methods and compare this value. Thus, this test will still check implementations, but will depend less on the internal details.

Copy link
Contributor Author

@sunki-hong sunki-hong Jun 6, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the comment @mik-laj ;)
So would it be nicer to check execution with plain text rather than invoking private method op._build_copy_query like this?

From checking execution by invoking private method

        copy_query = op._build_copy_query(credentials_block, copy_options)
        expected_copy_query = '''
                                COPY schema.table
                                FROM 's3://bucket/key'
                                with credentials
                                'aws_access_key_id=aws_access_key_id;aws_secret_access_key=aws_secret_access_key'
                                ;
                             '''
        assert_equal_ignore_multiple_spaces(self, expected_copy_query, copy_query)

        assert mock_run.call_count == 1
        assert access_key in copy_query
        assert secret_key in copy_query
        assert_equal_ignore_multiple_spaces(self, mock_run.call_args[0][0], copy_query)

To checking execution by plain text

        copy_query = '''
                        COPY schema.table
                        FROM 's3://bucket/key'
                        with credentials
                        'aws_access_key_id=aws_access_key_id;aws_secret_access_key=aws_secret_access_key'
                        ;
                     '''
        assert mock_run.call_count == 1
        assert_equal_ignore_multiple_spaces(self, mock_run.call_args[0][0], copy_query)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. Exactly. We should only check the places where the class is connecting to another class, not its internal details.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done :)
Changed all copy_query to plain text
Thanks for all the comment @mik-laj

@sunki-hong sunki-hong force-pushed the update-copy-command-for-s3-to-redshift branch from d1dc18d to 0ee158d Compare June 7, 2021 16:48
@sunki-hong sunki-hong requested a review from mik-laj June 8, 2021 14:49
@sunki-hong sunki-hong force-pushed the update-copy-command-for-s3-to-redshift branch from 0ee158d to cab554e Compare June 9, 2021 15:42
@sunki-hong sunki-hong force-pushed the update-copy-command-for-s3-to-redshift branch from cab554e to c7a3cb0 Compare June 12, 2021 12:14
@sunki-hong
Copy link
Contributor Author

@mik-laj Hi, can you take a look and check if it is okay to merge?

@potiuk potiuk merged commit 9cd7930 into apache:main Jun 13, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:providers provider:amazon-aws AWS/Amazon - related issues
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants