-
Notifications
You must be signed in to change notification settings - Fork 14.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update copy command for s3 to redshift #16241
Update copy command for s3 to redshift #16241
Conversation
eb81630
to
98e102d
Compare
op.execute(None) | ||
|
||
credentials_block = build_credentials_block(mock_session.return_value) | ||
copy_query = op._build_copy_query(credentials_block, copy_options) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you copy the result of this function to the test?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @mik-laj :) Thanks for the review.
Could you explain it in more detail?
Do you want the result of op._build_copy_query
to be asserted with expected plain copy text?
I thought that line 105 assert_equal_ignore_multiple_spaces(self, mock_run.call_args[0][0], copy_query)
checks the result
FYI expected copy_query
looks like this
COPY schema.table (column_1, column_2)
FROM 's3://bucket/key'
with credentials
'aws_access_key_id=aws_access_key_id;aws_secret_access_key=aws_secret_access_key'
;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI expected copy_query looks like this
Can you add it to tests?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh then like this?
copy_query = op._build_copy_query(credentials_block, copy_options) | |
copy_query = op._build_copy_query(credentials_block, copy_options) | |
expected_copy_query = ''' | |
COPY schema.table (column_1, column_2) | |
FROM 's3://bucket/key' | |
with credentials | |
'aws_access_key_id=aws_access_key_id;aws_secret_access_key=aws_secret_access_key' | |
; | |
''' | |
assert_equal_ignore_multiple_spaces(self, expected_copy_query, copy_query) | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Exactly. Building this SQL is crucial in integrating with a thirty-patty service, so it should show up in tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add tests for SQL without columns also?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
assert mock_run.call_count == 1 | ||
assert access_key in copy_query | ||
assert secret_key in copy_query | ||
assert_equal_ignore_multiple_spaces(self, mock_run.call_args[0][0], copy_query) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
here we have access to the query, so we can drop of invoking private methods and compare this value. Thus, this test will still check implementations, but will depend less on the internal details.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the comment @mik-laj ;)
So would it be nicer to check execution with plain text rather than invoking private method op._build_copy_query
like this?
From checking execution by invoking private method
copy_query = op._build_copy_query(credentials_block, copy_options)
expected_copy_query = '''
COPY schema.table
FROM 's3://bucket/key'
with credentials
'aws_access_key_id=aws_access_key_id;aws_secret_access_key=aws_secret_access_key'
;
'''
assert_equal_ignore_multiple_spaces(self, expected_copy_query, copy_query)
assert mock_run.call_count == 1
assert access_key in copy_query
assert secret_key in copy_query
assert_equal_ignore_multiple_spaces(self, mock_run.call_args[0][0], copy_query)
To checking execution by plain text
copy_query = '''
COPY schema.table
FROM 's3://bucket/key'
with credentials
'aws_access_key_id=aws_access_key_id;aws_secret_access_key=aws_secret_access_key'
;
'''
assert mock_run.call_count == 1
assert_equal_ignore_multiple_spaces(self, mock_run.call_args[0][0], copy_query)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. Exactly. We should only check the places where the class is connecting to another class, not its internal details.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done :)
Changed all copy_query to plain text
Thanks for all the comment @mik-laj
d1dc18d
to
0ee158d
Compare
0ee158d
to
cab554e
Compare
Co-authored-by: Kamil Breguła <[email protected]>
cab554e
to
c7a3cb0
Compare
@mik-laj Hi, can you take a look and check if it is okay to merge? |
Update copy command for s3_to_redshift to be able to select column names while copying files from s3 to redshift.
Related aws documentation (Link),
^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code change, Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in UPDATING.md.