-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
REDSHIFT: standardize credential usage #2068
Conversation
@zsalzbank, thanks for your PR! By analyzing the history of the files in this pull request, we identified @dlstadther, @ddaniels888 and @rantav to be potential reviewers. |
bdeff7b
to
e8cfb63
Compare
Obtaining the credentials for a redshift connection is now uniform accross all of the redshift tasks. This has the added benefit of allowing role-based credentials in all of the tasks as well.
e8cfb63
to
2f2ef70
Compare
luigi/contrib/redshift.py
Outdated
""" | ||
Override to return the account id. | ||
""" | ||
return None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If aws_access_key_id
and aws_secret_access_key
auto fill from config, why doesn't aws_account_id
and aws_arn_role_name
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was really trying to mimic the existing functionality for those fields, but would be happy to add that in if you think it is the right thing to do.
Right now, I'm not to sure that the implementation (current and previous) was even really so correct, because it is using the information from the s3
section as opposed to a redshift
section in the configuration file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I definitely think both role-based and key-based creds should be capable of pulling from config.
I agree that there is likely a better way to approach this implementation. I'll keep thinking on it.
luigi/contrib/redshift.py
Outdated
config = luigi.configuration.get_config() | ||
|
||
try: | ||
value = config.get('s3', attribute) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since these are general AWS creds, I'm not sure it makes sense to restrict their config location to s3
, particularly if they're being used here for Redshift.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Haha - just mentioned that above, but kept the current implementation. Happy to change to a redshift
section, which I think makes more sense, but it wouldn't be backwards compatible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
S3 gets its config here. So if we were to change this value (in the cred mixin), someone who uses both s3 and redshift (which is most redshift users) would need to include their creds two places in their config. This isn't ideal.
What are your thoughts on having a named section
parameter somewhere with a default value of 'redshift'
(where the user can override) which gets passed to _get_s3_configuration_attribute()
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds doable.
luigi/contrib/redshift.py
Outdated
except NoSectionError: | ||
value = None | ||
|
||
if value is None or value == '': |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can just be if not value:
because None
and ''
equate to False
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I learned something new!
luigi/contrib/redshift.py
Outdated
|
||
def _credentials(self): | ||
""" | ||
Return a credentials string for the provided task. If no valid |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Return a credential string
(not credentials)
I neglected to write a summary review, but in general this is very much something I've wanted to implement myself (just never had the time). All my comments are minor. Thanks for contributing! |
Made a couple of changes. Let me know what you'd like to do regarding |
luigi/contrib/redshift.py
Outdated
""" | ||
return None | ||
|
||
def _get_s3_configuration_attribute(self, attribute): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also just noticed that the name of this method includes s3
. Seem odd given explicit redshift usage.
Made some more changes. |
@zsalzbank Can you confirm that this code works for you? |
I tested getting the values from the environment variables. I don't use the luigi configuration files in my setup, so I didn't really test that part. But I can if you point me in the right direction for how to use them. |
You can create a It would look like
|
Works for me. I made one minor change to take advantage of the configuration object supporting a default that I noticed when debugging. If this all looks good, I'll squash all my fixups. |
615d8ff
to
2cbb60a
Compare
I also updated the error message to say where the variables can be set when there are no valid pairs of credentials. |
Let's get at least one more review before worrying about squashing or merging just yet. If no one else reviews, @Tarrasch are you familiar enough with S3's use of accessing config variables to be able to provide a review here? |
Also, @zsalzbank thanks for such quick turn around edits! |
No problem at all. I like giving back. |
I like this a lot! |
@property
def aws_access_key_id(self):
return '...'
|
@@ -52,7 +138,7 @@ class RedshiftTarget(postgres.PostgresTarget): | |||
use_db_timestamps = False | |||
|
|||
|
|||
class S3CopyToTable(rdbms.CopyToTable): | |||
class S3CopyToTable(rdbms.CopyToTable, _CredentialsMixin): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this the correct order of inheritance? Based on this http://stackoverflow.com/questions/825945/abstract-class-mixin-multiple-inheritance-in-python it should be ordered the other way around. I couldn't test it yet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Works for me as it is, but I'm happy to change it if you think that is the right way to go.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should I?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately I don't have the time to test it. I was doing something similar some time back and remember it failed because the @abc.abstractproperty
s from the base class were still abstract on instantiation.
As this is just based on my memory your code might be correct already. Maybe someone else can help out?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So the issue identified in that SO question seems to be around the Mixin implementing an abstractmethod which is supplied by an earlier inherited class.
Perhaps it's worth swapping the order of inheritance for a future case where _CredentialsMixin
overrides an abstractmethod which is added to rdbms.CopyToTable
.
So long as the overall functionality still works, it doesn't hurt anything.
THanks,
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me re-phrase my concern: From my understanding the current implementation should not work unless you override the abstract properties in sub class. (Which is exactly what you not want to do, when you try to read credentials from luigi.cfg
.)
Currently this is not covered by any unittest, as all the test subclass and override. So, my pragmatic suggestion would be: Can we create a test for reading credentials from luigi.cfg
(or even ENV) ? If this test goes through successfully my comment is decrepit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds doable. Might take me a little bit though, as I have to focus on some other stuff right now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where did we land on this thread? Do your tests cover this (potential) issue?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is what my most recent commit is for. I test loading the credentials from the different available sources. Let me know if you want to see other tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good. Just to clarify: What I missed earlier is the fact that within this PR the aws
credentials are no longer abstract. Potentially missing credentials are now handled explicitly in the mixin. Therefore order of inheritance is indeed irrelevant here. 👍
@zsalzbank Any update here? Really looking forward to merging and using this PR! |
Sorry, been busy, but I'll try and get to it next week. |
Added test cases - pass locally with the existing mixin order, but will confirm that is the case with the full CI suite. |
@dmohns Could you also review again? If approved, i'll merge. |
Thanks! |
Description
Obtaining the credentials for a redshift connection is now uniform
accross all of the redshift tasks. This has the added benefit of
allowing role-based credentials in all of the tasks as well.
Motivation and Context
I need to use role-based credentials for all of my redshift tasks, not just some of them.
Have you tested this? If so, how?
I ran this against the current unit tests for the redshift files and everything passed. If anything else is needed, please let me know.