Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

REDSHIFT: standardize credential usage #2068

Merged
merged 7 commits into from
Aug 28, 2017

Conversation

zsalzbank
Copy link
Contributor

Description

Obtaining the credentials for a redshift connection is now uniform
accross all of the redshift tasks. This has the added benefit of
allowing role-based credentials in all of the tasks as well.

Motivation and Context

I need to use role-based credentials for all of my redshift tasks, not just some of them.

Have you tested this? If so, how?

I ran this against the current unit tests for the redshift files and everything passed. If anything else is needed, please let me know.

@mention-bot
Copy link

@zsalzbank, thanks for your PR! By analyzing the history of the files in this pull request, we identified @dlstadther, @ddaniels888 and @rantav to be potential reviewers.

Obtaining the credentials for a redshift connection is now uniform
accross all of the redshift tasks. This has the added benefit of
allowing role-based credentials in all of the tasks as well.
"""
Override to return the account id.
"""
return None
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If aws_access_key_id and aws_secret_access_key auto fill from config, why doesn't aws_account_id and aws_arn_role_name?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was really trying to mimic the existing functionality for those fields, but would be happy to add that in if you think it is the right thing to do.

Right now, I'm not to sure that the implementation (current and previous) was even really so correct, because it is using the information from the s3 section as opposed to a redshift section in the configuration file.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I definitely think both role-based and key-based creds should be capable of pulling from config.

I agree that there is likely a better way to approach this implementation. I'll keep thinking on it.

config = luigi.configuration.get_config()

try:
value = config.get('s3', attribute)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since these are general AWS creds, I'm not sure it makes sense to restrict their config location to s3, particularly if they're being used here for Redshift.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Haha - just mentioned that above, but kept the current implementation. Happy to change to a redshift section, which I think makes more sense, but it wouldn't be backwards compatible.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

S3 gets its config here. So if we were to change this value (in the cred mixin), someone who uses both s3 and redshift (which is most redshift users) would need to include their creds two places in their config. This isn't ideal.

What are your thoughts on having a named section parameter somewhere with a default value of 'redshift' (where the user can override) which gets passed to _get_s3_configuration_attribute()?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds doable.

except NoSectionError:
value = None

if value is None or value == '':
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can just be if not value: because None and '' equate to False

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I learned something new!


def _credentials(self):
"""
Return a credentials string for the provided task. If no valid
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Return a credential string (not credentials)

@dlstadther
Copy link
Collaborator

I neglected to write a summary review, but in general this is very much something I've wanted to implement myself (just never had the time). All my comments are minor.

Thanks for contributing!

@zsalzbank
Copy link
Contributor Author

Made a couple of changes. Let me know what you'd like to do regarding s3/redshift section and auto-loading the credentials, and I'll be happy to change it as well.

"""
return None

def _get_s3_configuration_attribute(self, attribute):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also just noticed that the name of this method includes s3. Seem odd given explicit redshift usage.

@zsalzbank
Copy link
Contributor Author

Made some more changes.

@dlstadther
Copy link
Collaborator

@zsalzbank Can you confirm that this code works for you?

@zsalzbank
Copy link
Contributor Author

I tested getting the values from the environment variables. I don't use the luigi configuration files in my setup, so I didn't really test that part. But I can if you point me in the right direction for how to use them.

@dlstadther
Copy link
Collaborator

You can create a luigi.cfg file in the same directory as your .py luigi files.

It would look like

[redshift]
aws_access_key_id: <your id here>
aws_secret_access_key: <your secret here>

[s3]
<same as redshift above just to check that you can change the section to something other than the default>

@zsalzbank
Copy link
Contributor Author

Works for me. I made one minor change to take advantage of the configuration object supporting a default that I noticed when debugging.

If this all looks good, I'll squash all my fixups.

@zsalzbank
Copy link
Contributor Author

I also updated the error message to say where the variables can be set when there are no valid pairs of credentials.

@dlstadther
Copy link
Collaborator

Let's get at least one more review before worrying about squashing or merging just yet.

If no one else reviews, @Tarrasch are you familiar enough with S3's use of accessing config variables to be able to provide a review here?

@dlstadther
Copy link
Collaborator

Also, @zsalzbank thanks for such quick turn around edits!

@zsalzbank
Copy link
Contributor Author

No problem at all. I like giving back.

@dmohns
Copy link
Contributor

dmohns commented Mar 24, 2017

I like this a lot!
Two questions:
1.) Just double-checking as I am not 100% familiar with Mixins: Will backwards compatibility be preserved? E.g. will we still be able to provide Redshift credentials by overriding not using the Mixin or cfg?
2.) Isn't the comment You must also override the attributes provided by the CredentialsMixin. a bit misleading? As far as I understand the main benefit of this PR is that we do not have to override attributes but can simply provide entries in the cfg.

@zsalzbank
Copy link
Contributor Author

  1. Yes, at least as I understand your question. You can still do aws_access_key_id = '...' in the class definition or:
@property
def aws_access_key_id(self):
    return '...'
  1. Clarified.

@@ -52,7 +138,7 @@ class RedshiftTarget(postgres.PostgresTarget):
use_db_timestamps = False


class S3CopyToTable(rdbms.CopyToTable):
class S3CopyToTable(rdbms.CopyToTable, _CredentialsMixin):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this the correct order of inheritance? Based on this http://stackoverflow.com/questions/825945/abstract-class-mixin-multiple-inheritance-in-python it should be ordered the other way around. I couldn't test it yet.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Works for me as it is, but I'm happy to change it if you think that is the right way to go.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should I?

Copy link
Contributor

@dmohns dmohns Mar 30, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately I don't have the time to test it. I was doing something similar some time back and remember it failed because the @abc.abstractpropertys from the base class were still abstract on instantiation.
As this is just based on my memory your code might be correct already. Maybe someone else can help out?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the issue identified in that SO question seems to be around the Mixin implementing an abstractmethod which is supplied by an earlier inherited class.

Perhaps it's worth swapping the order of inheritance for a future case where _CredentialsMixin overrides an abstractmethod which is added to rdbms.CopyToTable.

So long as the overall functionality still works, it doesn't hurt anything.

THanks,

Copy link
Contributor

@dmohns dmohns Apr 5, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me re-phrase my concern: From my understanding the current implementation should not work unless you override the abstract properties in sub class. (Which is exactly what you not want to do, when you try to read credentials from luigi.cfg.)

Currently this is not covered by any unittest, as all the test subclass and override. So, my pragmatic suggestion would be: Can we create a test for reading credentials from luigi.cfg (or even ENV) ? If this test goes through successfully my comment is decrepit.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds doable. Might take me a little bit though, as I have to focus on some other stuff right now.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where did we land on this thread? Do your tests cover this (potential) issue?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is what my most recent commit is for. I test loading the credentials from the different available sources. Let me know if you want to see other tests.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good. Just to clarify: What I missed earlier is the fact that within this PR the aws credentials are no longer abstract. Potentially missing credentials are now handled explicitly in the mixin. Therefore order of inheritance is indeed irrelevant here. 👍

@dlstadther
Copy link
Collaborator

@zsalzbank Any update here? Really looking forward to merging and using this PR!

@zsalzbank
Copy link
Contributor Author

Sorry, been busy, but I'll try and get to it next week.

@zsalzbank
Copy link
Contributor Author

Added test cases - pass locally with the existing mixin order, but will confirm that is the case with the full CI suite.

@dlstadther
Copy link
Collaborator

@dmohns Could you also review again? If approved, i'll merge.

@dlstadther dlstadther merged commit b47b267 into spotify:master Aug 28, 2017
@zsalzbank
Copy link
Contributor Author

Thanks!

This was referenced Jun 29, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants