-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Need documentation on S3 use #26
Comments
Thanks for the input, for sure you are right about the documentation. As for boto vs tinys3, what extra would that allow that still matches how you would interact with spark? I've not used profiles in either spark or boto so I'm not sure what that would look like. |
The nice thing boto allows you to do, is to not have to specify your credentials in code at all. It reads them from files on disk (~/.aws/credentials). You can also set up multiple credentials in a single file in profile sections, so you can specify the profile name to use in the code. This is helpful if you have multiple accounts/iam roles and need to quickly switch between them. When we usually use spark, we configure it with IAM roles that control which S3 files it has access to, ie we are not embedding credentials in config files. I think the biggest hurdle was the documentation, more than the profiles though. |
Ah I see what you mean, that does make sense. First let's tackle the documentation issue though, I'd like to just basically copy the pyspark docs for the implemented methods, because the idea is to work the same way. Separately some examples for things like pulling files from s3, or accessing rdd data directly for debugging would be helpful I think. Do you think that would have been enough in your situation? |
Yeah that would have been perfect. On Nov 23, 2016 11:57 AM, "Will McGinnis" [email protected] wrote:
|
It's not clear that TinyS3 is needed, also not obvious without digging through code on how to set the AWS keys. Would also be nice if it supported profiles like boto, but that seems to be a limitation of TinyS3
The text was updated successfully, but these errors were encountered: