Need documentation on S3 use #26

codebynumbers · 2016-11-23T16:04:33Z

It's not clear that TinyS3 is needed, also not obvious without digging through code on how to set the AWS keys. Would also be nice if it supported profiles like boto, but that seems to be a limitation of TinyS3

wdm0006 · 2016-11-23T16:25:27Z

Thanks for the input, for sure you are right about the documentation. As for boto vs tinys3, what extra would that allow that still matches how you would interact with spark? I've not used profiles in either spark or boto so I'm not sure what that would look like.

codebynumbers · 2016-11-23T16:45:37Z

The nice thing boto allows you to do, is to not have to specify your credentials in code at all. It reads them from files on disk (~/.aws/credentials). You can also set up multiple credentials in a single file in profile sections, so you can specify the profile name to use in the code. This is helpful if you have multiple accounts/iam roles and need to quickly switch between them.

When we usually use spark, we configure it with IAM roles that control which S3 files it has access to, ie we are not embedding credentials in config files. I think the biggest hurdle was the documentation, more than the profiles though.

wdm0006 · 2016-11-23T16:57:51Z

Ah I see what you mean, that does make sense. First let's tackle the documentation issue though, I'd like to just basically copy the pyspark docs for the implemented methods, because the idea is to work the same way. Separately some examples for things like pulling files from s3, or accessing rdd data directly for debugging would be helpful I think. Do you think that would have been enough in your situation?

codebynumbers · 2016-11-23T20:53:38Z

Yeah that would have been perfect.

On Nov 23, 2016 11:57 AM, "Will McGinnis" [email protected] wrote:

Ah I see what you mean, that does make sense. First let's tackle the
documentation issue though, I'd like to just basically copy the pyspark
docs for the implemented methods, because the idea is to work the same way.
Separately some examples for things like pulling files from s3, or
accessing rdd data directly for debugging would be helpful I think. Do you
think that would have been enough in your situation?

—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
#26 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AA4IMBWl6wtsRJjSMf0u2xExd3qiIhnfks5rBHCPgaJpZM4K6wyX
.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Need documentation on S3 use #26

Need documentation on S3 use #26

codebynumbers commented Nov 23, 2016

wdm0006 commented Nov 23, 2016

codebynumbers commented Nov 23, 2016

wdm0006 commented Nov 23, 2016

codebynumbers commented Nov 23, 2016

Need documentation on S3 use #26

Need documentation on S3 use #26

Comments

codebynumbers commented Nov 23, 2016

wdm0006 commented Nov 23, 2016

codebynumbers commented Nov 23, 2016

wdm0006 commented Nov 23, 2016

codebynumbers commented Nov 23, 2016