Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/download directly to s3 #2584

Merged
merged 3 commits into from
Aug 4, 2017
Merged

Conversation

vrajmohan
Copy link
Contributor

No description provided.

Vraj Mohan added 3 commits August 3, 2017 14:22
This avoids using the local filesystem, which is good because we have
occasionally run into "No space left on device" errors.

We also avoid creating a zip file and a manifest and instead upload a
CSV file directly. This has 2 consequences:
- most users would prefer this because it avoids the step of having to
unzip the downloaded zip file.
- as there is no manifest, there is no way to show the query criteria
that were used to create the download.

This uses the package `smart_open` which unfortunately uses `boto` and
not `boto3`. That is not too bad, as `boto` and `boto3` can co-exist.
Using the partial function `use_kwargs` was preventing the parsing of
the POSTed JSON. We can restore the parsing by using the original
'use_kwargs_original`.
@codecov-io
Copy link

codecov-io commented Aug 4, 2017

Codecov Report

Merging #2584 into develop will decrease coverage by 0.43%.
The diff coverage is 52.94%.

Impacted file tree graph

@@             Coverage Diff             @@
##           develop    #2584      +/-   ##
===========================================
- Coverage    90.09%   89.66%   -0.44%     
===========================================
  Files           73       73              
  Lines         5645     5612      -33     
===========================================
- Hits          5086     5032      -54     
- Misses         559      580      +21
Impacted Files Coverage Δ
webservices/resources/download.py 100% <100%> (ø) ⬆️
webservices/tasks/download.py 66.66% <33.33%> (-25.78%) ⬇️
webservices/tasks/utils.py 80.95% <50%> (-19.05%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 0d582b0...bba27f4. Read the comment docs.

Copy link
Contributor

@ccostino ccostino left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great, @vrajmohan! I just had a couple of questions, but I think this is good to go. :-)


client = boto3.client('s3')

MAX_RECORDS = 100000
MAX_RECORDS = 500000
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh wow we're able to bump the max records with this too?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I should have called this out. My plan was to bump this up and test in dev first.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's give it a shot! :-)

db.session.connection().engine,
format='csv',
header=True
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, this is a lot simpler in addition to streaming directly! :-)

@@ -19,3 +22,13 @@ def get_bucket():

def get_object(key):
return get_bucket().Object(key=key)

def get_s3_key(name):
connection = boto.s3.connect_to_region(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know you had mentioned this before somewhere but can't remember - boto3 didn't provide what was needed? I know both can co-exist, just wanted to make sure I had a clear understanding!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is summarized in the commit message for 18F@028361c

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

D'oh, shame on me for not reading more closely, thanks!

@ccostino ccostino merged commit 17d9b73 into develop Aug 4, 2017
@ccostino ccostino deleted the feature/download-directly-to-S3 branch August 4, 2017 21:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants