Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature request: s3_client.download_file* multipart download #131

Closed
acordiner opened this issue Jan 4, 2019 · 10 comments
Closed

Feature request: s3_client.download_file* multipart download #131

acordiner opened this issue Jan 4, 2019 · 10 comments
Assignees

Comments

@acordiner
Copy link

Thanks for your excellent library! The docs mention the following patches to s3transfer:

s3_client.download_file* This is performed by the s3transfer module. – Patched with get_object
s3_client.upload_file* This is performed by the s3transfer module. – Patched with custom multipart upload

For performance reasons, it would be fantastic if the download_file* methods also did a custom multipart download, the same way the upload_file* methods do. Is this in the roadmap?

@terricain
Copy link
Owner

download_file and download_fileobj pretty much reads 4096 bytes, writes to file then reads another 4096 bytes. Am pretty sure S3 doesn't support multipart downloads, correct me if I'm wrong.

Does this do what you want?
https://github.com/terrycain/aioboto3/blob/master/aioboto3/s3/inject.py#L83-L96

@acordiner
Copy link
Author

I believe so; the docs list "Uploading/downloading a file in parallel" as a feature:
https://www.pydoc.io/pypi/boto3-1.4.5/autoapi/s3/transfer/index.html

@terricain
Copy link
Owner

Ok, have a go with download_file/fileobj and see if it does what you need.

@terricain
Copy link
Owner

Having done some reading, and some work done by @thehesiod it looks like in its current state, get_object does indeed download the entire file as it attempts to verify md5 sums.

We can improve download_file/obj by being more like s3transfer and using get_objects Range options to download multiple parts of the file. Will look into this in the next coming week or so.

@thehesiod
Copy link

thehesiod commented Jan 5, 2019

Also because aws API calls are signed, I believe the only way to upload in parts would be using multipart upload.

@terricain
Copy link
Owner

Also because aws API calls are signed, I believe the only way to upload in parts would be using multipart upload.

Yup, thats what I came to as well. Need to dig around in s3transefer, as the s3.upload/download_file/obj methods have some logic in them to choose between put_object and multipart upload, as well as get_object vs multiple get_object with range

@terricain terricain self-assigned this Jan 6, 2019
@thehesiod
Copy link

btw a big issue we have with multipart uploads is that the ETag becomes rather useless unless you put in the metadata what the chunk sizes were.

@terricain
Copy link
Owner

@thehesiod I'll add that in when I come to redoing this part

@rajeshwar-nu
Copy link

Any updates or ETA for multipart downloads? Eagerly waiting for this one.

@terricain
Copy link
Owner

Sorry been away for quite a while, no eta on multipart downloads as of yet.

kyboi pushed a commit to Kreoh/aioboto3 that referenced this issue Apr 29, 2024
kyboi added a commit to Kreoh/aioboto3 that referenced this issue Apr 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants