-
Notifications
You must be signed in to change notification settings - Fork 14.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reduce memory usage in S3Hook
#35449
Comments
I'd love to see someone take/make the time to do a full rewrite on the S3Hook. It's so convoluted and nothing like any of the other hooks. Knowing what we know now about how to create a hook, I bet we could greatly simplify it. |
@ferruzzi I'd like to take a stab at this one if it's available. |
@ellisms - That would be great. You can look at the newer hooks and see lots of examples of how this could be reworked to use boto3.client instead of boto3.resources, and they should really help clean up a LOT of this spaghetti code. let me know if you need any help. |
@ferruzzi Can you assign this one to me? Finally have some time to start looking at it. |
I started digging into this, and converting boto3.resource to boto3.client could introduce a breaking change. For example, |
Body
Original stacktrace from the Slack
The reason of this error simple, for some operations S3Hook create resource (High Level client) in addition to
S3.Client
and this resource created every time when some method of S3Hook called as result additional memory required, for example if runS3Hook.download_file
into the loop it might be reason for this errorAs usual there are at least two solutions:
Option 1: use caching into the internal methods of S3Hook
Option 2: Get rid of resource usage in S3 hook and replace it by
S3.Client
methods. It might be better solution:boto3
S3.Client
Committer
The text was updated successfully, but these errors were encountered: