-
Notifications
You must be signed in to change notification settings - Fork 46
Working with S3
This is a modified version of article originally published here
NOTE: We can use this python script to create a bucket with proper IAM user https://gist.github.com/theskumar/e349eea100f73c7f1dc5dfba324429de
Storing your Django site's static and media files on Amazon S3, instead of serving them yourself, can make your site perform better.
This post is about how to do that. We'll describe how to set up an S3 bucket with the proper permissions and configuration, how to upload static and media files from Django to S3, and how to serve the files from S3 when people visit your site.
We'll assume that you've got access to an S3 account, and a user with the permissions you'll need.
The first thing to consider is that, while I might be using my dpoirier
userid to set this up, I probably don't want our web site using my dpoirier
userid permanently. If someone was able to break into the site and get the credentials, I wouldn't want them to have access to everything I own. Or if I left Caktus (unthinkable though that is), someone else might need to be able to manage the resources on S3.
What we'll do is set up a separate AWS user, with the necessary permissions to run the site, but no more, and then have the web site use that user instead of your own.
-
Create a new user: ** Go to AWS IAM. ** Click "Create new users" and follow the prompts. Leave "Generate an access key for each User" selected.
-
Get the credentials
-
Go to the new user's Security Credentials tab.
-
Click "Manage access keys",
-
Download the credentials for the access key that was created, and
-
Save them somewhere because no one will ever be able to download them again.
-
(Though it's easy enough to create a new access key if you lose the old one's secret key.)
-
Get the new user's ARN (Amazon Resource Name) by going to the user's Summary tab. It'll look like this: "arn:aws:iam::123456789012:user/someusername"
-
Go to the bucket properties in the S3 management console.
-
Add a bucket policy that looks like this, but change "BUCKET-NAME" to the bucket name, and "USER-ARN" to your new user's ARN. The first statement makes the contents publicly readable (so you can serve the files on the web), and the second grants full access to the bucket and its contents to the specified user::
{ "Version": "2012-10-17", "Statement": [ { "Sid":"PublicReadForGetBucketObjects", "Effect":"Allow", "Principal": { "AWS": "*" }, "Action":["s3:GetObject"], "Resource":["arn:aws:s3:::BUCKET-NAME/*" ] }, { "Action": "s3:*", "Effect": "Allow", "Resource": [ "arn:aws:s3:::BUCKET-NAME", "arn:aws:s3:::BUCKET-NAME/*" ], "Principal": { "AWS": [ "USER-ARN" ] } } ] }
-
If you need to add limited permissions for another user to do things with this bucket, you can add more statements. For example, if you want another user to be able to copy all the content from this bucket to another bucket:
{ "Action": "s3:ListBucket", "Effect": "Allow", "Resource": "arn:aws:s3:::BUCKET-NAME", "Principal": { "AWS": [ "USER-ARN" ] } }
That will let the user list the objects in the bucket. The bucket was already publicly readable, but not listable, so adding this permission will let the user sync from this bucket to another one where the user has full permissions.
Expected results:
- The site can use the access key ID and secret key associated with the user's access key to access the bucket
- The site will be able to do anything with that bucket
- The site will not be able to do anything outside that bucket
The simplest case is just using S3 to serve your static files. In Django, we say "static files" to refer to the fixed files that we provide and serve as part of our site - typically images, css, and javascript, and maybe some static HTML files. Static files do not include any files that might be uploaded by users of the site. We call those "media files".
Before continuing, you should be familiar with managing static files, the staticfiles app, and deploying static files in Django.
Also, your templates should never hard-code the URL path of your static files. Use the static tag instead:
{% load static from staticfiles %}
<img src="{% static 'images/rooster.png' %}">
That will use whatever the appropriate method is to figure out the right URL for your static files.
Django provides two template tags named static
.
The first static
is in the static
templatetags library, and accessed using {% load static %}
. It just puts the value of STATIC_URL
in front of the path.
The one from staticfiles
({% load static from staticfiles %}
) is smarter - it uses whatever storage class you've configured for static files to come up with the URL.
By using the one from staticfiles
from the start, you'll be prepared for any storage class you might decide to use in the future.
In order for your static files to be served from S3 instead of your own server, you need to arrange for two things to happen:
- When you serve pages, any links in the pages to your static files should point at their location on S3 instead of your own server.
- Your static files are on S3 and accessible to the web site's users.
Part 1 is easy if you've been careful not to hardcode static file paths in your templates. Just change STATICFILES_STORAGE in your settings.
But you still need to get your files onto S3, and keep them up to date. You could do that by running collectstatic
locally, and using some standalone tool to sync the collected static files to S3, at each deploy. But we won't be able to get away with such a simple solution for media files, so we might as well go ahead and set up the custom Django storage we'll need now, and then our collectstatic
will copy the files up to S3 for us.
To start, install two Python packages: django-storages (yes, that's "storages" with an "S" on the end), and boto:
$ pip install django-storages boto
Add 'storages'
to INSTALLED_APPS
:
INSTALLED_APPS = (
...,
'storages',
)
If you want (optional), add this to your common settings:
AWS_HEADERS = { # see http://developer.yahoo.com/performance/rules.html#expires
'Expires': 'Thu, 31 Dec 2099 20:00:00 GMT',
'Cache-Control': 'max-age=94608000',
}
That will tell boto that when it uploads files to S3, it should set properties on them so that when S3 serves them, it'll include those HTTP headers in the response. Those HTTP headers in turn will tell browsers that they can cache these files for a very long time.
Now, add this to your settings, changing the first three values as appropriate:
AWS_STORAGE_BUCKET_NAME = 'BUCKET_NAME'
AWS_ACCESS_KEY_ID = 'xxxxxxxxxxxxxxxxxxxx'
AWS_SECRET_ACCESS_KEY = 'yyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyy'
# Tell django-storages that when coming up with the URL for an item in S3 storage, keep
# it simple - just use this domain plus the path. (If this isn't set, things get complicated).
# This controls how the `static` template tag from `staticfiles` gets expanded, if you're using it.
# We also use it in the next setting.
AWS_S3_CUSTOM_DOMAIN = '%s.s3.amazonaws.com' % AWS_STORAGE_BUCKET_NAME
# This is used by the `static` template tag from `static`, if you're using that. Or if anything else
# refers directly to STATIC_URL. So it's safest to always set it.
STATIC_URL = "https://%s/" % AWS_S3_CUSTOM_DOMAIN
# Tell the staticfiles app to use S3Boto storage when writing the collected static files (when
# you run `collectstatic`).
STATICFILES_STORAGE = 'storages.backends.s3boto.S3BotoStorage'
Only the first three lines should need to be changed for now.
One more thing you need to set up is CORS. CORS defines a way for client web applications that are loaded in one domain to interact with resources in a different domain. Since we're going to be serving our static files and media from a different domain, if you don't take CORS into account, you'll run into mysterious problems, like Firefox not using your custom fonts for no apparent reason.
Go to your S3 bucket properties, and under "Permissions", click on "Add CORS Configuration". Paste this in:
<?xml version="1.0" encoding="UTF-8"?>
<CORSConfiguration xmlns="http://s3.amazonaws.com/doc/2006-03-01/">
<CORSRule>
<AllowedOrigin>https://*.fueled.engineering</AllowedOrigin>
<AllowedMethod>GET</AllowedMethod>
<MaxAgeSeconds>3000</MaxAgeSeconds>
<AllowedHeader>Authorization</AllowedHeader>
</CORSRule>
</CORSConfiguration>
MaxAgeSeconds: In sec, the browser will cache this CORS rule locally for itself.
I won't bother to explain this, since there are plenty of explanations on the web that you can Google for. The tricky part is knowing you need to add CORS in the first place.
With this all set up, you should be able to upload your static files to S3 using collectstatic:
python manage.py collectstatic
If you see any errors, double-check all the steps above.
Once that's successful, you should be able to start your test site and view some pages. Look at the page source and you should see that the images, css, and javascript are being loaded from S3 instead of your own server. Any media files should still be served as before.
Don't put this into production quite yet, though. We still have some changes to make to how we're doing this.
Reminder: Django "media" files are files that have been uploaded by web site users, that then need to be served from your site. One example is a user avatar (an image the user uploads and the site displays with the user's information).
Media files are typically managed using FileField
and ImageField
fields on models. In a template, you use the url
attribute on the file field to get the URL of the underlying file.
For example, if user.avatar
is an ImageField
on your user model, then
<img src="">
would embed the user's avatar image in the web page.
By default, when a file is uploaded using a FileField
or ImageField
, it is saved to a file on a path inside the local directory named by MEDIA_ROOT
, under a subdirectory named by the field's upload_to
value. When the file's url
attribute is accessed, it returns the value of MEDIA_URL
, prepended to the file's path inside MEDIA_ROOT
.
An example might help. Suppose we have these settings:
MEDIA_ROOT = '/var/media/'
MEDIA_URL = 'http://media.example.com/'
and this is part of our user model:
avatar = models.ImageField(upload_to='avatars')
When a user uploads an avatar image, it might be saved as /var/media/avatars/12345.png
. Then <img src="">
would expand to <img src="http://media.example.com/avatars/12345.png">
.
Our goal is instead of saving those files to a local directory, to send them to S3. Then instead of having to serve them somehow locally, we can let Amazon serve them for us.
Another advantage of using S3 for media files is if you scale up by adding more servers, this makes uploaded images available on all servers at once.
Ideally, we'd be able to start putting new media files on S3 just by adding this to our settings:
# DO NOT DO THIS!
MEDIA_URL = "https://%s/" % AWS_S3_CUSTOM_DOMAIN
DEFAULT_FILE_STORAGE = 'storages.backends.s3boto.S3BotoStorage'
Adding those settings would indeed tell Django to save uploaded files to our S3 bucket, and use our S3 URL to link to them.
Unfortunately, this would store our media files on top of our static files, which we're already keeping in our S3 bucket. If we were careful to always set upload_to
on our FileFields to directory names that would never occur in our static files, we might get away with it (though I'm not sure Django would even let us). But we can do better.
What we want to do is either enforce storing our static files and media files in different subdirectories of our bucket, or use two different buckets. I'll show how to use the different paths first.
In order for our STATICFILES_STORAGE
to have different settings from our DEFAULT_FILE_STORAGE
, they need to use two different storage classes; there's no way to configure anything more fine-grained. So, we'll start by creating a custom storage class for our static file storage, by subclassing S3BotoStorage. We'll also define a new setting, so we don't have to hard-code the path in our Python code:
# custom_storages.py
from django.conf import settings
from storages.backends.s3boto import S3BotoStorage
class StaticStorage(S3BotoStorage):
location = settings.STATICFILES_LOCATION
Then in our settings:
STATICFILES_LOCATION = 'static'
STATICFILES_STORAGE = 'custom_storages.StaticStorage'
STATIC_URL = "https://%s/%s/" % (AWS_S3_CUSTOM_DOMAIN, STATICFILES_LOCATION)
Giving our class a location
attribute of 'static'
will put all our files into paths on S3 starting with 'static/'
.
You should be able to run collectstatic
again, restart your site, and now all your static files should have '/static/' in their URLs. Now delete from your S3 bucket any files outside of '/static' (using the S3 console, or whatever tool you like).
We can do something very similar now for media files, adding another storage class:
class MediaStorage(S3BotoStorage):
location = settings.MEDIAFILES_LOCATION
and in settings:
MEDIAFILES_LOCATION = 'media'
MEDIA_URL = "https://%s/%s/" % (AWS_S3_CUSTOM_DOMAIN, MEDIAFILES_LOCATION)
DEFAULT_FILE_STORAGE = 'custom_storages.MediaStorage'
Now when a user uploads their avatar, it should go into '/media/' in our S3 bucket. When we display the image on a page, the image URL will include '/media/'.
You can use different buckets for static and media files by adding a bucket_name
attribute to your custom storage classes. You can see the whole list of attributes you can set by looking at the source for S3BotoStorage
.
If your site already has user-uploaded files in a local directory, you'll need to copy them up to your media directory on S3. There are lots of tools these days for doing this kind of thing. If the command line is your thing, try the AWS CLI tools from Amazon. They worked okay for me.
Serving your static and media files from S3 requires getting a lot of different parts working together. But it's worthwhile for a number of reasons:
- S3 can probably serve your files more efficiently than your own server.
- Using S3 saves the resources of your own server for more important work.
- Having media files on S3 allows easier scaling by replicating your servers.
- Once your files are on S3, you're well on the way to using CloudFront to serve them even more efficiently using Amazon's CDN service.