Django and Amazon AWS Elastic Beanstalk with S3

When you deploy your first Django website to Amazon Web Service’s Elastic Beanstalk, you will face a number of problems, such as:

  1. How should I handle static files and user-uploaded media?
  2. How do I send emails with AWS?
  3. How can I refer to the same AWS application and environment from a second development computer?
  4. How can I add gcc – and specifically, bcrypt – to the AWS environment?
  5. How can I access my site’s RDS database remotely (ie. from my local computer)?

If you’re new to Elastic Beanstalk, check out their tutorial to help you get a Django 1.4 site up on AWS quite quickly. You need to sign up for an account with AWS, but otherwise it just works. It also works for Django 1.5.

This post has some extra notes which I found handy. Also, I found it unnecessary to install MySQL-python on my local machine.

1. Static files and user-uploaded media

If you follow the tutorial above through to the optional step where you set up the admin panel, you will have set up a way to handle static files. This is often the bane of using Django (for me at least). However, you will not have a way yet to handle user-uploaded media.
To test out file uploads, I added a test app which had a model with a single FileField, and registered it with the admin. With this, I could go to the admin panel of the live site and try to upload a file, and test if it worked.

Bad approach – adapt the static files approach to media

My first thought was, if static files are being loaded ok, why not copy the same approach for user-uploaded media? So I added these lines to my config file (to match the existing lines for /static):

  - namespace: aws:elasticbeanstalk:container:python:staticfiles
    option_name: /media/
    value: media/

And to settings.py:

MEDIA_URL = '/media/'
MEDIA_ROOT = os.path.join(os.path.dirname(os.path.dirname(
                  os.path.abspath(__file__))), 'media')

And it worked! I could click on the link to the uploaded file and see it.

Except … then I tried uploading a new version of the code with git aws.push, and suddenly I couldn’t see the file any more.

So I tried a slight variant of this approach, where I only had the one staticfiles instance in the config file, and used a MEDIA_URL of '/static/media/' and similarly for MEDIA_ROOT. It worked in the same way, which is to say, it didn’t work.

I was missing an important point, explained right at the end of this blog post: “Elastic Beanstalk images are all ephemeral… This means that nothing on an instances filesystem will survive through a deployment, redeployment, or stoppage of the environment/instance.”

Good approach – S3 with django-storages

So I had to understand more about how Django stores its files. The documentation is pretty clear on this, and I was happy to learn that the MEDIA_ROOT and MEDIA_URL settings are just locations to save files used by the default file storage system. So if you use another storage system, those two settings (probably) aren’t relevant.

When you use Elastic Beanstalk you also get an S3 bucket, so the solution is to use that to store the uploaded files. You can get your bucket name from the S3 console. The bucket name is the entire string you see there, e.g. elasticbeanstalk-us-west-2-xxxxxxxxxxxx.

We will use django-storages with boto.

First, you need to install them both (and add them to your requirements file):

pip install django-storages
pip install boto
pip freeze | grep django-storages >> requirements.txt
pip freeze | grep boto >> requirements.txt

In your settings.py file, add 'storages' to your INSTALLED_APPS, and also:

    DEFAULT_FILE_STORAGE = 'storages.backends.s3boto.S3BotoStorage'
    AWS_ACCESS_KEY_ID = '---key---'
    AWS_SECRET_ACCESS_KEY = '---secret---'
    AWS_STORAGE_BUCKET_NAME = '---bucket name---'

As I mentioned, you can leave out MEDIA_URL and MEDIA_ROOT now.

And that’s it! I found that this was all I needed to do to be able to upload files through the admin panel, and have them persist. You can also see the uploaded files in your S3 console.

Note this means I am using different storage systems for the static files to the user-uploaded media files. The former do not persist from one deployment to the next (but are reloaded each time), whereas the latter do.
I’m not sure if there’s a downside to this approach – I have seen Stack Overflow posts (e.g. this one) where both sets of files are put on S3.

I’ll also mention that the links to the user-uploaded files are quite long, e.g. https://elasticbeanstalk-us-west-2-xxxxxxxxxxxx.s3.amazonaws.com/myfolder/samplefile.txt?Signature=XXXXXXXXXXXX&Expires=9999999999&AWSAccessKeyId=XXXXXXXXXXXXX. These parameters change between deployments.

This seems to be a good way to handle user-uploaded media. In particular, the additional parameters should limit access to unauthorised users.

2. Email

I just added the usual lines to settings.py for my gmail account:

    EMAIL_USE_TLS = True  # not sure if this is needed
    EMAIL_HOST = 'smtp.gmail.com'
    EMAIL_HOST_USER = 'example@gmail.com'
    EMAIL_HOST_PASSWORD = 'PASSWORD'
    EMAIL_PORT = 587

Then I went into Amazon’s SES (Simple Email Service) console and verified the above EMAIL_HOST_USER email address, and some test recipient email addresses. I had to log in to gmail and respond to an email from gmail that everything was ok too.  Then, in the development sandbox, my Django app could send email fine (but only to the test recipients).

3. Referring to the same AWS environment from another computer

[Edit - this is outdated with CLI v3.0; use the 'eb' command instead.] First, you need to download a copy of the Elastic Beanstalk client to your second computer (as you did for the first one).  But this time, instead of typing eb init, you need to type (on a Mac/Linux system):

cd your/Django/project/directory
~/path/to/AWS-ElasticBeanstalk-CLI-2.5.0/AWSDevTools/Linux/AWSDevTools-RepositorySetup.sh
git aws.config

You will then be prompted for your access id, secret, region, etc, and you should be able to use git aws.push to push to the same place as on your other computer.

4. Adding gcc and/or bcrypt

I want to use bcrypt for password hashing. Simply adding bcrypt to your requirements.txt file is not sufficient, because bcrypt needs two more things: it needs gcc, and it needs the libffi package. Your development computer has these, but the AWS server does not.  Not being at all knowledgeable about yum or yaml, it took some trial and error to work out what changes I needed to make to .ebextensions/aws.config - so to save you this trouble, here are the extra lines you need to add to the yum section:

packages:
  yum:
    libffi-devel: []
    gcc: []

5. Accessing your site’s RDS database remotely

This is surprisingly easy.  You first need to tell RDS which IP addresses are allowed to connect; this is described in detail here.  The quick summary is to find the database’s “Security Groups” console in AWS, go to the “Inbound” tab, and set the rule to “MySQL”, with your local IP address (which you can get from whatismyip.com).

You can get a copy of the database dumped onto your local machine with eg.:

/Applications/MAMP/Library/bin/mysqldump -h abcdefg.cdefg.ap-xxx-1.rds.amazonaws.com -u ebroot -p ebdb > db.sql

The -p option will make it prompt you for your database password, which you entered when you set up the EB environment.  (I’m using MAMP, hence the need for the path to mysqldump above – you may not need this.) Do not put the port number (eg. :3306) at the end of the URL.

If you want to run your local development version of Django with the AWS RDS database, all you need to do is set the following environment variables before you do ./manage.py runserver:

    # export RDS_DB_NAME='ebdb'
    # export RDS_USERNAME='ebroot'
    # export RDS_PASSWORD=''  # you need to remember this
    # export RDS_HOSTNAME='xxxx.xxxx.us-east-1.rds.amazonaws.com'
    # (HOSTNAME is the endpoint from https://console.aws.amazon.com/rds/home )
    # export RDS_PORT='3306'  # also from the console

That’s assuming you are using the suggested setup in settings.py:

if 'RDS_DB_NAME' in os.environ:
    DATABASES = {
        'default': {
            'ENGINE': 'django.db.backends.mysql',
            'NAME': os.environ['RDS_DB_NAME'],
            'USER': os.environ['RDS_USERNAME'],
            'PASSWORD': os.environ['RDS_PASSWORD'],
            'HOST': os.environ['RDS_HOSTNAME'],
            'PORT': os.environ['RDS_PORT'],
        }
    }

I hope this helps someone out there get over the hurdle to using AWS.

  

4 thoughts on “Django and Amazon AWS Elastic Beanstalk with S3”

  1. For the S3 query parameters problem, you can get rid of them by:
    1. In your Django settings put AWS_QUERYSTRING_AUTH = False . This will get rid of query strings.
    2. Go to you S3 console, make a new folder for your static/media files. Then make it a public folder, since we’ve already disabled S3 authentication in Django.
    3. Back to Django settings, put AWS_LOCATION = ‘your_folder_name’ without trailing slash.

    You might also need to take a look at this issue of django-storages if you use the admin heavily:
    https://bitbucket.org/david/django-storages/issue/121/s3boto-admin-prefix-issue-with-django-14
    The subclass fix works for me.

  2. Thanks for the post, very informative.

    I’ve tried Amazon EB yesterday (Jul 19) and it works fine with vanilla Django 1.5 and admin activated. I’ve included ’127.0.0.1′, ‘localhost’, and ‘mysite.elasticbeanstalk.com’ in ALLOWED_HOSTS.

    As for the choice of storing STATIC files, the advantage of storing them in S3 along with user-uploaded MEDIA is that the S3 infrastructure is optimized for delivering contents globally, with CDN in every data center around the world- so our users can retrieve the static file fast anywhere (you can notice in S3 console that it is GLOBAL) . Whereas EC2 instances are bound regionally, thus higher latency.

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>