Category Archives: Websites

Using redis-queue for asynchronous calls with Django

I recently posted about using Redis and Celery with Django to handle asynchronous calls from your web pages. Given that I have memory constraints on the server, I have been wondering if I might get more bang for my buck with redis-queue (rq) instead of Celery.  In fact, I have found them comparable: rq uses about 12Mb per worker, and Celery uses about 10-12Mb per process.  However, Celery workers use (1+concurrency) processes, so if concurrency=1, Celery appears to use double the memory.

Using RQ

Here are the changes I’ve made to the code I posted earlier to replace Celery with redis-queue.  Note jobs.py is exactly the same as celery’s tasks.py, but without the @task decorator. (I did not use rq’s @job decorator.)

def status_view(request):
    """
    Called by the opt page via ajax to check if the optimisation is finished.
    If it is, return the results in JSON format.
    """
    if not request.is_ajax():
        raise SuspiciousOperation("No access.")
    if QUEUE_BACKEND=='celery':
        # as before - the main part was a call to Celery's AsyncResult
    elif QUEUE_BACKEND=='rq':
        from django.conf import settings
        from redis import Redis
        from rq import Queue
        from rq.job import Job, Status
        from rq.exceptions import NoSuchJobError
        try:
            connection = Redis(settings.RQ_REDIS_URL, settings.RQ_REDIS_PORT)
            # not quite sure if better to use Job(...) or Job.fetch(...) here
            # the difference is fetch also calls refresh
            # but I see it does not rerun the job
            job = Job.fetch(request.session['job_id'], connection=connection)
        except KeyError, NoSuchJobError:
            ret = {'error':'No optimisation is underway (or you may have disabled cookies).'}
            return HttpResponse(json.dumps(ret))
        if job.is_finished:
            ret = get_solution(job)
        elif job.is_queued:
            ret = {'status':'in-queue'}  # note extra
        elif job.is_started:
            ret = {'status':'waiting'}
        elif job.is_failed:
            ret = {'status': 'failed'}   # note extra

def get_context_data(self, **kwargs):
    ...
    if QUEUE_BACKEND=='celery':
        from . import tasks
        result = tasks.solve.delay(myarg, timeout=timeout)
    elif QUEUE_BACKEND=='rq':
        from . import jobs
        from redis import Redis
        from rq import Queue
        connection = Redis('localhost', PORT)
        q = Queue(connection=connection)
        job = q.enqueue_call(func=jobs.solve, args=[myarg],
                             kwargs={'timeout':timeout}, timeout=timeout+10)
        # the solve call itself has a timeout argument; timeout with rq shouldn't occur

In settings.py I added:

     RQ_REDIS_URL = 'localhost'
     RQ_REDIS_PORT = 6379

But I did not use django-rq at all.

One nice thing I see immediately is the additional status info – you can easily query if a job is still in the queue or has failed.  I’m sure these are possible to see in Celery too, but they are obvious in rq.

Run RQ workers

Running an rq worker is nice and simple – there is no daemonization or even setup files. On either your dev or production server, just type (and repeat for as many workers as you want):

rqworker --port 6379

Remaining issues

One initial problem was finding out how to get an existing job from its id.  I solved this with:

Job.fetch(job_id, connection=connection)

However, I cannot find documentation about Job.fetch, and I see that Job(...) by itself also works.  Please let me know if you know which of these I should be using.

The main problem I have with redis-queue now is terminating a task.  I have a “cancel” button on the optimisation screen, which I can implement with Celery via:

revoke(task_id, terminate=True)  # celery

I cannot find an equivalent in rq.  This is unfortunately a deal-breaker for me, so I am sticking with celery for now.  Can you help?

Asynchronous calls from Django

I have an optimisation I would like to run when the user presses a button on a Django page. For small cases, it is fine to run it synchronously.  However, when it takes more than a second or so, it is not great to have the web server held back by a process of unknown length.

The solution I have settled on is Celery, with Redis as the message broker.  I am using Redis over the alternatives, since it seems to have much lower memory requirements (I find it uses under 2 Mb, vs. 10-30 Mb per Celery process). And the equivalent commands if you want to use redis-queue (which uses about 10 Mb per worker) instead of Celery are given in this post.

There is a bit of a learning curve to get started with this, so I am making a guide for the next person by listing all the steps I have taken to get set up on both my development platform (running MacOS X) and a unix server (hosted by Webfaction).  Along the way I hope to answer questions about security and what the right settings are to put in the redis.conf file, the celery config file, and the usual Django settings.py file.

Install Redis

Redis is the message broker. You will need to have this running at all times for Celery’s tasks to be executed.

Installing Redis on Mac OS X is described in this blog. Basically, just download the latest version from redis.io, and in the resulting untarred directory:

make test
make
sudo mv src/redis-server /usr/bin
sudo mv src/redis-cli /usr/bin
mkdir ~/.redis
touch ~/.redis/redis.conf

Installing Redis on your server is similar, though you may need to know how to download the code from the command line first (e.g. see this post):

wget http://redis.googlecode.com/files/redis-2.6.14.tar.gz
tar xzf redis-2.6.14.tar.gz
cd redis-2.6.14
make test
make

On the production server we don’t need to relocate the redis-server or redis-cli executables, as we’ll see in the next section.

Run Redis

To run Redis on your Mac, just type one of:

redis-server  # if no config required, or:
redis-server ~/Python/redis-2.6.14/redis.conf

To run it on your Webfaction server, first add a custom app listening on a port, and note the port number you are assigned.

Now we need to daemonize it (see this post from the Webfaction community). In summary, in your redis directory, edit the redis.conf file like so (feel free to change the location of the pid file):

daemonize yes
...
pidfile /home/username/webapps/mywebapp/redis.pid
...
port xxxxx   # set to the port of the custom app you created

To test this works, type the commands below. If all is well, the pid file will now contain a process id which you can check by providing it to the ps command.

src/redis-server redis.conf
cat /home/username/webapps/mywebapp/redis.pid
ps xxxxx # use the number in the pid file

Note – when I did this without assigning the port number of the custom app, I got the following error:

# Warning: no config file specified, using the default config. In order to specify a config file use src/redis-server /path/to/redis.conf
# Unable to set the max number of files limit to 10032 (Operation not permitted), setting the max clients configuration to 4064.
# Opening port 6379: bind: Address already in use

It turns out someone else was already using port 6379, the default Redis port.

Now in practice you will want Redis to be managed with cron, so that it restarts if there is a problem. Webfaction has some docs on how to do this here; I used:

crontab -e
# and add this line to the file, changing the path as necessary:
0,10,20,30,40,50 * * * * ~/webapps/redis/redis-2.6.14/src/redis-server ~/webapps/redis/redis-2.6.14/redis.conf

FYI, for me the running Redis process uses 1.7 Mb (i.e. nothing compared to each celery process, as we’ll see).

Install Celery

The Celery docs cover this.  Installation is simple, on both development and production machines (except that I install it in the web app’s environment with Webfaction, as explained here):

pip install django-celery-with-redis

I have added the following to settings.py, replacing the port number for production:

BROKER_URL = 'redis://localhost:6379/0'
CELERY_RESULT_BACKEND = 'redis://localhost:6379/0'

import djcelery
djcelery.setup_loader()

INSTALLED_APPS = (
    ...
    'djcelery',
    ...
    )

And added the suggested lines to the top of wsgi.py:

import djcelery
djcelery.setup_loader()

I found lots more detail here, but I haven’t yet established how much of this is required.

Run a Celery worker

Now you need to start a Celery worker.

On your development server, you can enter your Django project directory and type:

python manage.py celery worker --loglevel=info

On your production server, I started by trying the same command above, to test out whether Celery could find the Redis process and run jobs – and it worked fine.  But in practice, the Celery docs say: “you will want to run the worker in the background as a daemon“.  (Note this link also talks about Celery beat, which “is a scheduler. It kicks off tasks at regular intervals, which are then executed by the worker nodes available in the cluster.” In my case, I do not need this.)

To do this, I copied the CentOS celeryd shell script file from the link at the end of the daemonization doc (since the server I am using runs CentOS), and placed it in a new celerydaemon directory in my Django project directory, along with the Django celeryd config file (I renamed the config file from celeryd, which was confusing as it is the same name as the shell script, to celery.sysconfig). I also created a new directory in my home directory called celery to hold the pid and log output files.

One more change is required, at least if you are using Webfaction to host your site: the call to celery_multi does not have a preceding python command by default.  While this works in an ssh shell, it does not work with cron - I believe because the $PATH is not set up the same way in cron.  So I explicitly add the python command in the front, including the path to python.

The config file looks like this:

# Names of nodes to start (space-separated)
CELERYD_NODES="myapp-node_1"

# Where to chdir at start. This could be the root of a virtualenv.
CELERYD_CHDIR="/home/username/webapps/webappname/projectname"

# How to call celeryd-multi (for Django)
# note python (incl path) added to front
CELERYD_MULTI="/home/user/bin/python $CELERYD_CHDIR/manage.py celeryd_multi" 

# Extra arguments
#CELERYD_OPTS="--app=my_application.path.to.worker --time-limit=300 --concurrency=8 --loglevel=DEBUG"
CELERYD_OPTS="--time-limit=180 --concurrency=2 --loglevel=DEBUG"
#  If you want to restart the worker after every 3 tasks, can use eg:
#  (I mention it here because I couldn't work out how to use 
#  CELERYD_MAX_TASKS_PER_CHILD)
#CELERYD_OPTS="--time-limit=180 --concurrency=2 --loglevel=DEBUG --maxtasksperchild=3" 

# Create log/pid dirs, if they don't already exist
CELERY_CREATE_DIRS=1

# %n will be replaced with the nodename
CELERYD_LOG_FILE="/home/username/celery/%n.log"
CELERYD_PID_FILE="/home/username/celery/%n.pid"

# Workers run as an unprivileged user
CELERYD_USER=celery
CELERYD_GROUP=celery

# Name of the projects settings module.
export DJANGO_SETTINGS_MODULE="myproject.settings"

In the shell script, I changed the two references to /var (DEFAULT_PID_FILE and DEFAULT_LOG_FILE) and the reference to /etc (CELERY_DEFAULTS) in the shell script to directories I can write to, e.g.:

DEFAULT_PID_FILE="/home/username/celery/%n.pid"
DEFAULT_LOG_FILE="/home/username/celery/%n.log"
...
CELERY_DEFAULTS=${CELERY_DEFAULTS:-"/home/username/webapps/webappname/projectname/celerydaemon/celeryd.sysconfig"}

I found a problem in the CentOS script – it calls /etc/init.d/functions, which resets the $PATH variable globally, so that the rest of the script cannot find python any more. I have raised this as an issue, where you can also see my workaround.

To test things out on the production server, you can type (use sh rather than source here because the script ends with an exit, and you don’t want to be logged out of your ssh session each time):

sh celerydaemon/celeryd start

and you should see a new .pid file in ~/celery showing the process id of the new worker(s).

Type the following line to stop all the celery processes:

sh celerydaemon/celeryd stop

Restart celery with cron if needed

As with Redis, you can ensure the celery workers are restarted by cron if they fail. Unlike with Redis, there are a lot of tricks here for the unwary (i.e. me).

  1. Write a script to check if a celery process is running. Webfaction provides an example here, which I have changed the last line of to read:
    sh /home/username/webapps/webappname/projectname/celerydaemon/celeryd restart
  2. This is the script we will ask cron to run. Note that I use restart here, not start; I am doing this because I have found in a real case that if the server dies suddenly, celery continues to think it is still running even when it isn’t, and so start does nothing. So add to your crontab (assuming the above script is called celery_check.sh):
    crontab -e
    1,11,21,31,41,51 * * * * ~/webapps/webappname/projectname/celerydaemon/celery_check.sh
  3. One last thing, pointed out to me in correspondence with Webfaction: the celeryd script file implements restart with:
    stop && start

    So if stop fails for any reason, the script will not restart celery.  For our purposes, we want start to occur regardless, so change this line to:

    stop; start;

Your celery workers should now restart if there is a problem.

Controlling the number of processes

If you’re like me you are now confused about the difference between a node, a worker, a process and a thread. When I run the celeryd start command, it kicks off three processes, one of which has the pid in the node’s pid file. This despite my request for one node, and “--concurrency=2” in the config file.

When I change the concurrency setting to 1, then I get two processes. When I also add another node, I get four processes.

So what I assume is happening is: workers are the same things as nodes, and each worker needs one process for overhead and “concurrency” additional processes.

For me, at first I found each celery process required about 30-35Mb (regardless of the number of nodes or concurrency). So three use about 100Mb.  When I looked again a week later, the processes were using only 10 Mb each, even when solving tasks.  I’m not sure what explains the discrepancy.

Use it

With this much, you can adapt the Celery demo (adding two numbers) to your own site, and it should work.

On my site I use ajax and javascript to regularly poll whether the optimisation is finished. The following files hopefully give the basic idea.

urls.py

# urls.py
from myapp.views import OptView, status_view
...
    url(r'^opt/', OptView.as_view(), name="opt"),
    url(r'^status/', status_view, name="status"), # for ajax
...

views.py

# views.py
import json
from django.views.generic import TemplateView
from django.core.exceptions import SuspiciousOperation
from celery.result import AsyncResult
from . import tasks

class OptView(TemplateView):
    template_name = 'opt.html'

    def get_context_data(self, **kwargs):
        """
        Kick off the optimization.
        """
        # replace the next line with a call to your task
        result = tasks.solve.delay(params)
        # save the task id so we can query its status via ajax
        self.request.session['task_id'] = result.task_id
        # if you need to cancel the task, use:
        # revoke(self.request.session['task_id'], terminate=True)
        context = super(OptView, self).get_context_data(**kwargs)
        return context

def status_view(request):
    """
    Called by the opt page via ajax to check if the optimisation is finished.
    If it is, return the results in JSON format.
    """
    if not request.is_ajax():
        raise SuspiciousOperation("No access.")
    try:
        result = AsyncResult(request.session['task_id'])
    except KeyError:
        ret = {'error':'No optimisation (or you may have disabled cookies).'}
        return HttpResponse(json.dumps(ret))
    try:
        if result.ready():
            # to do - check if it is really solved, or if it timed out or failed
            ret = {'status':'solved'}
            # replace this with the relevant part of the result
            ret.update({'result':result})
        else:
            ret = {'status':'waiting'}
    except AttributeError:
        ret = {'error':'Cannot find an optimisation task.'}
        return HttpResponse(json.dumps(ret))
    return HttpResponse(json.dumps(ret))

javascript

// include this javascript in your template (needs jQuery)
// also include the {% csrf_token %} tag, not nec. in a form
$(function() {
	function handle_error(xhr, textStatus, errorThrown) {
		clearInterval(interval_id);
		alert("Please report this error: "+errorThrown+xhr.status+xhr.responseText);
	}

	function show_status(data) {
		var obj = JSON.parse(data);
		if (obj.error) {
			clearInterval(interval_id);
			alert(obj.error);
		}
		if (obj.status == "waiting"){
			// do nothing
		}
		else if (obj.status == "solved"){
			clearInterval(interval_id);
			// show the solution
		}
		else {
			clearInterval(interval_id);
			alert(data);
		}
	}

	function check_status() {
		$.ajax({
			type: "POST",
			url: "/status/",
			data: {csrfmiddlewaretoken:
				document.getElementsByName('csrfmiddlewaretoken')[0].value},
			success: show_status,
			error: handle_error
		});
	}

	setTimeout(check_status, 0.05);
	// check every second
	var interval_id = setInterval(check_status, 1000);
});

As mentioned in the comments to the code above, if you need to cancel an optimisation, you can use:

revoke(task_id, terminate=True)

Monitoring

You can monitor what’s happening in celery with celery flower, at least on dev:

pip install flower
celery flower --broker=redis://localhost:PORTNUM/0

And then go to localhost:5555 in your web browser.

When you use djcelery, you will also find a djcelery app in the admin panel, where you can view workers and tasks.  There is a little bit of set up required to populate these tables.  More info about this is provided in the celery docs.

Security

Some links on this topic:

  • http://redis.io/topics/security
  • http://docs.celeryproject.org/en/latest/userguide/security.html

I’ll add to this section as I learn more about it.

I hope that’s helpful – please let me know what you think.

9 Lessons from PyConAU 2013

A summary of what I learned at PyCon AU in Hobart in 2013. (Click here for 2014.)

1. In 2005, Django helped make it possible for a team of ONE to make a commercial web app

Building web apps with Django is not just possible, it’s fun. I hadn’t realised the key role that Django played, along with Ruby on Rails, in making this happen.

2. But in 2013 the goal posts are higher – can it still be done?

Django was revolutionary when it was released, but it doesn’t take care of everything a modern (i.e. 2013) web app needs to be cutting-edge. On the back-end, once you get your head around Django itself, you need to get your head around South (for database migrations), virtualenv (so you don’t go crazy when new versions come out), the Python Image Library and django-filer or easy-thumbnails so you can upload images and files more nicely, Fabric to help you deploy your site, git (to version control your code, if you haven’t used it already), selenium (for functional testing), factory_boy (for any testing), django-reversion (so you can roll back data), staticfiles, a way to actually deploy static files on your system, e.g. a file system backend like Boto, tastypie or django-rest-framework (for an API), and perhaps a CMS like Django-CMS, Mezzanine or FeinCMS (which are the tips of other icebergs). That’s sort of where I’m up to at the moment. And there are lots more I will probably need soon - haystack (for faster searching), celery and a message broker (e.g. for non-web-page related tasks), memcache, maybe non-relational databases like MongoDB.

And that’s just the back-end. On the front-end you probably want to use javascript, ajax, jQuery, and probably another javascript library, e.g. I have been using kineticjs. But during the talks I learned I will need to consider meteor (heaps of cool stuff, but a starting point is that it drops a lot of the distinction between server and client, so that with very little code, a user can update the database and other users’ pages update to view it automatically), backbone.js (“models with key-value binding and custom events, collections with a rich API of enumerable functions,views with declarative event handling, and connects it all to your existing API over a RESTful JSON interface.”), angular.js (“lets you extend HTML vocabulary for your application”), D3.js (“data driven documents”), node.js, compass and SASS (to make css easier), ember.js (“a framework for creating ambitious web applications”), yeoman (“modern workflows for modern webapps” using Ruby and node.js)…

The keynote of DjangoCon AU by Alex Gaynor explained this in a historical context and sowed the idea in my mind that the time is ripe for a new framework (possibly an enhanced Django) that will make all these things easy as well (roughly speaking). Jacob Kaplan-Moss said to check out the Meteor screencast for what is possible.

3. Web security is never far from our thoughts

Jacob gave a great talk on web security.  As I mentioned above, Django takes care of the essential security features – CSRF tokens, SQL injections, password hashing and HTML cross-site scripting. Some immediately useful tips I picked up from Jacob are – always use https everywhere if you have user logins; django-secure makes this easy (“Helping you remember to do the stupid little things to improve your Django site’s security.”); use bcrypt for password hashing; use Django’s forms whenever there is user input, even if it’s not a form; turn off unused protocols (e.g. XML and yaml) in your API; and to emphasise how easy it is for others to intercept your unencrypted data, look up Firesheep.

4. Python packages for maths and science are making “big data” much more accessible to everyone

Lots of talks on this. Check out especially the scikit-learn documentation, which is incredibly thorough. But then there’s Pandas, scipy, and scikit-image, and for networks networkx.

For parallelization, the classic algorithm is mapreduce, and mrjob provides an python interface to this.  The easiest way to get started on parallelization is to use IPython.parallel. For an example, check out how to process a million songs in 20 minutes. For queuing jobs and running them in the background, redis-queue has a low barrier to entry. (One caveat – you may need to manually delete .pid files.)

An interesting quote – “Most of the world’s supercomputers are running Monte Carlo simulations.”

5. There are lots more packages and tools to try out

To improve my style, I want to check out django-model-utils (especially for “PassThroughManager”); and more generally, django-pipeline (for “CSS and JavaScript concatenation and compression, built-in JavaScript template support, and optional data-URI image and font embedding” – in preference to django-compressor), django-allauth (an “integrated set of Django applications addressing authentication, registration, account management as well as 3rd party (social) account authentication.”), django-taggit (to add tags to your project), Raven (the python client for Sentry, “notifies you when your users experience errors”), django-discover-runner (which will be part of Django 1.6 – it allows “you to specify which tests to run and organize your test code outside the reach of the Django test runner”), and django-sitetree (“introducing site tree, menu and breadcrumbs navigation”).

There’s more… Mock for testing (“allows you to replace parts of your system under test with mock objects and make assertions about how they have been used”), separate selenium tests into tests and page controllersGerrit (for online code reviews), Jenkins (“monitors executions of repeated jobs”), django-formrenderingtools (“customize layout of Django forms in templates, not in Python code.”). There’s a way to resize images in html5 before uploading them. And Fanstatic serves js and css files (e.g. specify you need jQuery through a python statement rather than in the template), though I’m not sure why I would need this yet.

If you need to kill off a process that’s taking too long you can use interrupting cow and django-timelimit.

There’s a way to compile clojure to javascript.  Since I don’t know clojure yet, this is a very speculative project for me, but I like the idea of avoiding javascript. :-)

And if you’re writing tests in iOS, there’s a way to run selenium on the iOS simulator using appium.

6. I still have a lot to learn about Python

I won’t embarrass myself by listing all the things I learnt about Python here, though we were encouraged not to be afraid of the CPython source code, and even less so of the PyPy source code (which has the advantage that it is in python!).

I was convinced I should be trying to use Python 3.3 whenever possible, if only to save time later with unicode errors – Python 2.x doesn’t handle these well. Django 1.5 is actually written in Python 3.3, using a package called six to make it work with Python 2.x too.  Incidentally, it also seems the consensus is to use PostgreSQL over MySQL. Though admittedly that doesn’t really fit under this heading.

7. The Python community is friendly, humble and welcoming

Good news! This keeps it fun to program in Python as much as anything.

8. PyCon was a great conference

Of all the scientific and industry conferences I have been to, this one had the best-presented talks I have seen – and not just the scheduled presenters, but also the lightning (5 minute) talks. They were very engaging and intelligible.  Speakers used their slideshows in inventive ways (e.g. using memegenerator, prezi.com and the odd xkcd cartoon).  And the conference itself was well organised by Chris Neugebauer.

9. Next time I’ll stay for the sprints!

Install lpsolve for Python

Today I wanted to try out using lpsolve with the python API on my Mac (OS X 10.7) and on my linux server.

Installing it on the Mac is tricky, but the essence is described in this blog post (here I clarify a few things that stumped me for a while, leave out some of the changes mentioned there that I didn’t need to do, and update a path or two):

  • Search for and download both of:
    • lp_solve_5.5.0.15_source.tar.gz
    • lp_solve_5.5.0.15_Python_source.tar.gz

    e.g. from sourceforge (I originally got a different version of one of these, and it did something quite different).

  • The first will extract to a folder lp_solve_5.5. The second will extract to a folder with the same name, however it will only contain an extra/Python directory. Copy this extra directory into the first download’s lp_solve_5.5 folder.
  • cd into this lp_solve_5.5 directory.
  • cd lpsolve55
  • sh ccc.osx . You will get a lot of warnings, but that’s ok. This will create a bin/ directory. On my Mac it has a subdirectory osx64/, containing liblpsolve55.a and liblpsolve55.dylib.
  • sudo cp bin/osx64/liblpsolve55.a bin/osx64/liblpsolve55.dylib /usr/local/lib (this step courtesy of this blog)
  • cd ../extra/Python
  • You now need to edit setup.py, as suggested in the blog post above (here I have updated the included directories to reflect current Xcode practice):
    ...
    LPSOLVE55 = '../../lpsolve55/bin/osx64' # not ux32
    ...
        ext_modules = ...
            ...
            include_dirs = ['../..', '/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.7.sdk/usr/include', '/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.7.sdk/usr/include/malloc'],
            ...
    

    You may also want to update the version number (5.5.0.8) in setup.py, which does not match the originally downloaded files (5.5.0.15); I’m not sure which is correct.

  • python setup.py build
  • python setup.py install . This writes out …/lib/python2.7/site-packages/lpsolve55-5.5.0.8-py2.7.egg-info (in my case into my virtualenv)
  • >>> from lpsolve55 import * should now work in python.

Once this was working, I also installed lpsolve on my linux (CentOS) server. I followed the same steps there, except I executed the straight ccc file rather than ccc.osx (I changed it to refer to ~/tmp instead of /tmp, since I’m using shared hosting and cannot execute from /tmp). On linux there is no need to change the setup.py file. And I did not find it necessary to copy the .a or .dylib files anywhere.

If you’re using WebFaction, you will also want to change the final install command (as described here), to:

python setup.py install --install-lib=$HOME/webapps/web_app/lib/python2.7 \
  --install-scripts=$HOME/webapps/web_app/bin \
  --install-data=$HOME/webapps/web_app/lib/python2.7

This also worked.

Hope that helps someone out there – let me know if you have any comments.

  

Django and Amazon AWS Elastic Beanstalk with S3

When you deploy your first Django website to Amazon Web Service’s Elastic Beanstalk, you will face a number of problems, such as:

  1. How should I handle static files and user-uploaded media?
  2. How do I send emails with AWS?
  3. How can I refer to the same AWS application and environment from a second development computer?
  4. How can I add gcc – and specifically, bcrypt – to the AWS environment?
  5. How can I access my site’s RDS database remotely (ie. from my local computer)?

If you’re new to Elastic Beanstalk, check out their tutorial to help you get a Django 1.4 site up on AWS quite quickly. You need to sign up for an account with AWS, but otherwise it just works. It also works for Django 1.5.

This post has some extra notes which I found handy. Also, I found it unnecessary to install MySQL-python on my local machine.

1. Static files and user-uploaded media

If you follow the tutorial above through to the optional step where you set up the admin panel, you will have set up a way to handle static files. This is often the bane of using Django (for me at least). However, you will not have a way yet to handle user-uploaded media.
To test out file uploads, I added a test app which had a model with a single FileField, and registered it with the admin. With this, I could go to the admin panel of the live site and try to upload a file, and test if it worked.

Bad approach – adapt the static files approach to media

My first thought was, if static files are being loaded ok, why not copy the same approach for user-uploaded media? So I added these lines to my config file (to match the existing lines for /static):

  - namespace: aws:elasticbeanstalk:container:python:staticfiles
    option_name: /media/
    value: media/

And to settings.py:

MEDIA_URL = '/media/'
MEDIA_ROOT = os.path.join(os.path.dirname(os.path.dirname(
                  os.path.abspath(__file__))), 'media')

And it worked! I could click on the link to the uploaded file and see it.

Except … then I tried uploading a new version of the code with git aws.push, and suddenly I couldn’t see the file any more.

So I tried a slight variant of this approach, where I only had the one staticfiles instance in the config file, and used a MEDIA_URL of '/static/media/' and similarly for MEDIA_ROOT. It worked in the same way, which is to say, it didn’t work.

I was missing an important point, explained right at the end of this blog post: “Elastic Beanstalk images are all ephemeral… This means that nothing on an instances filesystem will survive through a deployment, redeployment, or stoppage of the environment/instance.”

Good approach – S3 with django-storages

So I had to understand more about how Django stores its files. The documentation is pretty clear on this, and I was happy to learn that the MEDIA_ROOT and MEDIA_URL settings are just locations to save files used by the default file storage system. So if you use another storage system, those two settings (probably) aren’t relevant.

When you use Elastic Beanstalk you also get an S3 bucket, so the solution is to use that to store the uploaded files. You can get your bucket name from the S3 console. The bucket name is the entire string you see there, e.g. elasticbeanstalk-us-west-2-xxxxxxxxxxxx.

We will use django-storages with boto.

First, you need to install them both (and add them to your requirements file):

pip install django-storages
pip install boto
pip freeze | grep django-storages >> requirements.txt
pip freeze | grep boto >> requirements.txt

In your settings.py file, add 'storages' to your INSTALLED_APPS, and also:

    DEFAULT_FILE_STORAGE = 'storages.backends.s3boto.S3BotoStorage'
    AWS_ACCESS_KEY_ID = '---key---'
    AWS_SECRET_ACCESS_KEY = '---secret---'
    AWS_STORAGE_BUCKET_NAME = '---bucket name---'

As I mentioned, you can leave out MEDIA_URL and MEDIA_ROOT now.

And that’s it! I found that this was all I needed to do to be able to upload files through the admin panel, and have them persist. You can also see the uploaded files in your S3 console.

Note this means I am using different storage systems for the static files to the user-uploaded media files. The former do not persist from one deployment to the next (but are reloaded each time), whereas the latter do.
I’m not sure if there’s a downside to this approach – I have seen Stack Overflow posts (e.g. this one) where both sets of files are put on S3.

I’ll also mention that the links to the user-uploaded files are quite long, e.g. https://elasticbeanstalk-us-west-2-xxxxxxxxxxxx.s3.amazonaws.com/myfolder/samplefile.txt?Signature=XXXXXXXXXXXX&Expires=9999999999&AWSAccessKeyId=XXXXXXXXXXXXX. These parameters change between deployments.

This seems to be a good way to handle user-uploaded media. In particular, the additional parameters should limit access to unauthorised users.

2. Email

I just added the usual lines to settings.py for my gmail account:

    EMAIL_USE_TLS = True  # not sure if this is needed
    EMAIL_HOST = 'smtp.gmail.com'
    EMAIL_HOST_USER = 'example@gmail.com'
    EMAIL_HOST_PASSWORD = 'PASSWORD'
    EMAIL_PORT = 587

Then I went into Amazon’s SES (Simple Email Service) console and verified the above EMAIL_HOST_USER email address, and some test recipient email addresses. I had to log in to gmail and respond to an email from gmail that everything was ok too.  Then, in the development sandbox, my Django app could send email fine (but only to the test recipients).

3. Referring to the same AWS environment from another computer

[Edit - this is outdated with CLI v3.0; use the 'eb' command instead.] First, you need to download a copy of the Elastic Beanstalk client to your second computer (as you did for the first one).  But this time, instead of typing eb init, you need to type (on a Mac/Linux system):

cd your/Django/project/directory
~/path/to/AWS-ElasticBeanstalk-CLI-2.5.0/AWSDevTools/Linux/AWSDevTools-RepositorySetup.sh
git aws.config

You will then be prompted for your access id, secret, region, etc, and you should be able to use git aws.push to push to the same place as on your other computer.

4. Adding gcc and/or bcrypt

I want to use bcrypt for password hashing. Simply adding bcrypt to your requirements.txt file is not sufficient, because bcrypt needs two more things: it needs gcc, and it needs the libffi package. Your development computer has these, but the AWS server does not.  Not being at all knowledgeable about yum or yaml, it took some trial and error to work out what changes I needed to make to .ebextensions/aws.config - so to save you this trouble, here are the extra lines you need to add to the yum section:

packages:
  yum:
    libffi-devel: []
    gcc: []

5. Accessing your site’s RDS database remotely

This is surprisingly easy.  You first need to tell RDS which IP addresses are allowed to connect; this is described in detail here.  The quick summary is to find the database’s “Security Groups” console in AWS, go to the “Inbound” tab, and set the rule to “MySQL”, with your local IP address (which you can get from whatismyip.com).

You can get a copy of the database dumped onto your local machine with eg.:

/Applications/MAMP/Library/bin/mysqldump -h abcdefg.cdefg.ap-xxx-1.rds.amazonaws.com -u ebroot -p ebdb > db.sql

The -p option will make it prompt you for your database password, which you entered when you set up the EB environment.  (I’m using MAMP, hence the need for the path to mysqldump above – you may not need this.) Do not put the port number (eg. :3306) at the end of the URL.

If you want to run your local development version of Django with the AWS RDS database, all you need to do is set the following environment variables before you do ./manage.py runserver:

    # export RDS_DB_NAME='ebdb'
    # export RDS_USERNAME='ebroot'
    # export RDS_PASSWORD=''  # you need to remember this
    # export RDS_HOSTNAME='xxxx.xxxx.us-east-1.rds.amazonaws.com'
    # (HOSTNAME is the endpoint from https://console.aws.amazon.com/rds/home )
    # export RDS_PORT='3306'  # also from the console

That’s assuming you are using the suggested setup in settings.py:

if 'RDS_DB_NAME' in os.environ:
    DATABASES = {
        'default': {
            'ENGINE': 'django.db.backends.mysql',
            'NAME': os.environ['RDS_DB_NAME'],
            'USER': os.environ['RDS_USERNAME'],
            'PASSWORD': os.environ['RDS_PASSWORD'],
            'HOST': os.environ['RDS_HOSTNAME'],
            'PORT': os.environ['RDS_PORT'],
        }
    }

I hope this helps someone out there get over the hurdle to using AWS.

  

Use Django fixtures to export and import a database

I have some Django sites, and I’ve often wondered about how to make my dev database faithfully mirror the production database (at a point in time), i.e. copy all the info from my prod database back to my dev environment.

Approach 1: Export and import the data

You would think it is simple: just export the data from one and import it into the other. If you use the same type of database (e.g. MySQL) in both environments, this should work fine.  But I use sqlite in dev and MySQL in prod. Using sqlite on the development server is quick and easy, with no need to set up MySQL (see my earlier post on some of the hoops you may need to jump through for that to work). But then there are subtle differences between the import and export formats, and you can waste a lot of time mucking around with the files trying to get it to work.

Approach 2: Fixtures

So, the second idea is: use fixtures, e.g. on the production machine:

./manage.py dumpdata --indent 2 > all.json

and load it in again on the dev server (I use git to transfer it but there’s probably a cleaner approach):

./manage.py loaddata all.json

Ah – another problem. The dumpdata command, with no apps supplied, dumps everything including Users, User profiles, Sites, etc. Your dev database, even if you only just set it up a second ago, will have some content already. So you will get error messages like this:

IntegrityError: Could not load myapp.UserProfile(pk=18): column user_id is not unique

You could go back and only list the apps you want to transfer the data of after the dumpdata command, but this is painful and you may not know them all if you’re using lots of third party apps (South, Django-CMS, etc).

The solution: Fixtures + some delete statements

But I can give you good news: there is a way to do it by deleting all that initial content from your dev database, and potentially making one small deletion from the above json file.

Start with a newly created database, with the tables installed (e.g. using ./manage.py syncdb). These are the tables you need to wipe clean:

  • django_site
  • django_content_type
  • auth_permission
  • auth_user
  • south_migrationhistory if you’re using South.  Note: I am using South but did not delete the contents of this table, and it all worked. But I suspect you should delete the contents.
  • any user profiles you define (at least in Django 1.4, I’m not sure how they work in Django 1.5 yet)

In sqlite you’d do this by typing (assuming your database file is named sqlite.db):

sqlite3 sqlite.db
  delete from django_site;
  delete from django_content_type;
  delete from auth_permission;
  delete from auth_user;
  delete from south_migrationhistory;
  delete from ...;

Looks scary, but this is a brand new database anyway, right? You could easily recreate this content by starting over with ./manage.py syncdb.

The last line is where you delete your user profiles. If you do define a user profile, the act of creating a user will also create a profile. So when the users get loaded from the fixture file, user profiles will be automatically created; when the user profiles’ turn comes to be loaded from the fixture, there can be a potential clash. If you are OK to lose all your user’s profile data on your dev machine, there is a simple solution: just delete these entries from the json file.

That’s it! I hope that helps someone out there.

  

Installing PIL on Mac OS

Installing PIL (the python image library) on my Mac is non-trivial. I get this error when I try to read or write a jpeg image:

IOError: encoder jpeg not available

and this error when I try to use any fonts:

ImportError: The _imagingft C module is not installed

I understand that Pillow is a much friendlier version of PIL, but even it does not help here, as its documentation simply states “Once you have installed the prerequisites” – with no further explanation of how to do that. Also, I have never used homebrew, and am not sure how it works with virtualenv, so I’d prefer not to use it.

Here are the steps I have followed, which solve both of these problems. They took a long time to discover!

  1. Uninstall any existing PIL you may have.  I’m afraid this is easier said than done. Fortunately I had installed PIL in a virtualenv, so I could just change to a new one and go from there. If you installed it using Pillow, you should be able to just pip uninstall Pillow, but I have not tried it.
  2. Install the jpeg library.  The way to do this is partly given by this stackoverflow post, though it unfortunately misses the last step, which you can find here. To summarise:
    1. Download libjpeg from http://www.ijg.org/files/jpegsrc.v8c.tar.gz
    2. Unpack it (either using the Finder or something like  tar zxvf jpegsrc.v8c.tar.gz)
    3. ./configure
    4. make
    5. sudo make install
    6. cp -r ~/Downloads/jpeg-XX/ /usr/local/jpeg
  3. Install the freetype library.  Get this from http://www.freetype.org/download.html, and follow the same procedure as for jpeg above (I didn’t do step 5 and it worked fine).  Copy it into /usr/local/freetype.  This time it will work.
  4. Install any other libraries you may want (e.g. see this list).
  5. pip install Pillow

To show this works, I get this output from the final command:

    --------------------------------------------------------------------
    SETUP SUMMARY (Pillow 2.0.0 fork, originally based on PIL 1.1.7)
    --------------------------------------------------------------------
    version      2.0.0 (Pillow)
    platform     darwin 2.7.1 (r271:86832, Jul 31 2011, 19:30:53)
                 [GCC 4.2.1 (Based on Apple Inc. build 5658) (LLVM build 2335.15.00)]
    --------------------------------------------------------------------
    --- TKINTER support available
    *** JPEG support available
    --- ZLIB (PNG/ZIP) support available
    *** TIFF G3/G4 (experimental) support not available
    --- FREETYPE2 support available
    *** LITTLECMS support not available
    *** WEBP support not available
    --------------------------------------------------------------------
    To add a missing option, make sure you have the required
    library, and set the corresponding ROOT variable in the
    setup.py script.

    To check the build, run the selftest.py script.

    changing mode of build/scripts-2.7/pilconvert.py from 644 to 755
    changing mode of build/scripts-2.7/pildriver.py from 644 to 755
    changing mode of build/scripts-2.7/pilfile.py from 644 to 755
    changing mode of build/scripts-2.7/pilfont.py from 644 to 755
    changing mode of build/scripts-2.7/pilprint.py from 644 to 755

    warning: no previously-included files found matching '.hgignore'
    warning: no previously-included files found matching '.hgtags'
    warning: no previously-included files found matching 'BUILDME.bat'
    warning: no previously-included files found matching 'make-manifest.py'
    warning: no previously-included files found matching 'SHIP'
    warning: no previously-included files found matching 'SHIP.bat'
    warning: no files found matching '*.html' under directory 'docs'
    warning: no files found matching 'README' under directory 'docs'
    warning: no files found matching 'CHANGES' under directory 'docs'
    warning: no files found matching 'CONTENTS' under directory 'docs'
    changing mode of .../ENV-PIL/bin/pilconvert.py to 755
    changing mode of .../ENV-PIL/bin/pildriver.py to 755
    changing mode of .../ENV-PIL/bin/pilfile.py to 755
    changing mode of .../ENV-PIL/bin/pilfont.py to 755
    changing mode of .../ENV-PIL/bin/pilprint.py to 755
Successfully installed Pillow
Cleaning up...

That helps me with running Django on my dev server, so that I can upload JPEGs properly. But if I use PIL to draw directly, I find the output looks very grainy (e.g. from the code below), and the colours are feathered.  I have tried saving as both PNG and GIF.

img = Image.new('RGB', (500,500), (255, 255, 255))
draw = ImageDraw.Draw(img)
draw.ellipse((x-r, y-r, x+r, y+r))
font = ImageFont.truetype("examplefont.tff", 25)
w, h = draw.textsize(msg)
draw.text((x-w/2, y-h/2), msg, (0, 0, 0), font=font)
img.save(filename, fmt)

So I am happy to have Django working with images nicely, but I will stick to client-side drawing as much as possible.

  

Dynamic web pages with Django, Ajax and jQuery

How do you update a page’s content on the fly?

Of course, the simple answer is to use ajax.

But there are many ways to implement ajax, and sometimes it isn’t necessary. My aim here is to describe some solutions, and when each is appropriate.

Specifically, I’ll deal with sites using jQuery, and written with Django (and Django-CMS in particular, but that’s not critical to this discussion). I will also show how I hooked up Twitter Bootstrap‘s modal windows to do what I needed.

These are the approaches I have found for dynamically updating your webpage. They all have their uses:

  • Toggle hidden content
  • Generate content on the client side
  • jQuery’s $.ajax()
  • jQuery’s $.load()
  • Dajax/Dajaxice
  • a javascript MVC framework like Angular

Researching this took me a few days and quite a bit of trial and error, so with any luck by posting I will save someone else that time.

Toggle hidden content

First, you can often do without ajax, by toggling hidden content (with either explicit javascript or a suitable Twitter Bootstrap component). Here’s a simple example:

html:

<input type="submit" class="btn-link" name="comment"
    value="Add a comment" id="toggle-content" />
<div id="extra-content">{% include "myform.html" %}</div>

javascript:

$(document).ready(function() {
    $("#extra-content").hide();
    $("#toggle-content").click(function(){
        $("#extra-content").show();
        $("#toggle-content").hide();
    });
});

This is fine if you know the potential extra content in advance, the server doesn’t need to know about it, and there’s not too much of it. It has the very strong advantage of being very fast (once the original page has loaded).

Generate content on the client side

To be honest, after implementing what seemed to be a wonderful solution with ajax, using the .load() method below, I am now rewriting my webpage to generate the content as needed in javascript. The ajax solution was just too slow, and as luck would have it, the content I need to show can be based on existing content. In fact, I hope it can be as simple as copying an element if I change the css to depend on the enclosing class.

jQuery’s $.ajax()

For some things, like telling the server that the user has finished playing a game, I am using jQuery’s $.ajax() javascript command. This combines easily with a Django view to send and receive data.

Your template will look something like this (thanks to this stack overflow post for explaining a simple way to pass the CSRF token):

html:

{% csrf_token %}
<div id="game-board" data-board-id="{{ board.pk }}" data-done-ref="{% page_url 'game-handler' %}game-over/">...</div>

javascript:

function gameOver() {
    var board = $('#game-board').attr('data-board-id');
    $.ajax({
        type: "POST",
        url: $('#game-board').attr('data-done-ref'),  // or just url: "/my-url/path/"
        data: {
            csrfmiddlewaretoken: document.getElementsByName('csrfmiddlewaretoken')[0].value,
            board: board,
            move_list: move_list.join(','),
        },
        success: function(data) {
            alert("Congratulations! You scored: "+data);
        },
        error: function(xhr, textStatus, errorThrown) {
            alert("Please report this error: "+errorThrown+xhr.status+xhr.responseText);
        }
    });
}

And your view will look something like this (you need to set up urls.py to point to this*):

view.py:

from django.http import HttpResponseRedirect, HttpResponse
from django.http import Http404

def game_over(request):
    if request.is_ajax():
        try:
            board_pk = int(request.POST['board'])
            moves = list(map(int, request.POST['move_list'].split(',')))
        except KeyError:
            return HttpResponse('Error') # incorrect post
        # do stuff, e.g. calculate a score
        return HttpResponse(str(score))
    else:
        raise Http404

(*since I am using Django-CMS, I have actually set this up as an apphook. To do this, the admin needs to manually connect a particular page to the apphook; I also require this page to have a nominated id in the advanced settings, e.g. “game-handler”, so that I can use {% page_url "game-handler" %} in the template. I have passed this from the template to the javascript so that the javascript can reside in a static js file.)

Here I am just passing a few POST parameters to the Django view, and the Django view processes them and returns the score – very simple handling of data. Json encoding of data is also fairly easy, I believe (e.g. see the Dajaxice example below).

The downside of this and the other ajax approaches is speed: any time you interact with the server, there is the potential for it to be slow. You may want to display a loading indicator along the lines of this stack overflow post.

jQuery’s $.load()

jQuery’s $.load() javascript command is magic! It makes ajax so easy you don’t even realise you’re doing it. I am using this to replace the contents of a div with new data from the server when a button is clicked.

The html and javascript is straightforward (this example is slightly more complex than it needs to be, but is my real-world situation; it is based on this stack overflow post for reloading the content of a Twitter Bootstrap modal popup window):

html:

<a href="my-ajax-page/load/{{ board.pk }}/" data-target="#my-modal">
   Click here
</a>

javascript:

$("a[data-target=#my-modal]").click(function(event) {
    event.preventDefault();
    var target = $(this).attr("href");
    $("#my-modal .modal-body").load(target, function() {
         $("#my-modal").modal("show");
    });
});

With the setup above, the contents of the view referenced by my-ajax-page/load/### will be loaded into the modal dialog and displayed.

In fact using the href, data-target and data-toggle tags in a Bootstrap modal window automatically calls jQuery’s .load command, but it only seems to do it the first time; in my case, I need to load new content every time the modal is clicked, hence the explicit call above.

A few caveats I have discovered about changing contents (but double-check me on this please!) -

  • Sekizai – I use sekizai blocks for my js and css (as done by Django-CMS). If you are inserting new css and js inside the html, you may want to do it without putting them inside sekizai blocks.
  • jQuery’s $(function() {…}) – if you insert new javascript code using this function (or the equivalent .ready() method), which on a normally loaded page is automatically run when the page is ready, it will not be run.

Dajax/Dajaxice

Dajax turns the paradigm on its head, allowing you to seemingly dynamically change your page’s content from python, rather than in the javascript. I haven’t looked into it, but I presume that under the hood the python code is sending an object to the javascript, and the Dajax js library is decoding it there.

This is slightly more involved to set up, requiring several changes to your Django setup (settings.py, templates etc), and I found the documentation a little sparse.  It also took me some time to get the basic examples running; I wasn’t clear on whether I needed Dajaxice or Dajax, or which was the one to get started with; I installed Dajax first not realising I had to also follow the Dajaxice instructions (which was my bad); then I had to play with the ordering of the INSTALLED_APPS to make it work (in between django.contrib.sites and django.contrib.messages).

Once set up though, your javascript is confined only to what happens when the ajax call returns; you can just use html to call the ajax script.  And your Django code is called without need to set your own urls up; Dajax autodiscovers them so long as you put your ajax views into files named ajax.py instead of view.py.

From the examples on the Dajax website, it looks like with Dajaxice you need to write your own javascript to handle the returned data, whereas with Dajax you can change page elements directly from python without writing any javascript at all. The Dajax API also allows you to assign attributes to elements, add or remove css classes, and other related functions. This stuff is not too hard to do in javascript or jQuery either though.

The key advantage of Dajax that I see is it helps maintain the separation of models and views, since the code which changes the page is in python rather than in the template.

Dajaxice

A Dajaxice example from their website is:

html:

<input type="text" id="text"/>
<input type="button"
    onclick="Dajaxice.examples.args_example(
    callback, {'text':$('#text').val()})"
    value="Send!"/>

javascript:

function callback(data){
    alert(data.message);
}

With the ajax.py file (in a Django app named “examples”, as referenced by the “onclick” function above):

ajax.py:

from django.utils import simplejson
from dajaxice.decorators import dajaxice_register

@dajaxice_register
def args_example(request, text):
    return simplejson.dumps({'message':'Message is %s!' % text})

Dajax

And a Dajax example from the website, with one minor modification to make it clear that you do not need to write any javascript:

html:

<input type="text" value="5" id="a"> x
<input type="text" value="6" id="b"> =
<input type="text" value="" id="result">
<input type="button" value="Multiply!"
    onclick="Dajaxice.examples.multiply(
    Dajax.process,{'a':$('#a').val(),'b':$('#b').val()});">

With the ajax.py file (in a Django app named examples, as referenced by the onclick method above):

ajax.py:

from dajax.core import Dajax
from dajaxice.decorators import dajaxice_register

@dajaxice_register
def multiply(request, a, b):
    dajax = Dajax()
    result = int(a) * int(b)
    dajax.assign('#result','value',str(result))
    return dajax.json()

A javascript MVC framework like Angular

For web apps of any complexity, it is worth going with an MVC framework so you don’t have to manage the DOM updates yourself after the ajax call.  See this post for details of how I am getting Django to work with Angular.

Summary

All these approaches have their place. If you know the potential extra content in advance and the server doesn’t need to know about it, you can toggle hidden content. If you can generate it on the client side, that will probably be more responsive than using ajax. If you want to tell the server something but don’t need to change the content, you can use .ajax(). If you need to load in a whole new section, you can use .load() (but be aware it could be slow communicating with the server).

I think Dajaxice’s functionality is covered by .ajax(), but Dajaxice nicely handles the CSRF token for you and looks clearer. Dajax would be handy if several elements need to be changed and lets you write the page-changing code in python instead of javascript. It should thereby help maintain a cleaner separation of models and views.

I’m just learning this myself, so if I’ve missed another approach or said something misleading, please let me know!

Subclassing in Django to preserve reusability

I am developing a Django app called discuss, which allows users to post comments.

Now my particular application is to allow users to comment on a game, and I would like them to be able to load in their own game boards. However, I want to write the app as reusably as possible, so I do not want the gameboards to be part of the discuss app.

I have two models:

class Discussion(models.Model):
    name = models.CharField(max_length=60)
    def __unicode__(self):
        return self.name

class Post(models.Model):
    discussion = models.ForeignKey(Discussion)
    forerunner = models.ForeignKey("self", blank=True, null=True)
    author = models.ForeignKey(User)
    body = models.TextField()
    created = models.DateTimeField(auto_now_add=True)
    def __unicode__(self):
        return self.title

I have been toying with a few ways of going about this:

  • Add the (content_type, object_id, content_object) trio of fields to my Post class.  This would allow the user to associate any model with their post, but only one.  These are both undesirable features for my case. It is also messy-looking to me.
  • Add ManyToManyField(Post) to my game’s Board class, i.e. point back the other way, so that the reusable app’s Post class remains pure.  This could work except that it pollutes the Board class instead; not all boards appear in posts.
  • Add a new joining model like this:
        class PostedBoard(models.Model):
            board = models.ForeignKey(game.models.Board)
            posts = models.ManyToManyField(discuss.models.Post)

    This would probably work but feels very wrong.

  • Subclass Post for the game, e.g.:
        class GamePost(discuss.models.Post):
            boards = models.ManyToManyField(game.models.Board)
    

The last feels like the right object-oriented approach, but I wasn’t sure how well it would actually work in Django. The purpose of this post is simple: subclassing Django models does work, with a caveat: the usual object manager does not know the new post’s subclass. This means if you use discussion.post_set.objects, you will not know the subclasses of the returned objects.

d = Discussion.objects.get()
# <Discussion: MyDiscussion>
d.post_set.all()
# [<Post: First challenge>, <Post: Second challenge>, 
#  <Post: A comment on challenge 2>]
g = GamePost(discussion = d, title = "Test subclassing", ...)
g.save()
g.pk  # Note that the subclass's primary key is 4, not 1
# 4
d.post_set.all() # success!
# [<Post: First challenge>, <Post: Second challenge>, 
#  <Post: A comment on challenge 2>, <Post: Test subclassing>]
# Ah - but here's the rub: 
#      this command does not know the new post's subclass
gg = d.post_set.all()[3]
isinstance(gg, GamePost)
# False

There are a number of solutions out there to deal with this problem – this one seems well-regarded.

I have decided to go with a simple approach which takes advantage of the fact that the object’s primary key is the same whatever class it shows up as. If I need to use the instance as a member of its subclass, just use:

    def as_subclass(instance, subclass):
        try:
            return subclass.objects.get(pk=instance.pk)
        except subclass.DoesNotExist:
            return instance

or, if you have lots of subclassing going on, here is a more automated method which searches through all the possible subclasses and checks each one in turn (assuming only one level of inheritance):

    def as_leaf_class(instance):
        subclasses = instance.__class__.__subclasses__()
        for subclass in subclasses:
            t = subclass.objects.filter(pk=instance.pk)
            if len(t)>0: return t[0]
        return instance

I would love to hear if you’ve had a similar problem before and how you solved it!

  

Quick guide to creating reusable Django apps

I have developed several Django-CMS sites, e.g. Racing Tadpole, U R The Event Manager, Aquizzical and School Circle, and find myself frequently wanting to reuse pieces of the code.  Of course, this is why Django has the concept of reusable apps, and has a good intro tutorial here.

I am going to embellish on that tutorial and show the actual steps I followed to go from no app, to a single package (possibly containing multiple apps) on the Python Package Index, pypi, that anyone can install with pip, that can be found by others on djangopackages, and with source code on github.  I have already followed this process for cmsplugin-rt, which (in my humble opinion) contains lots of helpful Django-CMS plugins (particularly for Twitter Bootstrap) like buttons and a navbar, but also more generic plugins like Google fonts, Google analytics, Facebook and Twitter buttons.  You can install this easily by typing:

pip install cmsplugin-rt

I now want to repeat that process for a new set of apps, which will include some global settings for Bootstrap.

  1. Create a dummy Django project with all the apps you are going to need (e.g. Django-CMS in my case). 
  2. Write your app, starting with python manage.py startapp myapp 
  3. I have chosen to place this app one directory level down, i.e. cmsapp_rt/bsglobals/, so that I can put more than one app in my package (cmsapp_rt) and have them all installed by pip in one go. Note you need to put an __init__.py file in the cmsapp_rt directory.
  4. Check your app works by including it in the INSTALLED_APPS in your settings.py file, and typing python manage.py runserver.
  5. My projects are in ~/Python/Projects.  My approach is to create a directory like ~/Python/MyPackages, and in it create directories for each pip package. Note as far as I can tell, pip packages like to use hyphens instead of underscores.  So I have a directory ~/Python/MyPackages/cmsapp-rt/.
  6. Following the Django tutorial, type: mv ~/Python/Projects/dummy/cmsapp_rt ~/Python/MyPackages/cmsapp-rt/.
  7. Create the README.txt or README.rstLICENSE and MANIFEST.in files in the ~/Python/MyPackages/cmsapp-rt/ directory, as per the tutorial. Add the docs directory. I also put in a CHANGES file (and add it to the MANIFEST.in file too). Note if you have any fixtures, you need to add them to the MANIFEST.in file in the same was as the templates. Feel free to base yours off mine if it helps.
  8. Create the setup.py file.  Note you can add a URL to github here, e.g.
        url = 'https://github.com/RacingTadpole/cmsapp-rt'

    I also used

        from setuptools import setup
        setup(...
            find_packages packages = find_packages(),
        )

    and

            install_requires = [
                'django-singleton-admin',
            ],

    and most importantly, add to the arguments to setup:

            zip_safe = False,

    which forces the package to be installed as real files, not a zipped up egg. Django seems to struggle with the zipped up eggs.

  9. Commit your work to git – add a .gitignore file which contains at least:
        dist/
        *.egg-info
        build/

    Make yourself a new repository on Github. Then type:

        git init
        git remote add origin https://github.com/user/reponame.git
        git add --all
        git commit -a -m "Initial commit"
        git push -u origin master
  10. You can build the package with python setup.py sdist , as per the tutorial.
  11. Now it’s time to follow this guide from pypi.  The only two things you need from this are:
        python setup.py register
        python setup.py sdist bdist_wininst upload

    The guide says you need to register on the site before that first command, but you can just run the command and it will do that for you.  Also, I get the warning message below, but it doesn’t seem to matter (I have been using a Mac and a linux machine; maybe it is a problem for Windows?):

        Warning: Can't read registry to find the necessary compiler setting
        Make sure that Python modules _winreg, win32api or win32con are installed.
  12. Whenever you make changes to the code, you only need to update README, CHANGES, the version number in setup.py, re-commit it to git (and push it to github), and type:
    python setup.py sdist bdist_wininst upload
  13. At this point, you should be able to install the package using
        pip install projectname

    (or if you are only updating it, add --update on the end).

  14. With any luck, your dummy project will start to work again once you’ve done that, but this time, it will be drawing the app from your system’s packages, and any other project can do so too.  (In practice, you will want to use virtualenv for this.)
  15. Finally, let django packages know about your app.  This is very easy if you have already put your project on github – use the form here.

All done. I have followed this through and just published cmsapp-rt in this way, which you can now install with

    pip install cmsapp-rt

Please let me know if you have any suggestions or improvements!