All posts by Arthur Street

Serve datatables with ajax from Django

Datatables is an amazing resource which lets you quickly display lots of data in tables, with sorting, searching and pagination all built in.

The simplest way to use it is to populate the table when you load the page.  Then the sorting, searching and pagination all just happen by themselves.

If you have a lot of data, you can improve page load times by just serving the data you need to, using ajax. On first sight, this is made easy too.  However, be warned: if the server is sending only the data needed, then the server needs to take care of sorting, searching and pagination. You will also need to control the table column sizes more carefully.

There’s quite a lot required to get this right, so I thought I’d share what I’ve learned from doing this in Django.

Start with the following html. This example demonstrates using the render function to insert a link into the table.

</pre>
<div class="row">
<table class="table table-striped table-bordered" id="example" style="clear: both;">
<thead>
<tr>
<th>Name</th>
<th>Value</th>
</tr>
</thead>
</table>
</div>
<pre>

and javascript:

$(document).ready(function() {
    exampleTable = $('#example').dataTable( {
        "aaSorting": [[ 2, "asc" ]],
        "aoColumns": [
            { "mData":"name", "sWidth":"150px" },
            { "mData":"supplier", "sWidth":"150px",
              "mRender": function (supplier, type, full)  {
                             return '<a href="'+supplier.slug+'">' + supplier.name + '</a>';
                         },
            },
            { "sType": 'numeric', "sClass": "right", "mData":"price", "sWidth":"70px" },
        ],
        "bServerSide": true,
        "sAjaxSource": "{% url 'api' 'MyClass' %}",
        "bStateSave" : true, // optional
                fnStateSave :function(settings,data){
                        localStorage.setItem("exampleState", JSON.stringify(data));
                },
                fnStateLoad: function(settings) {
                        return JSON.parse(localStorage.getItem("exampleState"));
                },
        fnInitComplete: function() { // use this if you don't hardcode column widths
            this.fnAdjustColumnSizing();
        }
    });
    $('#example').click(function() { // only if you don't hardcode column widths
        exampleTable.fnAdjustColumnSizing();
    });

Next you need to write an API for the data. I’ve put my api in its own file, apis.py, and made it a generic class-based view, so I’ve added to urls.py:

from django.conf.urls import patterns, url
from myapp import views, apis

urlpatterns = patterns('',
   ...
   url(r'^api/v1/(?P<cls_name>[\w-]+)/$',apis.MyAPI.as_view(),name='api'),
)

Then in apis.py, I put the following. You could use Django REST framework or TastyPie for a fuller solution, but this is often sufficient. I’ve written it in a way that can work across many classes; just pass the class name in the URL (with the right capitalization). One missing feature here is an ability to sort on multiple columns.

import sys
import json

from django.http import HttpResponse
from django.views.generic import TemplateView
from django.core.serializers.json import DjangoJSONEncoder

import myapp.models

class JSONResponse(HttpResponse):
    """
    Return a JSON serialized HTTP response
    """
    def __init__(self, request, data, status=200):
        # pass DjangoJSONEncoder to handle Decimal fields
        json_data = json.dumps(data, cls=DjangoJSONEncoder)
        super(JSONResponse, self).__init__(
            content=json_data,
            content_type='application/json',
            status=status,
        )

class JSONViewMixin(object):
    """
    Return JSON data. Add to a class-based view.
    """
    def json_response(self, data, status=200):
        return JSONResponse(self.request, data, status=status)

# API

# define a map from json column name to model field name
# this would be better placed in the model
col_name_map = {'name': 'name',
                'supplier': 'supplier__name', # can do foreign key look ups
                'price': 'price',
               }
class MyAPI(JSONViewMixin, View):
    "Return the JSON representation of the objects"
    def get(self, request, *args, **kwargs):
        class_name = kwargs.get('cls_name')
        params = request.GET
        # make this api general enough to handle different classes
        klass = getattr(sys.modules['myapp.models'], class_name)

        # TODO: this only pays attention to the first sorting column
        sort_col_num = params.get('iSortCol_0', 0)
        # default to value column
        sort_col_name = params.get('mDataProp_{0}'.format(sort_col_num), 'value')
        search_text = params.get('sSearch', '').lower()
        sort_dir = params.get('sSortDir_0', 'asc')
        start_num = int(params.get('iDisplayStart', 0))
        num = int(params.get('iDisplayLength', 25))
        obj_list = klass.objects.all()
        sort_dir_prefix = (sort_dir=='desc' and '-' or '')
        if sort_col_name in col_name_map:
            sort_col = col_name_map[sort_col_name]
            obj_list = obj_list.order_by('{0}{1}'.format(sort_dir_prefix, sort_col))

        filtered_obj_list = obj_list
        if search_text:
            filtered_obj_list = obj_list.filter_on_search(search_text)

        d = {"iTotalRecords": obj_list.count(),                # num records before applying any filters
            "iTotalDisplayRecords": filtered_obj_list.count(), # num records after applying filters
            "sEcho":params.get('sEcho',1),                     # unaltered from query
            "aaData": [obj.as_dict() for obj in filtered_obj_list[start_num:(start_num+num)]] # the data
        }

        return self.json_response(d)

This API depends on the model for two extra things:

  • the object manager needs a filter_on_search method, and
  • the model needs an as_dict method.

The filter_on_search method is tricky to get right. You need to search with OR on the different fields of the model, and AND on different words in the search text. Here is an example which subclasses the QuerySet and object Manager classes to allow chaining of methods (along the lines of this StackOverflow answer).

from django.db import models
from django.db.models import Q
from django.db.models.query import QuerySet

class Supplier(models.Model):
    name = models.CharField(max_length=60)
    slug = models.SlugField(max_length=200)

class MyClass(models.Model):
    name = models.CharField(max_length=60)
    supplier = models.ForeignKey(Supplier)
    price = models.DecimalField(max_digits=8, decimal_places=2)
    objects = MyClassManager()

    def as_dict(self):
        """
        Create data for datatables ajax call.
        """
        return {'name': self.name,
                'supplier': {'name': self.supplier.name, 'slug': self.supplier.slug},
                'price': self.price,
                }

class MyClassMixin(object):
    """
    This will be subclassed by both the Object Manager and the QuerySet.
    By doing it this way, you can chain these functions, along with filter().
    (A simpler approach would define these in MyClassManager(models.Manager),
        but won't let you chain them, as the result of each is a QuerySet, not a Manager.)
    """
    def q_for_search_word(self, word):
        """
        Given a word from the search text, return the Q object which you can filter on,
        to show only objects containing this word.
        Extend this in subclasses to include class-specific fields, if needed.
        """
        return Q(name__icontains=word) | Q(supplier__name__icontains=word)

    def q_for_search(self, search):
        """
        Given the text from the search box, search on each word in this text.
        Return a Q object which you can filter on, to show only those objects with _all_ the words present.
        Do not expect to override/extend this in subclasses.
        """
        q = Q()
        if search:
            searches = search.split()
            for word in searches:
                q = q & self.q_for_search_word(word)
        return q

    def filter_on_search(self, search):
        """
        Return the objects containing the search terms.
        Do not expect to override/extend this in subclasses.
        """
        return self.filter(self.q_for_search(search))

class MyClassQuerySet(QuerySet, MyClassMixin):
    pass

class MyClassManager(models.Manager, MyClassMixin):
    def get_query_set(self):
        return MyClassQuerySet(self.model, using=self._db)

This is a stripped down version of my production code. I haven’t fully tested this stripped down version, so please let me know if you find any problems with it.

Hope it helps!

Solve this puzzle – @memo and dynamic programming

Here’s the situation.  You have a puzzle with 15 round coloured plates arranged in an equilateral triangle, and 14 coloured pebbles on top of them.  One plate does not have a pebble – it is called the hole.  Your goal is to rearrange the pebbles so that they are on the matching coloured plates, in the minimum number of moves possible.  For each move, you can only move one pebble in a straight line to the hole, possibly leaping over other pebbles on the way.

The question is – can you design an algorithm to calculate, for any starting board, the minimum number of moves to solve it?

In fact this describes the game Panguru, recently produced by Second Nature Games and available as a board game and online.  In Panguru, there are two pebbles and plates of each colour, and one additional legal move down the centreline of the triangle.  If my quick description is too wordy, the online game will give you a much better feel for it; there are rules available too.

Panguru Online

A dynamic programming solution, with memoization

Here’s the approach I came up with, which I implemented in python. This was informed by the excellent book Python Algorithms, Mastering Basic Algorithms in the Python Language, by Magnus Lie Hetland, which I highly recommend.

Think of all the possible moves used to solve the puzzle as a tree. Number the positions 0 to 14.  The root node of the tree is the hole position, and child nodes give the position of the pebble that is being moved. So with the hole starting at the top of the triangle (position 0), one path through the tree might be 0-3-5.  (If you think about it, the last move always tells you where the hole finishes up.)

It is easy to measure how close we are to the solved board: we just count how many pebbles have the same colour as their plates.  When we hit 14, we are done!

In fact we can turn it around.  Let’s assume we have a solution (14 matching pebbles) after m moves. How did we get here?  Clearly, we must have had 13 matching pebbles the move before. (Because you only move one pebble at a time, each move can only increase or decrease the number of matching pebbles by one, or leave the number unchanged.)

And the move before that, we must have had either 12 or 13 matching pebbles. And so on.

This sounds like induction and lends itself to a recursive solution, like this. First define the tree structure via the Node class. The core of this is the __init__ method which keeps track of the node’s parent. I’ve also added original_pos to help us later, and two extra methods to display nodes nicely on the command line (__repr__) and to make it easier to access the node’s ancestry (__getitem__).

class Node():
    def __init__(self, parent, move, num_matching):
        self.parent = parent
        self.move = move
        self.num_matching = num_matching

    def original_pos(self, pos):
        """
        Return the original position of the pebble now at position pos.
        """
        if not self.parent:
            return pos
        if pos==self.move:
            prev_pos = self.parent.move
        elif pos==self.parent.move:
            prev_pos = self.move
        else:
            prev_pos = pos
        return self.parent.original_pos(prev_pos)

    def __getitem__(self, key):
        """
        If m is a node which you think of as the last move in a sequence,
        m[-3] is the third last move as a Node.
        m[-3].move is the third last move as an integer (which position was moved)
        """
        if key>=0:
            raise IndexError, "You must index moves from the end, e.g. m[-1] for the last move."
        if key==-1:
            return self
        else:
            try:
                return self.parent[key+1]
            except TypeError:
                raise IndexError, "Out of range."

    def __repr__(self):
        return "%s%d" % ((self.parent and ("%s-" % str(self.parent)) or ""),
                         self.move)

Then the puzzle-solving approach could be implemented like this:

class Puzzle():
    def __init__(self, plates, pebbles, allowed_moves):
    """
    Set up a puzzle instance for solving.
    Args:
        plates is a string of 15 chars representing colours
            e.g. "WCBRBOGYGPCORPY"
        pebbles is the same, with "-" for the hole
            e.g. "-PCBRYOGYGBCORP"
        allowed_moves is a list s.t.
            allowed_moves[i] = list of positions that a pebble at pos i can move to
                = list of positions that can move to this position, if it is the hole
            e.g. [[1, 2, 3, 4, 5, 6, 9, 10, 12, 14],
                  [0, 2, 3, 4, 6, 8, 10, 13],
                  [0, 1, 4, 5, 7, 9, 11, 14],
                  [0, 1, 4, 5, 6, 7, 10, 12],
                  [0, 1, 2, 3, 5, 7, 8, 11, 12, 13],
                  [0, 2, 3, 4, 8, 9, 12, 14],
                  [0, 1, 3, 7, 8, 9, 10, 11],
                  [2, 3, 4, 6, 8, 9, 11, 12],
                  [1, 4, 5, 6, 7, 9, 12, 13],
                  [0, 2, 5, 6, 7, 8, 13, 14],
                  [0, 1, 3, 6, 11, 12, 13, 14],
                  [2, 4, 6, 7, 10, 12, 13, 14],
                  [0, 3, 4, 5, 7, 8, 10, 11, 13, 14],
                  [1, 4, 8, 9, 10, 11, 12, 14],
                  [0, 2, 5, 9, 10, 11, 12, 13]]
    """
        self.plates = plates
        self.pebbles = pebbles
        self.allowed_moves = allowed_moves
        self.num_pebbles = len(filter(lambda x: x not in ["-"], self.pebbles))
        self.num_matching = sum(plates[i] == pebbles[i] for i in range(len(pebbles)))
        hole_pos = self.pebbles.find('-')
        self.root = Node(None, hole_pos, self.num_matching)

    def matching_nodes(self, turn, num_matching):
        """
        Return all the series of moves (as BoardNodes with parents)
        that have 'num_matching' matching spots after 'turn' turns.
        """
        if turn==0:
            if num_matching==self.num_matching:
                return [self.root]
            else:
                return []
        result = []
        for change in (-1,0,1):
            for prev_node in self.matching_nodes(turn-1, num_matching+change):
                for move in self.allowed_moves[prev_node.move]:
                    pebble_colour = self.pebbles[prev_node.original_pos(move)]
                    # was the moved pebble on a matching plate already?
                    old_pos_match = (self.plates[move]==pebble_colour)
                    # does the prev board's hole plate match the moved pebble?
                    new_pos_match = (self.plates[prev_node.move]==pebble_colour)
                    # did the move change how many positions were matching,
                    # by exactly the number we're looking at?
                    if (old_pos_match-new_pos_match)==change:
                        result += [Node(prev_node, move, num_matching)]
        return result

The interesting recursion here is going on in the matching_nodes method.  It just implements the idea that the solutions that have, say, 10 matching positions at turn 10, must have had either 9,10 or 11 matching positions at turn 9. It then works back to turn 0, at which point we know how many matching positions there were.

On top of this, we need a further method which finds the right question to ask. It could start by saying – give me all the solutions after 0 moves. If there are none, find all the solutions after 1 move. And keep trying until you find some solutions, e.g.:

    def optimal_moves(self, stop_at=20):
        """
        Return a tuple (fewest possible number of moves, [optimal moves for this board]).
        """
        num = 0
        result = []
        while not result and num<=stop_at:
            result = self.matching_nodes(num, self.num_pebbles)
            num += 1
        return (num, result)

The code above will work, but it will be slow and inefficient, because matching_nodes will often recurse through territory it has already covered. And that’s where Hetland’s memo decorator comes to the rescue. (This is only a few lines of code which you can find by searching at Google books.) This will cache the results of the decorated method, so that it does not need to be recalculated.  To use it, simply apply @memo to the def matching_nodes line, like so:

@memo
def matching_nodes(self, turn, num_matching):

And that’s it!

You can see this code in action when you ask for a hint in the online game. Each time you press the hint button, the javascript sends off an ajax query which triggers the server to run the above code on the player’s board, and return which moves would come next if the player plays optimally.

In fact, to get around annoying ever-expanding memory problems on the server, I’m running this asynchronously using Celery and Redis (as covered in an earlier blog post), and restarting the celery worker after every request.  But that’s another story…

I hope that has been of interest, and please let me know if you have any comments or improvements.

  

Better than jQuery.ajax() – Django with Angular

I have built a website for my games company Second Nature Games using Django.  Django brings lots of benefits like a nice admin panel where my partner can upload new game boards and add new content to the site.

However, to write games on the web you can’t rely too much on a server-side framework like Django. You are going to have to write some javascript as well.  I was keen for the challenge, having used some jQuery before.

In my first attempt, the boards were rendered by Django on the server, and then the javascript on the client-side would manipulate the document object model (DOM) as the game was played. As I described in an earlier post, I used jQuery’s $.ajax() command to alert the server when the game was finished, and then the javascript would update scores and stats as required.

But this is not easily scalable:  as I added a leaderboard and more stats, more and more things needed to be updated, and I got into a tangle.

The solution is to use a javascript MVC framework like Backbone, Angular or Ember (or Meteor even more so).  I chose to use Angular. There is a great website, To Do MVC, which shows the same To Do application written in many different frameworks, so you can make an informed choice if you’re facing the same decision.

Using one of these frameworks changes how you view the server: the server just sends data and the client renders it. This is known as “data on the wire”.  The effect is to turn the server into an API. (Unfortunately for Django fans, I suspect Django is really overkill for this. But having developed the rest of the site with Django, I stuck with it.)  Meanwhile, whenever the data updates, the javascript framework will update the DOM for you.

Here is a sample template for the Panguru leaderboard, using Angular:

<div class="info">
  <div class="leaderboard">
    <h3 class="title">Leaderboard</h3>
    <ul>
      <li ng-repeat="entry in leaderboard.entries">
        <span class="name">{{ entry.name }}</span>
        <span class="score">{{ entry.score }}</span>
      </li>
    </ul>
  </div>
</div>

You can see it’s pretty straightforward and elegant. The code to load it from the API is:

$scope.updateLeaderboard = function() {
  var that = this;
  $http.get('api/leaderboard/')
    .success(function(data, status, headers, config) {
      that.leaderboard.entries = data;
    })
    .error(function(data, status, headers, config) {
      console.log('error updating leaderboard');
    });
  }

As you may have noticed, Angular and Django both use the double-parentheses notation. You don’t want to have them both try to interpret the same file.

One approach is to use Django’s {% verbatim %} tag. I think it’s nicer to separate the Django-rendered templates from the Angular-rendered templates completely, into two separate directories. Django templates stay in their app’s templates directory. I put Angular’s html files into the app’s static/partials directory. If the server needs to provide data to the javascript, I let Django render it in its template, and pick it up in the static html file. One example is to pass the csrf token. Another example arises because I’m using the staticfiles app, so Django renames all the static files. Unfortunately this includes the locations of all the partial html files and images you need. E.g. in my Django templates/game_page.html:

<html lang="en" ng-app="panguru">
<head>
  <title>Panguru Online</title>
  ...
  {% addtoblock "css" %}{# if you are using sekizai #}</pre>
  <style type="text/css">
    .load {background-image: url('{% static "img/loader.gif" %}');}
  </style>
  {% endaddtoblock %}
  <script type="text/javascript">
    csrf_token = '{{ csrf_token }}';
    container_url = '{% static "partials/container.html" %}';
    ...
  </script>

</head>
<body>
  <div panguru-container></div>
</body>
</html>

This part I’m not especially proud of, and would love to hear if you have a better solution.

You can then either use a Django API framework like TastyPie to serve the api/leaderboard/ URL, or you can write one yourself. I did this starting from a JSON response mixin like the one in the Django docs, or this one by Oz Katz, developed for use with Backbone. Then, in Django views.py, I put:

class LeaderboardApi(JSONViewMixin, View):
    "Return the JSON representation of the leaderboard"
    def get(self, request, *args, **kwargs):
        return self.json_response(Score.as_list(limit=20))

That just requires a method on your model which returns the object in an appropriate format, e.g.

class Score(models.Model):
    user = models.ForeignKey(User)
    score = models.PositiveIntegerField(default=0)
    class Meta:
        ordering = ['-score']
    @classmethod
    def as_list(cls):
        return [score.as_dict() for score in cls.objects.all()]
    def as_dict(self):
        return {'name': self.user.username, 'score': self.score }

There’s just one more thing, which is an inconsistency in how Django and Angular apply trailing slashes to URLs. Unfortunately if you use Angular’s $resource approach to interacting with the API, it will strip off any final slash, and then Django will redirect the URL to one with a slash, losing any POST parameters along the way. There’s a fair bit of discussion about this online. My approach has been to just turn off Django’s default behavior in settings.py:

APPEND_SLASH = False

Get ready to thoroughly test your app when you make this change, especially if you are also using Django-CMS, as I am. I uncovered a few places where I had been a bit sloppy in hardwiring URLs, and was inconsistent in my use of slashes in them.

The full result is the logic puzzle game Panguru, which you can play online or buy as a physical boardgame.  Please check it out and let me know what you think!

Get started with Python on a Mac

Here are the steps I recommend for getting started with python on a Mac (OS X), particularly if you are fairly new to the language.

  1. The Mac comes with python installed, and you can run this directly from the terminal application with the command “python”.  However there is a very nice python interpreter called the “ipython notebook” which I strongly recommend using too.  Installation instructions are available here - specifically, I downloaded Enpackage Canopy and typed two commands into the terminal.  You should now be able to get an ipython notebook running in your internet browser with the terminal command:
    cd somewhere/sensible/
    ipython notebook --pylab=inline

    The last part of this means that plots get drawn inside the notebook, which is handy. If you forget to add that, you can type into the notebook itself: %pylab inline.
    Note you use shift-Enter to run all the python commands in a cell.

  2. Install pip.  This lets you easily download python packages and will be very handy later.  This is easy – from the terminal type:
    sudo easy_install pip
  3. More advanced users may want to install virtualenv, so that different projects you work on can use different versions of packages. If you’re planning on putting any of your code in production, this is a must. But if you’re just getting started, ignore it for now.
  4. Advanced users will also want to install git so you can incrementally save your own versions of code. If you’re just starting out, leave this for later too.

OK, now let’s dive in the deep end by loading some financial data from Quandl, then manipulating it with Pandas and plotting it with matplotlib. You’ll need Pandas and the Quandl package, which you can get by typing into the terminal:

pip install pandas
pip install Quandl

Now in your ipython notebook type (I recommend doing each group of statements in its own cell, so that you can run them separately; remember it’s shift-enter to run the statements):

import numpy
import pandas
import Quandl

assets = ['OFDP.ALUMINIUM_21.3',
          'OFDP.COPPER_6.3',
          'OFDP.LEAD_31.3',
          'OFDP.GOLD_2.3',
          'OFDP.ZINC_26.3']

data = Quandl.get(assets)
data.head()

data['2012':].plot()

This link gives ways you can develop the plot further.

To give an example of how to manipulate the data, you could try:

# show recent daily returns, normalised by asset volatility (ie. z-scores)
data = data.sort_index() # ensure oldest comes first
returns = (data/data.shift(1))
log_returns = numpy.log(returns)
vols = pandas.ewmstd(log_returns,span=180)  # daily vol, exp weighted moving avg
z = (returns-1)/vols # calc z-scores
z[z>50] = numpy.nan  # remove very bad data
z[z<-50] = numpy.nan
z.tail() # take a look at the most recent z-scores

That’s a very quick introduction to installing and using python on the Mac. I hope it’s helpful!
If you want more info about writing in Python itself, this link looks like a comprehensive site.

Let me know if you have any comments or improvements.

The Rubik’s Cube

And now for something completely different.  In my spare time I’ve been learning how to solve the Rubik’s Cube, almost entirely by following the beginner’s solution on Ryan Heise’s website. He gives a way to do it with only four different sequences you need to memorise, and the site has cool animations of the moves.  He also shows how to do it without memorisation and has a good page on the maths of the cube.

Here are my notes on his approach. They are not a sophisticated analysis, just an effort to record it so I can come back in a few years’ time and save some time relearning it.

I’ll use Singmaster notation for moves, where a clockwise turn of the frontwise face is labelled F, an anticlockwise turn F', and the other faces are back B, left L, right R, up U and down D. (We won’t need D.)

Also, to take the “mirror image” of a sequence, just substitute:
R ↔ L'
L ↔ R' 
F ↔ F' 
U ↔ U'.

The first layer

You build up the completed cube by layers. The first layer is fun to work out without any memorised moves, but you can refer to earlier link for details. One useful strategy I learnt there is to focus first on the edge piece and then on the corner pieces.

The second layer

The second layer is where I have always got stuck in the past, as any moves you make cannot affect the first layer any more, making it much harder. I did work out my own way of moving an edge piece from the top layer down into the second layer; unfortunately it does affect one other piece in the second layer, so it can only ever be part of the solution.  For the record, to move the piece from the top-right edge to the front-right edge, in the same orientation, the sequence is:

U2R2U2RU2R2

But the better sequence in this situation (because it does not affect any other pieces in layer 2) is:

(U'F'UF)(URU'R')

Moves like U'F'UF are called commutators in group theory.  If U and F had no edges in common, then the sequence would have no effect. But because they do share an edge, this sequence winds up affecting three edges: the upper-left, the upper-front, and the front-right edges.

Then URU'R' affects the upper-back, upper-right and front-right edges.

For this move, we don’t really care what happens to the upper face, so it is only the impact on the front-right edge that matters, since it contains one corner of the first layer, and the edge piece we want to replace.  If you think through where that corner piece of the first layer goes, it gets caught by the moves: (.F'UF)(U.U'R') = F'UFR' . That’s interesting – we know this piece ends up where it started, but that is not obvious from this analysis. The mysteries of the cube!

If you can’t get any pieces on the right side in the correct orientation, you will need to do it on the left side using the mirror image of the above sequence.

The third layer

That brings us to the third layer, which will take four separate steps.

The cross

First, you want to form a cross on the top.  At this point, you are only aiming to get the upper colours correct, not the side colours. If you can make a little J on the top layer (i.e. the back and left edge pieces have the right colour on top), then use:

R'(U'F'UF)R

This flips the orientation of the front and right edges and rotates the left, front and back edge pieces anti-clockwise.  (It also plays havoc with the corners.)

You can see that it can only affect the top layer from my earlier reasoning: U'F'UF only affects the upper-left, upper-front, and front-right edges.  By enclosing it in R'-R, that front-right edge gets transformed into the upper-right edge – so the sequence affects only the upper-left, upper-front and upper-right edges (it leaves the single upper-back edge piece alone).

If you have a straight line instead of a J to start, put the straight line across from left to right and you can use this sequence to turn it into a J.

The whole face

I find this the trickiest step to remember, not because the sequence is hard, but because you need to orient the cube properly before you use it.

If you have just one corner with the right colour on top, then put it in the front-left position. Do you now have the top colour on the front face on the right, and on the right face at the back? If so, you can now do the sequence:

(RUR')(U)(RU2R')

This flips the three corners other than the front-left, and rotates all the corners 180 degrees around the upper face.  I know saying it “flips” the corners is not precise – there are three sides to the corners, so there are two ways they can flip.  That’s partly why I find it hard to remember how to use this move.

(The move also rotates the edge pieces, without flipping, like so: the front is unchanged while the others are rotated anti-clockwise around the upper face.  Ryan uses this fact to re-use this move for another purpose later.)

If you have just one corner with the right colour on top, but the sequence didn’t work, you will need to put that corner in the front-right and use the mirror image sequence.

If you have zero or two correct top colours on the corners, then you need to know what orientation to hold the cube before using it. This is harder to remember.  See Ryan Heise’s website for the correct orientations. I tend to just keep trying the move over and over, sometimes mirror-imaging it, until something works…

Position the corners

Now that the top face is all the right colour, it is just a matter of rotating the corners and edges around the top face.  Start with the corners.  This sequence will leave the edges untouched and the front-left corner untouched, and rotate the other three corners clockwise around the top face:

R'(FR')(B2)(RF')R'(B2)R2

This is pretty tricky, and the only way I can get my head around it is to think through how it affects each piece.

It’s also interesting that with this info, you can think of two ways to rotate the corners anti-clockwise: you could perform the mirror image of this move (starting with the correct corner in the front-right), or you could apply the inverse of this move (i.e. work backwards to undo the move you just did).  These are two very different sequences of moves that have the exact same effect on the cube. More mysteries of the cube!

Position the edges

You can rotate the three edge pieces, other than the back one, clockwise around the top face using the “whole-face” move above, then rotating the entire cube clockwise as viewed from above, and finally performing the mirror-image of the “whole-face” move, i.e.:

(RUR')(U)(RU2R') [whole cube clockwise] (L'U'L)(U')(L'U2L)

You have now solved the cube!

Conclusion

Writing this has helped me to understand a little better how these moves work, and certainly to remember them.  I hope it’s helped you too!  I’d love to hear if you’ve seen it explained better somewhere else – I’m just learning this as I go.  Also please let me know if you spot any errors in what I’ve written.

  

Drag and drop with AngularJS and ng-repeat

I am using AngularJS and jQuery UI and have an app with drag-and-drop behaviour. I want to update a statistic on the screen every time an item is dropped.  The items are displayed in an  ng-repeat.

This turns out to be a bit tricky for a newcomer to AngularJS, e.g. you need to know:

I haven’t seen any examples that put all this together for my particular case, and though the solution is straightforward it took me quite a while to piece it together.

To save you the time, here is my solution.  To simplify it, it just updates a text string obj.outer. When you load up the page, the n and inner are replaced with the numbers 1,2,3 and “Inner linked”, as expected. outer is replaced with “Updated on link”, which is also what you would expect. You can drag and drop the pieces of text on each other. And when you do, the inner and outer fields are updated.

index.html:

    <script src="//ajax.googleapis.com/ajax/libs/jquery/1.9.1/jquery.min.js"></script>
    <script src="//cdnjs.cloudflare.com/ajax/libs/angular.js/1.1.5/angular.min.js"></script>
    <script src="//cdnjs.cloudflare.com/ajax/libs/underscore.js/1.5.1/underscore-min.js"></script>
    <script src="//ajax.googleapis.com/ajax/libs/jqueryui/1.10.2/jquery-ui.min.js"></script>

    <div ng-app="test">
    <p ng-init="obj={outer:'not yet'}">outer: {{obj.outer}}</p>
      <ul>
          <li ng-repeat="n in [1,2,3]"><span my-dnd>{{n}} {{inner}}</span></li>
      </ul>
    </div>

app.js:

    angular.module('test', [])
      .directive('myDnd', function() {
        return {
            link: function(scope, elt, attrs) {
                scope.inner = "Inner";
                scope.obj.outer = "Updated on link";
                elt.draggable({revert:"invalid"});
                elt.droppable({
                    accept: function( d ) { return true; },
                    drop: function( event, ui ) {
                        return scope.$apply(function() {
                            scope.inner = 'Dropped';
                            scope.obj.outer = 'Updated on drop';
                        });
                    }
                });
              }
            }
        });

Some comments:

  • Note that you can use scope.obj.outer; you do not need to use scope.$parent.obj.outer. I believe this is ‘prototypical Javascript inheritance’. I’m not exactly sure why you need the $parent sometimes but not others.
  • You definitely need scope.obj.outer rather than just scope.outer.
  • You can leave out the return in the statement return scope.$apply(...). I haven’t looked into what the difference is. You cannot leave the call to $apply out.
  • I would like to know why the documentation shows the link function using scope rather than $scope, as used by the controllers.

Please let me know if you have any thoughts or improvements!

  

Private media with Django

I have often wanted user-uploaded files and images to be “private” or “secure”, i.e. require some authentication and authorisation to view, but haven’t known how to start. Now that I have a solution that works (e.g. for sites hosted by Webfaction), I’d like to share it with you.

The basic principles are:

  1. Serve your public “static” files (e.g. css and javascript) and any public user/admin-uploaded “media” files from your existing static webapp. This static webapp has no capability to selectively hide some files from view, so we will not use it for the private media.
  2. Serve your private media from a regular Django view:
    • This view will be accessed through a regular Django URL in a urls.py file.
    • The first step in this view is to authenticate and check for authorisation. Be aware that you will only have the request parameters (including the URL path) to do this with. One solution might be to use a directory structure with permissions varying by directory.
    • The second step is to serve the file. You could do this using Django directly, but it would be woeful. Webfaction uses Apache, which has a nice module called mod_xsendfile which lets Apache serve the contents (Django merely specifies the path in the response header). There is a solution if you use nginx too which I have not explored (see django-filer’s secure downloads feature for more). You need to explicitly install and activate this module however – see below for details.
  3. Store your private media somewhere different to your public media. You can do this by providing a custom storage for your private FileFields and ImageFields. If you need instance-specific permissions, you can do this by passing a method as the upload_to directory.

Note – before going down this path, check if the file system you are using already has a protection mechanism (e.g. Amazon’s S3 service), in which case you probably won’t need this.

An implementation

I packaged up an implementation of these principles in django-private-media. You can install it from PyPi with:

pip install django-private-media

This package draws significantly from the secure file download code of Stephan Foulis’s django-filer.  One key difference between the two is that django-filer replaces all file and image fields with a foreign key; in contrast, because my focus is solely on the permissioning, django-private-media just uses the standard Django file and image fields.  As a result, it should be fairly straightforward to convert an existing project to use it.  See the readme for more details.

Code snippets

You’ll find below some code snippets to point you in the right direction. They assume you add a PRIVATE_MEDIA_URL and PRIVATE_MEDIA_ROOT, analogous to Django’s MEDIA_URL and MEDIA_ROOT, to your settings.py file.

urls.py

# urls.py
from django.conf import settings
...
urlpatterns += patterns('appname.views',
    url(r'^{0}(?P.*)$'.format(settings.PRIVATE_MEDIA_URL.lstrip('/')), 'serve_private_file',),
)
...

views.py

# views.py
from django.conf import settings

def has_read_permission(self, request, path):
    "Only show to authenticated users - extend this as desired"
    # Note this could allow access to paths including ../..
    # Don't use this simple check in production!
    return request.user.is_authenticated()

def serve_private_file(request, path):
    "Simple example of a view to serve private files with xsendfile"
    if has_read_permission(request, path):
        fullpath = os.path.join(settings.PRIVATE_MEDIA_ROOT, path)
        response = HttpResponse()
        response['X-Sendfile'] = fullpath
        return response

models.py

# models.py

from django.db import models
from django.conf import settings

private_media = FileSystemStorage(location=settings.PRIVATE_MEDIA_ROOT,
                                  base_url=settings.PRIVATE_MEDIA_URL,
                                  )
# or you could define a custom subclass of FileSystemStorage

class Car(models.Model):
    photo = models.ImageField(storage=private_media)

settings.py

# settings.py
PRIVATE_MEDIA_ROOT = '/home/username/private-media' # for example
PRIVATE_MEDIA_URL = '/private/' # for example

Installing xsendfile

To install and activate xsendfile on Webfaction, follow the advice given by this post.

That’s all!

What have I missed?  Please let me know if you’ve done something similar and have another or better solution.

Otherwise, I hope this helps!

Using redis-queue for asynchronous calls with Django

I recently posted about using Redis and Celery with Django to handle asynchronous calls from your web pages. Given that I have memory constraints on the server, I have been wondering if I might get more bang for my buck with redis-queue (rq) instead of Celery.  In fact, I have found them comparable: rq uses about 12Mb per worker, and Celery uses about 10-12Mb per process.  However, Celery workers use (1+concurrency) processes, so if concurrency=1, Celery appears to use double the memory.

Using RQ

Here are the changes I’ve made to the code I posted earlier to replace Celery with redis-queue.  Note jobs.py is exactly the same as celery’s tasks.py, but without the @task decorator. (I did not use rq’s @job decorator.)

def status_view(request):
    """
    Called by the opt page via ajax to check if the optimisation is finished.
    If it is, return the results in JSON format.
    """
    if not request.is_ajax():
        raise SuspiciousOperation("No access.")
    if QUEUE_BACKEND=='celery':
        # as before - the main part was a call to Celery's AsyncResult
    elif QUEUE_BACKEND=='rq':
        from django.conf import settings
        from redis import Redis
        from rq import Queue
        from rq.job import Job, Status
        from rq.exceptions import NoSuchJobError
        try:
            connection = Redis(settings.RQ_REDIS_URL, settings.RQ_REDIS_PORT)
            # not quite sure if better to use Job(...) or Job.fetch(...) here
            # the difference is fetch also calls refresh
            # but I see it does not rerun the job
            job = Job.fetch(request.session['job_id'], connection=connection)
        except KeyError, NoSuchJobError:
            ret = {'error':'No optimisation is underway (or you may have disabled cookies).'}
            return HttpResponse(json.dumps(ret))
        if job.is_finished:
            ret = get_solution(job)
        elif job.is_queued:
            ret = {'status':'in-queue'}  # note extra
        elif job.is_started:
            ret = {'status':'waiting'}
        elif job.is_failed:
            ret = {'status': 'failed'}   # note extra

def get_context_data(self, **kwargs):
    ...
    if QUEUE_BACKEND=='celery':
        from . import tasks
        result = tasks.solve.delay(myarg, timeout=timeout)
    elif QUEUE_BACKEND=='rq':
        from . import jobs
        from redis import Redis
        from rq import Queue
        connection = Redis('localhost', PORT)
        q = Queue(connection=connection)
        job = q.enqueue_call(func=jobs.solve, args=[myarg],
                             kwargs={'timeout':timeout}, timeout=timeout+10)
        # the solve call itself has a timeout argument; timeout with rq shouldn't occur

In settings.py I added:

     RQ_REDIS_URL = 'localhost'
     RQ_REDIS_PORT = 6379

But I did not use django-rq at all.

One nice thing I see immediately is the additional status info – you can easily query if a job is still in the queue or has failed.  I’m sure these are possible to see in Celery too, but they are obvious in rq.

Run RQ workers

Running an rq worker is nice and simple – there is no daemonization or even setup files. On either your dev or production server, just type (and repeat for as many workers as you want):

rqworker --port 6379

Remaining issues

One initial problem was finding out how to get an existing job from its id.  I solved this with:

Job.fetch(job_id, connection=connection)

However, I cannot find documentation about Job.fetch, and I see that Job(...) by itself also works.  Please let me know if you know which of these I should be using.

The main problem I have with redis-queue now is terminating a task.  I have a “cancel” button on the optimisation screen, which I can implement with Celery via:

revoke(task_id, terminate=True)  # celery

I cannot find an equivalent in rq.  This is unfortunately a deal-breaker for me, so I am sticking with celery for now.  Can you help?

Asynchronous calls from Django

I have an optimisation I would like to run when the user presses a button on a Django page. For small cases, it is fine to run it synchronously.  However, when it takes more than a second or so, it is not great to have the web server held back by a process of unknown length.

The solution I have settled on is Celery, with Redis as the message broker.  I am using Redis over the alternatives, since it seems to have much lower memory requirements (I find it uses under 2 Mb, vs. 10-30 Mb per Celery process). And the equivalent commands if you want to use redis-queue (which uses about 10 Mb per worker) instead of Celery are given in this post.

There is a bit of a learning curve to get started with this, so I am making a guide for the next person by listing all the steps I have taken to get set up on both my development platform (running MacOS X) and a unix server (hosted by Webfaction).  Along the way I hope to answer questions about security and what the right settings are to put in the redis.conf file, the celery config file, and the usual Django settings.py file.

Install Redis

Redis is the message broker. You will need to have this running at all times for Celery’s tasks to be executed.

Installing Redis on Mac OS X is described in this blog. Basically, just download the latest version from redis.io, and in the resulting untarred directory:

make test
make
sudo mv src/redis-server /usr/bin
sudo mv src/redis-cli /usr/bin
mkdir ~/.redis
touch ~/.redis/redis.conf

Installing Redis on your server is similar, though you may need to know how to download the code from the command line first (e.g. see this post):

wget http://redis.googlecode.com/files/redis-2.6.14.tar.gz
tar xzf redis-2.6.14.tar.gz
cd redis-2.6.14
make test
make

On the production server we don’t need to relocate the redis-server or redis-cli executables, as we’ll see in the next section.

Run Redis

To run Redis on your Mac, just type one of:

redis-server  # if no config required, or:
redis-server ~/Python/redis-2.6.14/redis.conf

To run it on your Webfaction server, first add a custom app listening on a port, and note the port number you are assigned.

Now we need to daemonize it (see this post from the Webfaction community). In summary, in your redis directory, edit the redis.conf file like so (feel free to change the location of the pid file):

daemonize yes
...
pidfile /home/username/webapps/mywebapp/redis.pid
...
port xxxxx   # set to the port of the custom app you created

To test this works, type the commands below. If all is well, the pid file will now contain a process id which you can check by providing it to the ps command.

src/redis-server redis.conf
cat /home/username/webapps/mywebapp/redis.pid
ps xxxxx # use the number in the pid file

Note – when I did this without assigning the port number of the custom app, I got the following error:

# Warning: no config file specified, using the default config. In order to specify a config file use src/redis-server /path/to/redis.conf
# Unable to set the max number of files limit to 10032 (Operation not permitted), setting the max clients configuration to 4064.
# Opening port 6379: bind: Address already in use

It turns out someone else was already using port 6379, the default Redis port.

Now in practice you will want Redis to be managed with cron, so that it restarts if there is a problem. Webfaction has some docs on how to do this here; I used:

crontab -e
# and add this line to the file, changing the path as necessary:
0,10,20,30,40,50 * * * * ~/webapps/redis/redis-2.6.14/src/redis-server ~/webapps/redis/redis-2.6.14/redis.conf

FYI, for me the running Redis process uses 1.7 Mb (i.e. nothing compared to each celery process, as we’ll see).

Install Celery

The Celery docs cover this.  Installation is simple, on both development and production machines (except that I install it in the web app’s environment with Webfaction, as explained here):

pip install django-celery-with-redis

I have added the following to settings.py, replacing the port number for production:

BROKER_URL = 'redis://localhost:6379/0'
CELERY_RESULT_BACKEND = 'redis://localhost:6379/0'

import djcelery
djcelery.setup_loader()

INSTALLED_APPS = (
    ...
    'djcelery',
    ...
    )

And added the suggested lines to the top of wsgi.py:

import djcelery
djcelery.setup_loader()

I found lots more detail here, but I haven’t yet established how much of this is required.

Run a Celery worker

Now you need to start a Celery worker.

On your development server, you can enter your Django project directory and type:

python manage.py celery worker --loglevel=info

On your production server, I started by trying the same command above, to test out whether Celery could find the Redis process and run jobs – and it worked fine.  But in practice, the Celery docs say: “you will want to run the worker in the background as a daemon“.  (Note this link also talks about Celery beat, which “is a scheduler. It kicks off tasks at regular intervals, which are then executed by the worker nodes available in the cluster.” In my case, I do not need this.)

To do this, I copied the CentOS celeryd shell script file from the link at the end of the daemonization doc (since the server I am using runs CentOS), and placed it in a new celerydaemon directory in my Django project directory, along with the Django celeryd config file (I renamed the config file from celeryd, which was confusing as it is the same name as the shell script, to celery.sysconfig). I also created a new directory in my home directory called celery to hold the pid and log output files.

One more change is required, at least if you are using Webfaction to host your site: the call to celery_multi does not have a preceding python command by default.  While this works in an ssh shell, it does not work with cron - I believe because the $PATH is not set up the same way in cron.  So I explicitly add the python command in the front, including the path to python.

The config file looks like this:

# Names of nodes to start (space-separated)
CELERYD_NODES="myapp-node_1"

# Where to chdir at start. This could be the root of a virtualenv.
CELERYD_CHDIR="/home/username/webapps/webappname/projectname"

# How to call celeryd-multi (for Django)
# note python (incl path) added to front
CELERYD_MULTI="/home/user/bin/python $CELERYD_CHDIR/manage.py celeryd_multi" 

# Extra arguments
#CELERYD_OPTS="--app=my_application.path.to.worker --time-limit=300 --concurrency=8 --loglevel=DEBUG"
CELERYD_OPTS="--time-limit=180 --concurrency=2 --loglevel=DEBUG"
#  If you want to restart the worker after every 3 tasks, can use eg:
#  (I mention it here because I couldn't work out how to use 
#  CELERYD_MAX_TASKS_PER_CHILD)
#CELERYD_OPTS="--time-limit=180 --concurrency=2 --loglevel=DEBUG --maxtasksperchild=3" 

# Create log/pid dirs, if they don't already exist
CELERY_CREATE_DIRS=1

# %n will be replaced with the nodename
CELERYD_LOG_FILE="/home/username/celery/%n.log"
CELERYD_PID_FILE="/home/username/celery/%n.pid"

# Workers run as an unprivileged user
CELERYD_USER=celery
CELERYD_GROUP=celery

# Name of the projects settings module.
export DJANGO_SETTINGS_MODULE="myproject.settings"

In the shell script, I changed the two references to /var (DEFAULT_PID_FILE and DEFAULT_LOG_FILE) and the reference to /etc (CELERY_DEFAULTS) in the shell script to directories I can write to, e.g.:

DEFAULT_PID_FILE="/home/username/celery/%n.pid"
DEFAULT_LOG_FILE="/home/username/celery/%n.log"
...
CELERY_DEFAULTS=${CELERY_DEFAULTS:-"/home/username/webapps/webappname/projectname/celerydaemon/celeryd.sysconfig"}

I found a problem in the CentOS script – it calls /etc/init.d/functions, which resets the $PATH variable globally, so that the rest of the script cannot find python any more. I have raised this as an issue, where you can also see my workaround.

To test things out on the production server, you can type (use sh rather than source here because the script ends with an exit, and you don’t want to be logged out of your ssh session each time):

sh celerydaemon/celeryd start

and you should see a new .pid file in ~/celery showing the process id of the new worker(s).

Type the following line to stop all the celery processes:

sh celerydaemon/celeryd stop

Restart celery with cron if needed

As with Redis, you can ensure the celery workers are restarted by cron if they fail. Unlike with Redis, there are a lot of tricks here for the unwary (i.e. me).

  1. Write a script to check if a celery process is running. Webfaction provides an example here, which I have changed the last line of to read:
    sh /home/username/webapps/webappname/projectname/celerydaemon/celeryd restart
  2. This is the script we will ask cron to run. Note that I use restart here, not start; I am doing this because I have found in a real case that if the server dies suddenly, celery continues to think it is still running even when it isn’t, and so start does nothing. So add to your crontab (assuming the above script is called celery_check.sh):
    crontab -e
    1,11,21,31,41,51 * * * * ~/webapps/webappname/projectname/celerydaemon/celery_check.sh
  3. One last thing, pointed out to me in correspondence with Webfaction: the celeryd script file implements restart with:
    stop && start

    So if stop fails for any reason, the script will not restart celery.  For our purposes, we want start to occur regardless, so change this line to:

    stop; start;

Your celery workers should now restart if there is a problem.

Controlling the number of processes

If you’re like me you are now confused about the difference between a node, a worker, a process and a thread. When I run the celeryd start command, it kicks off three processes, one of which has the pid in the node’s pid file. This despite my request for one node, and “--concurrency=2” in the config file.

When I change the concurrency setting to 1, then I get two processes. When I also add another node, I get four processes.

So what I assume is happening is: workers are the same things as nodes, and each worker needs one process for overhead and “concurrency” additional processes.

For me, at first I found each celery process required about 30-35Mb (regardless of the number of nodes or concurrency). So three use about 100Mb.  When I looked again a week later, the processes were using only 10 Mb each, even when solving tasks.  I’m not sure what explains the discrepancy.

Use it

With this much, you can adapt the Celery demo (adding two numbers) to your own site, and it should work.

On my site I use ajax and javascript to regularly poll whether the optimisation is finished. The following files hopefully give the basic idea.

urls.py

# urls.py
from myapp.views import OptView, status_view
...
    url(r'^opt/', OptView.as_view(), name="opt"),
    url(r'^status/', status_view, name="status"), # for ajax
...

views.py

# views.py
import json
from django.views.generic import TemplateView
from django.core.exceptions import SuspiciousOperation
from celery.result import AsyncResult
from . import tasks

class OptView(TemplateView):
    template_name = 'opt.html'

    def get_context_data(self, **kwargs):
        """
        Kick off the optimization.
        """
        # replace the next line with a call to your task
        result = tasks.solve.delay(params)
        # save the task id so we can query its status via ajax
        self.request.session['task_id'] = result.task_id
        # if you need to cancel the task, use:
        # revoke(self.request.session['task_id'], terminate=True)
        context = super(OptView, self).get_context_data(**kwargs)
        return context

def status_view(request):
    """
    Called by the opt page via ajax to check if the optimisation is finished.
    If it is, return the results in JSON format.
    """
    if not request.is_ajax():
        raise SuspiciousOperation("No access.")
    try:
        result = AsyncResult(request.session['task_id'])
    except KeyError:
        ret = {'error':'No optimisation (or you may have disabled cookies).'}
        return HttpResponse(json.dumps(ret))
    try:
        if result.ready():
            # to do - check if it is really solved, or if it timed out or failed
            ret = {'status':'solved'}
            # replace this with the relevant part of the result
            ret.update({'result':result})
        else:
            ret = {'status':'waiting'}
    except AttributeError:
        ret = {'error':'Cannot find an optimisation task.'}
        return HttpResponse(json.dumps(ret))
    return HttpResponse(json.dumps(ret))

javascript

// include this javascript in your template (needs jQuery)
// also include the {% csrf_token %} tag, not nec. in a form
$(function() {
	function handle_error(xhr, textStatus, errorThrown) {
		clearInterval(interval_id);
		alert("Please report this error: "+errorThrown+xhr.status+xhr.responseText);
	}

	function show_status(data) {
		var obj = JSON.parse(data);
		if (obj.error) {
			clearInterval(interval_id);
			alert(obj.error);
		}
		if (obj.status == "waiting"){
			// do nothing
		}
		else if (obj.status == "solved"){
			clearInterval(interval_id);
			// show the solution
		}
		else {
			clearInterval(interval_id);
			alert(data);
		}
	}

	function check_status() {
		$.ajax({
			type: "POST",
			url: "/status/",
			data: {csrfmiddlewaretoken:
				document.getElementsByName('csrfmiddlewaretoken')[0].value},
			success: show_status,
			error: handle_error
		});
	}

	setTimeout(check_status, 0.05);
	// check every second
	var interval_id = setInterval(check_status, 1000);
});

As mentioned in the comments to the code above, if you need to cancel an optimisation, you can use:

revoke(task_id, terminate=True)

Monitoring

You can monitor what’s happening in celery with celery flower, at least on dev:

pip install flower
celery flower --broker=redis://localhost:PORTNUM/0

And then go to localhost:5555 in your web browser.

When you use djcelery, you will also find a djcelery app in the admin panel, where you can view workers and tasks.  There is a little bit of set up required to populate these tables.  More info about this is provided in the celery docs.

Security

Some links on this topic:

  • http://redis.io/topics/security
  • http://docs.celeryproject.org/en/latest/userguide/security.html

I’ll add to this section as I learn more about it.

I hope that’s helpful – please let me know what you think.

9 Lessons from PyConAU 2013

A summary of what I learned at PyCon AU in Hobart in 2013. (Click here for 2014.)

1. In 2005, Django helped make it possible for a team of ONE to make a commercial web app

Building web apps with Django is not just possible, it’s fun. I hadn’t realised the key role that Django played, along with Ruby on Rails, in making this happen.

2. But in 2013 the goal posts are higher – can it still be done?

Django was revolutionary when it was released, but it doesn’t take care of everything a modern (i.e. 2013) web app needs to be cutting-edge. On the back-end, once you get your head around Django itself, you need to get your head around South (for database migrations), virtualenv (so you don’t go crazy when new versions come out), the Python Image Library and django-filer or easy-thumbnails so you can upload images and files more nicely, Fabric to help you deploy your site, git (to version control your code, if you haven’t used it already), selenium (for functional testing), factory_boy (for any testing), django-reversion (so you can roll back data), staticfiles, a way to actually deploy static files on your system, e.g. a file system backend like Boto, tastypie or django-rest-framework (for an API), and perhaps a CMS like Django-CMS, Mezzanine or FeinCMS (which are the tips of other icebergs). That’s sort of where I’m up to at the moment. And there are lots more I will probably need soon - haystack (for faster searching), celery and a message broker (e.g. for non-web-page related tasks), memcache, maybe non-relational databases like MongoDB.

And that’s just the back-end. On the front-end you probably want to use javascript, ajax, jQuery, and probably another javascript library, e.g. I have been using kineticjs. But during the talks I learned I will need to consider meteor (heaps of cool stuff, but a starting point is that it drops a lot of the distinction between server and client, so that with very little code, a user can update the database and other users’ pages update to view it automatically), backbone.js (“models with key-value binding and custom events, collections with a rich API of enumerable functions,views with declarative event handling, and connects it all to your existing API over a RESTful JSON interface.”), angular.js (“lets you extend HTML vocabulary for your application”), D3.js (“data driven documents”), node.js, compass and SASS (to make css easier), ember.js (“a framework for creating ambitious web applications”), yeoman (“modern workflows for modern webapps” using Ruby and node.js)…

The keynote of DjangoCon AU by Alex Gaynor explained this in a historical context and sowed the idea in my mind that the time is ripe for a new framework (possibly an enhanced Django) that will make all these things easy as well (roughly speaking). Jacob Kaplan-Moss said to check out the Meteor screencast for what is possible.

3. Web security is never far from our thoughts

Jacob gave a great talk on web security.  As I mentioned above, Django takes care of the essential security features – CSRF tokens, SQL injections, password hashing and HTML cross-site scripting. Some immediately useful tips I picked up from Jacob are – always use https everywhere if you have user logins; django-secure makes this easy (“Helping you remember to do the stupid little things to improve your Django site’s security.”); use bcrypt for password hashing; use Django’s forms whenever there is user input, even if it’s not a form; turn off unused protocols (e.g. XML and yaml) in your API; and to emphasise how easy it is for others to intercept your unencrypted data, look up Firesheep.

4. Python packages for maths and science are making “big data” much more accessible to everyone

Lots of talks on this. Check out especially the scikit-learn documentation, which is incredibly thorough. But then there’s Pandas, scipy, and scikit-image, and for networks networkx.

For parallelization, the classic algorithm is mapreduce, and mrjob provides an python interface to this.  The easiest way to get started on parallelization is to use IPython.parallel. For an example, check out how to process a million songs in 20 minutes. For queuing jobs and running them in the background, redis-queue has a low barrier to entry. (One caveat – you may need to manually delete .pid files.)

An interesting quote – “Most of the world’s supercomputers are running Monte Carlo simulations.”

5. There are lots more packages and tools to try out

To improve my style, I want to check out django-model-utils (especially for “PassThroughManager”); and more generally, django-pipeline (for “CSS and JavaScript concatenation and compression, built-in JavaScript template support, and optional data-URI image and font embedding” – in preference to django-compressor), django-allauth (an “integrated set of Django applications addressing authentication, registration, account management as well as 3rd party (social) account authentication.”), django-taggit (to add tags to your project), Raven (the python client for Sentry, “notifies you when your users experience errors”), django-discover-runner (which will be part of Django 1.6 – it allows “you to specify which tests to run and organize your test code outside the reach of the Django test runner”), and django-sitetree (“introducing site tree, menu and breadcrumbs navigation”).

There’s more… Mock for testing (“allows you to replace parts of your system under test with mock objects and make assertions about how they have been used”), separate selenium tests into tests and page controllersGerrit (for online code reviews), Jenkins (“monitors executions of repeated jobs”), django-formrenderingtools (“customize layout of Django forms in templates, not in Python code.”). There’s a way to resize images in html5 before uploading them. And Fanstatic serves js and css files (e.g. specify you need jQuery through a python statement rather than in the template), though I’m not sure why I would need this yet.

If you need to kill off a process that’s taking too long you can use interrupting cow and django-timelimit.

There’s a way to compile clojure to javascript.  Since I don’t know clojure yet, this is a very speculative project for me, but I like the idea of avoiding javascript. :-)

And if you’re writing tests in iOS, there’s a way to run selenium on the iOS simulator using appium.

6. I still have a lot to learn about Python

I won’t embarrass myself by listing all the things I learnt about Python here, though we were encouraged not to be afraid of the CPython source code, and even less so of the PyPy source code (which has the advantage that it is in python!).

I was convinced I should be trying to use Python 3.3 whenever possible, if only to save time later with unicode errors – Python 2.x doesn’t handle these well. Django 1.5 is actually written in Python 3.3, using a package called six to make it work with Python 2.x too.  Incidentally, it also seems the consensus is to use PostgreSQL over MySQL. Though admittedly that doesn’t really fit under this heading.

7. The Python community is friendly, humble and welcoming

Good news! This keeps it fun to program in Python as much as anything.

8. PyCon was a great conference

Of all the scientific and industry conferences I have been to, this one had the best-presented talks I have seen – and not just the scheduled presenters, but also the lightning (5 minute) talks. They were very engaging and intelligible.  Speakers used their slideshows in inventive ways (e.g. using memegenerator, prezi.com and the odd xkcd cartoon).  And the conference itself was well organised by Chris Neugebauer.

9. Next time I’ll stay for the sprints!