Category Archives: Python

Inside Class Cruncher

You’ll find here the concepts used to create Class Cruncher, a handy webapp for school administrators.

The problem

How should you deploy a certain number of teachers across your school’s classes, given rules about class sizes? You may need to use composite classes.

More precisely, given:

  • The total number of teachers
  • Which grades are being taught (eg. kindergarten, 1st grade, 2nd grade etc)
  • How many children are in each grade
  • Min & max class sizes in each grade
  • Which composite classes are allowed (eg. grades 1 & 2 but not grades 1 & 3 together)
  • The smallest number of children from a grade allowed in a composite class (eg. so you don’t have a single lonely first grader in a class with grade 2).

Then list out the classes that each teacher should take (i.e. how many children from each grade are in each class).

A solution

Solve this as a mixed-integer linear programming problem, using lpsolve. This uses the simplex and branch-and-bound algorithms to maximise an objective function subject to some constraints. Here we have lots of constraints, and just want a feasible solution, rather than an “optimal” solution, so the objective function is not so important.

Step 1: Choose your variables

For every potential class j, we’ll have a binary variable y(j) (ie. it must be either 0 or 1), to tell us if it is being used or not.

Because classes can include students from different grades, we will invent the term “subclass” to refer to the number of children from one particular grade in a class. Then our key variables are x(i), the number of children in each potential subclass i.

Let’s relate the classes and subclasses via a matrix C(j,i), with 1 if subclass i is in class j, and 0 otherwise. Since subclasses are only in a single class, for any subclass i, C(j,i) is 1 for only one class j. So C is mostly 0s, with each row having at least one 1, and no columns having more than one 1.

Then the number of children in class j = the sum over i of C(j,i).x(i).

(In matrix notation, Cx = number of children in each class, where x and the right hand side are column vectors.)

C(j,i) also tells us if subclass i is being used: it is given by the sum over j of C(j,i).y(j).

Recall that y(j) tells us if class j is being used. Also, for a given subclass i, C(j,i) is only 1 for a single class j, so the sum is just a single term.

(In matrix notation, C’y = binaries telling if each subclass is used, where C’ is the transpose of C.)

For example: Suppose you have 3 grades (kindergarten, 1 and 2), and can allow for 1 pure kindy class, 2 composite K/1 classes, 2 pure 1st grade and 1 pure 2nd grade class. That’s a total of 6 classes and 8 subclasses. The matrix C is:

—-8 subclasses—–
| 1 . . . . . . .
| . 1 1 . . . . .
6 . . . 1 1 . . .
classes . . . . . 1 . .
| . . . . . . 1 .
| . . . . . . . 1

To simplify the model, we’ll assume potential composite classes cannot optionally only use some of their subclasses – it’s all or nothing.

So in fact, we also need a preliminary step (before solving) to decide on the number of potential classes and subclasses from the input data. We can make a guess at an upper limit, eg. by dividing the number of children in each grade by the maximum number allowed in a class, and rounding up.

Step 2: Choose your constraints

Our constraints need to achieve these things:

  1. Ensure all the children in each grade are in a class (or equivalently, a subclass)
  2. Classes sizes must be between the minimum and maximum class size – or zero (since not all potential classes need to be used!)
  3. Subclass sizes must be above the minimum subclass size, if the class is being used
  4. The number of classes must match (or be less than) the number of available teachers

We also need to relate the x and y variables. In a linear program, all the constraints need to be linear in the variables z. This means you can express them as matrices A, so that A.z is greater than, equal to, or less than, another vector c. Here z is just a column vector containing both x and y.

Take each of the above in turn.

Ensure all the children in each grade are in a class (or equivalently, a subclass)

We have to express this in terms of the number of children in the subclasses x, because the y’s don’t tell us how many children there are in each class.

In our earlier example, the first point involves 3 constraints, one for each grade. Multiply the below matrix with the column vector x (which has 8 rows).

—-8 subclasses—– sums to
| 1 1 . 1 . . . . = number of kindergarteners
3 grades . . 1 . 1 1 1 . = number of first graders
| . . . . . . . 1 = number of 2nd graders

Classes sizes must be between the minimum and maximum class size, or zero

If we had to use every potential class (ie. all the y variables were known to be 1), then this is just the constraint Cx >= min_class_size, Cx <= max_class_size (where I’m applying the inequality to each row of each side, so these are 6+6=12 constraints in our example.)

However, Cx >= min_class_size should only apply if y is 1. That’s easy to write: Cx – min_class_size.y >= 0. So the constraint matrix has C (N_classes × N_subclasses or 6×8) on the left and -min_class_size (N_classes × N_classes or 6×6, a diagonal matrix of the minimum class sizes) on the right.

I make the same modification for the max class sizes, although it’s not strictly necessary; this has the effect of forcing y >= 0.

Subclass sizes must be above the minimum subclass size, if the class is being used

If we had to use every potential class (ie. all the y variables were known to be 1), and therefore every subclass, then this is a very simple set of constraints: x >= min_subclass_size. However, we only want this constraint to apply if the corresponding binary value is 1. That corresponding y is actually the corresponding row of C’y (as we discussed earlier).

There’s a classic trick to only apply constraints based on the value of a binary variable, called the “big M” method.

Write x >= min_subclass_size as (x – min_subclass_size) >= 0, and then replace the right hand side with M(C’y – 1), where M is a big number (eg. 10000). Now when C’y is 1, the constraint applies; when it is 0, it does not. Cool!

The number of classes must match (or be less than) the number of available teachers

The last constraint is easy – the sum of the y’s must be less than or equal the number of teachers; the x’s don’t come into it.

Do we need to explicitly constrain y to be binary?

Certainly all our variables need to be integers, not floating point. But y already can’t be bigger than 1, because our big M constraint would fail. And it can’t be less than 0 because of the way we wrote our max class size constraint.

So y is already forced to be binary.

Matrix representation of the constraints

A:  (integer vars)      (binary vars)                     b:
 -- num_subclasses -- -- num_classes --
(--------------------+-----------------)                  (--------------)
(                    |                 )      |           (       |      )
( grade_size_matrix  |        0        )  num_grades  =   (  grade_sizes )
(                    |                 )      |           (       |      )
(--------------------+-----------------)                  (--------------)
(                    |-l1              )      |           (       |      )
( class_size_matrix  |     (diag.)     )  num_classes >=  (       0      )
(                    |             -lnc)      |           (       |      )
(--------------------+-----------------)                  (--------------)
(                    |-u1              )      |           (       |      )
( class_size_matrix  |     (diag.)     )  num_classes <=  (       0      )
(                    |             -unc)      |           (       |      )
(--------------------+-----------------)                  (--------------)
( 1                  |                 )      |           (       |      )
(     1              |                 )      |           (       |      )
(        ...         |     -M * C'     ) num_sub_classes>=(    ls - M    )
(              1     |                 )      |           (       |      )
(                  1 |                 )      |           (       |      )
(--------------------+-----------------)                  (--------------)
(         0          | 1 1   ...   1 1 ) num_classes =,<= ( max_classes  )
(--------------------+-----------------)                  (--------------)

where l is the min_class_size
      u is the max_class_size
      ls is the min_subclass_size
      C' is the transpose of the class_size_matrix
      M is a large number

Step 3: Choose the objective function

For a single answer, I just minimise the sum of all the x(i).

Ideally you could get second-best and other feasible solutions from the solver, but I had trouble doing this. As a work-around, I produce different solutions by changing the weights on each subclass. I raise the weight from 1 to 2 for the subclasses in a given grade or composite class, and repeat.

Step 4: Express the solution in an understandable way

We recast the resulting x and y values so that people can understand them, eg. in our example with 8 subclasses we might get an answer like

--------------- x ----------- ------- y -------
[20, 12, 10, 0, 0, 25, 22, 21, 1, 1, 0, 1, 1, 1]

This is more easily understood as being 5 classes with the following compositions:

           K  1st 2nd
class #1: 20,  0,  0
class #2: 12, 10,  0
class #3:  0, 25,  0
class #4:  0, 22,  0
class #5:  0,  0, 21

Step 5: Post-process to humanise the answers

The optimisation produces often solutions with very different numbers of students in each class, eg. two kindergarten classes, one with 10 students, and one with 20. Clearly, a better solution is two classes of 15.

As is often the case, the constraints we thought we needed don't quite fully give us the solutions we expected.

We could tackle this by adding more constraints or changing the objective function, but in this case it is simpler to do some "shuffling" after the optimisation, according to some prescribed rules.

Composite classes make this a bit harder. For another example, given the formatted output (note the composite classes are listed at the end):

[[10,0,0], [20,0,0], [0,25,0], [0,0,10], [25,5,0], [0,5,25]]

We could return:

[[21,0,0], [21,0,0], [0,22,0], [0,0,20], [13,8,0], [0,5,15]]

The rules I came up with took some discovering:

  1. For each grade, find the average class size of classes with subclasses from that grade. (In the above example, it's 20, which includes 5 grade 1s in the comp class)
  2. For each grade, move that grade's students around to make them as close to this average as possible, eg:
    • kindergarten: avg = 20;

      [[20,0,0], [20,0,0], [0,25,0], [0,0,10], [15,5,0], [0,5,25]]

    • grade 1: avg = (25+20+30)/3 = 25; in the example, this cannot be done, since the last class is the problem.
    • grade 2: avg = (10+30)/2 = 20,

      [[20,0,0], [20,0,0], [0,25,0], [0,0,20], [15,5,0], [0,5,15]]

  3. Repeat until class sizes only change by 1, eg:
    • kindergarten: avg = 20, no shuffling required
    • grade 1: avg = (25+20+20)/3 = 21.7;

      [[20,0,0], [20,0,0], [0,22,0], [0,0,20], [15,7,0], [0,6,15]]

    • grade 2: avg = (20+21)/2 = 20.5, no change required
    • kindergarten: avg = (20+20+22)/3 = 20.7,

      [[21,0,0], [20,0,0], [0,22,0], [0,0,20], [14,7,0], [0,6,15]]

    • etc

Step 6: Build it!

Build a web app to take the inputs, perform the optimisation and serve the results!

  

Django & Angular overview

Angular is what HTML would have been if it had been designed for building web applications”

What problem does Angular solve?

It separates your javascript models, views and controllers – just like Django does for your server-side code.

It does so using “two-way data-binding”: whenever the model changes, the view changes as well – and vice versa.

Pros and Cons of Angular

Angular has a rich ecosystem of modules, eg. Ionic for mobile app development.

However, Angular 2 (to be released in 2015) will not be easily backwards compatible. Angular 1 may not be supported for much longer (18 months?).

Plenty of alternatives exist – check them out at ToDo MVC.

One that is gaining popularity is React – “a javascript library for building user interfaces”. Mark Finger has written a helpful package called django-react to make this easy to use in Django.

A quick Angular demo

Eg. see the code snippets on the Angular home page.

What tools make it easier to use with Django?

Server-side:

  • Django-angular – lots of useful utilities to help the two work together, especially around forms and template sharing; there is also support for ‘three-way’ data-binding (ie. the server detects when the client’s model changes – and the server can modify values on the client side without the client needing to poll).
  • Django REST framework or TastyPie – since your Django app’s API is now its main feature
  • Django-compressor or django-pipeline – because you will have dozens of little js files defining your Angular components

Client-side:

  • Grunt or gulp – to automate javascript necessities like minification, compilation, unit testing, linting, etc
  • Npm or bower – like pip install for your javascript packages
  • Angular has lots of modules you can add, eg. ngDialog and AngularUI
  • Don’t use the default angular router; ui-router is better.

And Yeoman – a “generator ecosystem” – although there is no django + angular generator yet.

What practices make it easier to use with Django?

This section derived from the excellent Thinkster tutorial Build Web Applications with Django and AngularJS.

Angular directory structure (in the project directory root):

  • /static/javascripts/<ng_app_name>.config.js
  • /static/javascripts/<ng_app_name>.js
  • /static/javascripts/<ng_app_name>.routes.js
  • /static/javascripts/<ng_module_name>/<ng_module_name>.module.js
  • /static/javascripts/<ng_module_name>/controllers/<controller_name>.controller.js, …
  • /static/javascripts/<ng_module_name>/directives/<directive_name>.directive.js, …
  • /static/javascripts/<ng_module_name>/services/<service_name>.service.js, …
  • /static/templates/<ng_module_name>/<ng_template_name>.html, …
  • /templates/<django_template_name>.html, …
  • /templates/javascripts.html

urls.py

urlpatterns = patterns(
    '',
    url(r'^admin/', include(admin.site.urls)),
    url(r'^api/v1/', include(router.urls)),
    # pass everything else through to Angular
    url('^.*$', IndexView.as_view(), name='index'),
)

views.py

from django.views.decorators.csrf import ensure_csrf_cookie
from django.views.generic.base import TemplateView
from django.utils.decorators import method_decorator

class IndexView(TemplateView):
    template_name = 'index.html'

    @method_decorator(ensure_csrf_cookie)
    def dispatch(self, *args, **kwargs):
       return super(IndexView,self).dispatch(*args,**kwargs)

Testing frameworks

There are many javascript testing frameworks available, eg. mocha and jasmine.

What problems have people had?

Please let me know!

Resources – Tutorials

What is this post anyway?

These are some questions for and notes from the SyDjango meetup on Angular in January 2015.

  

Harness your Python style to write good Javascript

Python encourages you to write good code. Javascript does not. How can you harness your Python style to write good Javascript?

Classes

Let’s try to write this Python code in Javascript.

class Vehicle(object):
    def __init__(self, size):
        self.size = size
    def __str__(self):
        return "Vehicle of size {0}".format(self.size)

class Hovercraft(Vehicle):
    def __init__(self, *args, **kwargs):
        # initialize just like a vehicle
        super(Hovercraft, self).__init__(*args, **kwargs)
        # add extra custom init if needed
    def hover(self, height):
        print("Hovering at height {0}".format(height))

# eg. make a new size 8 hovercraft
h = Hovercraft(8)
# eg. hover at height 12
h.hover(12)
# eg. change size and print
h.size = 9
print(h)

Not obvious? The problem is that Javascript doesn’t have an out-of-the-box analogue to classes. Somehow we need to adapt objects and functions to the task.

Use prototypes

Your first thought might be to define new object types by defining object constructors, which are called with the new keyword, and adding methods to the object’s prototype chain. These objects look a lot like classes, don’t they? So you might write something like this:

// Not the best approach - see below
function Vehicle(size) {
    this.size = size;
    // or use this and arguments pseudo-arguments
}
Vehicle.prototype.toString = function () {
    return "Vehicle of size " + this.size;
};
function Hovercraft(size) {
    this.size = size;
}
Hovercraft.prototype = new Vehicle();
Hovercraft.prototype.hover = function (height) {
    alert("hovering at height "+height);
}

// eg. make a new size 8 hovercraft
h = new Hovercraft(8);
// eg. hover at height 12
h.hover(12);
// eg. change size and print
h.size = 9
h.toString()

This has a number of problems:

  • Vehicle and Hovercraft have the same constructor, but you can’t reuse it.
  • It feels strange to define the class’s methods outside the constructor, with the prototype lines.
  • There’s no room for private variables or methods.

Use closures

For a better solution, we need to think laterally – and use Javascript’s closures. I was hesitant to make use of these at first, thinking it would only lead to perverse and impenetrable code. But in fact they are a force for good, as we’ll see.

Put simply, a closure is a function which has variables bound to it. You write the closure as a function inside another function. The trick is that the inner function can refer to any of the outer function’s local variables. (Thanks for this succinct summary StackExchange!)

Here’s a nice example of a closure from a talk on functional programming given by Douglas Crockford, JavaScript architect at PayPal and formerly Yahoo (at 57mins). (In fact the previous example is from this talk too.) In this example, a closure is used to produce an object called singleton, which has a private variable, a private function, and two methods. You call the two methods as singleton.firstMethod(a,b) and singleton.secondMethod(c):

var singleton = (function () {
    var privateVariable;
    function privateFunction(x) {
        ...privateVariable...
    }

    return {
        firstMethod: function (a, b) {
            ...privateVariable...
        },
        secondMethod: function (c) {
            ...privateFunction()...
        }
    };
}() );
// note the function is called immediately,
// so the var singleton is its returned value
// the surrounding brackets are just to help the reader

So let’s adapt that to our problem:

function vehicle(size) {
    var that = {
        size: size,
        toString: function () { 
            return "Vehicle of size " + this.size; 
        },
    };
    return that;
}
function hovercraft(size) {
    var that = vehicle(size);  // inherit from vehicle
    that.hover = function(height) { 
        alert("hovering at height "+height);
        return that; // optional - allows chaining
    };
    return that;
}

// eg. make a new size 8 hovercraft
h = hovercraft(8);
// eg. hover at height 12
h.hover(12);
// eg. both at once using chaining
g = hovercraft(8).hover(12);
// eg. change size and print
h.size = 9
h.toString()

In the above, size is accessible to the world, just as it is in Python. But Javascript also lets us make it private:

function vehicle(size) {
    // can define private variables here
    // instead, here we use the fact that parameters are private
    return {
        toString: function () { 
            return "Vehicle of size " + size; 
        },
    };
}

In general, to make your own constructor function (eg. vehicle, hovercraft), you follow this recipe from the talk:

  1. Make an object
  2. Define (private) variables and functions
  3. Augment the object with methods (which have access to the privates above)
  4. Return the object

This pattern has a name: the module pattern. You’ll find a great writeup of it, and some ways to use it across multiple js files, in this post by Ben Cherry. He also points out the best way to handle dependencies, and to update existing variables.

Use chaining

In the example above, I have added the extra feature of “chaining”. In Python you have to put each effect on a separate line, eg.:

h.size = 9
h.hover(18)
print(h)

But in Javascript you can potentially chain it all together into one, like so:

h.size(9).hover(18).toString()

I first discovered the joy of chaining while using Mike Bostock’s super-powerful D3 library. He describes how to do it here – in fact, his description of how to write a reusable chart arrives at the same closure-based solution as we have, with the addition of getters and setters as below.

We made hover chain, but size do it doesn’t yet, as it’s a variable, not a function. Mike solves this by adding a getter/setter function for each public variable, which we could add like this:

function vehicle(startSize) {
    var size = startSize;
    var that = {
        toString: function () { 
            return "Vehicle of size " + size;
        },
    };
    // getter/setter functions
    that.size = function(_) {
        if (!arguments.length) return size;
        size = _;
        return that; // Q: would 'return this' work?
    };
    return that;
}
// eg. this works now
h.size(9).hover(18).toString()

Then h.size() returns (gets) the hovercraft h‘s size, and h.size(9) sets the size to 9.

Another benefit: the code never refers to this. That’s handy, because I find whenever I refactor code into smaller functions I get tripped up by the meaning of this changing.

You may also recognize such getters and setters from jQuery, where eg. $("body").text() returns the page body’s text, and $("body").text("eels") sets the body’s text to “eels”.

Still, as nice as chaining is, needing to add 5 lines of boilerplate code for every variable, with 3 references to the variable name that must be changed each time, is the sort of thing we became programmers to avoid.

To solve this, I am starting to put all the variables with getters and setters into a single object, eg. xt:

function vehicle(startSize) {
    var ext = {size: startSize};
    if (Object.keys) {
        var extKeys = Object.keys(ext);
    }
    function toString() {
        return "Vehicle of size " + ext.size;
    }
    return {
        toString: toString,
        get: function(name) {
            if (!arguments.length) return ext;
            return ext[name];
        },
        set: function(name, val) {
            if (typeof extKeys!=="undefined" && extKeys.indexOf) {
                if (extKeys.indexOf(name)>=0) {
                    ext[name] = val;
                } else {
                    throw Error("Variable "+name+" not found");
                }
            } else {
                // on browsers without Object.keys or indexOf,
                // don't check the name is valid
                ext[name] = val;
            }
            return this;
        }   
    }
}
// eg. these work
h.get("size");
h.set("size",9).hover(18).toString();

Don’t make it a global

Finally, you probably don’t want to have a new global called vehicle. It’s better to add it to another module, eg. RT (which may or may not already exist), as RT.vehicle. It might also depend on other modules, eg. the underscore (_) library. To do this, wrap the whole function in another closure!

if (typeof _ === 'undefined') { 
    throw new Error('Vehicle requires underscore') 
}
(function(RT, _) {
    function vehicle(startSize) { 
        ... // copy from above
    }

    // attach vehicle to RT
    if (typeof RT==="undefined") {
        RT = {};
    }
    RT.vehicle = vehicle;
    return RT;
}(typeof RT === "undefined" ? {} : RT, _));
// eg. make a new size 9 vehicle
v = RT.vehicle(9);

Too crazy?

In conclusion

Javascript’s closures give you access to some interesting programming patterns. Foremost among them, it lets you implement Python-style classes, with the added bonus of private variables and functions. And this is not just an academic gimmick that risks complicating your code in the real world: it is championed by the people who develop javascript, and it is used by jQuery and D3 among others. It helps you to write good, reusable code.

So – please let me know if you’ve used this pattern before, and whether my comparison to Python’s classes stacks up.

A final thought – perhaps it is wrong to compare Javascript to Python after all. Perhaps it is better compared to LISP!

  

9 Lessons from PyConAU 2014

A summary of what I learned at PyCon AU in Brisbane this year. (Videos of the talks are here.)

1. PyCon’s code of conduct

Basically, “Be nice to people. Please.”

I once had a boss who told me he saw his role as maintaining the culture of the group.  At first I thought that seemed a strange goal for someone so senior in the company, but I eventually decided it was enlightened: a place’s culture is key to making it desirable, and making the work sustainable. So I like that PyCon takes the trouble to try to set the tone like this, when it would be so easy for a bunch of programmers to stay focused on the technical.

2. Django was made open-source to give back to the community

Ever wondered why a company like Lawrence Journal-World would want to give away its valuable IP as open source? In a “fireside chat” between Simon Willison (Django co-creator) and Andrew Godwin (South author), it was revealed that the owners knew that much of their CMS framework had been built on open source software, and they wanted to give back to the community. It just goes to show, no matter how conservative the organisation you work for, if you believe some of your work should be made open source, make the case for it.

3. There are still lots more packages and tools to try out

That lesson’s copied from my post last year on PyCon AU. Strangely this list doesn’t seem to be any shorter than last year – but it is at least a different list.

Things to add to your web stack -

  • Varnish – “if your server’s not fast enough, just add another”.  Apparently a scary scripting language is involved, but it can take your server from handling 50 users to 50,000. Fastly is a commercial service that can set this up for you.
  • Solr and elasticsearch are ways to make searches faster; use them with django-haystack.
  • Statsd & graphite for performance monitoring.
  • Docker.io

Some other stuff -

  • mpld3 – convert matplotlib to d3. Wow! I even saw this in action in an ipython notebook.
  • you can use a directed graph (eg using networkx) to determine the order of processes in your code

Here are some wider tools for bioinformaticians (if that’s a word), largely from Clare Sloggett’s talk -

  • rosalind.info – an educational tool for teaching bioinformatics algorithms in python.
  • nectar research cloud – a national cloud for Australian researchers
  • biodalliance – a fast, interactive, genome visualization tool that’s easy to embed in web pages and applications (and ipython notebooks!)
  • ensembl API – an API for genomics – cool!

And some other sciency packages -

  • Natural Language Toolkit NLTK
  • Scikit Learn can count words in docs, and separate data into training and testing sets
  • febrl – to connect user records together when their data may be incorrectly entered

One standout talk for me was Ryan Kelly’s pypy.js, implementing a compliant and fast python in the browser entirely in javascript. The only downside is it’s 15 Mb to download, but he’s working on it!

And finally, check out this alternative to python: Julia, “a high-level, high-performance dynamic programming language for technical computing”, and Scirra’s Construct 2, a game-making program for kids (Windows only).

4. Everyone loves IPython Notebook

I hadn’t thought to embed javascript in notebooks, but you can. You can even use them collaboratively through Google docs using Jupyter‘s colaboratory. You can get a table-of-contents extension too.

5. Browser caching doesn’t have to be hard

Remember, your server is not just generating html – it is generating an http response, and that includes some headers like “last modified”, “etag”, and “cache control”. Use them. Django has decorators to make it easy. See Mark Nottingham’s tutorial. (This from a talk by Tom Eastman.)

6. Making your own packages is a bit hard

I had not heard of wheels before, but they replace eggs as a “distributable unit of python code” – really just a zip file with some meta-data, possibly including operating-system-dependent binaries. Tools that you’ll want to use include tox (to run tests in lots of different environments); sphinx (to auto-generate your documentation) and then ReadTheDocs to host your docs; check-manifest to make sure your manifest.in file has everything it needs; and bumpversion so you don’t have to change your version number in five different places every time you update the code.

If you want users to install your package with “pip install python-fire“, and then import it in Python with “import fire“, then you should name your enclosing folder python_fire, and inside that you should have another folder named fire. Also, you can install this package while you are testing it by cding to the python-fire directory and typing pip install -e . (note the final full-stop; the -e flag makes it editable).

Once you have added a LICENSE, README, docs, tests, MANIFEST.insetup.py and optionally a setup.cfg (to the python-fire directory in the above example) and you have pip installed setuptoolswheel and twine, you run both

python setup.py bdist_wheel [--universal]
python setup.py sdist

The bdist version produces a binary distribution that is operating-system-specific, if required the universal flag says it will run on all operating systems in both Python 2 and Python 3). The sdist version is a source distribution.

To upload the result to pypi, run

twine upload dist/*

(This from a talk by Russell Keith-Magee.)  Incidentally, piprot is a handy tool to check how out-of-date your packages are. Also see the Hitchhiker’s Guide to Packaging.

7. Security is never far from our thoughts

This lesson is also copied from last year’s post. If you offer a free service (like Heroku), some people will try to abuse it. Heroku has ways of detecting potentially fraudulent users very quickly, and hopes to open source them soon. And be careful of your APIs which accept data – XML and YAML in particular have scary features which can let people run bad things on your server.

8. Database considerations

Some tidbits from Andrew Godwin’s talk (of South fame)…

  • Virtual machines are slow at I/O, so don’t put your database on one – put your databases on SSDs. And try not to run other things next to the database.
  • Setting default values on a new column takes a long time on a big database. (Postgres can add a NULL field for free, but not MySQL.)
  • Schema-less (aka NoSQL) databases make a lot of sense for CMSes.
  • If only one field in a table is frequently updated, separate it out into its own table.
  • Try to separate read-heavy tables (and databases) from write-heavy ones.
  • The more separate you can keep your tables from the start, the easier it will be to refactor (eg. shard) later to improve your database speed.

9. Go to the lightning talks

I am constantly amazed at the quality of the 5-minute (strictly enforced) lightning talks. Russell Keith-Magee’s toga provides a way to program native iOS, Mac OS, Windows and linux apps in python (with Android coming). (Keith has also implemented the constraint-based layout engine Cassowary in python, with tests, along the way.) Produce displays of lightning on your screen using the von mises distribution and amazingly quick typing. Run python2 inside python3 with sux (a play on six).  And much much more…

Finally, the two keynotes were very interesting too. One was by Katie Cunningham on making your websites accessible to all, including people with sight or hearing problems, or dyslexia, or colour-blindness, or who have trouble with using the keyboard or the mouse, or may just need more time to make sense of your site. Oddly enough, doing so tends to improve your site for everyone anyway (as Katie said, has anyone ever asked for more flashing effects on the margins of your page?). Examples include captioning videos, being careful with red and green (use vischeck), using aria, reading the standards, and, ideally, having a text-based description of any graphs on the site, like you might describe to a friend over the phone. Thinking of an automated way to do that last one sounds like an interesting challenge…

The other keynote was by James Curran from the University of Sydney on the way in which programming – or better, “computational thinking” – will be taught in schools. Perhaps massaging our egos at a programming conference, he claimed that computational thinking is “the most challenging thing that people do”, as it requires managing a high level of complexity and abstraction. Nonetheless, requiring kindergarteners to learn programming seemed a bit extreme to me – until he explained at that age kids would not be in front of a computer, but rather learning “to be exact”. For example, describing how to make a slice of buttered bread is essentially an algorithm, and it’s easy to miss all the steps required (like opening the cupboard door to get the bread). If you’re interested, some learning resources include MIT’s scratch, alice (using 3D animations), grok learning and the National Computer Science School (NCSS).

All in all, another excellent conference – congratulations to the organisers, and I look forward to next year in Brisbane again.

  

Monte Carlo Business Case Analysis in Python with pandas

I recently gave a talk at the Australian Python Convention arguing that for big decisions, it can be risky to rely on business case analysis prepared on spreadsheets, and that one alternative is to use Python with pandas.

The video is available here.

The slides for the talk are shown below – just click on the picture and then you can advance through the slides using the arrow keys. Alternatively, click here to see them in a new tab.


You can see the slides from the talk online here.

I also showed an ipython notebook, which you can find as a pdf here. The slides, the ipython notebook, and the beginnings of a library to make the analysis easier, are all at bitbucket.org/artstr/montepylib.

  

Serve datatables with ajax from Django

Datatables is an amazing resource which lets you quickly display lots of data in tables, with sorting, searching and pagination all built in.

The simplest way to use it is to populate the table when you load the page.  Then the sorting, searching and pagination all just happen by themselves.

If you have a lot of data, you can improve page load times by just serving the data you need to, using ajax. On first sight, this is made easy too.  However, be warned: if the server is sending only the data needed, then the server needs to take care of sorting, searching and pagination. You will also need to control the table column sizes more carefully.

There’s quite a lot required to get this right, so I thought I’d share what I’ve learned from doing this in Django.

Start with the following html. This example demonstrates using the render function to insert a link into the table.

</pre>
<div class="row">
<table class="table table-striped table-bordered" id="example" style="clear: both;">
<thead>
<tr>
<th>Name</th>
<th>Value</th>
</tr>
</thead>
</table>
</div>
<pre>

and javascript:

$(document).ready(function() {
    exampleTable = $('#example').dataTable( {
        "aaSorting": [[ 2, "asc" ]],
        "aoColumns": [
            { "mData":"name", "sWidth":"150px" },
            { "mData":"supplier", "sWidth":"150px",
              "mRender": function (supplier, type, full)  {
                             return '<a href="'+supplier.slug+'">' + supplier.name + '</a>';
                         },
            },
            { "sType": 'numeric', "sClass": "right", "mData":"price", "sWidth":"70px" },
        ],
        "bServerSide": true,
        "sAjaxSource": "{% url 'api' 'MyClass' %}",
        "bStateSave" : true, // optional
                fnStateSave :function(settings,data){
                        localStorage.setItem("exampleState", JSON.stringify(data));
                },
                fnStateLoad: function(settings) {
                        return JSON.parse(localStorage.getItem("exampleState"));
                },
        fnInitComplete: function() { // use this if you don't hardcode column widths
            this.fnAdjustColumnSizing();
        }
    });
    $('#example').click(function() { // only if you don't hardcode column widths
        exampleTable.fnAdjustColumnSizing();
    });

Next you need to write an API for the data. I’ve put my api in its own file, apis.py, and made it a generic class-based view, so I’ve added to urls.py:

from django.conf.urls import patterns, url
from myapp import views, apis

urlpatterns = patterns('',
   ...
   url(r'^api/v1/(?P<cls_name>[\w-]+)/$',apis.MyAPI.as_view(),name='api'),
)

Then in apis.py, I put the following. You could use Django REST framework or TastyPie for a fuller solution, but this is often sufficient. I’ve written it in a way that can work across many classes; just pass the class name in the URL (with the right capitalization). One missing feature here is an ability to sort on multiple columns.

import sys
import json

from django.http import HttpResponse
from django.views.generic import TemplateView
from django.core.serializers.json import DjangoJSONEncoder

import myapp.models

class JSONResponse(HttpResponse):
    """
    Return a JSON serialized HTTP response
    """
    def __init__(self, request, data, status=200):
        # pass DjangoJSONEncoder to handle Decimal fields
        json_data = json.dumps(data, cls=DjangoJSONEncoder)
        super(JSONResponse, self).__init__(
            content=json_data,
            content_type='application/json',
            status=status,
        )

class JSONViewMixin(object):
    """
    Return JSON data. Add to a class-based view.
    """
    def json_response(self, data, status=200):
        return JSONResponse(self.request, data, status=status)

# API

# define a map from json column name to model field name
# this would be better placed in the model
col_name_map = {'name': 'name',
                'supplier': 'supplier__name', # can do foreign key look ups
                'price': 'price',
               }
class MyAPI(JSONViewMixin, View):
    "Return the JSON representation of the objects"
    def get(self, request, *args, **kwargs):
        class_name = kwargs.get('cls_name')
        params = request.GET
        # make this api general enough to handle different classes
        klass = getattr(sys.modules['myapp.models'], class_name)

        # TODO: this only pays attention to the first sorting column
        sort_col_num = params.get('iSortCol_0', 0)
        # default to value column
        sort_col_name = params.get('mDataProp_{0}'.format(sort_col_num), 'value')
        search_text = params.get('sSearch', '').lower()
        sort_dir = params.get('sSortDir_0', 'asc')
        start_num = int(params.get('iDisplayStart', 0))
        num = int(params.get('iDisplayLength', 25))
        obj_list = klass.objects.all()
        sort_dir_prefix = (sort_dir=='desc' and '-' or '')
        if sort_col_name in col_name_map:
            sort_col = col_name_map[sort_col_name]
            obj_list = obj_list.order_by('{0}{1}'.format(sort_dir_prefix, sort_col))

        filtered_obj_list = obj_list
        if search_text:
            filtered_obj_list = obj_list.filter_on_search(search_text)

        d = {"iTotalRecords": obj_list.count(),                # num records before applying any filters
            "iTotalDisplayRecords": filtered_obj_list.count(), # num records after applying filters
            "sEcho":params.get('sEcho',1),                     # unaltered from query
            "aaData": [obj.as_dict() for obj in filtered_obj_list[start_num:(start_num+num)]] # the data
        }

        return self.json_response(d)

This API depends on the model for two extra things:

  • the object manager needs a filter_on_search method, and
  • the model needs an as_dict method.

The filter_on_search method is tricky to get right. You need to search with OR on the different fields of the model, and AND on different words in the search text. Here is an example which subclasses the QuerySet and object Manager classes to allow chaining of methods (along the lines of this StackOverflow answer).

from django.db import models
from django.db.models import Q
from django.db.models.query import QuerySet

class Supplier(models.Model):
    name = models.CharField(max_length=60)
    slug = models.SlugField(max_length=200)

class MyClass(models.Model):
    name = models.CharField(max_length=60)
    supplier = models.ForeignKey(Supplier)
    price = models.DecimalField(max_digits=8, decimal_places=2)
    objects = MyClassManager()

    def as_dict(self):
        """
        Create data for datatables ajax call.
        """
        return {'name': self.name,
                'supplier': {'name': self.supplier.name, 'slug': self.supplier.slug},
                'price': self.price,
                }

class MyClassMixin(object):
    """
    This will be subclassed by both the Object Manager and the QuerySet.
    By doing it this way, you can chain these functions, along with filter().
    (A simpler approach would define these in MyClassManager(models.Manager),
        but won't let you chain them, as the result of each is a QuerySet, not a Manager.)
    """
    def q_for_search_word(self, word):
        """
        Given a word from the search text, return the Q object which you can filter on,
        to show only objects containing this word.
        Extend this in subclasses to include class-specific fields, if needed.
        """
        return Q(name__icontains=word) | Q(supplier__name__icontains=word)

    def q_for_search(self, search):
        """
        Given the text from the search box, search on each word in this text.
        Return a Q object which you can filter on, to show only those objects with _all_ the words present.
        Do not expect to override/extend this in subclasses.
        """
        q = Q()
        if search:
            searches = search.split()
            for word in searches:
                q = q & self.q_for_search_word(word)
        return q

    def filter_on_search(self, search):
        """
        Return the objects containing the search terms.
        Do not expect to override/extend this in subclasses.
        """
        return self.filter(self.q_for_search(search))

class MyClassQuerySet(QuerySet, MyClassMixin):
    pass

class MyClassManager(models.Manager, MyClassMixin):
    def get_query_set(self):
        return MyClassQuerySet(self.model, using=self._db)

This is a stripped down version of my production code. I haven’t fully tested this stripped down version, so please let me know if you find any problems with it.

Hope it helps!

  

Solve this puzzle – @memo and dynamic programming

Here’s the situation.  You have a puzzle with 15 round coloured plates arranged in an equilateral triangle, and 14 coloured pebbles on top of them.  One plate does not have a pebble – it is called the hole.  Your goal is to rearrange the pebbles so that they are on the matching coloured plates, in the minimum number of moves possible.  For each move, you can only move one pebble in a straight line to the hole, possibly leaping over other pebbles on the way.

The question is – can you design an algorithm to calculate, for any starting board, the minimum number of moves to solve it?

In fact this describes the game Panguru, recently produced by Second Nature Games and available as a board game and online.  In Panguru, there are two pebbles and plates of each colour, and one additional legal move down the centreline of the triangle.  If my quick description is too wordy, the online game will give you a much better feel for it; there are rules available too.

Panguru Online

A dynamic programming solution, with memoization

Here’s the approach I came up with, which I implemented in python. This was informed by the excellent book Python Algorithms, Mastering Basic Algorithms in the Python Language, by Magnus Lie Hetland, which I highly recommend.

Think of all the possible moves used to solve the puzzle as a tree. Number the positions 0 to 14.  The root node of the tree is the hole position, and child nodes give the position of the pebble that is being moved. So with the hole starting at the top of the triangle (position 0), one path through the tree might be 0-3-5.  (If you think about it, the last move always tells you where the hole finishes up.)

It is easy to measure how close we are to the solved board: we just count how many pebbles have the same colour as their plates.  When we hit 14, we are done!

In fact we can turn it around.  Let’s assume we have a solution (14 matching pebbles) after m moves. How did we get here?  Clearly, we must have had 13 matching pebbles the move before. (Because you only move one pebble at a time, each move can only increase or decrease the number of matching pebbles by one, or leave the number unchanged.)

And the move before that, we must have had either 12 or 13 matching pebbles. And so on.

This sounds like induction and lends itself to a recursive solution, like this. First define the tree structure via the Node class. The core of this is the __init__ method which keeps track of the node’s parent. I’ve also added original_pos to help us later, and two extra methods to display nodes nicely on the command line (__repr__) and to make it easier to access the node’s ancestry (__getitem__).

class Node():
    def __init__(self, parent, move, num_matching):
        self.parent = parent
        self.move = move
        self.num_matching = num_matching

    def original_pos(self, pos):
        """
        Return the original position of the pebble now at position pos.
        """
        if not self.parent:
            return pos
        if pos==self.move:
            prev_pos = self.parent.move
        elif pos==self.parent.move:
            prev_pos = self.move
        else:
            prev_pos = pos
        return self.parent.original_pos(prev_pos)

    def __getitem__(self, key):
        """
        If m is a node which you think of as the last move in a sequence,
        m[-3] is the third last move as a Node.
        m[-3].move is the third last move as an integer (which position was moved)
        """
        if key>=0:
            raise IndexError, "You must index moves from the end, e.g. m[-1] for the last move."
        if key==-1:
            return self
        else:
            try:
                return self.parent[key+1]
            except TypeError:
                raise IndexError, "Out of range."

    def __repr__(self):
        return "%s%d" % ((self.parent and ("%s-" % str(self.parent)) or ""),
                         self.move)

Then the puzzle-solving approach could be implemented like this:

class Puzzle():
    def __init__(self, plates, pebbles, allowed_moves):
    """
    Set up a puzzle instance for solving.
    Args:
        plates is a string of 15 chars representing colours
            e.g. "WCBRBOGYGPCORPY"
        pebbles is the same, with "-" for the hole
            e.g. "-PCBRYOGYGBCORP"
        allowed_moves is a list s.t.
            allowed_moves[i] = list of positions that a pebble at pos i can move to
                = list of positions that can move to this position, if it is the hole
            e.g. [[1, 2, 3, 4, 5, 6, 9, 10, 12, 14],
                  [0, 2, 3, 4, 6, 8, 10, 13],
                  [0, 1, 4, 5, 7, 9, 11, 14],
                  [0, 1, 4, 5, 6, 7, 10, 12],
                  [0, 1, 2, 3, 5, 7, 8, 11, 12, 13],
                  [0, 2, 3, 4, 8, 9, 12, 14],
                  [0, 1, 3, 7, 8, 9, 10, 11],
                  [2, 3, 4, 6, 8, 9, 11, 12],
                  [1, 4, 5, 6, 7, 9, 12, 13],
                  [0, 2, 5, 6, 7, 8, 13, 14],
                  [0, 1, 3, 6, 11, 12, 13, 14],
                  [2, 4, 6, 7, 10, 12, 13, 14],
                  [0, 3, 4, 5, 7, 8, 10, 11, 13, 14],
                  [1, 4, 8, 9, 10, 11, 12, 14],
                  [0, 2, 5, 9, 10, 11, 12, 13]]
    """
        self.plates = plates
        self.pebbles = pebbles
        self.allowed_moves = allowed_moves
        self.num_pebbles = len(filter(lambda x: x not in ["-"], self.pebbles))
        self.num_matching = sum(plates[i] == pebbles[i] for i in range(len(pebbles)))
        hole_pos = self.pebbles.find('-')
        self.root = Node(None, hole_pos, self.num_matching)

    def matching_nodes(self, turn, num_matching):
        """
        Return all the series of moves (as BoardNodes with parents)
        that have 'num_matching' matching spots after 'turn' turns.
        """
        if turn==0:
            if num_matching==self.num_matching:
                return [self.root]
            else:
                return []
        result = []
        for change in (-1,0,1):
            for prev_node in self.matching_nodes(turn-1, num_matching+change):
                for move in self.allowed_moves[prev_node.move]:
                    pebble_colour = self.pebbles[prev_node.original_pos(move)]
                    # was the moved pebble on a matching plate already?
                    old_pos_match = (self.plates[move]==pebble_colour)
                    # does the prev board's hole plate match the moved pebble?
                    new_pos_match = (self.plates[prev_node.move]==pebble_colour)
                    # did the move change how many positions were matching,
                    # by exactly the number we're looking at?
                    if (old_pos_match-new_pos_match)==change:
                        result += [Node(prev_node, move, num_matching)]
        return result

The interesting recursion here is going on in the matching_nodes method.  It just implements the idea that the solutions that have, say, 10 matching positions at turn 10, must have had either 9,10 or 11 matching positions at turn 9. It then works back to turn 0, at which point we know how many matching positions there were.

On top of this, we need a further method which finds the right question to ask. It could start by saying – give me all the solutions after 0 moves. If there are none, find all the solutions after 1 move. And keep trying until you find some solutions, e.g.:

    def optimal_moves(self, stop_at=20):
        """
        Return a tuple (fewest possible number of moves, [optimal moves for this board]).
        """
        num = 0
        result = []
        while not result and num<=stop_at:
            result = self.matching_nodes(num, self.num_pebbles)
            num += 1
        return (num, result)

The code above will work, but it will be slow and inefficient, because matching_nodes will often recurse through territory it has already covered. And that’s where Hetland’s memo decorator comes to the rescue. (This is only a few lines of code which you can find by searching at Google books.) This will cache the results of the decorated method, so that it does not need to be recalculated.  To use it, simply apply @memo to the def matching_nodes line, like so:

@memo
def matching_nodes(self, turn, num_matching):

And that’s it!

You can see this code in action when you ask for a hint in the online game. Each time you press the hint button, the javascript sends off an ajax query which triggers the server to run the above code on the player’s board, and return which moves would come next if the player plays optimally.

In fact, to get around annoying ever-expanding memory problems on the server, I’m running this asynchronously using Celery and Redis (as covered in an earlier blog post), and restarting the celery worker after every request.  But that’s another story…

I hope that has been of interest, and please let me know if you have any comments or improvements.

  

Get started with Python on a Mac

Here are the steps I recommend for getting started with python on a Mac (OS X), particularly if you are fairly new to the language.

  1. The Mac comes with python installed, and you can run this directly from the terminal application with the command “python”.  However there is a very nice python interpreter called the “ipython notebook” which I strongly recommend using too.  Installation instructions are available here - specifically, I downloaded Enpackage Canopy and typed two commands into the terminal.  You should now be able to get an ipython notebook running in your internet browser with the terminal command:
    cd somewhere/sensible/
    ipython notebook --pylab=inline

    The last part of this means that plots get drawn inside the notebook, which is handy. If you forget to add that, you can type into the notebook itself: %pylab inline.
    Note you use shift-Enter to run all the python commands in a cell.

  2. Install pip.  This lets you easily download python packages and will be very handy later.  This is easy – from the terminal type:
    sudo easy_install pip
  3. More advanced users may want to install virtualenv, so that different projects you work on can use different versions of packages. If you’re planning on putting any of your code in production, this is a must. But if you’re just getting started, ignore it for now.
  4. Advanced users will also want to install git so you can incrementally save your own versions of code. If you’re just starting out, leave this for later too.

OK, now let’s dive in the deep end by loading some financial data from Quandl, then manipulating it with Pandas and plotting it with matplotlib. You’ll need Pandas and the Quandl package, which you can get by typing into the terminal:

pip install pandas
pip install Quandl

Now in your ipython notebook type (I recommend doing each group of statements in its own cell, so that you can run them separately; remember it’s shift-enter to run the statements):

import numpy
import pandas
import Quandl

assets = ['OFDP.ALUMINIUM_21.3',
          'OFDP.COPPER_6.3',
          'OFDP.LEAD_31.3',
          'OFDP.GOLD_2.3',
          'OFDP.ZINC_26.3']

data = Quandl.get(assets)
data.head()

data['2012':].plot()

This link gives ways you can develop the plot further.

To give an example of how to manipulate the data, you could try:

# show recent daily returns, normalised by asset volatility (ie. z-scores)
data = data.sort_index() # ensure oldest comes first
returns = (data/data.shift(1))
log_returns = numpy.log(returns)
vols = pandas.ewmstd(log_returns,span=180)  # daily vol, exp weighted moving avg
z = (returns-1)/vols # calc z-scores
z[z>50] = numpy.nan  # remove very bad data
z[z<-50] = numpy.nan
z.tail() # take a look at the most recent z-scores

That’s a very quick introduction to installing and using python on the Mac. I hope it’s helpful!
If you want more info about writing in Python itself, this link looks like a comprehensive site.

Let me know if you have any comments or improvements.

  

9 Lessons from PyConAU 2013

A summary of what I learned at PyCon AU in Hobart in 2013. (Click here for 2014.)

1. In 2005, Django helped make it possible for a team of ONE to make a commercial web app

Building web apps with Django is not just possible, it’s fun. I hadn’t realised the key role that Django played, along with Ruby on Rails, in making this happen.

2. But in 2013 the goal posts are higher – can it still be done?

Django was revolutionary when it was released, but it doesn’t take care of everything a modern (i.e. 2013) web app needs to be cutting-edge. On the back-end, once you get your head around Django itself, you need to get your head around South (for database migrations), virtualenv (so you don’t go crazy when new versions come out), the Python Image Library and django-filer or easy-thumbnails so you can upload images and files more nicely, Fabric to help you deploy your site, git (to version control your code, if you haven’t used it already), selenium (for functional testing), factory_boy (for any testing), django-reversion (so you can roll back data), staticfiles, a way to actually deploy static files on your system, e.g. a file system backend like Boto, tastypie or django-rest-framework (for an API), and perhaps a CMS like Django-CMS, Mezzanine or FeinCMS (which are the tips of other icebergs). That’s sort of where I’m up to at the moment. And there are lots more I will probably need soon - haystack (for faster searching), celery and a message broker (e.g. for non-web-page related tasks), memcache, maybe non-relational databases like MongoDB.

And that’s just the back-end. On the front-end you probably want to use javascript, ajax, jQuery, and probably another javascript library, e.g. I have been using kineticjs. But during the talks I learned I will need to consider meteor (heaps of cool stuff, but a starting point is that it drops a lot of the distinction between server and client, so that with very little code, a user can update the database and other users’ pages update to view it automatically), backbone.js (“models with key-value binding and custom events, collections with a rich API of enumerable functions,views with declarative event handling, and connects it all to your existing API over a RESTful JSON interface.”), angular.js (“lets you extend HTML vocabulary for your application”), D3.js (“data driven documents”), node.js, compass and SASS (to make css easier), ember.js (“a framework for creating ambitious web applications”), yeoman (“modern workflows for modern webapps” using Ruby and node.js)…

The keynote of DjangoCon AU by Alex Gaynor explained this in a historical context and sowed the idea in my mind that the time is ripe for a new framework (possibly an enhanced Django) that will make all these things easy as well (roughly speaking). Jacob Kaplan-Moss said to check out the Meteor screencast for what is possible.

3. Web security is never far from our thoughts

Jacob gave a great talk on web security.  As I mentioned above, Django takes care of the essential security features – CSRF tokens, SQL injections, password hashing and HTML cross-site scripting. Some immediately useful tips I picked up from Jacob are – always use https everywhere if you have user logins; django-secure makes this easy (“Helping you remember to do the stupid little things to improve your Django site’s security.”); use bcrypt for password hashing; use Django’s forms whenever there is user input, even if it’s not a form; turn off unused protocols (e.g. XML and yaml) in your API; and to emphasise how easy it is for others to intercept your unencrypted data, look up Firesheep.

4. Python packages for maths and science are making “big data” much more accessible to everyone

Lots of talks on this. Check out especially the scikit-learn documentation, which is incredibly thorough. But then there’s Pandas, scipy, and scikit-image, and for networks networkx.

For parallelization, the classic algorithm is mapreduce, and mrjob provides an python interface to this.  The easiest way to get started on parallelization is to use IPython.parallel. For an example, check out how to process a million songs in 20 minutes. For queuing jobs and running them in the background, redis-queue has a low barrier to entry. (One caveat – you may need to manually delete .pid files.)

An interesting quote – “Most of the world’s supercomputers are running Monte Carlo simulations.”

5. There are lots more packages and tools to try out

To improve my style, I want to check out django-model-utils (especially for “PassThroughManager”); and more generally, django-pipeline (for “CSS and JavaScript concatenation and compression, built-in JavaScript template support, and optional data-URI image and font embedding” – in preference to django-compressor), django-allauth (an “integrated set of Django applications addressing authentication, registration, account management as well as 3rd party (social) account authentication.”), django-taggit (to add tags to your project), Raven (the python client for Sentry, “notifies you when your users experience errors”), django-discover-runner (which will be part of Django 1.6 – it allows “you to specify which tests to run and organize your test code outside the reach of the Django test runner”), and django-sitetree (“introducing site tree, menu and breadcrumbs navigation”).

There’s more… Mock for testing (“allows you to replace parts of your system under test with mock objects and make assertions about how they have been used”), separate selenium tests into tests and page controllersGerrit (for online code reviews), Jenkins (“monitors executions of repeated jobs”), django-formrenderingtools (“customize layout of Django forms in templates, not in Python code.”). There’s a way to resize images in html5 before uploading them. And Fanstatic serves js and css files (e.g. specify you need jQuery through a python statement rather than in the template), though I’m not sure why I would need this yet.

If you need to kill off a process that’s taking too long you can use interrupting cow and django-timelimit.

There’s a way to compile clojure to javascript.  Since I don’t know clojure yet, this is a very speculative project for me, but I like the idea of avoiding javascript. :-)

And if you’re writing tests in iOS, there’s a way to run selenium on the iOS simulator using appium.

6. I still have a lot to learn about Python

I won’t embarrass myself by listing all the things I learnt about Python here, though we were encouraged not to be afraid of the CPython source code, and even less so of the PyPy source code (which has the advantage that it is in python!).

I was convinced I should be trying to use Python 3.3 whenever possible, if only to save time later with unicode errors – Python 2.x doesn’t handle these well. Django 1.5 is actually written in Python 3.3, using a package called six to make it work with Python 2.x too.  Incidentally, it also seems the consensus is to use PostgreSQL over MySQL. Though admittedly that doesn’t really fit under this heading.

7. The Python community is friendly, humble and welcoming

Good news! This keeps it fun to program in Python as much as anything.

8. PyCon was a great conference

Of all the scientific and industry conferences I have been to, this one had the best-presented talks I have seen – and not just the scheduled presenters, but also the lightning (5 minute) talks. They were very engaging and intelligible.  Speakers used their slideshows in inventive ways (e.g. using memegenerator, prezi.com and the odd xkcd cartoon).  And the conference itself was well organised by Chris Neugebauer.

9. Next time I’ll stay for the sprints!