9 Lessons from PyConAU 2014

A summary of what I learned at PyCon AU in Brisbane this year. (Videos of the talks are here.)

1. PyCon’s code of conduct

Basically, “Be nice to people. Please.”

I once had a boss who told me he saw his role as maintaining the culture of the group.  At first I thought that seemed a strange goal for someone so senior in the company, but I eventually decided it was enlightened: a place’s culture is key to making it desirable, and making the work sustainable. So I like that PyCon takes the trouble to try to set the tone like this, when it would be so easy for a bunch of programmers to stay focused on the technical.

2. Django was made open-source to give back to the community

Ever wondered why a company like Lawrence Journal-World would want to give away its valuable IP as open source? In a “fireside chat” between Simon Willison (Django co-creator) and Andrew Godwin (South author), it was revealed that the owners knew that much of their CMS framework had been built on open source software, and they wanted to give back to the community. It just goes to show, no matter how conservative the organisation you work for, if you believe some of your work should be made open source, make the case for it.

3. There are still lots more packages and tools to try out

That lesson’s copied from my post last year on PyCon AU. Strangely this list doesn’t seem to be any shorter than last year – but it is at least a different list.

Things to add to your web stack -

  • Varnish – “if your server’s not fast enough, just add another”.  Apparently a scary scripting language is involved, but it can take your server from handling 50 users to 50,000. Fastly is a commercial service that can set this up for you.
  • Solr and elasticsearch are ways to make searches faster; use them with django-haystack.
  • Statsd & graphite for performance monitoring.
  • Docker.io

Some other stuff -

  • mpld3 – convert matplotlib to d3. Wow! I even saw this in action in an ipython notebook.
  • you can use a directed graph (eg using networkx) to determine the order of processes in your code

Here are some wider tools for bioinformaticians (if that’s a word), largely from Clare Sloggett’s talk -

  • rosalind.info – an educational tool for teaching bioinformatics algorithms in python.
  • nectar research cloud – a national cloud for Australian researchers
  • biodalliance – a fast, interactive, genome visualization tool that’s easy to embed in web pages and applications (and ipython notebooks!)
  • ensembl API – an API for genomics – cool!

And some other sciency packages -

  • Natural Language Toolkit NLTK
  • Scikit Learn can count words in docs, and separate data into training and testing sets
  • febrl – to connect user records together when their data may be incorrectly entered

One standout talk for me was Ryan Kelly’s pypy.js, implementing a compliant and fast python in the browser entirely in javascript. The only downside is it’s 15 Mb to download, but he’s working on it!

And finally, check out this alternative to python: Julia, “a high-level, high-performance dynamic programming language for technical computing”, and Scirra’s Construct 2, a game-making program for kids (Windows only).

4. Everyone loves IPython Notebook

I hadn’t thought to embed javascript in notebooks, but you can. You can even use them collaboratively through Google docs using Jupyter‘s colaboratory. You can get a table-of-contents extension too.

5. Browser caching doesn’t have to be hard

Remember, your server is not just generating html – it is generating an http response, and that includes some headers like “last modified”, “etag”, and “cache control”. Use them. Django has decorators to make it easy. See Mark Nottingham’s tutorial. (This from a talk by Tom Eastman.)

6. Making your own packages is a bit hard

I had not heard of wheels before, but they replace eggs as a “distributable unit of python code” – really just a zip file with some meta-data, possibly including operating-system-dependent binaries. Tools that you’ll want to use include tox (to run tests in lots of different environments); sphinx (to auto-generate your documentation) and then ReadTheDocs to host your docs; check-manifest to make sure your manifest.in file has everything it needs; and bumpversion so you don’t have to change your version number in five different places every time you update the code.

If you want users to install your package with “pip install python-fire“, and then import it in Python with “import fire“, then you should name your enclosing folder python_fire, and inside that you should have another folder named fire. Also, you can install this package while you are testing it by cding to the python-fire directory and typing pip install -e . (note the final full-stop; the -e flag makes it editable).

Once you have added a LICENSE, README, docs, tests, MANIFEST.insetup.py and optionally a setup.cfg (to the python-fire directory in the above example) and you have pip installed setuptoolswheel and twine, you run both

python setup.py bdist_wheel [--universal]
python setup.py sdist

The bdist version produces a binary distribution that is operating-system-specific, if required the universal flag says it will run on all operating systems in both Python 2 and Python 3). The sdist version is a source distribution.

To upload the result to pypi, run

twine upload dist/*

(This from a talk by Russell Keith-Magee.)  Incidentally, piprot is a handy tool to check how out-of-date your packages are. Also see the Hitchhiker’s Guide to Packaging.

7. Security is never far from our thoughts

This lesson is also copied from last year’s post. If you offer a free service (like Heroku), some people will try to abuse it. Heroku has ways of detecting potentially fraudulent users very quickly, and hopes to open source them soon. And be careful of your APIs which accept data – XML and YAML in particular have scary features which can let people run bad things on your server.

8. Database considerations

Some tidbits from Andrew Godwin’s talk (of South fame)…

  • Virtual machines are slow at I/O, so don’t put your database on one – put your databases on SSDs. And try not to run other things next to the database.
  • Setting default values on a new column takes a long time on a big database. (Postgres can add a NULL field for free, but not MySQL.)
  • Schema-less (aka NoSQL) databases make a lot of sense for CMSes.
  • If only one field in a table is frequently updated, separate it out into its own table.
  • Try to separate read-heavy tables (and databases) from write-heavy ones.
  • The more separate you can keep your tables from the start, the easier it will be to refactor (eg. shard) later to improve your database speed.

9. Go to the lightning talks

I am constantly amazed at the quality of the 5-minute (strictly enforced) lightning talks. Russell Keith-Magee’s toga provides a way to program native iOS, Mac OS, Windows and linux apps in python (with Android coming). (Keith has also implemented the constraint-based layout engine Cassowary in python, with tests, along the way.) Produce displays of lightning on your screen using the von mises distribution and amazingly quick typing. Run python2 inside python3 with sux (a play on six).  And much much more…

Finally, the two keynotes were very interesting too. One was by Katie Cunningham on making your websites accessible to all, including people with sight or hearing problems, or dyslexia, or colour-blindness, or who have trouble with using the keyboard or the mouse, or may just need more time to make sense of your site. Oddly enough, doing so tends to improve your site for everyone anyway (as Katie said, has anyone ever asked for more flashing effects on the margins of your page?). Examples include captioning videos, being careful with red and green (use vischeck), using aria, reading the standards, and, ideally, having a text-based description of any graphs on the site, like you might describe to a friend over the phone. Thinking of an automated way to do that last one sounds like an interesting challenge…

The other keynote was by James Curran from the University of Sydney on the way in which programming – or better, “computational thinking” – will be taught in schools. Perhaps massaging our egos at a programming conference, he claimed that computational thinking is “the most challenging thing that people do”, as it requires managing a high level of complexity and abstraction. Nonetheless, requiring kindergarteners to learn programming seemed a bit extreme to me – until he explained at that age kids would not be in front of a computer, but rather learning “to be exact”. For example, describing how to make a slice of buttered bread is essentially an algorithm, and it’s easy to miss all the steps required (like opening the cupboard door to get the bread). If you’re interested, some learning resources include MIT’s scratch, alice (using 3D animations), grok learning and the National Computer Science School (NCSS).

All in all, another excellent conference – congratulations to the organisers, and I look forward to next year in Brisbane again.

  

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>