Category Archives: Code

Non-web related coding, mostly Python.

Bad code lasts the longest…

Browsing through the URC North Western Synod’s website today, I see they now run it on WordPress. And this is a bit of a relief for me – the site’s previous incarnation was a bunch of hand-crafted PHP written by 16-year-old me, and I’d been having sleepless nights wondering if that code was still in use, and how much pain it had caused. (Not that WordPress is perfect, but if it’s good enough for this blog…)

Not all of my early attempts at computer programming have gone the way of all flesh, though – the Python script I wrote to parse the dinner menus for Magdalen JCR (and the subject of my very first blog post) is apparently still in production and telling people when to expect Chicken Kiev as recently as last week. The time it’s saved various JCR computer reps has presumably now exceeded the two days I spent writing it in the first place. And though I know the code is pretty horrible by my current standards, it does that which got me into computer programming in the first place – makes real people’s lives easier.

Converting Word documents to PDF with OpenOffice and Python

The problem

A word document (plain old .doc, not 2007) should be received by e-mail, fed to a script, turned into a PDF and published on a website.

At my disposal

My server running Debian ‘Lenny’, which does not have a display of any kind.

How hard can it be?

Harder than it should have been, as ever. Here are my steps:

# aptitude install python-uno xvfb unoconv

You’ll note the inclusion of Xvfb there, because it turns out that “headless” mode in OpenOffice isn’t really headless at all. Sigh. Also sigh some more at the broken dependencies of the unoconv Debian package.

Now we can write our script to do the actual conversion. Shame it took twice as long as it should have…

Fun with CurrentCost

Five years after the cool kids first started jumping on the bandwagon, I’ve got myself a CurrentCost CC128 (Southern Electric send them to some customers for free, it seems – e.g. my granddad who didn’t want it).

So, with the addition of an eight quid data cable and the Linux box running in my lounge, may I present my electricity usage graphs. Bear in mind that these are (at the time of writing) for a five-bedroom house in central Oxford.

The parser for the XML output of the device I’m using is this one – just swap “COM20″ for “/dev/ttyUSB0″ in their testrun script and fix it to ignore empty lines read from the serial port, and you’re in business. I then hackdapted this RRDTool tutorial to plot the graphs.

Are IDEs a problem?

I’ve just read an interesting piece over at The Register on the bloated awkwardness of Visual Studio 2010, and another on the question of whether we need IDEs at all.

The latter is a difficult question for me – on the one hand, there’s a school of thought I have some sympathy with, which says that IDEs are a crutch of the feeble-minded, and allow bad programmers to kid themselves that they’re good, because they can generate lots of code automatically and hit ctrl-space if they run out of ideas.


When I started programming, six years ago (!), the first three languages I used were Turbo Pascal for Windows, PHP and Visual Basic 6. TPW was a good basic language for teaching A-Level Computing, but the built-in editor was scarcely better than Notepad. I can’t remember if it had a compile-and-run button, but I seem to recall not. PHP was slightly better – once I’d worked out how to get Apache onto my Windows machine and sacrificed a chicken to get PHP talking to it – but again, no IDE out of the box, and even the relatively advanced capabilities built into the copy of DreamWeaver  (was I the only student in the country honest enough to cough up over £100 for it?) didn’t feel up to much.

VB6, though, felt magic. In retrospect, it was a horrid language, but not only could I drop controls onto a form and double-click to generate the outline of the method they’d activate, I could actually pause and resume the code while it was running! I could see the values of variables at a point in execution, and even go backwards and forwards. The completion facilities of the IDE were basic, but they were there, and they made things much faster. Writing code to automate Microsoft Office was a particular sweet spot – run the macro recorder to generate code containing roughly the API calls you were after, then drop some VB6 control flow round them, and off we go.

Later, reading Computer Science at Oxford, the practicals we did rarely stipulated an IDE, but we nearly always ended up using gedit + the relevant command-line compiler. Certainly, the existence of IDEs for Haskell was never alluded to – either the ultimate example of clever people thinking IDEs are the preserve of the feeble-minded, or the assumption that we’d be clever enough to go and look for one ourselves, depending on how silly I want to consider myself in retrospect.

I’m happy to say that Java hardly featured at all in our courses (I’m with Joel on that one), but when it did, we were told to use BlueJ, because, Mike Spivey explained, “it has only two buttons, and Eclipse has hundreds of others you don’t need”.

Given the short length of the practicals involved, I only paused to think “pish, how many buttons can it have?”, but I didn’t feel the need to find out for myself until I started writing Java for a living. He was right, there are hundreds of them. Despite which, I use Eclipse every day at work, and would never dream of trying to write code without it. It is, irrefutably, a big, bloated beast, but when you’re working on serious real-world Java, with version control, coding standards, complicated dependencies, hundreds of packages making up one program, and spend far more time reading and debugging code than writing it, you really do need the beast on your side (or so I believe).

So that’s it, then – I’ve converted to the world of IDEs? Well, not quite. The other language I use on a day-to-day basis – though mostly for pleasure rather than business – is Python. And I don’t usually use an IDE, simply bashing out code in Notepad++ on Windows, or KDevelop on Linux. OK, so KDevelop is sort-of an IDE, but it’s very lightweight.

Of course, Python being interpreted rather than compiled makes it easier to just fire up your Python program from the command line after editing it. And that, really, gives us a clue as to the only sane conclusion of the IDE debate: it’s the same as the programming language debate. There are tools (languages and IDEs) and there are jobs. Good programmers pick the best tool for the job, and for a compiled language as verbose as Java, an IDE arguably makes things faster. For Python, on the other hand, it’s not essential (IMO), but it depends on the tastes of the individual.

SysAdmin stuff

It’s amazing how many fewer afternoons I seem to spend hacking around on my servers these days. Perhaps I got a life; I certainly got a full-time job. I have however sorted a few long-standing bits and pieces out today… is now available over IPv6

As are its various satelite sites and Sorry, no, there is no bouncing logo to reward those of you viewing them via such.

A backup system that doesn’t Totally Suck

I’ve finally retired my creaking “run a shell script to rsync them onto my laptop when I remember (i.e. every six months)” manaul backup system in favour of an encrypted LVM partition on my home server, and rdiff-backup to make nice incremental backups of everything on a nightly basis. The instructions on how to do it are all out there on the interweb, and it’s not too difficult, fortunately. I’m a bit disappointed that backupninja doesn’t support remote rdiff-backup, but I guess I should submit a patch if it bothers me that much…meanwhile, my wrapper script seems to work just fine.

mod_wsgi delivers on the promise

It’s been over a year since I deployed Django in production, and I wasn’t looking forward to it. Last time, I had a lot of trouble with mod_python, sessions and decimal objects refuising to pickle.

Thankfully, all this really seems to have grown up in the last year – mod_wsgi is now the recommended way of deploying Django in production, and following the mod_wsgi django instructions, I was in business in 20 minutes. No fuss, no mess, no drama, and best of all, using daemon mode, no noticeable performance hit when serving static files and PHP off the same Apache installation. The ability to run the django project as its own unprivileged user when using daemon mode is also real handy.

Manipulating Maildirs with Python

My e-mail still isn’t as shiny as I’d like. In particular, my use of Exim Filters to sort incoming mail into folders lacks the ability to mark messages as read (although it’s still miles ahead of the dreaded Procmail). This would be handy for high-traffic mailing lists which I don’t have time to read on a daily basis, but which I find it hard to ignore the “unread” icon next to the folders for.

One day, I should probably move to using the Dovecot LDA and its sieve implementation, which supports the “imap4flags” extension, thus allowing marking messages as read, making them turn purple in Thunderbird, and all sorts of other cool stuff. Sadly, life (or this afternoon) is too short.

In the meantime, I’ve solved the problem in the usual way I deal with life’s imperfections: gratuitious Python run from a crojob every five minutes.

(Disclaimer: letting scratty little bits of Python anywhere near something as important as your e-mail is probably a Very Bad Idea.)

Bogroll 0.2

Earlier this year, I hacked together a stateless RSS reader called Bogroll.

It’s been doing sterling service for me at ever since. Today, I’ve sorted out a 0.2 release with the following improvements:

  • Now caches etags/Last-Modified headers to avoid fetching a feed if it hasn’t changed since last time (thank you, Mark Pilgrim, for chapter 14 of Dive Into Python 3, which reminded me to be a good citizen in this regard). I was pleased to discover that the Universal Feed Parser it’s built on top of already supports gzip and deflate compression to save bandwidth.
  • Now supports just one category per feed, because having articles appear in several categories just seems wrong to me
  • Each category now really does contain the most recent X articles from the relevant feeds, because I’ve fixed the severely broken sort-by-date logic

A fair bit of refactoring has gone on under the hood, and the code now looks a bit more like an app and less like a ten-minute bodge. The next round will involve getting some proper unit tests in place, and possibly AJAX magic to load the articles lazily on the page.

You can download the 0.2 zip, or get the latest version from subversion if you like to live dangerously. The cool kids all seem to be using Git or Mercurial these days, but I haven’t found the need (or overcome the inertia) yet.

Enjoy. Feedback welcome to the usual address.

Model inheritance in Django

a.k.a. “How to query for all objects not in a given subclass”

I ran across this problem today – do please leave a comment if you know of a better way to solve it than what follows.

You have two Django models, one of which inherits from the other, e.g.:

class Order(models.Model):
    # some fields

class DiningOrder(Order):
    # some more fields

So, how do I write a query (using the Django ORM) which returns all the Orders which are not DiningOrders?

Apparently, this does the trick:

>>> Order.objects.all().count()
>>> DiningOrder.objects.all().count()
>>> Order.objects.exclude(id__in=[ for d in DiningOrder.objects.all()]).count()