Category Archives: SysAdmin

Linux Capabilities and rdiff-backup

Hazel Smith gave an excellent talk at FLOSS UK‘s Unconference in London last weekend about Linux Capabilities and using them to run a backup system with minimal permissions. Several people in the room, myself among them, sat up and went “Nice idea. I’ll be using that”. Here’s what I did:


For many years, I’ve used rdiff-backup to back up a bunch of different Linux systems. It works well, keeps the most recent backup on disk in its original form (including file owners and permissions) and allows access to the previous n days’ worth of backups too, stored efficiently. I keep 30 days and it’s occasionally been handy to have the history available.

My system was “pull” based as you’d expect: the backup server logged into each of the to-be-backed-up systems over SSH and ran rdiff-backup via sudo. You can then configure sudo so that the “backuphelper” user rdiff-backup logs in as can run rdiff-backup as root without a password being prompted for. This then gives rdiff-backup the power to read all the files and do a system-wide backup. On the receiving end, the entire rdiff-backup process and the scripts calling it run as a cron job under the root user so it can preserve owners and files on the backed-up data.

The problems

This system had served me well for a number of years, but as Hazel’s talk pointed out, it definitely violates the principle of least privilege to run the entire process as root on both the backup and target servers.

Fixing it: the target servers

I followed a similar process to the slides. On Debian, if you haven’t fiddled with the contents of /etc/pam.d then the installation of the libpam-cap package automatically adds the necessary line for to common-auth, so it works for SSH, cron and su spawned shells and you only need to configure in /etc/security/capabilities.conf.

Because rdiff-backup is written in Python, you can’t set capabilities on the script itself: the shell spawns a Python process so you need to set the capabilities on /usr/bin/python (or rather /usr/bin/python2.7 at the time of writing, as the former is a symlink). There is much waffle on the internet about how unfortunate it feels to be putting capabilities on the interpreter rather than the intended program. However, since in this case the capabilities only work if Python is run from a user who has already been granted them by pam_cap, it doesn’t seem like too much of a problem. I briefly pondered using a hard link python-with-extra-caps which was owned by ‘backuphelper’, but that felt like a maintenance burden. Overall, I still think doing it this way exposes a much smaller attack surface than running rdiff-backup as root.

Taking it further

I see that Hazel’s post-talk additions to the slides noted the further possibility of using capabilities to avoid running the backup process as root on the backup server. I had a go and managed to get it working.

Again, because rdiff-backup is Python, the capabilities needed setting on the Python interpreter. Once more I made a dedicated ‘backuphelper’ or similar user to run the backup process. I moved the SSH keys and known_hosts list across from ‘root’ which used to own them, and found that rdiff-backup at the backup server end needs CAP_DAC_OVERRIDE (the ability to write arbitrary files, not just read them), and also CAP_FOWNER if you’re expecting it to preserve Linux owner/group information and permissions.

I don’t think the backup server has gained a great deal of security from this: if you can write and chown/chmod arbitrary files, then you can certainly take total control of a system in a few steps. But at the very least, not running it all as root limits the damage that can be done by accidents (bugs/flaws in rdiff-backup) and adds more steps to get in the way of a poorly crafted or targeted exploit.

A couple of other twiddles were needed: if your backup process sends e-mail, it will now do so as ‘backuphelper’ not root, so make sure that user has a sensible from address. Lastly, my backup scripts run ‘shutdown’ to switch off the backup server when it’s finished work. I had to arrange for this to be done via sudo now that the entire backup process is no longer run by root.


As in many configuration files, order matters in the PAM configuration. RTFM. If you want to see whether pam_cap is working, you can do this:

grep CapInh /proc/$$/status

And use capsh –decode= on the resulting bit string to understand what you’ve got. If it’s all zeroes, check capabilities.conf and your PAM configuration.

Last words

If you’re thinking of giving your backup system an overhaul, don’t forget to test whether the backups the new system takes can actually be restored from.

FLOSS Unconference 2015

I had a good time at this in London yesterday – some interesting talks in the morning including one about Linux Capabilities which I’ll definitely be lifting some ideas from, and a couple of my questions (“Why don’t developers and sysadmins like each other?”, “What did you do with your Raspberry Pi?”) were discussed in the afternoon.

I thought I’d be in a minority as a developer, but in fact it was about two-thirds dev and one-third sysadmin. Some of us considered ourselves both, of course.

Unfortunately I was feeling quite wrung out after a long week and decided to make a dash for the early train home, but I’ll definitely be back at similar events in future.

TalkTalk Business, the good and the bad

Last year, the assimilation of Be into Sky prompted us to have a think about our internet provider at the church. We have a single phone line into the church office, rarely used for calls but often used for internet.

Prompted by the attraction of having one bill to pay, and not paying as much for line rental as we were to BT, we moved both phone and internet over to TalkTalk’s business offering. They did send us a router, but I just plugged their account details into our existing one and left everything as it was. The switch-over was refreshingly simple – because they were providing the landline too and plugging it into their equipment at the exchange, faffing around with migration codes for the ADSL wasn’t necessary.

For six months, all seemed well – our ADSL was fine and running at 9mbps, and the phone line we never used was presumably OK. Then a couple of weeks ago, the real test of the supplier started when the line developed a fault. Our ADSL wouldn’t stay sync’d and was running at a third of its usual speed with massive packet loss.

I was pleasantly surprised to get hold of a human being in support on a bank holiday Monday, and even more so when he was prepared to take my word for it, without arguing, that I’d tried replacing everything my side of the master socket and even used the test socket* to eliminate a possible fault in our equipment. The only annoyance was a classic call-centre screw-up – the automated system picks up, asks you to key in the phone number you’re calling about, then puts you through to a human who … asks you for the phone number you’re calling about. Pretty shaky for a telecoms company…

It all went a bit sideways from there – I explained that the church building isn’t manned continuously, so if he was going to get BT Openreach to send an engineer, he needed to (a) call me back and say when that would be, and (b) make damn sure the engineer had my mobile number. Neither of those things happened, and I had to call back two days later to be told an engineer had been sent and failed to gain access to the premises. I was told an engineer had been re-booked for Thursday between 1 and 6. I duly spent my Thursday afternoon sat in a chilly church with no WiFi, and phoned to tell them nobody had turned up. I was told that I had been misinformed, and they’d failed to re-book the engineer. Suppressing my anger, I asked them to try again and make less of a hash of it. This time, I got the 8am to 1pm slot on Friday morning, and thankfully BT’s man arrived by 8.15.

He swiftly identified a junction box just inside our property (but before the master socket) which was full of water. One replacement later, and everything is fine again.

I’m not sure who to blame for the screw-up over sending an engineer – perhaps such problems are an inevitable side-effect of being TalkTalk’s customer on BT’s piece of copper, much like the loss of accountability between Network Rail and the train companies.

What else? I wasn’t impressed by TalkTalk’s free router – the web interface has a noticeable delay on all operations, and it’s slightly lacking in features (e.g. you can configure the DHCP to always give the same IP address to a given device, but you can’t specify which IP address). The fact that the £20 TP-Link one I got from Argos is better is a bit of a clue as to how much they spent on theirs.

* Un-screw the faceplate from your master socket, and you’ll find the test socket behind it.

Blocking executable files (even buried inside ZIPs) in Exim

One of the handful of family members I host e-mail for had a narrow escape the other day, just about managing to avoid opening an .exe file buried inside a ZIP file attached to an e-mail purporting to be from Amazon.

The quality of some fake e-mails sloshing around these days is very, very good, and it seemed in this case that even the full might of SpamAssassin and ClamAV (with unofficial malware signatures) hadn’t sufficed to stop this one getting to the user’s inbox.

Spurred on by the thought of how long it might have taken me to disinfect their Windows box if they’d opened the .exe, I decided to take more drastic measures and block attachments containing .exes on the server.

Plenty of recipes for doing this are to be found on the net. The really nice bit for me, though, was the chance to break out Eximunit and do some test-driven sysadmin:

from eximunit import EximTestCase

from email.MIMEMultipart import MIMEMultipart
from email.MIMEBase import MIMEBase
from email.MIMEText import MIMEText
from email.Utils import COMMASPACE, formatdate
from email import Encoders

EXE_REJECT_MSG = """Executable attachments are not accepted. Contact postmaster if you have a
legitimate reason to send such files."""

ZIP_EXE_REJECT_MSG = """Executable attachments are not accepted, even inside ZIP files. Contact
postmaster if you have a legitimate reason to send such files."""

class ExeTests(EximTestCase):
    """Tests for .exe rejection"""

    def setUp(self):
        # Sets the default IP for sesions to be faked from

    def testDavidDomainRejectsExe(self):
                         .rcptTo('').assertDataRejected(self.messageWithAttachment('test.exe'), EXE_REJECT_MSG)

    def testDavidDomainRejectsExeZip(self):
                         .rcptTo('').assertDataRejected(self.messageWithAttachment(''), ZIP_EXE_REJECT_MSG)

    def testDavidDomainAcceptsJPG(self):

    def testDavidDomainAcceptsJpgZip(self):

    def messageWithAttachment(self, filename):
        msg = MIMEMultipart()
        msg['From'] = ''
        msg['To'] = COMMASPACE.join('')
        msg['Date'] = formatdate(localtime=True)
        msg['Subject'] = 'This is a subject about .exe and or .zip'

        msg.attach(MIMEText('Test message body'))

        part = MIMEBase('application', "octet-stream")
        part.set_payload( open(filename,"rb").read() )
        part.add_header('Content-Disposition', 'attachment; filename="%s"' % os.path.basename(filename))
        return msg.as_string()

The tests (both positive and negative cases) helped me to hammer out a couple of initial bugs. I really don’t know how anyone runs a live e-mail service without this sort of reassurance when tweaking the settings.

P.S. In the 48 hours since it went live, the new check has rejected over 60 messages, all of them containing a single .exe buried inside a ZIP. Many, but not all, of the messages are purporting to be from Amazon, and a surprising variety of different hosts are sending them, presumably part of a botnet of compromised machines.

Google and IPv6 e-mail

Update: The change described below does not seem to have reliably stopped Google from bouncing my e-mails. Time to ask them what they’re doing…

I obviously spoke too soon. Having complimented Google for finally enabling IPv6 on Google Apps, I was lying in bed this morning firing off a few e-mails from my phone when this bounce came back:

This message was created automatically by mail delivery software. A message that you sent could not be delivered to one or more of its recipients. This is a permanent error. The following address(es) failed:

SMTP error from remote mail server after end of data:
host ASPMX.L.GOOGLE.COM [2a00:1450:400c:c05::1a]:
550-5.7.1 [2001:41c8:10a:400::1 16] Our system has detected that this 550-5.7.1 message does not meet IPv6 sending guidelines regarding PTR records 
550-5.7.1 and authentication. Please review 550-5.7.1 for more 550 5.7.1 information. ek7si798308wic.60 - gsmtp

Hmm. The recipient address has been changed, but the rest of the above is verbatim. The page Google link to says:

“The sending IP must have a PTR record (i.e., a reverse DNS of the sending IP) and it should match the IP obtained via the forward DNS resolution of the hostname specified in the PTR record. Otherwise, mail will be marked as spam or possibly rejected.”

All of which is reasonable-ish, but the sending IP does have a PTR record which matches the IP obtained by forward resolution:

david@jade:~$ host 2001:41c8:10a:400::1 domain name pointer

david@jade:~$ host has address has IPv6 address 2001:41c8:10a:400::1

So what are they objecting to? Some Googling and some speculation suggests that they might be looking at all hosts in the chain handling the message (!). Further down the bounce in the original message text we find:

Received: from [2a01:348:1af:0:1571:f2fc:1a42:9b38]
	by with esmtpsa (TLS1.0:RSA_ARCFOUR_MD5:128)
	(Exim 4.80)
	id 1Vrm3Q-0002Ay-NH; Sat, 14 Dec 2013 10:02:36 +0000

Now, the IPv6 address given there is the one my phone had at the time. It doesn’t have reverse DNS because you can’t disable IPv6 privacy extensions in Android (also Google’s fault!), and assigning reverse DNS to my entire /64 would require a zone file many gigabytes big.

At this point, it’s probably best to stop speculating on Google’s opaque system and start working around it from my end. Others have resorted to disabling IPv6 for their e-mail server altogether – no thanks – or just for sending to This latter approach doesn’t work for me as the example above involves – and potentially lots of different domains will be using Google Apps for mail, so a simple domain-based white/blacklist isn’t going to cut it.

After spending some time with the excellent Exim manual, I’ve come up with a solution. It involves replacing the dnslookup router with two routers, one for mail to GMail/Google Apps hosted domains, and one for other traffic. Other settings on the routers are omitted for brevity, but you should probably keep the settings you found originally.

  debug_print = "R: dnslookup (non-google) for $local_part@$domain"
  # note this matches the host name of the target MX
  ignore_target_hosts = * : *
  # not no_more, because the google one might take it

  debug_print = "R: dnslookup (google) for $local_part@$domain"
  # strip received headers to avoid Google's silly IPv6 rules
  headers_remove = Received
  headers_add = X-Received: Authenticated device belonging to me or one of my users

Sysadmin by point and click

I promised an update on Google Apps some time ago. This week, we finally flipped the e-mail for Saint Columba’s over to Google Apps hosted mail. Combined with using Mythic’s DNS servers (free with the domain registration and with a pretty nice web interface), that meant the church’s online presence was fully disentangled from my servers for the first time in years.

While that’s a good thing on a pragmatic, I-don’t-have-time-to-run-this-any-more basis, it’s been a bit of a rough ride. Here are some of the things I’ve learned.

  • Google apps ‘groups’ are a replacement for what you think of as e-mail aliases or forwarders. Make a group, set it to ‘anyone on the internet can post’, and add the addresses you want it to forward to. In my case, I found Google Apps e-mail users which were part of the group failed to get any messages sent to it. Then it suddenly started working after 24 hours. The frustration of not knowing why this is (or if it’ll stop working again tomorrow) was probably the low point of the whole exercise.
  • Forwarding mail from a google account somewhere else requires somewhere else to give you a confirmation code before it’ll work. Not entirely unreasonable, but frustrating when it’s two accounts in the same Apps domain.
  • The out of office auto-responders on external GMail accounts don’t work for mail forwarded via apps groups/aliases. There is no documentation of this, they just fail to send a reply (not really an Apps problem, more a migration from old GMail account to Apps problem).
  • Single-line or empty test messages with just a subject line will often get eaten/filtered to junk by Google Apps as spam. This makes testing rather ‘fun’.
  • Google Apps e-mail supports two-stage authentication with your phone, which is handy, but domain administrators have to flip a setting to let users enable it. Whilst you’re there, enable SSL everywhere (boo for this not being the default).
  • All parts of Google Apps have IPv6 enabled. Nice to know we can remain the only (?) church in Oxford with an IPv6 website and e-mail.

Now it’s done and working, we’ll be sticking with it, but I can’t pretend it’s been very good for my blood pressure getting there.

Twin Nvidia graphics cards and Ubuntu

My home desktop PC, running Ubuntu, has had two monitors for quite some time, but when an obliging friend on IRC offered me a third, I couldn’t pass it up.

This meant installing a second graphics card, and fortunately I had a second Nvidia GeForce 9400GT (PCI Express), identical to the one already fitted. This went in perfectly but wasn’t recognised by anything. When I took the cover off again and looked more closely, it seems there’s a jumper which needs rotating to switch into ‘dual video cards’ mode:





Having done this, everything sprang into life. Tell Nvidia’s settings applet to overwrite the existing Xorg.conf file (not merge with it), set them all as separate X displays, and Ubuntu spreads cleanly across three of them. Shame I don’t have a fourth…

Words to live by

“Read the error message” – Me

“It’s always simpler than you think” – Me

“If a job’s worth doing, it’s worth doing properly” – my grandfather.

Why am I going all philosophical on you? Well, I wasted far too much time last night on a networking problem, and as with 99% of the technical issues I run into, all of the above applied.

The problem was apparently straightforward: I powered down and disconnected my home ‘server’ which had been creaking away in a corner quite happily. It connected to my home network via a powerline ethernet adapter, and had a static IP address.

Later on, after my new internet connection had been installed, with consequent change of router, I plugged it all back in and couldn’t ping the server. I checked the cable. I checked the power-over-ethernet thing. I theorized (wrongly) that Sky’s router wouldn’t talk to statically configured clients it hadn’t handed a DHCP lease to. I drafted in Michael with some switches and tcpdump, and he established that no packets were emerging from my machine. A couple of commands later, we came to the conclusion that the network port didn’t think it had a cable plugged into it.

At this point, I had a lightbulb moment and sheepishly admitted that this machine’s onboard networking had died a year ago and I’d put in a new Ethernet port on a PCI card. Sure enough, moving the cable to the right network port made everything spring into life.

Assumption is, and always will be, the mother of all screw-ups.

I’ve ensured this won’t happen again by taping up the dodgy port, as I ought to have done a year ago:


SQL? Dude, you’re doing it wrong

I’ve increasingly formed the opinion over the past few years that (almost) anyone writing software, certainly in the SME or 90% of open-source space, simply shouldn’t be writing raw SQL.

This is the 21st century, and all the major programming languages have these things called ORMs. Since all you actually wanted from your database was some kind of load/save/search for the objects that make up your software’s state, it turns out encoding that metaphor at the object level is much nicer than writing the code to do it all yourself.

The added bonus of this is that when you want support for a new DBMS, you just need to see if your ORM has support. Generate a schema, run through all your tests, a couple of minor fixes, job done. And you’ll make your sysadmin a happy man by not dictating a choice of DBMS to him or her which makes their life harder.

 You may be drawing breath to argue that writing the raw SQL yourself by hand is ‘more efficient’, but come on. Your blog has a couple of hundred posts and maybe a few thousand comments. The daily hit rate of ten and a half people isn’t going to tax even MySQL – so whether you use it, PostgreSQL, or even the free edition of IBM’s DB2 (eight-character limit on database names, anyone?) really doesn’t matter. It’s an implementation detail you shouldn’t worry too much about, and certainly shouldn’t prematurely optimize by getting too familiar with.

Django has led the way in the Python world for years on the ORM question, but all the other languages have them too. Make the jump, and you can always escape to raw SQL if you really need it in one corner of your application.