As regular readers will know, I run a co-located server with a few friends. I’m also responsible for things like the WiFi network at church. Inevitably, these things run just fine for weeks, months or even years at a time – and then fail. The first I hear of it is usually from a would-be user, sometimes days later.
The world of open-source has a solution to this, and it’s called Nagios (actually, I think the cool kids have jumped ship to the more recent Icinga, but I haven’t played with that). It’s a monitoring system – the idea is that you install it somewhere, tell it what computers / devices / websites to monitor, then it can e-mail you or do something else to alert you when things fail or go offline.
The really nice thing about it is that it’s all pluggable – you can write your own checks as scripts, and also provide your own notification commands, for example to send you a text.
The only problem is, where to put it? Hosting it on the co-lo box I’m trying to monitor is a bit self-defeating. Fortunately this is a rather good use for that Raspberry PiÂ I had lying around – I found a convenient internet link to plug it into, removed all the GUI packages, and it’s running Nagios just fine. So far, without really even trying, I’ve got it monitoring 76 services across 29 hosts. Some of the checks are a bit random – e.g. a once-per-day check on whether any of my domain names have expired – but now it’s possible to buy ten-year renewals it’s nice to have an eye on this sort of thing, as who knows if one’s registrar will still have the right contact details to send reminders in a decade?*
So far it’s working a treat, texting me about problems so I can fix them before users notice, although if I add many more checks I suspect I’ll have found an excuse to invest in a Raspberry Pi 2 to keep up with the load.
If you’reÂ doing it right, you’ll configure Nagios to know which routers and computers stand between it and the rest of the internet (and hence what it’s trying to monitor). This means it won’t generateÂ a slew of alerts if its internet connection dies, just one. It also means you get a useful map of what you’re monitoring:
* I’m pretty sure Mythic will do, because they’re great, but it never hurts to have a backup in place.