Server downtime

It appears my server was offline for about 12 hours over the past day. I believe we are back on-line for regular service now … it appears that my D-link router crashed last night, and since that’s what routes all the traffic from the Internet to the Global Paradigm and View from the Edge pages, both servers were effectively down for the duration. Took a simple reset on the router, and all was good.

Beyond the impact to my server, the fault raises something of a larger issue. In today’s world, there is a lot of talk about out-sourcing, and of remote maintenance. In my line of work, system administration, more and more of it is done over the web, out-sourced to people who are located in a different place than the physical servers, sometimes even on a different continent. And for many problems, such remote administration is just as effective as hands-on work would be. For the most part, I always do my own administration remotely … even when I am working in the same room as my server, I am almost always working on it from a different machine, rather than sitting directly at the console. And often, I am working on the server over the web from a variety of other locations.

But last night shows there are times when there’s no choice but to have a physical body in the server room. Although the fix for this problem was relatively simple, it was one that was only usable from the same physical location as my router box … even if my router was configured for remote web access, as soon as it crashed I’d have been unable to get to it to effect the reboot. There are ways to mitigate the problems, in some cases. Remote power switches exist that allow you to effect a power cycle from a distance, and many other techniques are available mitigate the risk. But no matter what you do, there’s always a point of failure, a particular piece of vulnerable equipment, where we need a set of boots on the ground, and a pair of eyes in the room.

As far as I can tell, I am back up again. I haven’t worked out why the router crashed yet, though I should probably take this opportunity to look into firmware levels on it. But it s a good object lesson in the dangers of too much remote administration and out-sourcing … there are some problems you simply can’t solve unless you are sitting with the faulty equipment.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: