Whoops.

December 21st, 2007 | by Tom |

Had a slightly embarrassing problem yesterday, although I’m not entirely sure it’s my (our) fault…

One of our servers, for historical reasons, is actually hosted in the same rack at UKS as four more; all owned by one of our larger customers. We happen to manage all of these servers, too.

Anyway, sometime around 4pm yesterday (just as I’m about to leave - typical) one of the redundant PSU modules in their primary VM host, and as a horrible, unexpected side-effect, it blew the fuse in the rack’s power bar! Which really shouldn’t happen, but.. Well.. It did! :(

UKS inspected the issue as soon as we noticed, and promptly replaced the fuse. Of course, [almost] all of the machines are set to ‘power-on after AC power loss’ in their BIOS settings. As they all do, the resultant surge in power draw once-again trips the fuse.

We were blamed for not using staggered start-up on our APC PDU. But they don’t all support it? Why should the fuse blow at the sight of four >2A servers starting up? It’s not like the rack was purchased on a fixed ampage - the servers are all paid on single-server hosting packages!

Anyway, as I’m actually off on holiday today, I’ve had to convince the admin girl to drop a spare PSU down to Redditch.. No big deal, as UKS are more than capable of swapping the PSU out.

The bigger problem was the extended downtime of our main web server. It hasn’t got a working LED, so it was “quite difficult” for the on-site engineer to determine if it was up or not. I suppose checking the airflow through the back of the machine, or the multitude of drive lights weren’t obvious options…

After a night of down-time, it’s actually up again. The backup VM didn’t go quite to plan, which I think we’ll need to investigate in the future. And I suppose I’ve got a LED to fix when I’m next in the data centre!

You must be logged in to post a comment.