Archive for the ‘RAID’ Category.

I’m spoilt these days…

I’ve just had to setup Windows on a physical machine (shudder) to control and monitor the IOMeter disk benchmarks that are needed for my final year project. I didn’t try to run it in Wine, but I suppose I should’ve. Needless to say, I do require it to be perfect in order to maintain the fairness of my testing, so Windows was unfortunately my first choice.

Due to the age of the hardware I had lying around; an old Athlon XP-M system with an Abit NF7-S 2.0 and 512MB of ‘borrowed’ memory (thanks Ian), it was safe to say that it wouldn’t be any good installing Vista on it. Therefore I downloaded and burnt an XP ISO from my MSDN account and set about installing XP to the 200GB SATA drive I had (thanks Neillans, actually!)

The Abit NF7-S range of boards (particularly the V2.0) were highly-regarded during their hay-day: a testament to Abit’s awesome legacy. Not least for their inclusion of SATA ports way back in 2002, when Serial ATA was a relatively new feature on desktop boards. It even included basic RAID functions across the twin ports, courtesy of the Sil3112r chipset, which is still sold today if you look hard enough. When this was my main motherboard I actually ran a pair of 36GB WD Raptors in RAID-0 (scarily the same pair I use as my root drive now! I’m poor, OK?) and everything worked extremely well.. I never had a single problem with it.

But fast-forward to installing XP onto a SATA single disk, and I was stumped for a little while. Aside from the faff in convincing my floppy drive to work with the board (I’d previously disabled it via three, separate options in the BIOS — nightmare) I then had XP’s installation looping continuously, instead of booting from the HDD to continue with the second phase of the installation. It was almost as if XP was failing to write NTLDR into the MBR, somehow.

Now by convention on modern motherboards, SATA ports can typically be set to three modes: RAID, AHCI, and IDE. The latter of which is used purely for compatibility with older operating systems. However, the ‘RAID’ mode typically prevents that particular disk from being presented as a possible boot disk by marking it for use within RAID arrays only. It’s all fairly self-explanatory, however.

However, within the NF7-S’s BIOS, there are no such options. You can either enable/disable the SATA chipset, and optionally enable/disable the ‘SATA RAID ROM’, which you would believe would be only required if creating RAID arrays. I didn’t wish to use the RAID features and therefore I didn’t intend on ear-marking the disk as a RAID disk, as I wanted to boot from it. Sounds sensible, right?

Sadly, unless this ROM option is actually enabled, regardless of whether or not I wished to use any of the RAID features; the disks will not be presented as boot disks. Quite why there is even an option in the first place is beyond me! Because of this, the XP installation CD was failing to find a suitable boot disk and was therefore intent on looping endlessly through the first phase of the installation process. Fun times…

It has since occurred to me just how far SATA adoption and usability has actually come in the last 5-6 years. With most chipsets now natively including anything from two to eight AHCI SATA ports, as well as incorporating much better integration into the BIOS menus. Similarly, with natively AHCI-aware operating systems such as Linux, Solaris (and friends), many BSDs, Vista (and Windows 7!) now becoming largely common-place, there are few reasons for any of the IDE-compatibility options any longer.

That is, unless you’ve only got a single-core processor, 512MB of memory and an old, awkward (but great) motherboard. I just wish the IOMeter devs would consider creating a GTK+/QT4 front-end for dynamo! :)

It’s alive!

I’ve recently been building servers again. Aside from the usual 2U stuff, I thought I’d show a few pictures of the current project I’m working on. This 4U Supermicro chassis is destined to be used as our backup/storage server at the co-lo facility. VM backups, database backups, general file store, etc. etc.

ZFS Server: 24 hotswap bays

Plenty of drive bays there (24 to be exact).

ZFS Server: a view from above

Here you can see how neat it is. Partly because of the good design of the case, and partly because of the tight integration with Supermicro’s own boards. The shroud that ducts air over the CPU also works wonders.

ZFS Server: very, very loud fans.

As you can imagine, it sounds like a jet taking-off when it’s going at full pelt. I wonder if co-los typically have an ‘upper noise limit’? :D

I’ll put more detail down about what I’m doing with it a bit later… I’m currently testing all manner of Solaris-based distributions (a learning experience in its own right) with some funky zpool configurations. More to come!

I’m going to cry.

If you can remember back to my woes with SATA-II disks and 3Ware 9500S cards, you’ll probably be feeling my pain right about now.

The wonderful set of Western Digital RE2 drives, which saved the day all those months ago, now seem to be dropping out of the RAID5 array. :(

However, the biggest clue as to why, was provided by the following 3DM2 alert e-mail:

20080117061603 - Controller 0
ERROR - Drive timeout detected: port=0

Searching the 3Ware knowledge base for ‘WD timeout’ leads us to this KB article, concerning certain WD drives that inadvertantly drop-out of an array because of a fault in their ‘doze time’ implementation. Whether or not the firmware patches describe rid the drive of any ‘doze time’ altogether, or just tweak it for better compatibility is beyond me, but the corrected firmwares can be had from this download page on the Western Digital website.

So, I guess I’m going to UKS on monday to upgrade the firmware on four WD5000YS drives. :(

Update: As it turns out, the disk firmware was already up to date! Now I wish I’d known you could check the drive firmware in 3DM2 before I bothered connecting each drive to the on-board controller, booting up with an FDD and waiting to be told 4 times that the firmware was already current! Whoops ;)

If anyone else comes across this, and they’re sure their drives are from the older firmware set, you would do well to upgrade your 9500S (or other 3Ware controller) firmware first. The latest release (at least for the 9500S) has an update that allows the WD firmware upgrade to transverse the 3Ware controller: so no need to meddle about with connecting disks to on-board controllers.

Anyway. The array has been fine since; no time-outs at all. As above, I upgraded the 3Ware card’s firmware by a few revisions, so hopefully that will sort the issue out. Guess we’ll just have to wait another 6 months to find out! ;)

The woes of incompatible RAID hardware

So in the last few weeks, we’ve been wondering why all these sets of new 500GB hard disks have been degrading in their RAID-5 array, merely hours after being created. Which was also around the time that I’d finished setting up Gentoo, frustratingly.

Eventually it was decided that instead of making frequent trips out to the co-location facility, we’d bring the server in to HQ for some diagnosis. As after three completely different sets of brand-new 500GB disks, purchased from two separate manufacturers, had all exhibited the same behaviour – it was almost undoubtedly a sign that something else was causing the RAID arrays to degrade.

The RAID card used in this particular server is an 8-port 3Ware 9500S. It’s been reliable in the past, and never exhibited a single issue up until the point we began replacing disks with larger, newer models. I even took it upon myself to strip all SATA cables from the machine and replace them with un-used items. Of course, this was a long shot and it made no difference either way (but at least I’d ruled it out).

Now I don’t know who forgot to check – but this particular model of RAID card does not specifically support SATA-II disks. Of course the first ideal that springs to mind is backwards-compatibility; if a SATA-II disk is plugged into a SATA-I port, one would expect it to automatically run at the slower rate (much like PATA of old.) Though as it turns out, that’s just not something you can assume.

So after much head-scratching, Googling and more Googling, I thought it would be worth adding jumpers to the rear of the drives in order to forcibly limit the drive to 1.5Gbit/sec. I’d like to point out that nothing I found on-line, written by either of the two disk manufacturers (Seagate or WD in this instance) mentioning that the jumper limits any other feature of SATA-II – it’s just a speed lock. Kudos to whoever it was that wrote the 3Ware 9500S Wikipedia article (which has been since been deleted), as it proved to be a rather good muse.

For the fourth time I recreated the arrays and began installing Gentoo. After two days of installing and configuring Gentoo, followed by roughly 18 hours of I/O stressing by bonnie++, the RAID card hasn’t skipped a beat. I may be tempting fate by writing about this so soon, but I’m fairly happy to say that I’ve cracked the problem of why our server’s root file system was degrading before it was even fully initialised.

So, how do you use SATA-II disks with a SATA-I 3Ware RAID controller? Forcibly limit the speed to 1.5Gbit/sec, and it all works like it should. As I didn’t find a specific answer to this question on-line, I’m hoping that this may be of some use to others out there who may be struggling with upgrading their 9500S arrays. By all means, please let me know if it has! :)

Update, 22-07-07: The array is still working a charm, so in the words of Borat – “Great Success!”. Watch it fail now! :P

You’ll never guess what..

Yep, a second lot of Seagate drives has failed in the server that I mentioned a few days back

It’s got me wondering if the drives aren’t the problem here. It’s two different series of drives, albeit from the same manufacturer, though the second lot were the more expensive line.

But either waywe’re going to UKS to check out the machine, and make sure it was built properly. Swap wires around, generally just have a nose and a check, and then install the third (and hopefully final) set of drives. Which are almost-certain to be Western Digitals. We’ve already got some of their RE-series drives working in other machines, and they’ve been fine. It’s a good job this server is only used for DRBD back-ups of VMs!

So another trip to UKS. If these drives fail, we’ll be worrying I think. :(

A fortnight of ups and downs (literally)

It’s been two weeks (roughly) since I started my placement, and there has been quite a bit to learn.

My first week has given me time to get to grips with my new environment, and also gave me a steep introduction into the systems that I’m going to be responsible for in the forseeable future. That is, a large amount of Gentoo co-located servers (some running VMs, which in turn run the services, and some caring for VM back-ups via DRBD) and an Asterisk PBX, which itself is actually hosted on a Gentoo server.

Due to.. Something… Wonderful, there isn’t a network/server map in sight. So all of this was pretty confusing to start with, and ust trying to envision the layout of the entire system from one person’s explanation was very difficult indeed. Thankfully now, I’m pretty well-versed, but there are still some aspects I’m yet to grasp. Though it shouldn’t take long now..

What is particularly interesting is the variance of the work. In the early days I was doing rudimentry work; changing passwords on the 3Ware RAID card web interfaces to something new. This was fine until I realised that the passwords had an 8-character limit, and thus the passwords I had picked were far too long. This wasn’t noticed until after I’d been through almost all of the RAID cards… Even the MD was laughing from his office when that hit the e-mails.

On the other hand, last thursday myself and Jon had to visit UK Solutions to replace some hard drives in a server, and remove the aging RedHat install on that machine, along with another, in favour of shiny new Gentoo installs. Unfortunately the phrase ‘Shiny Gentoo install’ is somewhat of an oxymoron; it’s anything but shiny, glossy, or any other descriptive word for ‘pretty’. We’re talking about a purely CLI installation. Any harder and it’d be Linux From Scratch

But if it wasn’t hard, it’d be boring! And it’s definitely a nice change to get out of the office and into the .. frying pan. Anyone who’s had to sit between the arse-end of two server racks will understand my synonym. Cool, it most-definitely is not. But I had a job to do, and that was to install Gentoo onto the new RAID-5 array of recently-purchased 500GB Seagate drives.

The basic installation of Gentoo is by-far the most confusing – there’s a reason the install manual has probably been re-written 5001 times in order to make it simpler for first-time users, and anyone who remembers their first attempt at a Gentoo install will be nodding violently right about now. Thankfully once that’s done (and we head back to HQ in search of a more comfortable, ssh-supported server configuration environment) things do get easier. You get used to simply typing ‘emerge application-name‘, editing the necessary config file (or stealing it with scp from a working server ;) ) and using the necessary init.d script to restart the daemon.

Sounds groovy, but let me take you back to the RAID-5 configuration that I was working on in those two fun-packed days. RAID-5, for the unitiated, provides a fall-back in server environments in the event that one of your drives fails. For instance, in a three-drive RAID5 array of 500GB disks you will have a striped array of 1000GB in total, with one third of each drive taken up by parity data. The downside being that you don’t get the full 1500GB to use. The upside is if any one of the drives in the array fails, you can connect a new drive, and the data on the lost drive can be ‘re-constructed’ from the parity data contained on the remaining two drives. Our nice 3Ware RAID cards also have support for ‘hot spare’ drives, so if one drive fails – the controller can bring in the fourth drive and re-construct the failed drives’ data without anyone having to physically visit the machine.

There are two problems with RAID-5:

  1. It’s slow. Having to write parity data with each write is hugely taxing on write speed. Reads are generally similar to that of RAID-0 though.
  2. If two drives fail at once, you’re screwed. There’s not enough parity data contained on the remaining drive to construct two failed drives. RAID-6 was implemented to cover this, but isn’t widely used. There’s two lots of parity to write, and THEN you need more disks (4) for only the same volume size.

Anyway, what happens just as I’ve finished configuring my first Gentoo server? Yep, one of the drives drops out of the array! The RAID card attempted to replenish the array from the hot spare, but low and behold … THAT FAILED TOO! :(

TYPICAL.. So those drives shall be going back under RMA, and we’re spending a bit more on some Seagate enterprise-class drives (which we should’ve had in the first place really, but then even desktop drives shouldn’t have failed that prematurely!) which shall then require another trip to UKSolutions in order to install them! Oh, and ANOTHER Gentoo install. Plus a few more. And we have to re-wire the internal LAN switch, which is going to be a massive job due to the lovely job of cable tidying that UKS has done.

I think I’m having fun! :) </geek>

More another time, I think. :)