Archive for the ‘General’ Category.

Optical drive firmware updating in Linux

I recently needed to burn a copy of Windows 7 Pro but realisd that I’d unfortunately run out of blank DVD-Rs long ago. Fear not, for I live near an Aldi supermarket, whom sell everything dirt cheap. DVD-R’s a DVD-R, right?

Wrong. I tried at least three of the twenty I purchased (for a few quid) and none of them would even begin writing. Brasero/K3B both complained about incompatible media types.

Remembering that my DVD drive, a trusty NEC 3500A, was designed, built and purchased somewhere between 2004 and 2005 (4-5 years ago at this point) and that I hadn’t ever updated the firmware, I set about researching ways and means into doing this.

I came across this website, run by a pair of firmware hackers named Liggy and Dee whom have (between them) released, and continue to host, many firmware releases (both official and unofficial) for a wide variety of NEC optical drives.

What’s more, their binflash (or ‘necflash’) utility was even released as a Linux binary and it even provides compatibility for reading the official NEC .exe firmware releases! I was sceptical that it would work under Ubuntu 9.10 at first, but much to my delight it worked perfectly. With a little reading, I was able to dump my current firmware (2.16) to file and subsequently flash two different firmware releases: 2.58 (an OEM firmware release) and the latest, official NEC firmware 2.1A release.

The full output of my escapades for anyone curious:


~$ sudo ./necflash -flash -v -s Desktop/NECND350_v21A.exe /dev/sg2
Binflash - NEC version - (C) by Liggy and Herrie
Visit http://binflash.cdfreaks.com

Identified drive: 4 - 3031
Detected drive from Firmware: 4

You are about to flash your drive with the following firmware:

Vendor: _NEC
Identification: DVD_RW ND-3500AG
Version: 2.1A

Remember no one can be held responsible for any kind of failure!
Are you sure you want to proceed? (y/n) y

Entering safe mode
Sending firmware to drive at 0x006000
Sending firmware to drive at 0x00e000
Sending firmware to drive at 0x016000
Sending firmware to drive at 0x01e000
Sending firmware to drive at 0x026000
Sending firmware to drive at 0x02e000
Sending firmware to drive at 0x036000
Sending firmware to drive at 0x03e000
Sending firmware to drive at 0x046000
Sending firmware to drive at 0x04e000
Sending firmware to drive at 0x056000
Sending firmware to drive at 0x05e000
Sending firmware to drive at 0x066000
Sending firmware to drive at 0x06e000
Sending firmware to drive at 0x076000
Sending firmware to drive at 0x07e000
Sending firmware to drive at 0x086000
Sending firmware to drive at 0x08e000
Sending firmware to drive at 0x096000
Sending firmware to drive at 0x09e000
Sending firmware to drive at 0x0a6000
Sending firmware to drive at 0x0ae000
Sending firmware to drive at 0x0b6000
Sending firmware to drive at 0x0be000
Sending firmware to drive at 0x0c6000
Sending firmware to drive at 0x0ce000
Sending firmware to drive at 0x0d6000
Sending firmware to drive at 0x0de000
Sending firmware to drive at 0x0e6000
Sending firmware to drive at 0x0ee000
Sending firmware to drive at 0x0f6000
Sending firmware to drive at 0x0fe000
Sending checksum to drive
Erasing flash block 2
Erasing flash block 3
Erasing flash block 4
Erasing flash block 5
Erasing flash block 6
Erasing flash block 7
Erasing flash block 8
Erasing flash block 9
Erasing flash block 10
Erasing flash block 11
Erasing flash block 12
Erasing flash block 13
Erasing flash block 14
Erasing flash block 15
Erasing flash block 16
Erasing flash block 17
Erasing flash block 18
Writing flash block 2
Writing flash block 3
Writing flash block 4
Writing flash block 5
Writing flash block 6
Writing flash block 7
Writing flash block 8
Writing flash block 9
Writing flash block 10
Writing flash block 11
Writing flash block 12
Writing flash block 13
Writing flash block 14
Writing flash block 15
Writing flash block 16
Writing flash block 17
Writing flash block 18
Leaving safe mode

Whilst the 2.58 OEM release didn’t fix my problems, 2.1A did and I now have a freshly-burnt copy of Windows 7 Pro to go and play games with. Nice one, Liggy & Dee. :)

Site Rankings

My domain’s been around quite some time.. Not exactly like this in its current guise, but I’ve had the domain for a number of years now. As a result, I’ve also often become quite frustrated by finding my own ramblings/moans/whinges on Google’s listings, in the search for solutions.

So here’s an experiment. A phrase that I know yields no results on Google (as of this posting date):

Rachel Jennison Harper.

I’ll update when Google does.. :)

Edit: Google has now updated! In no more than 19 hours, too.

In addition to this, I appaer to have made a mini celebrity out of Miss Harper. Given the continued existence of this blog (and the long-standing back-up of Google’s search listings) she won’t be forgotten any time soon! :D

Classic XKCD

For those of you who don’t follow XKCD, you really are missing out. It’s just genius, and today’s comic really did tickle me.. (Click for a larger image!)

This really is a true story, and she doesn't know I put it in my comic because her wifi hasn't worked for weeks.
This really is a true story, and she doesn’t know I put it in my comic because her wifi hasn’t worked for weeks.

Well, I laughed at least. And it reminds me of a famous euphemism, too!

Nick: “Jon, why have you locked your door?
Jon: “I’m re-compiling my kernel!

Interesting Statistics

This isn’t quite work-related, but I feel it is relevant to my performance during term-time…

I have happened across some statistics collected “for demographic reasons” by my University’s Students Union, as a result of their routine swipe of your student card whenever you show up for any paid night at the Uni bar. You wondered what it was for, right? Well now I can tell you!

The sample I’ve acquired is an excerpt of the statistics collected for ‘Project Friday’ at Legends, on Friday the 2nd May 2008:

Year of Study Attendees
1: 65 (52.42%)
2: 33 (26.61%)
3: 19 (15.32%)
M: 2 (1.61%)
Data not current: 5 (4.03%)

Gender Attendees
Male: 97 (78.23%)
Female: 22 (17.74%)
Data not current 5: (4.03%)

Department Attendees
Computing, Engineering & Technology: 111 (89.52%)
Health: 5 (4.03%)
Business School: 3 (2.42%)
Data not current: 5 (4.03%)


Study Site Attendees
STAFFORD: 116 (93.55%)
THOMAS TELFORD SCHOOL: 2 (1.61%)
STOKE: 1 (0.81%)
Data not current: 5 (4.03%)

78.23% male!

I’ve long-wondered what the ratio of males to females on the Stafford campus really is, but it appears that we truly are screwed: out-numbering the girls by over 3:1. What’s scarier is that 5 of the ‘patrons’ weren’t even identifiable as male or female. :P

One could suggest that this would be a blessing in disguise for the female minorities of Stafford, however, with an overwhelming majority of the male attendees being first-year Computing and Engineering students, it could quite possibly the reason why they don’t bother coming in the first place.

Oh, and one last thing: WHO LET THOSE TWO SCHOOL KIDS IN?! :o

(And no, I don’t care if they’re female!)

I’m going to cry.

If you can remember back to my woes with SATA-II disks and 3Ware 9500S cards, you’ll probably be feeling my pain right about now.

The wonderful set of Western Digital RE2 drives, which saved the day all those months ago, now seem to be dropping out of the RAID5 array. :(

However, the biggest clue as to why, was provided by the following 3DM2 alert e-mail:

20080117061603 - Controller 0
ERROR - Drive timeout detected: port=0

Searching the 3Ware knowledge base for ‘WD timeout’ leads us to this KB article, concerning certain WD drives that inadvertantly drop-out of an array because of a fault in their ‘doze time’ implementation. Whether or not the firmware patches describe rid the drive of any ‘doze time’ altogether, or just tweak it for better compatibility is beyond me, but the corrected firmwares can be had from this download page on the Western Digital website.

So, I guess I’m going to UKS on monday to upgrade the firmware on four WD5000YS drives. :(

Update: As it turns out, the disk firmware was already up to date! Now I wish I’d known you could check the drive firmware in 3DM2 before I bothered connecting each drive to the on-board controller, booting up with an FDD and waiting to be told 4 times that the firmware was already current! Whoops ;)

If anyone else comes across this, and they’re sure their drives are from the older firmware set, you would do well to upgrade your 9500S (or other 3Ware controller) firmware first. The latest release (at least for the 9500S) has an update that allows the WD firmware upgrade to transverse the 3Ware controller: so no need to meddle about with connecting disks to on-board controllers.

Anyway. The array has been fine since; no time-outs at all. As above, I upgraded the 3Ware card’s firmware by a few revisions, so hopefully that will sort the issue out. Guess we’ll just have to wait another 6 months to find out! ;)

Whoops.

Had a slightly embarrassing problem yesterday, although I’m not entirely sure it’s my (our) fault…

One of our servers, for historical reasons, is actually hosted in the same rack at UKS as four more; all owned by one of our larger customers. We happen to manage all of these servers, too.

Anyway, sometime around 4pm yesterday (just as I’m about to leave – typical) one of the redundant PSU modules in their primary VM host, and as a horrible, unexpected side-effect, it blew the fuse in the rack’s power bar! Which really shouldn’t happen, but.. Well.. It did! :(

UKS inspected the issue as soon as we noticed, and promptly replaced the fuse. Of course, [almost] all of the machines are set to ‘power-on after AC power loss’ in their BIOS settings. As they all do, the resultant surge in power draw once-again trips the fuse.

We were blamed for not using staggered start-up on our APC PDU. But they don’t all support it? Why should the fuse blow at the sight of four >2A servers starting up? It’s not like the rack was purchased on a fixed ampage – the servers are all paid on single-server hosting packages!

Anyway, as I’m actually off on holiday today, I’ve had to convince the admin girl to drop a spare PSU down to Redditch.. No big deal, as UKS are more than capable of swapping the PSU out.

The bigger problem was the extended downtime of our main web server. It hasn’t got a working LED, so it was “quite difficult” for the on-site engineer to determine if it was up or not. I suppose checking the airflow through the back of the machine, or the multitude of drive lights weren’t obvious options…

After a night of down-time, it’s actually up again. The backup VM didn’t go quite to plan, which I think we’ll need to investigate in the future. And I suppose I’ve got a LED to fix when I’m next in the data centre!

A rescue mission!

What a day yesterday!

One of our customers has had a very similar primary/secondary DRBD/VMWare system to ours, setup for their London HQ operations. This means, one server has the VMs loaded, whist another sits in a warm fail-over mode, courtesy of DRBD mirroring of the VMs.

It’s a bit of a resource drain (power mostly, as space wasn’t much of an issue) but it does mean that the VMs can be resumed on another machine really quite quickly, minimising any downtime in the event of a catastrophic host failure.

Well, two days ago, the system experienced one such failure (we’re not sure what it was; perhaps a kernel panic..) and refused to reboot when power-cycled. A few attempts were made by one of my colleagues to diagnose the issue via IMPI, but this just wasn’t as useful as we’d have hoped (difficulties displaying the AMI BIOS and the grub boot-loader to name the main two woes) and eventually we reverted to bringing VMs up on the spare machine.

The problem here is that, at some point, the primary machine (now inactive) was upgraded from 4GiB, to 8GiB of RAM. A costly upgrade (when you consider that all memory fitted to these server boards must be of either Registered ECC or FB-DIMM calibre – age-depending) at the best of times, which meant that the decision before my time to upgrade one host without upgrading the backup, was now quite a problem.

VMs can consume a monumental amount of memory. Be it with virtual allocation of the host’s memory, to the guest, or by the host OS’s caching of frequently-accessed VM disk data (which the kernel sees as ‘just another large file’, for want of a better explanation.) This is particularly noticeable when one attempts to run a Windows 2003 Server VM, with both MSSQL Server 2005 and Idiom’s [hideous|inefficient|bloated|pick-one] WorldServer software. Not a great idea…

As a result, the VM in particular hogs the host machine’s resources, and when we attempted to start a virtual machine with 3.6GiB of RAM allocated to it, on a host with only 4GiB in total.. We had a few performance problems. Uh-oh! :)

So yesterday, I had another 4GiB memory delivered to the customer’s office, and made my way from Birmingham to London at around 7:45am. I was on-site for about 9am, and much to my surprise, the C.E.O. had already fitted the memory for me! Well, that was nice of him? :)

However, I did spend the entire day (and quite a bit longer) diagnosing the issues with the primary host (which, incidentally runs Gentoo Linux.) On top of this, I’ve also now catalogued the hardware specifications of the machines (that were somewhat lacking), and even set up an old APC PDU that was lying around, which should give us the ability to power-cycle the machines in future without the pre-requisite telephone call to a random member of staff, in-which I ask them to ‘hold the power button for 4 seconds’ (whilst praying that they’ve found the right machine.) :D

A good, if not tiring day, given that I didn’t get home until 11pm! At least the taxis/food is all on expenses. :)

Note to self

Do not write an extremely long e-mail, highlighting the ups and downs of our current VoIP system, and send it to the boss.

You will receive an even longer reply. :(

Results!

Well, thanks to Ben for pointing out that the University grades are now available from the awful ‘Myportal’ web page – a labyrinth of dreadfully-designed hyperlinks. The long and short of it is though; I now have the results from my 2nd year modules!

They’re listed below, highest first (all grades are out of a maximum of 15!):

  • Advanced Routing: 15
  • Remote Access Networks: 15
  • LAN Switching & WAN Networks: 15
  • Introduction to IP Telephony: 14
  • Management in Organisations: 12
  • Fundamentals of Network Security: 10
  • Communications: 10
  • Organisational Systems for Engineers: 8

To put the numbers into perspective; 13-15 is a 1st, 10-12 is a 2:1, 7-9 is a 2:2, and 4-6 is a 3rd.

So for my second year, which contributes to 25% of my over-all degree mark, my mean average grade is currently 12.375 – a high 2:1. It can be said that it might’ve been a nice even 1st if it wasn’t for that 2:2 in Organisational Systems, but you could also say that I should’ve done better in Network Security (I really should – I definitely didn’t revise enough for that exam :( )

But, it’s a nice starting point for my third (and final) year. Come September ’08, if I put in as the required amount of work, I might just get that 1st class honours degree. Can only hope eh? :)

The woes of incompatible RAID hardware

So in the last few weeks, we’ve been wondering why all these sets of new 500GB hard disks have been degrading in their RAID-5 array, merely hours after being created. Which was also around the time that I’d finished setting up Gentoo, frustratingly.

Eventually it was decided that instead of making frequent trips out to the co-location facility, we’d bring the server in to HQ for some diagnosis. As after three completely different sets of brand-new 500GB disks, purchased from two separate manufacturers, had all exhibited the same behaviour – it was almost undoubtedly a sign that something else was causing the RAID arrays to degrade.

The RAID card used in this particular server is an 8-port 3Ware 9500S. It’s been reliable in the past, and never exhibited a single issue up until the point we began replacing disks with larger, newer models. I even took it upon myself to strip all SATA cables from the machine and replace them with un-used items. Of course, this was a long shot and it made no difference either way (but at least I’d ruled it out).

Now I don’t know who forgot to check – but this particular model of RAID card does not specifically support SATA-II disks. Of course the first ideal that springs to mind is backwards-compatibility; if a SATA-II disk is plugged into a SATA-I port, one would expect it to automatically run at the slower rate (much like PATA of old.) Though as it turns out, that’s just not something you can assume.

So after much head-scratching, Googling and more Googling, I thought it would be worth adding jumpers to the rear of the drives in order to forcibly limit the drive to 1.5Gbit/sec. I’d like to point out that nothing I found on-line, written by either of the two disk manufacturers (Seagate or WD in this instance) mentioning that the jumper limits any other feature of SATA-II – it’s just a speed lock. Kudos to whoever it was that wrote the 3Ware 9500S Wikipedia article (which has been since been deleted), as it proved to be a rather good muse.

For the fourth time I recreated the arrays and began installing Gentoo. After two days of installing and configuring Gentoo, followed by roughly 18 hours of I/O stressing by bonnie++, the RAID card hasn’t skipped a beat. I may be tempting fate by writing about this so soon, but I’m fairly happy to say that I’ve cracked the problem of why our server’s root file system was degrading before it was even fully initialised.

So, how do you use SATA-II disks with a SATA-I 3Ware RAID controller? Forcibly limit the speed to 1.5Gbit/sec, and it all works like it should. As I didn’t find a specific answer to this question on-line, I’m hoping that this may be of some use to others out there who may be struggling with upgrading their 9500S arrays. By all means, please let me know if it has! :)

Update, 22-07-07: The array is still working a charm, so in the words of Borat – “Great Success!”. Watch it fail now! :P