IPv4 is done, get over it

After seeing a retweet on twitter of John Graham-Cumming’s blog post about the UK DWP having a full /8 of IPv4 address space that it, “isn’t using”, I was pretty annoyed. This is very short-sighted.

Let me be very clear: IPv4 is done, move on. Despite large amounts of it being assigned in vast swathes in the early 90’s, no amount of it being returned is going to last the world for very long. 16M addresses in a /8 is absolutely nothing when you consider the rate at which RIPE (and this doesn’t account for any other region) was dishing-out address blocks to LIRs and end-users. They allocated 4M addresses in ~10 days. Even with an increased rate that accounts for the ‘land-rush’ factor, another /8 isn’t going to last any meaningful amount of time at all.

Furthermore, does anyone think the US DoD will hand back their early-nineties Direct IANA Assignment, 11/8? Probably not! And there’s about as much chance as our own government handing back 51/8, too.

The entire world needs to let-go of this ridiculously-limited resource. No-one ever intended for IPv4 to be used to the extent that it has been (including Vint Cerf himself). RIRs, CIDR & the scourge of the modern Internet, NAT, have all been prolonging the inevitable fact that we need a better solution for the Internet, in its current (and future) state.

Whether anyone likes it or not, that solution is IPv6. It’s taken decades to get to where it is, it’s mature and — get ready for a shocker — it’s already bloody working. Now is not the time to start thinking ‘oh but there’s some un-used space back here that we could use to prolong it’, nor is it the time to re-invent some new means to address the Internet. It’s too late, way too late.

This blog follows on from a recent spiel of thoughts on IRC (me = teh) that I can’t be bothered to paraphrase in order to sugar-coat it. I think I’ve spoken my mind though, even if it is a bit jagged:

[marcus] http://blog.jgc.org/2012/09/the-uk-has-entire-unused-ipv4-8-that-is.html
* teh explodes in a fit of rage 
[teh] More feet-dragging 
[james] who from?
[teh] 'Oh look, someone has some addresses we can use for the next few months! THEY MUST LET US HAVE THEM!'
[teh] The general public (well, the ones that have only just learnt about legacy allocations)
[james] url?
[teh] As above?
[james] ah
[james] they don't know if it's unused
[james] it's just not publicly routed
[james] it's probably used internally
[james] are they going to suggest we should have 10/8 on the internet too?
[teh] This is what I've said, though I don't think anyone's really listening
[teh] Or 11/8
[teh] If you compare the situation behind 51/8 to 11/8, people might realise that government are the last people that are going to hand-back space to IANA
[teh] And even then, where the hell would a few /8's get the world? No-where for very long.
[marcus] teh: Nailed it ;)

The sooner everyone stops labouring under a false pretence (the one where IPv4 depletion won’t affect them), the sooner we’ll be able to get on and do something sensible with Internet addressing. The quicker we have the IPv6 ‘flash point’, the better.

(Can’t believe anyone’s had me worked-up enough to actually blog about something. Sheesh. :))

Source routing with OpenVZ & Linux

If, like me, you have to run lots of OpenVZ-based virtual server hosts, you will likely have encountered the fun that is reverse-path filtering, or ‘rp_filter’. This is the function of the kernel that rejects ‘martian’ IP addresses arriving on any given interface. This is usually a good thing, until you wish to connect your OpenVZ host to two separate networks and have it route IP addresses from both subnets to & from your guests via the VENET-style interfaces.

Essentially, despite differing source addresses, only one default gateway exists to send traffic to IPs not within the connected subnets and thus, traffic on any “secondary” subnet is rejected as a martian when leaving the host’s interface that is connected to its default gateway.

Some people would use bridged intefaces, although this is sadly not an option for me right now. Whilst the performance of VENET is supposedly better, we also have a large install-base of VENET guests that do not wish to be disturbed. So for now I still need a way to make this work with VENET interfaces (and also VETH if required later).

There are two methods around the return_path filtering, with the first being a terrible hack that should only be used temporarily, if at all… If you echo ‘1’ to /proc/sys/net/ipv4/conf/all/log_martians, you will be able to see which interface is filtering martian packets. With that information you can then simply disable the rp_filter function by echoing ‘0’ to /proc/sys/net/ipv4/conf/INTERFACE/rp_filter and martians won’t be filtered.

However, this isn’t a sensible option. A better solution is to actually create a routing rule to alter the default gateway used, based on the source subnet. It took me a little bit of digging, but I eventually managed to get this working after combing a few sources (including, but not limited to, the iproute2 man file).

For reference, here’s my routing table showing two networks and two /32 IPs assigned to a guest’s VENET interface (note that the networks are /23’s, not /24’s!):

Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
10.0.9.159      0.0.0.0         255.255.255.255 UH    0      0        0 venet0
10.0.125.53     0.0.0.0         255.255.255.255 UH    0      0        0 venet0
10.0.8.0        0.0.0.0         255.255.254.0   U     0      0        0 br0
10.0.124.0      0.0.0.0         255.255.254.0   U     0      0        0 br1
0.0.0.0         10.0.9.1        0.0.0.0         UG    0      0        0 br0

Start by opening /etc/iproute2/rt_tables in your favourite editor. You’ll need to append a line to the bottom to create a new routing table:

# cat /etc/iproute2/rt_tables 
#
# reserved values
#
255    local
254    main
253    default
0    unspec
#
# local
#
#1    inr.ruhep
100    vlan4

As you can see, I’ve appended a new table named ‘vlan4′ (picking a sensible name helps, in my case this is the VLAN name for 10.0.124.0/23)  and given it a priority of 100. As per my understanding, the priority should be decremented for each subsequent table defined.

Now you need to use ip to define the new rules & routing behaviour, taking advantage of the new table we’ve defined. First, create a rule matching traffic from your secondary subnet:

ip rule add from 10.0.124.0/23 iif venet0 table vlan4

For reference, the ‘iif’ attribute is not a mistake; “iif” not “if”. This was also a key part of the setup, as it only classifies traffic originating from the VENET interfaces, no-where else.

Now add a route to define the new default gateway for our new table of classified traffic and apply it:

ip route add default via 10.0.125.1 dev br1 table vlan4
ip route flush cache

You should now find that guest traffic from either network is routed correctly without having to change any rp_filter settings. At any time you can use the following two commands to see your configuration:

ip rule show
ip route show table vlan4

Be sure to re-apply the ‘ip rule’ and ‘ip route’ statements on your next reboot; under Scientific Linux 6.0 I’ve used the /etc/rc.local file, but you can just as easily apply them on ifup in Debian’s network configuration.

Fixing Firefox search add-ons in Ubuntu

I booted up my work machine today (a fully-patched Ubuntu 10.10 x86_64 installation) to find that most of my search engine add-ons in Firefox had disappeared.

Anyone that’s had this happen to them will notice that you literally cannot find/retrieve these basic add-ons with via the Add-ons store, and most searches online bring back results related to the Google toolbar (not helpful).

Eventually I tracked-down this post and I am eternally grateful to the poster because, not only did it it allow me to fix the issue myself (copy the missing .xml files from /usr/lib/firefox-addons/searchplugins/en-US/ to the corresponding ‘en-GB’ folder) but I have also finally found a method to send my queries to google.co.uk instead of google.com… Just edit the appropriate .xml file, changing .com to .co.uk.

So no more having to deal with American shopping results!

Dell 6224 switch ‘Oversize Packets’ counter

It’s been a while since I’ve written anything on my blog, but following the lack of any hits on Google regarding this, I felt this might well be a useful snippet to those in the same boat as myself.

I’m currently testing & tweaking an iSCSI setup that utilises a Dell 6224 switch. These are very fast switches for the money (about ~£900 if you’ve got a good account manager!) and provide a lot of features, including stacking, if you have more than one. Their drawbacks, however, are mostly in the lack of documentation and/or the same level of user interface ‘polish’ that you receive from other manufacturers. Most people will say; ‘you get what you pay for’, but for the most part they are great switches for the money you pay.

One such “lack of documentation” has had me annoyed today. I’ve been using LACP to bond ports and, at the same time, have raised the MTU to the maximum of 9216 (which can be done per-port, without a reboot or a switchport up/down event, I might add) across all ports. All this, in an attempt to glean a little more performance (i.e. lower processing overhead) from my iSCSI sessions.

And it seemed to work just fine. However, upon inspecting the interface counters, I noticed a stunning amount of packets being regarded as ‘Oversize Packets':


switch#show interfaces counters port-channel 1
Alignment Errors: ............................. 0
FCS Errors: ................................... 0
Single Collision Frames: ...................... 0
Multiple Collision Frames: .................... 0
Late Collisions: .............................. 0
Excessive Collisions: ......................... 0
Oversize Packets: ............................. 15829678
Internal MAC Rx Errors: ....................... 0
Received Pause Frames: ........................ 0
Transmitted Pause Frames: ..................... 0

I wasn’t sure whether or not to take this as an error or just a simple ‘count’ of packets. “Oversize” would indicate that they’re bigger than the port was expecting, but I was still hitting around 120MB/sec (out of the theoretical 125MB/sec that Gigabit Ethernet can physically provide) which wouldn’t be conducive to a serious string of frame/packet errors.

I couldn’t find anything online, so I contacted Dell ProSupport to raise a ticket. I had to go through the annoying rigmarole of explaining the problem three times over, but eventually a ‘switch expert’ explained that he wasn’t certain on the use of that counter and that its purpose depended on the firmware version currently in use (in my case, this was 3.2.0.9) and needed to check with his colleagues.

He eventually rang back to inform me that this was not a problem with the switch. The “Oversize Packets” counter merely serves to log packets that have a payload in excess of 1518 bytes. A fixed amount. It doesn’t matter than the MTU was set to 9216, it just continues counting the packets. Utterly useless, then!

As some form of consolation, he also mentioned that it didn’t update in real time.. Owing me to believe that there was some form of port stats analysing process running over the real time output. When it’s this useless, could I please have an option to turn it off? Or better yet, don’t bother logging it by default!

IPv6 on m0n0wall

I finally got around to sending my first ping6 echos! Who knew I’d get replies on my first go?!

My ADSL provider Andrews & Arnold have provided me with a /48 IPv6 subnet, which seems somewhat wasteful at 2^80 addresses (throw that in your calculator) but certainly useful for testing nevertheless. Whilst slowly getting my head around the task that is variable-length subnetting of IPv6 ranges – painful at best – I decided to just throw in a /64 subnet and set a static gateway address on m0n0wall‘s LAN interface to see if it would ‘just work’.

The result, is a working IPv6 LAN by simply enabling autoconfig from the m0n0wall box and telling Ubuntu’s Network Manager to use it. Et voila:

teh@desktop:~$ ifconfig eth0
eth0 Link encap:Ethernet HWaddr 00:01:29:fc:37:1d
inet addr:81.187.xxx.xxx Bcast:81.187.xxx.xxx Mask:255.255.255.240
inet6 addr: 2001:8b0:ff87:1:201:29ff:fefc:371d/64 Scope:Global
inet6 addr: fe80::201:29ff:fefc:371d/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:1616524 errors:0 dropped:0 overruns:0 frame:0
TX packets:2224946 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:277202062 (277.2 MB) TX bytes:519498762 (519.4 MB)
Interrupt:18

You’ll notice that the last 80 bits of my IPv6 address on this host were assigned via autoconfig, using part of my MAC address (the part that doesn’t correspond to a certain manufacturer, IIRC) as well as some randomly-generated bits, too.

And to make my night, ping6 worked straight away, too:

teh@desktop:~$ ping6 2001:08B0:FF88:0001::1
PING 2001:08B0:FF88:0001::1(2001:8b0:ff88:1::1) 56 data bytes
64 bytes from 2001:8b0:ff88:1::1: icmp_seq=1 ttl=64 time=3.81 ms
64 bytes from 2001:8b0:ff88:1::1: icmp_seq=2 ttl=64 time=0.130 ms
64 bytes from 2001:8b0:ff88:1::1: icmp_seq=3 ttl=64 time=0.132 ms

--- 2001:08B0:FF88:0001::1 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2000ms
rtt min/avg/max/mdev = 0.130/1.358/3.813/1.735 ms

Now to plan how I’m going to roll this out at work…

Pretending to be a Solaris admin

I’m always, always forgetting how to discover the available disks on a Solaris/OpenSolaris machine.

As I was having another (un-successful) crack at getting a disk controller (other than the motherboard’s IDE controller) to work with Nexenta Core v2, I’d again forgotten how I was meant to discover the disks as-probed by the OpenSolaris kernel.

Of course, Nexenta includes Ubuntu Hardy’s userland tools, but anything kernel/device-related is still very different to what I’m used to.

I finally found a particularly well-written post by Pascal Gienger, whom notes that:

First we will try to look up the disks accessible by our system:

# format
Searching for disks...done
AVAILABLE DISK SELECTIONS:
0. c0d0
/pci@0,0/pci-ide@1f,1/ide@0/cmdk@0,0
1. c1d0

/pci@0,0/pci-ide@1f,1/ide@1/cmdk@0,0
Specify disk (enter its number): ^C

Type CTRL-C to quit “format”.

If your disks do not show up, use devfsadm:

# devfsadm
# format
Searching for disks...done
AVAILABLE DISK SELECTIONS:
0. c0d0
/pci@0,0/pci-ide@1f,1/ide@0/cmdk@0,0
1. c0d1

/pci@0,0/pci-ide@1f,1/ide@0/cmdk@1,0
2. c1d0

/pci@0,0/pci-ide@1f,1/ide@1/cmdk@0,0
3. c1d1

/pci@0,0/pci-ide@1f,1/ide@1/cmdk@1,0
Specify disk (enter its number): ^C

You’ll notice that the virtual disks are mapped as IDE/ATA drives, so the disk device names don’t have a target specification “t”.

Which has helped me to finally find out that my second-hand (i.e. ‘borrowed’ from an old work machine) Adaptec RAID card, doesn’t work with Nexenta Core v2. Still, Core v3 will be out in a few months – maybe I’ll try again then.

Also worth noting, as it may be useful, iostat -En prints out similar information useful when searching for disks to use with ZFS.

Optical drive firmware updating in Linux

I recently needed to burn a copy of Windows 7 Pro but realisd that I’d unfortunately run out of blank DVD-Rs long ago. Fear not, for I live near an Aldi supermarket, whom sell everything dirt cheap. DVD-R’s a DVD-R, right?

Wrong. I tried at least three of the twenty I purchased (for a few quid) and none of them would even begin writing. Brasero/K3B both complained about incompatible media types.

Remembering that my DVD drive, a trusty NEC 3500A, was designed, built and purchased somewhere between 2004 and 2005 (4-5 years ago at this point) and that I hadn’t ever updated the firmware, I set about researching ways and means into doing this.

I came across this website, run by a pair of firmware hackers named Liggy and Dee whom have (between them) released, and continue to host, many firmware releases (both official and unofficial) for a wide variety of NEC optical drives.

What’s more, their binflash (or ‘necflash’) utility was even released as a Linux binary and it even provides compatibility for reading the official NEC .exe firmware releases! I was sceptical that it would work under Ubuntu 9.10 at first, but much to my delight it worked perfectly. With a little reading, I was able to dump my current firmware (2.16) to file and subsequently flash two different firmware releases: 2.58 (an OEM firmware release) and the latest, official NEC firmware 2.1A release.

The full output of my escapades for anyone curious:


~$ sudo ./necflash -flash -v -s Desktop/NECND350_v21A.exe /dev/sg2
Binflash - NEC version - (C) by Liggy and Herrie
Visit http://binflash.cdfreaks.com

Identified drive: 4 - 3031
Detected drive from Firmware: 4

You are about to flash your drive with the following firmware:

Vendor: _NEC
Identification: DVD_RW ND-3500AG
Version: 2.1A

Remember no one can be held responsible for any kind of failure!
Are you sure you want to proceed? (y/n) y

Entering safe mode
Sending firmware to drive at 0x006000
Sending firmware to drive at 0x00e000
Sending firmware to drive at 0x016000
Sending firmware to drive at 0x01e000
Sending firmware to drive at 0x026000
Sending firmware to drive at 0x02e000
Sending firmware to drive at 0x036000
Sending firmware to drive at 0x03e000
Sending firmware to drive at 0x046000
Sending firmware to drive at 0x04e000
Sending firmware to drive at 0x056000
Sending firmware to drive at 0x05e000
Sending firmware to drive at 0x066000
Sending firmware to drive at 0x06e000
Sending firmware to drive at 0x076000
Sending firmware to drive at 0x07e000
Sending firmware to drive at 0x086000
Sending firmware to drive at 0x08e000
Sending firmware to drive at 0x096000
Sending firmware to drive at 0x09e000
Sending firmware to drive at 0x0a6000
Sending firmware to drive at 0x0ae000
Sending firmware to drive at 0x0b6000
Sending firmware to drive at 0x0be000
Sending firmware to drive at 0x0c6000
Sending firmware to drive at 0x0ce000
Sending firmware to drive at 0x0d6000
Sending firmware to drive at 0x0de000
Sending firmware to drive at 0x0e6000
Sending firmware to drive at 0x0ee000
Sending firmware to drive at 0x0f6000
Sending firmware to drive at 0x0fe000
Sending checksum to drive
Erasing flash block 2
Erasing flash block 3
Erasing flash block 4
Erasing flash block 5
Erasing flash block 6
Erasing flash block 7
Erasing flash block 8
Erasing flash block 9
Erasing flash block 10
Erasing flash block 11
Erasing flash block 12
Erasing flash block 13
Erasing flash block 14
Erasing flash block 15
Erasing flash block 16
Erasing flash block 17
Erasing flash block 18
Writing flash block 2
Writing flash block 3
Writing flash block 4
Writing flash block 5
Writing flash block 6
Writing flash block 7
Writing flash block 8
Writing flash block 9
Writing flash block 10
Writing flash block 11
Writing flash block 12
Writing flash block 13
Writing flash block 14
Writing flash block 15
Writing flash block 16
Writing flash block 17
Writing flash block 18
Leaving safe mode

Whilst the 2.58 OEM release didn’t fix my problems, 2.1A did and I now have a freshly-burnt copy of Windows 7 Pro to go and play games with. Nice one, Liggy & Dee. :)

Testing Google Go on Ubuntu

Yesterday a few of you will have heard the news that Google recently launched a new programming language, named ‘Go‘.

Whilst I’m not a programmer, and exist far from the plain of ever pretending to be one – I do have some professional interests in playing with this. I’ll probably update this post a little later with some more, specific information when all can be revealed, but for now here’s a little taster:

root@gotest:~# 6g hello.go
root@gotest:~# 6l hello.6
root@gotest:~# ./6.out
hello, world

It works! This machine is an openvz container, running Ubuntu 9.04 x86_64 and it works a treat, with the only exception that I couldn’t build Go with the standard ‘all.bash’ make script. I had to use the ‘make.bash’ script, instead for it to work – something about probing the network devices not working with the former script. Thanks go to Rob Pike from Google, whom seems to have been working pretty darn hard in the #go-wild IRC channel on Freenode recently!

Update: 34SP.com are now offering Google Go development environments, for those wishing to dabble!

Exchange 2010 to support Firefox and Safari

I’m actually unbelievably shocked. Uncontrollable, crazy laughter gripped the inner space of my mind when I was faced with the news that Microsoft are planning to support Firefox 3.x and Safari 3 from the Exchange 2010 ‘Outlook Web Access’ web page.

Further still, they’re touting the fact that the OWA now has all of the features the regular Outlook desktop does!

Does this not strike anyone else as a move that would make Windows (and Office, particularly since itself and OpenOffice will by then both have full ODF compatibility) completely obsolete? Why would you pay for a Windows 7 site license, when you can upgrade your Exchange server to 2010, replace all of the Windows machines with Ubuntu 10.04 LTS, Firefox 3.1 and OpenOffice, and save your company thousands of pounds?

On top of this, they’ve supposedly tuned 2010 to be ‘less bursty’ in the way that it accesses the disk, as well as adding JBOD concatenation support. Does anyone else see that as ‘Please virtualise your Exchange servers’? Yep, so did I.

I suppose you could be running HyperV, but with Microsoft supporting iterations of Windows Server under RedHat Xen virtulisation, I really don’t see how they’re going to convince people to pay for their the majority of their bread-and-butter products, once Exchange 2010 débuts.

What’s next? Windows 7 released under the Microsoft Public License? Perhaps they’ll just call it ‘Windows Azure Client’ when they give it away for free…

Pidgin 2.5.5 hogging my CPU time

I’ve had a re-occurring issue with pidgin randomly screwing with my CPU usage; actually maxing out a single core for no apparent reason and/or crashing thereafter. In actual fact, I think I can even attribute a few recent gnome-panel crashes to this behaviour, as well.

Today I’ve been informed (by my darling girlfriend) that her buddy icon was out of date: she’d changed it a while back, yet my client was appeared to be stubbornly displaying the old icon, even weeks later. As I couldn’t find a method for forcing the buddy icon to update within the program itself, I navigated to ~/.purple/icons (finding approximately 1,660 cached icons!) and deleted the lot.

Since restarting pidgin it’s taken a while for the buddy icons to repopulate for some reason, but after a few tests, it does appear that they’re updating properly when changed by the other party. As a side effect, I believe I’ve found (and fixed) the cause of pidgin’s leak/loop/error! Hopefully someone else will find my serendipitous bug-squashing useful.

I may even launch a bug report, given that I couldn’t find one. Now all I need is time to do so… :)