Backups Revisited Part 2

Posted: February 21st, 2011 | Author: | Filed under: Advice (Unsolicited), Computers | Comments Off

In this post, I’ll focus on the practical side of backups.

Last time, I asserted that in order for a backup to really be a backup, your data has to be automatically replicated on two different drives using two separate filesystems on two different computers that are geographically separated, and one of those backups needs to be able to go back in time by at least 24 hours.

Satisfying all of these criteria at once usually isn’t free, but it doesn’t have to be hard, and you’re probably closer to a workable solution than you think.

In this post, I’ll examine a few possible solutions and point out some non-obvious ones. This isn’t meant to be comprehensive, but rather serves to give a general flavor of the state of backup solutions.

Built-In Solutions

OS X’s Time Machine fits some of our criteria for backups, but falls short in others. You can back up to two different drives,  the filesystems are distinct, and you’re able to move the Time Machine drive back in time if needed. Backing up to other computers with Time Machine is possible, but it’s unsupported and not very reliable (at least in my experience).

Using a network-enabled USB drive or a Time Capsule is essentially equivalent to backing up to another computer (they’re practically little computers themselves), but that costs a good deal of additional money. Unless you really know what you’re doing and are willing to take the time to make it work (and keep it working), making remote backups work with the Time Capsule is not really feasible.

Although I’ve never used it personally, Windows 7′s Backup and Restore feature appears to be feature-for-feature equivalent to Time Machine, but without Apple’s high-gloss glittery front-end. If you have a Professional or above license, it can backup to network shares, which is an improvement over Time Machine but requires you to pay more for the OS itself, which is kind of a drag.

You can use rsync by itself on pretty much any platform or with any one of a plethora of (usually OS-specific) front-ends. rsync can push files to pretty much anywhere and it supports incremental backups, so you could definitely satisfy all your backup demands with rsync, although it would take a little bit of work to get everything set up.

Sneakernet

If you’ve got a USB drive and are willing to lug it back and forth, there’s a relatively inexpensive way to come pretty close to an optimal backup solution. If you leave your USB drive at work, take it home and do backups every Monday night and bring the drive back to work on Tuesday, you’ve got your bases mostly covered. The problem here, of course, is that you have to remember to take the drive home with you, your backup granularity is kind of coarse (if your drive dies, you lose at most a week’s worth of stuff), and there’s a small window of vulnerability when your USB drive is at home. You get geographic distance for free, though.

Enter the Cloud

There are several companies that have recently started to offer so-called “cloud backup” services that provide you with some amount of storage space to which you can back up. Notable companies in this space include Mozy, Backblaze and CrashPlan. With cloud backup services, you easily satisfy all of our desirable backup properties simultaneously (unless you happen to live next to one of their data centers, of course), but it will usually cost you and doing the initial backup over the wide-area Internet may take weeks or months. Most services will ship you an external hard drive to which you can do your initial backup, but you have to eat the cost of a hard drive (~$100-150) for the privilege of writing to the drive and mailing it right back.

In my opinion, the standout favorite contender in this space is CrashPlan for one simple reason – they allow two computers running CrashPlan to back up an unlimited amount of data to each other for free. So if you and your friend both want to run backups, you can back up to each other.

Unexpected Surprises

If you care about your photos, you’ll want them backed up. If you share your photos on a site like Facebook or Flickr, you’re most of the way to an ideal backup of those photos. The only major drawbacks here is that restoring your photos isn’t trivial (you have to re-download them, although there are applications that will help automate that process) and you might incur a loss in quality when the site scales your image down. If you don’t mind those things though, these are great inexpensive ways to backup.

“But what about our time travel requirement” you might ask? If you’re editing photos, you might care about reverting to a previous edit. Most of the time though, you take pictures, upload them and never modify them again. Static data like pictures or music, where individual items never change but the set of items is expected to grow larger, is easier to back up because as long as you never delete anything the time travel requirement isn’t necessary.

My Setup

I have three computers that I care about – my desktop, my laptop and my home theatre PC. My desktop has an OS X partition and a Windows 7 partition and my laptop runs Debian in a VM, so I need to back up five filesystems in total. The HTPC has external storage drives that hold movies and music.

I admit that I break my own rules a bit – the external media drive on the HTPC is a RAID 1 with no other backups. I know, scary right?

Every system runs CrashPlan, even the Linux VM on my laptop. All systems backup to two places. The first is an old external drive attached to the HTPC. The second is a workstation under my desk at UCSD. Since I had an extra drive lying around, my desktop’s OS X partition also runs a Time Machine backup on a second internal drive.

That about covers it. Next week, something not related to backups!


Analyzing The Comic-Con Registration Meltdown

Posted: November 22nd, 2010 | Author: | Filed under: Advice (Unsolicited), Computers, Opinions (Uninformed) | 2 Comments »

Thanks to PanelsOnPages.com for the image

On November 1st at 9 AM, online registration for 4-day passes to San Diego Comic Con began. By 9:05 AM, the massive volume of registering attendees caused the registration system to become inaccessible. By 10:30 AM, Comic Con International closed the registration site down, claiming that it would re-open in three weeks.

At 6 AM PST this morning, registration re-opened. By 6:05, the registration site was once again inaccessible due to overwhelming traffic volume. A lucky few managed to get all the way through the registration process, but most of us were left repeatedly retrying until, at 7:30 AM, the registration site was once again closed.

In the grand scheme of things, this was not a disaster. However, it inconvenienced a large number of people, myself included.  In my opinion, both of these problems were entirely preventable.

I do research in the design of scalable, high-performance large-scale systems. Many people in my field work on technologies that are designed to prevent exactly the kind of failure to operate at scale that occurred this morning. While I am by no means an expert, I feel that I can speak from an informed position about scalability issues like this one. In this post, I’m going to speculate on what happened, and what might be done to fix the problem inexpensively.

Disclaimer: I don’t work for Event Planning International Corporation (EPIC) or Comic Con International (CCI), and I was only an external observer of their registration meltdown, so I don’t know exactly what occurred. I’ve seen problems like this documented enough times that I think I can guess what really happened from that documentation and personal experience. I also don’t work for Amazon or any of the other companies mentioned favorably in this post, although I have clearly consumed vast quantities of the cloud computing Kool-Aid.

I’m going to assume some things about EPIC’s architecture. In particular, I’ll assume they have a single well-provisioned database system and a handful of web servers, possibly with a load balancer sitting in front of them and distributing requests evenly among them. This is how most small websites usually look.

“What we’ve got here is a failure to communicate.”

In my mind, the timeline of the meltdown looked something like this:

6 AM: CCI posts a link to the registration site on its homepage. Frantically refreshing nerds see the green box and click it, presenting them with step one of five: enter your name and e-mail address.

Over the next 90 seconds or so, several hundred people open connections to the registration site. Since the page is just vanilla HTML and maybe a little PHP or Javascript, the web server(s) cache the first page and everything is running fairly smoothly, everything’s being served out of memory and all is right with the world.

People start clicking the “Next” button to proceed to step two. Each click of the “Next” button causes an HTTP POST request to be sent to a PHP script housed on one of the web servers. This script inserts some data into the database indicating that a person whose name is X and whose e-mail address is Y has reserved an attendee slot for the next few minutes – this is mainly there so that you don’t register more people than you can fit in the convention hall. While this is happening for the first few hundred requests, several thousand more registrants are about to hit the “Next” button and start this process for their registrations. The database server starts fielding thousands of connections and inserting thousands of rows into the registration lock table at once. It starts to run out of memory and starts swapping, or it hits 100% CPU utilization, or it’s disks are seeking all over the place. It gets slower and slower. Eventually, it effectively stops returning responses to queries.

So now the database server is effectively hosed. The web servers continue to issue database queries anyway, and they have an increasing backlog of PHP scripts waiting for their database queries to return. The web servers’ memories fill up with session state, the PHP scripts’ stacks, TCP connection state, etc. Eventually, the web servers run out of memory, they start swapping and their performance essentially drops to zero. Requests for page two begin to time out. “500: internal server error” begins popping up on the screens of frustrated nerds around the globe. These users furiously hit “Refresh” hoping that the website will come back to life, which creates new requests and only makes the problem worse.

At this point, sysadmins are running around like their hair is on fire trying to get the problem under control. They try every trick in the book, but nothing works. Frantic phone calls are made. Servers are powered off in hopes that demand will recede if the server is inaccessible for a period of time. Demand does not recede.

After about an hour, the decrease in volume from users giving up releases sufficient system resources for some lucky individuals to be able to advance to step 2: enter your address and phone number. Hundreds to thousands of users do just that and then click “Next” on step 2′s page within about 30 seconds of one another. This issues a flood of HTTP POSTs to another PHP script that is supposed to insert the information contained in the POST into the database and associate it with the name and e-mail address that the user entered in step 1. The problem returns with a vengeance and the servers fall over again. Few users make it to step 3 successfully.

7:30 AM: CCI orders the site closed, claiming that they will be “investigating their registration options”. EPIC (presumably) loses a high-profile client.

Cloud Computing to the Rescue

So, what happened here? Fundamentally, EPIC’s servers were not sufficiently well-provisioned to handle the load presented to it by SDCC’s registrants. The servers couldn’t handle the strain, and so they ground to a screeching halt.

How can we solve problems like this? One way is to buy more and more well-provisioned computers and spread the load across them until the load on a given machine becomes manageable.  Unfortunately, this typically involves a lot of up-front and long-term costs: you need to buy the computers,  find some place to put them where they’ll remain cool and dry, and fix them when they break. Additionally, when your servers are talking to a database, the server hosting the database must be doubly well-provisioned, often at significant additional cost. Oracle makes a lot  its money selling ridiculously well-provisioned database servers the size of a refrigerator for hundreds of thousands of dollars a piece.

Getting enough computers to get the job done does not need to be expensive, however.  Pay-as-you-go “Infrastructure as a Service” (IaaS) systems – one of the many classes of systems classified under the blanket term “cloud computing” by the world’s IT pundits – were designed to solve this exact problem.

I’m going to focus on Amazon’s EC2 and RDS here for the sake of example and because they’re the most popular services of their kind, but many other IaaS offerings exist. Joyent and Media Temple are two other great examples of IaaS companies with a strong presence in the marketplace, especially among startups that need affordable, scalable hosting solutions.

Let’s suppose that you need to register 24,000 people (~20% of attendees at last year’s con) and expect peak demand to last around 24 hours. You buy time on 50 “high-memory extra-large” EC2 instances (17.1 GB of RAM, dual-core processors) to use as web servers and ten “high-memory quadruple extra large” RDS database servers (68 GB of RAM, 8-core processors, “high I/O capacity”) to do your query processing. You reserve them on-demand, which essentially means you pay more per-hour but you only pay for what you use. Let’s suppose conservatively that you’ll store 20 GB of data to these database servers (that’s almost 1 MB of data per user, which I’m guess is more than enough to store basic registration information) and that you’ll read and write every byte of it once in 24 hours. Let’s assume that the bank’s credit card transaction processing servers will scale to meet the load; after all, they handle millions of transactions a day routinely (and are extremely well-provisioned because of it).

If you use all of it – that means all 50 web servers and all 10 database servers for 24 hours straight – it’ll cost you about $950 at the end of the month. If you only use the database instances (for 24 hours straight, and read, write and store 20 GB),  it’ll cost about $320. (This according to AWS’ cost calculator). That’s far less than the retail cost of the parts in even one of the aforementioned servers.

Need more machines? Buy time on more machines for a few tens of dollars; Amazon’s magical network of services can even balance the load across them for you automatically if you ask it nicely. Don’t need all ten of those database servers you reserved? Only turn on five of them and leave the others alone. Demand starting to die down after 18 hours instead of 24? Start shutting machines down. Magical.

This kind of flexibility and “magic” provisioning of hardware is what makes cloud computing such a compelling way to solve website scalability problems like the one plaguing Comic Con, and it’s one of the reasons why the research and industrial communities alike are so excited about it. I hope that situations like this one will encourage more companies to leverage these sorts of technologies to deploy more scalable websites.


Working From a Laptop (or, Virtual Machines are Awesome)

Posted: October 17th, 2010 | Author: | Filed under: Computers | 2 Comments »

When I joined UCSD three years ago, I was issued a Dell Optiplex 320, Dell’s bargain business workstation at the time. My advisor (because he is awesome) also issued me a MacBook Pro. I used both daily until last month, when my patience with the Optiplex and its bizarre issues reached its limit and I switched to using my laptop at work full-time.

I had hesitated to do this because all the testbed machines we use run Linux and it’s just easier to develop on the same platform. I’ve tried dual-booting Linux in the past and it wasn’t any fun. Despite the fact that a lot of people have been working hard to get distros like Ubuntu working well on Mac hardware, there’s always a laundry list of gotchas that I just didn’t want to deal with. I could do all my editing on remote machines through SSH, but that solution has never worked that well for me, especially if I’m working at home or I’m travelling.

In retrospect, the answer was pretty obvious; since the only thing I’m doing in Linux is compiling and editing code, why not just run Linux in a virtual machine? It took a little bit of tweaking and tuning to get the setup working well, but I’m pretty happy with it overall so I thought I’d share.

I run Debian testing in VMWare Fusion, giving it one CPU and 512MB of RAM. These specs seem to be sufficient for what I’m doing in the VM, and all of the serious data crunching gets done somewhere else. The VM has a 24GB “disk” (really just a collection of files on my laptop’s hard drive) and runs SSH and NFS servers. I export my home directory on the VM over NFS locally so that I can edit files from the Mac side easily; the new changes in Disk Utility in Snow Leopard really make setting up NFS on the Mac side painless. I was originally using AFS for file access, but AFS kept dropping hidden directories everywhere that was screwing with my build scripts, so I abandoned it. When I want to build binaries or install packages, I just SSH into the VM.

I run CrashPlan both inside the VM and on the laptop itself. The laptop doesn’t back up the VM image itself, which saves space on the backup volume, and the laptop and VM can be restored independently in the event of a failure.  I suspend the VM about once a month and make a copy of it just in case I need to fall back on a checkpoint if the VM is somehow corrupted.

Working this way has a number of unexpected benefits:

Managing “disks” is painless: Recently, I needed a disk running ext4 for testing. Before, I would have had to create an ext4 partition on the main disk in my Optiplex or find, install and format another physical disk. In a VM, I just created a new 1GB virtual disk and formatted it and in five minutes I had a disk I could use for testing.

I can work within a single context: The days of “crap, I have that file but it’s on my desktop” are gone, and good riddance. I can move to other people’s desks to collaborate with them without leaving my work setup behind. If I need to go home (it happens sometimes) I just rip out some cables, close the lid and toss my laptop in my backpack and I can pick up exactly where I left off when I get home. Overall it’s noticeably more convenient.

Triple-monitor support!: Yeah, I could probably have made this work on the Optiplex, but I would have had to fight X11, and that’s no fun at all. Instead, I just ponied up the cash for a USB to DVI adapter; the resulting video is a little choppy, but it’s more than adequate for looking at text. Now I have the two that were connected to the Optiplex as well as the laptop’s built-in display:

I’m pretty pleased with this setup so far. I’m also a fan of multi-monitor setups, and having three kind of makes me feel like a super-villain.


Thoughts on Neutralitygate

Posted: August 10th, 2010 | Author: | Filed under: Computers, Opinions (Uninformed) | Comments Off

There has been a lot of sound and fury today (well, technically yesterday) surrounding Verizon and Google’s joint net neutrality policy announcement. I didn’t really know about it until Tim brought it to my attention. He wanted to know my thoughts about this, and blogs are practically designed for uninformed opinions, so why not?

In the interest of full disclosure, I’ve worked at Google in the past. Several friends of mine are either current or former Googlers. Money from Google’s generous intern salary paid for all the furniture in my living room. I’m totally biased, I freely admit. The rest of this post will also probably reveal my profound ignorance, but that’s why there are comments.

Frankly, I’m surprised at the amount of rage that I’m picking up about this announcement. This isn’t a net-neutrality-crushing alliance between Google and Verizon. It’s not binding. There’s nothing in the language that says that it’s not open for discussion and feedback. This is basically these two companies saying, “This is how we think we’re going to deal with the net neutrality thing right now.”

The notion of allowing carriers to “offer new types of services that are not part of the open Internet” is also not all that surprising to me. In my view, net neutrality says “if two computers on the Internet want to talk to one another, the carriers providing Internet access to either of the two shouldn’t be able to stop them”. Under “shouldn’t be able to stop them”, I mean that my computer shouldn’t be unable to talk to Google’s computers because I’m on a “Microsoft-friendly” ISP. What Google and Verizon seem to be proposing here is that companies should be able to create services that don’t operate over the public Internet as long as those services don’t circumvent net neutrality. It’s not really clear what they mean by “not part of the open Internet” here; are we saying that the computers making up this service aren’t world-routable? If so, then companies are already doing this internally. Large corporate intranets could be seen as an example of a service that operates separately from the open Internet. Do they mean creating a parallel network over which these services run? In that case, data center networks seem to fit that description pretty well and clearly companies are providing dedicated networks for data center networks, or just buying their own. Again, on the face of it I’m not too terribly distressed.

Most of the rage seems to be directed at the fact that they don’t think that wireless access to the Internet should be protected in the same way that wired access is. Judging by Google’s blog post on the subject (see how biased I am?) their justification seems to be that they don’t know what the mobile Internet really looks like yet. It seems to me that this is basically a political compromise; Google caves somewhat on the wireless issue so that Verizon will agree to come on board with the agreement. Notice that the transparency requirement opens the door for any company that tries to pull any pro-neutrality funny business to be publicly reamed by their customers (you need only look at the Comcast P2P filtering scandal to see how much good press anti-neutrality practices generate for the telcos that implement them). Google promises to not be non-neutral in access to its properties, and it and Verizon get to be great friends; everybody wins.

Are Google and Verizon both huge multinational corporations that are interested primarily in pulling in a lot of cash? Absolutely. Does the language of this announcement open up the possibility of abuse, corruption and rampant mismanagement? Of course, but so does every other agreement between more than one person. Does this announcement violate Google’s “Do no evil” mission statement? Hardly. To me, this just sounds like Verizon and Google getting all buddy-buddy and announcing the sort of long-term, under-defined recommendations that make PR people all warm and fuzzy because they don’t actually require you to promise to do anything. Everything I’m hearing from Google indicates they’re really self-conscious about this whole thing (“it’s not an aggreement”, “we are still committed to net neutrality”, “we’re not sure what the mobile market looks like, so we have to keep our options open”), which gives me some comfort; the second you get a “so what?” announcement out of a company Google’s size (see Comcast’s P2P shenanigans, Antennagate), it’s time to be concerned. Yes, I know I’m totally defending Google here. I told you, I’m biased.

It might sound like I don’t care about net neutrality, but nothing could be further from the truth. People are absolutely right when they say that the Internet is the most democratic medium ever conceived, and it’s really important that it stays that way. The Internet is big business, and these days it’s dominated by big businesses with a lot of influence and a lot of money. As citizens of a democracy, we have every right (and, some may argue, the responsibility) to be outraged when a large, powerful entity looks like it’s trying to gain control of a fundamentally democratic communications medium like the Internet. I just don’t think that this announcement represents anything close to such a power grab.

And that is my opinion on the subject. Release the trolls!


The Death of the KVM Switch

Posted: June 23rd, 2010 | Author: | Filed under: Computers | Comments Off

This post is the first in a series tentatively titled “First-World Problems: How I Use Way Too Many Computers”. It’s true, I use and administrate way too many computers. Many times, I’ve had to interact with multiple different computers at a time, whether it’s to download patches or because one machine has specialized hardware for a particular task. It’s sometimes useful for those computers to share a keyboard and mouse, particularly so I don’t have to switch back and forth between two keyboards all the time. Traditionally, people have accomplished this with KVM (Keyboard, Video and Mouse) switches. A KVM switch has two or more input connections and an output connection; simply hook your computers to the inputs and your keyboard, mouse and display to the output and viola, you can share them.

I’m going to assert that KVM switches are awful. There are a couple reasons why:

  1. You have to switch back and forth between machines with a key combo. Often, this combo is not documented anywhere on the switch itself. Almost always, it’s non-modifiable. Always, it’s cumbersome.
  2. KVM switches are stuck in 1997. Most KVM switches use PS/2 and VGA connectors. Getting this to work with computers that use USB and DVI requires a bunch of separate dongles that only nerds like me have in abundance. Good luck finding a native USB/DVI KVM switch for under $150.
  3. More cables. Seriously, I’ve got enough cables cluttering up the back of my desk, I’d rather not introduce more cables if I can avoid it.

Thankfully, there are software solutions that allow you to bypass the need for a KVM switch altogether.

Synergy allows you to use a single keyboard and mouse among an arbitrary collection of computers. Synergy runs a server on the machine with the keyboard and mouse attached and clients on all the other machines. The user specifies how the displays of these multiple machines are arranged with respect to one another; for example, I could say that my laptop’s display is to the left of my workstation’s display. Synergy detects when the mouse hits the edge of a display and “switches” the keyboard and mouse from one computer to another based on the configured arrangement. So when my mouse hits the left edge of my workstation’s display, it is seamlessly transferred to the right edge of my laptop’s display. I’ve used Synergy this way for a while now, and I’ll never even think of buying a KVM just to share a keyboard and mouse again.

Of course, Synergy doesn’t take care of sharing a single display between multiple computers. Most monitors have multiple inputs, and if you’re willing to press the input change button, Synergy might still work, but it’s still cumbersome. Another alternative is to use VNC. VNC is a communications protocol, and is implemented by a ton of different clients for all platforms (Google “VNC” for a list of the big ones). It essentially allows you to view and control another computer remotely. Those of you running Mac OS X Snow Leopard will probably use Screen Sharing for this, but under the covers it’s still VNC.

Both these solutions are cross-platform, pretty easy to set up and have a wide range of graphical helper applications and front-ends. They share the common disadvantage that they send data unencrypted, which means that any enterprising hacker could view your keystrokes as they travel over the network. Thankfully, it’s easy to solve this problem. There are a number of applications that will create a secure tunnel between two computers (Google “SSH tunnel” for more information). Alternative protocols to VNC like NX provide all the functionality of VNC over an encrypted connection. There are a lot of options, and the best part is that almost all of them are free.

So in short, if you’re thinking of buying a KVM, save $100 in cables and dongles and use software instead.


Wolfram Alpha is My Master Now

Posted: June 6th, 2010 | Author: | Filed under: Computers | Comments Off

Part of my research lately has involved asking a lot of questions like “What’s 100 TB / X / (N*D) in MB per second?” and “What’s the amount of time it takes for a 7200 RPM disk to go through half a revolution?” (reasons for this will be revealed later). Google Calculator is great for this. If you’ve never heard of Google Calculator, that’s because it’s built into Google Search. Try going to Google and search for “10 hectares in square parsecs” to get an idea for what I mean. It supports tons of unit conversion formats, but it’s lacking in support for calculations relating to bytes.

Megabytes or Mibibytes?

As a bit of exposition: a common question when buying hard drives is “I just bought a 500 GB drive but it only has 465 GB of capacity when I hook it to my computer. I want my extra space back!” This discrepancy isn’t because hard drive companies want to screw you over (OK, they might be trying to screw you over, but not with this), it has to do with how computer science deals with bytes. In a network setting, measurement is typically done in powers of 10, so a gigabyte is 1000 megabytes. With storage, measurement is typically done in powers of 2, so a gigabyte is 1024 megabytes. So when you buy a 500 GB drive, your computer will tell you it’s a 465.7 GB drive because they gave you the powers-of-10 capacity and the OS displays it in powers-of-two. I know that Snow Leopard no longer does this to avoid confusion, but I’m not sure what Windows 7 does.

In an effort to resolve the confusion, the International Electrotechnical Commission (IEC) instituted a set of units of measurement that used powers of two, and kept powers of ten as the default. So according to IEC, a megabyte is 1000 kilobytes, and a mibibyte is 1000 kibibytes. The ‘-’-bi’ was supposed to invoke the word “binary” in the user’s mind. Personally I think they deliberately did this to make computer scientists sound ridiculous. Once you get up high enough you hit yobibytes, for Pete’s sake; that sounds like something you’d give Scooby Doo’s second-cousin Yobi Doo to make him cooperate.

Google Calculator deals explicitly in IEC units, but doesn’t follow the standard: if you ask it “how many bytes in a yottabyte” and “how many bytes in a yobibyte”, you get the same answer. This is intensely irritating if you care about powers of ten when you’re dealing with bytes.

Enter Wolfram Alpha

Wolfram Alpha is basically a thin veneer on top of Mathematica that allows you to access some of its simpler functionality for free. Mathematica is a mathematical toolbox that is so absurdly sophisticated and full-featured that I’m expecting it to gain sentience any minute. Just to give you an idea, the Mathematica documentation – complete with typesetting and figures – was written entirely in Mathematica. Scary.

I was hunting around for an alternative to Google Calculator’s unit conversion (and considering writing my own, such was my level of desperation) when I decided to ask Alpha. After all, you can ask Alpha “How many roads must a man walk down before you can call him a man?” so it must be able to do unit conversion, right? Turns out it can, and it honors the IEC specification! So if I ask it “how many bytes in a kilobyte”, not only will it give me powers of ten, it will ask me if that’s what I wanted and suggest kibibytes as an alternative!

It can also convert between decimal, binary (twos-compliment and unsigned), hex and octal. Hot damn.


Esoteric Tip of the Day #1: Dead Man’s Check

Posted: December 12th, 2009 | Author: | Filed under: Computers, Esoteric Tips | 10 Comments »

I’m responsible for the care and feeding of way too many computers. I say “way too many” because the probability of one of your computers doing something stupid on a Friday night increases in proportion to the number of computers that can do something stupid. Most of the time, the stupidity is routine (“Oh hey, what a surprise! Firefox is using all my RAM!”) but every so often, they surprise you. I’m going to post something here whenever I manage to fix one of these more “WTF”-type errors under the heading Esoteric Tip of the Day. Chances are you won’t care, but someone Google searching might, and I want to make this sort of information easy to find so that others won’t have to endure this same issue.

I noticed a couple days ago that my Mac mini (which acts as a storage server but will probably be doing more media serving once I get a better TV) was not visible to the rest of my home network. It turns out that the mini’s network interface was having serious issues, because the mini couldn’t even get an IP address. Doing a little digging into the log, I found a lot of log segments that look like this:

kernel  AppleYukon2: error - FATAL: SkGeStopPort() does not terminate (Rx)
kernel  AppleYukon2: error - Event queued in Init Level 0
configd[14] network configuration changed.
Firewall[66]  krb5kdc is listening from :::88 proto=6
Firewall[66]  krb5kdc is listening from 0.0.0.0:88 proto=6
kernel  Ethernet [AppleYukon2]: Link up on en0, 100-Megabit, Full-duplex, Symmetric flow-control, Debug [796d,6c0c,0de1,0200,4de1,4000]
configd[14] network configuration changed.
Firewall[66]  krb5kdc is listening from :::88 proto=6
Firewall[66]  krb5kdc is listening from 0.0.0.0:88 proto=6
kernel  AppleYukon2: 00000000,00000000 sk98nif - deadmanCheck - nothing received, soft reset of chip
kernel  AppleYukon2: 00000000,00000000 sk98nif - deadmanCheck - nothing received, soft reset of chip
kernel  AppleYukon2: 00000000,00000000 sk98nif - deadmanCheck - nothing received, soft reset of chip
kernel  AppleYukon2: 00000000,00000000 sk98nif - deadmanCheck - nothing received, soft reset of chip
kernel  AppleYukon2: 00000000,00000000 sk98nif - deadmanCheck - nothing received, soft reset of chip
kernel  AppleYukon2: 00000000,00000000 sk98nif - deadmanCheck - still nothing received, hard reset of chip

There is only one group of people who know exactly what these messages mean, and they are the guys that wrote the AppleYukon2 driver.  Thankfully, a little Googling and some link clicking later, I had a plausible explanation. Apparently this sort of error occurs when one of the networking-related preferences that OS X stores is corrupted somehow. This apparently happened quite a bit when users upgraded to 10.5.7, and it also happened to me. The driver reads a corrupted preferences file, takes some ridiculous action and wedges the underlying hardware in some kind of bizarre state that only a reboot can correct.

Thankfully, OS X can regenerate its preferences files. To do this cleanly:

  • Reboot holding down the Shift key. This will force the computer to boot in Safe Mode and (as a side-effect) rebuild its caches, which is a good thing if they’ve been populated with garbage from your corrupted preferences files, which they probably have been.
  • Delete the following files in /Library/Preferences/SystemConfiguration: NetworkInterfaces.plist, com.apple.airport.preferences.plist, com.apple.network.identification.plist
  • Reboot and re-apply your network settings (IP address, et al)

Special thanks to Daniel Palmer, the poor soul who sat on the phone with Apple to get this resolved. Here’s the thread containing the solution.

Update 3/3/10: This problem has reared its ugly head once again, and I’ve found out a little more information about it:

When you set the speed and duplex settings of your wired network adapter manually rather than keeping the setting at “Automatically” and you disable IPv6, the driver no longer triggers hard resets for whatever reason. In my case, it triggers soft resets at precise 6-minute intervals.

Also, as reader lafber pointed out, this problem only seems to occur when there is no traffic on the wired network. To solve this problem, I wrote a one-line AppleScript script. The application start a command that pings my router every 30 seconds. The command itself runs in a virtual terminal, so I don’t have to keep an application running in the foreground. The AppleScript looks like this:

do shell script “screen -m -d ping -i 30 192.168.1.1″

To get it to start at bootup, I saved that AppleScript script as an application and added the application to my default user’s Login Items in the Accounts system preference pane.


Decent Python Code Folding in TextMate

Posted: July 16th, 2009 | Author: | Filed under: Computers | 2 Comments »

textmate_logoTextMate is a great editor, and Python is a great programming languages, but they both have their limitations.

One of TextMate’s nicer features is code folding, which allows you to collapse a block of code (a function, a conditional block, etc.) down to a single line in the editor. This often makes a large piece of code much easier to navigate. TextMate determines where code can be folded by evaluating each line of the code and tagging particular lines with start and end markers.

Doing code folding for Python in this method (only considering a single line at a time) is impossible, since the only clear end of a function in Python is a reduction in indentation level on the next line. For example, here’s a function in C followed by a statement:

int foo(int value) {
   printf("%d", value + 5);
}
foo(24);

Here’s the same code written in Python:

def foo(value):
   print "%d" % (value + 5)
foo(24)

As you can see, if you look at the code one line at a time, it’s not clear where the definition of foo in Python ends, whereas the closing curly brace (}) is clearly the end of foo’s definition in C.

By default, TextMate punts: it defines the start of a function or class definition as the start of a foldable region and the first subsequent blank line as the end of that foldable region.  This doesn’t really work well, because function definitions without any blank lines aren’t really readable past a certain size.

There is, however, a way to hack your way around this issue. Another of TextMate’s killer features is that you can customize almost everything, including the definition for a given language.

Open the Language Editor through the “Bundles > Bundle Editor >Edit Languages …” menu item (as of TextMate 1.5.8, anyway).

Pull up the language editor for Python, find a line that looks like this:

foldingStopMarker = '^\s*$|^\s*\}|^\s*\]|^\s*\)|^\s*"""\s*$';

and replace it with this:

foldingStopMarker = '^\s*#\s*END_DEF ([a-zA-Z0-9_<]+)';

Now, if you want to fold a function definition in Python, stick a comment at the end of it:

def foo(value):
   print "%d" % (value + 5)
#END_DEF foo

Hope fully this helps out some of you who are using TextMate for Python programming.


Dealing with WordPress Database Inconsistency

Posted: May 20th, 2009 | Author: | Filed under: Computers | 1 Comment »

I just finished wrestling with a problem with WordPress. Editing a really old page tickled some sort of internal inconsistency in the database, which created a zombie page that could neither be edited nor deleted. I really wasn’t looking forward to crawling around in the database, so I did some googling and came up with Lester Chan’s brilliant WP-DBManager. One click of the “Repair” button and allowed me to deliver the deathblow to the zombie page, saving me a boatload of time in the process. Thanks, Lester!


Preventing data loss

Posted: January 21st, 2009 | Author: | Filed under: Computers | 4 Comments »

OK, so it's not a forest fire, but you get the point.

I’ve always been the sort of person that saves receipts. Until very recently, I had an entire filing box full of receipts, warranty cards and software license keys. This system bothered me in much the same way that keeping paper copies of research papers did: they can’t be searched (did I file the receipt for my coffee table under C for coffee table, T for table, I for Ikea, F for furniture?), and they take up space in my apartment, where space is pretty limited. There’s the additional problem of a paper receipt’s relative fragility: there’s only one copy of it, and it can be burned, crumpled, crushed, soaked, can fade into unreadability, and so on. The solution was pretty straightforward; I scanned everything in the box, gave them descriptive file names and let Spotlight index them.  To provide some additional protection against data loss, I burned two copies of the newly-scanned PDFs onto two separate DVDs and mailed one to my parents. This got me thinking; sure, this way it’s much less likely that my important documents will be destroyed in the event of a disaster. Really though, how safe is my data? More importantly, how safe can my data really be?

Some studies point to the fact that the lifespan of recordable CDs and DVDs is quite short (in the neighborhood of three to five years), which means that any “permanent” storage on DVDs would need to be refreshed on about that interval to remain current. Now, given that most of what I burned on those DVDs don’t need to be kept for more than five years, that shouldn’t really be a problem. Also, these are kept in relatively controlled conditions – no direct sunlight, ambient temperature where they are doesn’t get much colder than about 55 or much hotter than about 100. But what about things like photos, home videos? Stuff that is irreplaceable and would need to be stored over a period of, say, decades?

Every data retention scheme you can name has its breaking point, some condition that could irretrievably kill your data. In my opinion, the important factors are the likelihood and inevitability of the failure and if the failure can be proactively avoided. Let’s take a look at a few ways to store data sorted from least paranoid to “tin-foil hat” paranoid.

Single disk:

  • Data is lost when: the disk fails.
  • How likely/inevitable is this?: All disks will fail, it’s just a matter of when.
  • Proactive avoidance?: Unless you go to some outside storage source, none.

Scarily, this is what most people are relying on. Professional drive recovery companies might be able to get your data off the disk, but it will cost a pretty penny.

RAID array (multiple, redundant disks):

  • Data is lost when:
    • All redundant disks fail before the array can be rebuilt. Especially heinous if your disks all croak of the same hardware problem, or your power supply kills them by catching on fire.
    • (Hardware RAID only) RAID controller fails, no replacement or equivalent controller can be found
  • How likely/inevitable is this?: With enough diligence (and enough backup disks), this shouldn’t be too likely.
  • Proactive avoidance?: More redundant disks, software RAID, quick monitoring and early detection of drive errors, always having a spare drive or two handy.

Burning to CD/DVD:

  • Data is lost when: the disk becomes unreadable (crushed, melted, thoroughly scratched, hacked into un-spinnable chunks, or some other loss of structural integrity)
  • How likely/inevitable is this?: Some might argue that it will happen eventually, but it’s a lot less likely than your disk failing, especially if you don’t expect to touch the data itself frequently.
  • Proactive avoidance?: More copies, stored in more places. Re-burning periodically.

Backup to off-site computer/external disk that’s kept somewhere else:

  • Data is lost when: both the storage system storing the original and the storage system storing the backup fail before an additional backup can be brought online
  • How likely/inevitable is this?: Unless you’re not paying attention to your backup for long periods of time or there’s a nuclear war, you’re probably OK. The amount of time you have to react depends, of course, on the storage setup of the two machines.
  • Proactive avoidance?: More than one backup, as widely distributed geographically as you can afford.

Cloud Storage (Amazon’s S3, others):

  • Data is lost when: hopefully never. Companies providing cloud storage services spend a lot of money making sure that this doesn’t ever happen. Amazon’s SLA for S3 doesn’t even mention data loss, just the minimum amount of time they guarantee that the service will be available per year.
  • How likely/inevitable is this?: Much like the nuclear war scenario, if S3 loses data it will probably be on the news, at least in the Silicon Valley.
  • Proactive avoidance?: Multiple storage clouds, owned by multiple vendors. This is truly the peak of tinfoil-hat paranoia.

In my opinion, there are only two real downsides to the cloud storage approach. The first is that upload bandwidth, for most residential Internet customers in the United States, just plain sucks, and uploading a non-trivial amount of data is painfully slow. Fiber to the home (and a technology-friendly executive branch) might change this, but it won’t happen in the immediate future. Second, S3 costs money, both to upload data and to store it there. If you’re not willing (or able) to pay the bill, this isn’t for you.

This is by no means an exhaustive list, and there are a lot of hybrid approaches. I back up my computer’s hard drives to external drives periodically, and those drives spend most of their time in my desk drawer. My various media (photos, music, digital copies of DVDs) are stored on a software RAID-1 in the media center PC. As an additional layer of protection, all my photos are on Flickr, all my music is synchronized with my iPod fairly regularly and my DVDs are, well, DVDs.

This setup, while pretty decent, is by no means foolproof. One major concern that I haven’t talked about here is data corruption, and I’ve got pretty much zero defense against that. Once Apple releases Snow Leopard, hopefully I’ll be able to transfer all my data over to ZFS volumes. I could nerd out over ZFS for a whole other post, but the short of it is that ZFS makes far fewer assumptions about data’s integrity and comes pretty close to eliminating the data corruption problem.