Archive for the 'Computers' Category

Esoteric Tip of the Day #1: Dead Man’s Check

  

I’m responsible for the care and feeding of way too many computers. I say “way too many” because the probability of one of your computers doing something stupid on a Friday night increases in proportion to the number of computers that can do something stupid. Most of the time, the stupidity is routine (”Oh hey, what a surprise! Firefox is using all my RAM!”) but every so often, they surprise you. I’m going to post something here whenever I manage to fix one of these more “WTF”-type errors under the heading Esoteric Tip of the Day. Chances are you won’t care, but someone Google searching might, and I want to make this sort of information easy to find so that others won’t have to endure this same issue.

I noticed a couple days ago that my Mac mini (which acts as a storage server but will probably be doing more media serving once I get a better TV) was not visible to the rest of my home network. It turns out that the mini’s network interface was having serious issues, because the mini couldn’t even get an IP address. Doing a little digging into the log, I found a lot of log segments that look like this:


kernel AppleYukon2: error - FATAL: SkGeStopPort() does not terminate (Rx)
kernel AppleYukon2: error - Event queued in Init Level 0
configd[14] network configuration changed.
Firewall[66] krb5kdc is listening from :::88 proto=6
Firewall[66] krb5kdc is listening from 0.0.0.0:88 proto=6
kernel Ethernet [AppleYukon2]: Link up on en0, 100-Megabit, Full-duplex, Symmetric flow-control, Debug [796d,6c0c,0de1,0200,4de1,4000]
configd[14] network configuration changed.
Firewall[66] krb5kdc is listening from :::88 proto=6
Firewall[66] krb5kdc is listening from 0.0.0.0:88 proto=6
kernel AppleYukon2: 00000000,00000000 sk98nif - deadmanCheck - nothing received, soft reset of chip
kernel AppleYukon2: 00000000,00000000 sk98nif - deadmanCheck - nothing received, soft reset of chip
kernel AppleYukon2: 00000000,00000000 sk98nif - deadmanCheck - nothing received, soft reset of chip
kernel AppleYukon2: 00000000,00000000 sk98nif - deadmanCheck - nothing received, soft reset of chip
kernel AppleYukon2: 00000000,00000000 sk98nif - deadmanCheck - nothing received, soft reset of chip
kernel AppleYukon2: 00000000,00000000 sk98nif - deadmanCheck - still nothing received, hard reset of chip

There is only one group of people who know exactly what these messages mean, and they are the guys that wrote the AppleYukon2 driver.  Thankfully, a little Googling and some link clicking later, I had a plausible explanation. Apparently this sort of error occurs when one of the networking-related preferences that OS X stores is corrupted somehow. This apparently happened quite a bit when users upgraded to 10.5.7, and it also happened to me. The driver reads a corrupted preferences file, takes some ridiculous action and wedges the underlying hardware in some kind of bizarre state that only a reboot can correct.

Thankfully, OS X can regenerate its preferences files. To do this cleanly:

  • Reboot holding down the Shift key. This will force the computer to boot in Safe Mode and (as a side-effect) rebuild its caches, which is a good thing if they’ve been populated with garbage from your corrupted preferences files, which they probably have been.
  • Delete the following files in /Library/Preferences/SystemConfiguration: NetworkInterfaces.plist, com.apple.airport.preferences.plist, com.apple.network.identification.plist
  • Reboot and re-apply your network settings (IP address, et al)

Special thanks to Daniel Palmer, the poor soul who sat on the phone with Apple to get this resolved. Here’s the thread containing the solution.

Update 3/3/10: This problem has reared its ugly head once again, and I’ve found out a little more information about it:

When you set the speed and duplex settings of your wired network adapter manually rather than keeping the setting at “Automatically” and you disable IPv6, the driver no longer triggers hard resets for whatever reason. In my case, it triggers soft resets at precise 6-minute intervals.

Also, as reader lafber pointed out, this problem only seems to occur when there is no traffic on the wired network. To solve this problem, I wrote a one-line AppleScript script. The application start a command that pings my router every 30 seconds. The command itself runs in a virtual terminal, so I don’t have to keep an application running in the foreground. The AppleScript looks like this:

do shell script “screen -m -d ping -i 30 192.168.1.1″

To get it to start at bootup, I saved that AppleScript script as an application and added the application to my default user’s Login Items in the Accounts system preference pane.

Decent Python Code Folding in TextMate

  

textmate_logoTextMate is a great editor, and Python is a great programming languages, but they both have their limitations.

One of TextMate’s nicer features is code folding, which allows you to collapse a block of code (a function, a conditional block, etc.) down to a single line in the editor. This often makes a large piece of code much easier to navigate. TextMate determines where code can be folded by evaluating each line of the code and tagging particular lines with start and end markers.

Doing code folding for Python in this method (only considering a single line at a time) is impossible, since the only clear end of a function in Python is a reduction in indentation level on the next line. For example, here’s a function in C followed by a statement:

int foo(int value) {
   printf("%d", value + 5);
}

foo(24);


Here’s the same code written in Python:

def foo(value):
   print "%d" % (value + 5)

foo(24)


As you can see, if you look at the code one line at a time, it’s not clear where the definition of foo in Python ends, whereas the closing curly brace (}) is clearly the end of foo’s definition in C.

By default, TextMate punts: it defines the start of a function or class definition as the start of a foldable region and the first subsequent blank line as the end of that foldable region.  This doesn’t really work well, because function definitions without any blank lines aren’t really readable past a certain size.

There is, however, a way to hack your way around this issue. Another of TextMate’s killer features is that you can customize almost everything, including the definition for a given language.

Open the Language Editor through the “Bundles > Bundle Editor >Edit Languages …” menu item (as of TextMate 1.5.8, anyway).

Pull up the language editor for Python, find a line that looks like this:

foldingStopMarker = '^\s*$|^\s*\}|^\s*\]|^\s*\)|^\s*"""\s*$';


and replace it with this:

foldingStopMarker = '^\s*#\s*END_DEF ([a-zA-Z0-9_<]+)';


Now, if you want to fold a function definition in Python, stick a comment at the end of it:

def foo(value):
   print "%d" % (value + 5)
#END_DEF foo


Hope fully this helps out some of you who are using TextMate for Python programming.

Dealing with WordPress Database Inconsistency

  

I just finished wrestling with a problem with WordPress. Editing a really old page tickled some sort of internal inconsistency in the database, which created a zombie page that could neither be edited nor deleted. I really wasn’t looking forward to crawling around in the database, so I did some googling and came up with Lester Chan’s brilliant WP-DBManager. One click of the “Repair” button and allowed me to deliver the deathblow to the zombie page, saving me a boatload of time in the process. Thanks, Lester!

Preventing data loss

  

OK, so it's not a forest fire, but you get the point.

I’ve always been the sort of person that saves receipts. Until very recently, I had an entire filing box full of receipts, warranty cards and software license keys. This system bothered me in much the same way that keeping paper copies of research papers did: they can’t be searched (did I file the receipt for my coffee table under C for coffee table, T for table, I for Ikea, F for furniture?), and they take up space in my apartment, where space is pretty limited. There’s the additional problem of a paper receipt’s relative fragility: there’s only one copy of it, and it can be burned, crumpled, crushed, soaked, can fade into unreadability, and so on. The solution was pretty straightforward; I scanned everything in the box, gave them descriptive file names and let Spotlight index them.  To provide some additional protection against data loss, I burned two copies of the newly-scanned PDFs onto two separate DVDs and mailed one to my parents. This got me thinking; sure, this way it’s much less likely that my important documents will be destroyed in the event of a disaster. Really though, how safe is my data? More importantly, how safe can my data really be?

Some studies point to the fact that the lifespan of recordable CDs and DVDs is quite short (in the neighborhood of three to five years), which means that any “permanent” storage on DVDs would need to be refreshed on about that interval to remain current. Now, given that most of what I burned on those DVDs don’t need to be kept for more than five years, that shouldn’t really be a problem. Also, these are kept in relatively controlled conditions – no direct sunlight, ambient temperature where they are doesn’t get much colder than about 55 or much hotter than about 100. But what about things like photos, home videos? Stuff that is irreplaceable and would need to be stored over a period of, say, decades?

Every data retention scheme you can name has its breaking point, some condition that could irretrievably kill your data. In my opinion, the important factors are the likelihood and inevitability of the failure and if the failure can be proactively avoided. Let’s take a look at a few ways to store data sorted from least paranoid to “tin-foil hat” paranoid.

Single disk:

  • Data is lost when: the disk fails.
  • How likely/inevitable is this?: All disks will fail, it’s just a matter of when.
  • Proactive avoidance?: Unless you go to some outside storage source, none.

Scarily, this is what most people are relying on. Professional drive recovery companies might be able to get your data off the disk, but it will cost a pretty penny.

RAID array (multiple, redundant disks):

  • Data is lost when:
    • All redundant disks fail before the array can be rebuilt. Especially heinous if your disks all croak of the same hardware problem, or your power supply kills them by catching on fire.
    • (Hardware RAID only) RAID controller fails, no replacement or equivalent controller can be found
  • How likely/inevitable is this?: With enough diligence (and enough backup disks), this shouldn’t be too likely.
  • Proactive avoidance?: More redundant disks, software RAID, quick monitoring and early detection of drive errors, always having a spare drive or two handy.

Burning to CD/DVD:

  • Data is lost when: the disk becomes unreadable (crushed, melted, thoroughly scratched, hacked into un-spinnable chunks, or some other loss of structural integrity)
  • How likely/inevitable is this?: Some might argue that it will happen eventually, but it’s a lot less likely than your disk failing, especially if you don’t expect to touch the data itself frequently.
  • Proactive avoidance?: More copies, stored in more places. Re-burning periodically.

Backup to off-site computer/external disk that’s kept somewhere else:

  • Data is lost when: both the storage system storing the original and the storage system storing the backup fail before an additional backup can be brought online
  • How likely/inevitable is this?: Unless you’re not paying attention to your backup for long periods of time or there’s a nuclear war, you’re probably OK. The amount of time you have to react depends, of course, on the storage setup of the two machines.
  • Proactive avoidance?: More than one backup, as widely distributed geographically as you can afford.

Cloud Storage (Amazon’s S3, others):

  • Data is lost when: hopefully never. Companies providing cloud storage services spend a lot of money making sure that this doesn’t ever happen. Amazon’s SLA for S3 doesn’t even mention data loss, just the minimum amount of time they guarantee that the service will be available per year.
  • How likely/inevitable is this?: Much like the nuclear war scenario, if S3 loses data it will probably be on the news, at least in the Silicon Valley.
  • Proactive avoidance?: Multiple storage clouds, owned by multiple vendors. This is truly the peak of tinfoil-hat paranoia.

In my opinion, there are only two real downsides to the cloud storage approach. The first is that upload bandwidth, for most residential Internet customers in the United States, just plain sucks, and uploading a non-trivial amount of data is painfully slow. Fiber to the home (and a technology-friendly executive branch) might change this, but it won’t happen in the immediate future. Second, S3 costs money, both to upload data and to store it there. If you’re not willing (or able) to pay the bill, this isn’t for you.

This is by no means an exhaustive list, and there are a lot of hybrid approaches. I back up my computer’s hard drives to external drives periodically, and those drives spend most of their time in my desk drawer. My various media (photos, music, digital copies of DVDs) are stored on a software RAID-1 in the media center PC. As an additional layer of protection, all my photos are on Flickr, all my music is synchronized with my iPod fairly regularly and my DVDs are, well, DVDs.

This setup, while pretty decent, is by no means foolproof. One major concern that I haven’t talked about here is data corruption, and I’ve got pretty much zero defense against that. Once Apple releases Snow Leopard, hopefully I’ll be able to transfer all my data over to ZFS volumes. I could nerd out over ZFS for a whole other post, but the short of it is that ZFS makes far fewer assumptions about data’s integrity and comes pretty close to eliminating the data corruption problem.

Automatic Everything

  

This was going to be an extended rant on how people use databases where people shouldn’t use databases, but the more I wrote the more I realized that this had been analyzed quite a bit by many in the systems research community and blogosphere at large, many members of which are far more knowledgeable than I. So I’ll summarize my rant in a paragraph and then move onto more philosophical, “meta”-type comments.

Twitter’s architecture (as much as they’ve shown us) is a Ruby on Rails app backed by a MySQL database. This combination is the Golden Hammer of Web 2.0. A frighteningly large number of web application developers seem to follow the mantra, “If I need to store data, use SQL as a Big-Ass Table (no, not that Big-Ass Table). Who needs high-speed middleware? I’ll write everything in Ruby!” The problem is that schema design is as close to alchemy as CS gets and tuning databases is tedious and hard to do right. If you are writing something that must process tens of thousands of messages a day, do not think you can write it in an interpreted language and have it frequently converse with a database. If you think this will work, you are living in a magical dream world. I’m talking directly to you, Twitter, you poor sad whipping boy of the Web 2.0 universe. Please, for your own sake, rewrite Starling in C or C++ and use a more suitable back-end.

That concludes the synopsis of my multi-page rant of doom. Now, for the meta: if I were to write an essay for NPR’s This I Believe, the following would be that essay.

I believe in telling systems what I want, not how to get it, and having them give it to me as quickly as possible. I believe that programmers are lazy, and that middleware should give them the ability to do the right thing the easy way. I believe in intrinsic scalability and building on sound principles. I believe that the disk is evil and writing to it should be avoided until you have no other choice. I believe in most of what databases do and in the potential of what their descendant systems can and will do.

I believe in the awesome potential of automatic everything.

Messing with Thom Yorke’s head

  

Radiohead’s latest music video was shot without cameras. Instead, they used a combination of reflected light and lasers to generate clouds of points in 3D. Google was nice enough to provide the rest of the world with some of the 3D point cloud data collected for that music video. A big piece of that data is about 2100 frames of lead singer Thom Yorke’s head. A frame of the original data (when output via Processing) looks like this:

If you look closely, you’ll notice that the point cloud is really noisy around the edges. A simple high-pass filter later and that same frame looks like this:

That’s a little more manageable. I figured, why stop at point when you can have 3D surfaces? One of the more straightforward ways to make a 3D surface out of a bunch of points is to stick a bunch of triangles in between the points, creating what’s called a Delaunay triangulation. This is a really compute-intensive calculation and I don’t exactly have a supercomputer on hand, so I did a lot of fudging and approximation. Even with all that fudging, each of these frames took as much as 5 minutes to render. This process has been running for most of last week while I’ve been at work. That same frame above looks like this when Delaunay-triangulated:

Notice that it’s a little noisy, which is mainly due to some approximation on my part as well as some leftover noise in the point cloud. The video below shows what happens when you sequence all 2100 frames together. Enjoy!

This is the future – why are my updates still failing?

  

Software Updates are BrokenSo anyone who has an iPhone or iPod Touch will be pretty aware that Apple’s update servers basically fell over in response to all the demand today due to the new iPhone firmware. Recently, Firefox’s update servers suffered exactly the same problem. Now I’m sure that these guys have a really expensive load balancer in front of their update server cluster, but why in the world are so many major companies still having all their users go to a single place for updates?

If I want to download an update from Software Update today on my home computers, I have to do it three times - once for my Mac Mini (file server/backup server/media center), once for my laptop and once for my tower. The actual update binary is, in most cases, identical. If I wanted to only download the update once, I’d have to find where Software Update keeps the update’s installer file, copy it to the other machines and run it there. In some cases I have to download tens or hundreds of megabytes of file that could easily be transferred over my home network, saving both my time and the update provider’s money.

The thing that’s the most irritating about this is that it’s a completely solved problem. Blizzard, for example, distributes updates to World of Warcraft over Bittorrent. My roommate just started playing WoW again and had to install a patch (~2 GB) on two of his computers. He downloaded and installed the patch on the first computer, which took about an hour and a half. The download-and-install process for the second computer took all of about five minutes because the computer automatically recognized that a source for the update existed on its local network and downloaded the file peer-to-peer from the other machine.

Imagine if everyone interested in downloading the iPhone patch could download it not only from Apple but from each other. After the first few hundred downloads (which would have to pull directly from Apple) most of the remaining transfer would be peer-to-peer. If iTunes needs to authenticate the phone with Apple before installing, that’s fine; the load on the servers from authorization would be far lower and of a much shorter duration than the load from patch downloading. Security, of course, is an issue with Bittorrent-esque downloads, but there are relatively straightforward ways to deal with that.

I’m just saying it’s about time that someone did something about this, because it’s getting a little ridiculous.

Working for The Man

  

microsoftresearch.jpg

Here’s something you never thought you’d be hearing from me: I’m working for Microsoft this summer.

Well, I’m working for Microsoft Research this summer, anyway.

Wait, put out those torches, put down the pitchforks and let me explain.

Some of you may know that I’ve been somewhat … critical … of Microsoft in the past. My anti-Microsoft sentiments have mellowed somewhat in recent years, however, The XBox 360 may have been to blame for that, and my hatred of Windows has ebbed since I realized that building operating systems is hard and keeping them working is even harder. Also, it’s no longer my job to fix PCs running Windows (thank goodness) so I haven’t seen the Windows installations of security-casual college students in a while.

So why MSR? Microsoft Research is where a lot of the interesting corporate systems research is happening right now, and the project I’ll be working on is really closely related to my current research. I’ll get to spend three months in the Seattle area during the three months that the area’s weather is really nice. Compared to what I would make at UCSD in three months it’s a huge chunk of cash. It’s a huge stack of win all the way around.

Cure for a scratchy speaker

  

potentiometer.gif

I had a problem with my speakers for a while, and it was getting on my nerves so much today that I finally sat down and tried to fix it. Essentially, when you adjusted the volume on these speakers, sound would cut in and out on the left speaker unless you had it set on just the right volume or you jiggled the knob for a few minutes. Even after lots of jiggling and turning, the knob was picky; every once in a while the left speaker would just  sort of stop working (and you’d wonder if you’d suddenly gone deaf in one ear).I had figured from the beginning that the problem was the knob itself. The knob is just a potentiometer (an electronic doodad that variably limits current flow based on the position of the knob), and sometimes the contact between parts of a potentiometer can get gunked up, causing the entire works to go crazy – at least that’s what the Internet tells me. Love the technical explicitness of that last paragraph? Blame the fact that I hated my electronics course in college.

Anyway, I called up my dad and asked for his advice on the subject. I was ready to rip the speaker open and spray lubricant on the knob, which may or may not have been a good idea. He suggested just turning the knob through its entire range of motion about 50 times. It turns out that worked like a charm – the speakers are as responsive as the day I bought them. Rocket scientist dad for the win!

Leopard

  

 

Leopard

Mac OS 10.5 is here!

First off, a warning. Don’t buy the Leopard Family Pack. I did, because I’m a sucker. I paid an extra $100 for a sticker on my box and an extra word on my receipt. Seriously, Apple, at least give me a CD key or some paper authorization of my extra licenses. Don’t just give me a sticker. I can fake a sticker.

 

Here are some of the things I really like about Leopard:

  • NetInfo Manager is gone! I hated that program so much. It’s been replaced by about 3 different control panels, but things are where you’d expect them to be instead of in an obscure, atrociously ugly utility.
  • Frostier frosted glass menus! Eat it, Vista! 
  • Terminal finally has tabs. iChat also finally has tabs. It’s a tab fiesta.
  • Although Screen Sharing uses VNC, it’s not glacially slow; their compression algorithm isn’t half bad.
  • Based on my benchmarks Spotlight is exactly a gajillion times faster than it was in Tiger.
  • Believe it or not, Leopard actually runs really well on my 4-and-a-half-year-old Powerbook G4. I am really surprised.
  • Apparently the OpenGL drivers are much improved, although I don’t really run games anymore.
  • NFS mounting (and auto-mounting for that matter) is no longer a gross hack. Hooray for nfs://
  • It’s shiny. Wait, you probably already knew that … but oh man it’s shiny.

I had this problem with my Macbook Pro that basically involved my keyboard suddenly not working for minutes at a stretch, but a clean reinstall seems to have solved that problem (fingers crossed on that one). I haven’t tried Time Machine yet, but I’m looking forward to it.

Next »