Esoteric Tip of the Day #1: Dead Man’s Check

December 12th, 2009

I’m responsible for the care and feeding of way too many computers. I say “way too many” because the probability of one of your computers doing something stupid on a Friday night increases in proportion to the number of computers that can do something stupid. Most of the time, the stupidity is routine (”Oh hey, what a surprise! Firefox is using all my RAM!”) but every so often, they surprise you. I’m going to post something here whenever I manage to fix one of these more “WTF”-type errors under the heading Esoteric Tip of the Day. Chances are you won’t care, but someone Google searching might, and I want to make this sort of information easy to find so that others won’t have to endure this same issue.

I noticed a couple days ago that my Mac mini (which acts as a storage server but will probably be doing more media serving once I get a better TV) was not visible to the rest of my home network. It turns out that the mini’s network interface was having serious issues, because the mini couldn’t even get an IP address. Doing a little digging into the log, I found a lot of log segments that look like this:


kernel AppleYukon2: error - FATAL: SkGeStopPort() does not terminate (Rx)
kernel AppleYukon2: error - Event queued in Init Level 0
configd[14] network configuration changed.
Firewall[66] krb5kdc is listening from :::88 proto=6
Firewall[66] krb5kdc is listening from 0.0.0.0:88 proto=6
kernel Ethernet [AppleYukon2]: Link up on en0, 100-Megabit, Full-duplex, Symmetric flow-control, Debug [796d,6c0c,0de1,0200,4de1,4000]
configd[14] network configuration changed.
Firewall[66] krb5kdc is listening from :::88 proto=6
Firewall[66] krb5kdc is listening from 0.0.0.0:88 proto=6
kernel AppleYukon2: 00000000,00000000 sk98nif - deadmanCheck - nothing received, soft reset of chip
kernel AppleYukon2: 00000000,00000000 sk98nif - deadmanCheck - nothing received, soft reset of chip
kernel AppleYukon2: 00000000,00000000 sk98nif - deadmanCheck - nothing received, soft reset of chip
kernel AppleYukon2: 00000000,00000000 sk98nif - deadmanCheck - nothing received, soft reset of chip
kernel AppleYukon2: 00000000,00000000 sk98nif - deadmanCheck - nothing received, soft reset of chip
kernel AppleYukon2: 00000000,00000000 sk98nif - deadmanCheck - still nothing received, hard reset of chip

There is only one group of people who know exactly what these messages mean, and they are the guys that wrote the AppleYukon2 driver.  Thankfully, a little Googling and some link clicking later, I had a plausible explanation. Apparently this sort of error occurs when one of the networking-related preferences that OS X stores is corrupted somehow. This apparently happened quite a bit when users upgraded to 10.5.7, and it also happened to me. The driver reads a corrupted preferences file, takes some ridiculous action and wedges the underlying hardware in some kind of bizarre state that only a reboot can correct.

Thankfully, OS X can regenerate its preferences files. To do this cleanly:

  • Reboot holding down the Shift key. This will force the computer to boot in Safe Mode and (as a side-effect) rebuild its caches, which is a good thing if they’ve been populated with garbage from your corrupted preferences files, which they probably have been.
  • Delete the following files in /Library/Preferences/SystemConfiguration: NetworkInterfaces.plist, com.apple.airport.preferences.plist, com.apple.network.identification.plist
  • Reboot and re-apply your network settings (IP address, et al)

Special thanks to Daniel Palmer, the poor soul who sat on the phone with Apple to get this resolved. Here’s the thread containing the solution.

Update 3/3/10: This problem has reared its ugly head once again, and I’ve found out a little more information about it:

When you set the speed and duplex settings of your wired network adapter manually rather than keeping the setting at “Automatically” and you disable IPv6, the driver no longer triggers hard resets for whatever reason. In my case, it triggers soft resets at precise 6-minute intervals.

Also, as reader lafber pointed out, this problem only seems to occur when there is no traffic on the wired network. To solve this problem, I wrote a one-line AppleScript script. The application start a command that pings my router every 30 seconds. The command itself runs in a virtual terminal, so I don’t have to keep an application running in the foreground. The AppleScript looks like this:

do shell script “screen -m -d ping -i 30 192.168.1.1″

To get it to start at bootup, I saved that AppleScript script as an application and added the application to my default user’s Login Items in the Accounts system preference pane.

Graduate Application Season

December 7th, 2009

It’s that time of the year again, the most wonderful time of the year: time for college seniors to start applying to graduate school.

Although I’m quite happy where I am now, I look back on grad application time as one of the most physically and emotionally exhausting times of my life. My advisor Amin Vahdat has just written a very thoughtful article on the graduate school application process, how to know if you really want to go to graduate school and how to get into a good program. I also wrote a related post last year on the more technical side of the application process.

To all the applicants this year, good luck! You should consider applying to UCSD (nudge nudge, wink wink).

Decent Python Code Folding in TextMate

July 16th, 2009

textmate_logoTextMate is a great editor, and Python is a great programming languages, but they both have their limitations.

One of TextMate’s nicer features is code folding, which allows you to collapse a block of code (a function, a conditional block, etc.) down to a single line in the editor. This often makes a large piece of code much easier to navigate. TextMate determines where code can be folded by evaluating each line of the code and tagging particular lines with start and end markers.

Doing code folding for Python in this method (only considering a single line at a time) is impossible, since the only clear end of a function in Python is a reduction in indentation level on the next line. For example, here’s a function in C followed by a statement:

int foo(int value) {
   printf("%d", value + 5);
}

foo(24);


Here’s the same code written in Python:

def foo(value):
   print "%d" % (value + 5)

foo(24)


As you can see, if you look at the code one line at a time, it’s not clear where the definition of foo in Python ends, whereas the closing curly brace (}) is clearly the end of foo’s definition in C.

By default, TextMate punts: it defines the start of a function or class definition as the start of a foldable region and the first subsequent blank line as the end of that foldable region.  This doesn’t really work well, because function definitions without any blank lines aren’t really readable past a certain size.

There is, however, a way to hack your way around this issue. Another of TextMate’s killer features is that you can customize almost everything, including the definition for a given language.

Open the Language Editor through the “Bundles > Bundle Editor >Edit Languages …” menu item (as of TextMate 1.5.8, anyway).

Pull up the language editor for Python, find a line that looks like this:

foldingStopMarker = '^\s*$|^\s*\}|^\s*\]|^\s*\)|^\s*"""\s*$';


and replace it with this:

foldingStopMarker = '^\s*#\s*END_DEF ([a-zA-Z0-9_<]+)';


Now, if you want to fold a function definition in Python, stick a comment at the end of it:

def foo(value):
   print "%d" % (value + 5)
#END_DEF foo


Hope fully this helps out some of you who are using TextMate for Python programming.

Dealing with WordPress Database Inconsistency

May 20th, 2009

I just finished wrestling with a problem with WordPress. Editing a really old page tickled some sort of internal inconsistency in the database, which created a zombie page that could neither be edited nor deleted. I really wasn’t looking forward to crawling around in the database, so I did some googling and came up with Lester Chan’s brilliant WP-DBManager. One click of the “Repair” button and allowed me to deliver the deathblow to the zombie page, saving me a boatload of time in the process. Thanks, Lester!

Preventing data loss

January 21st, 2009

OK, so it's not a forest fire, but you get the point.

I’ve always been the sort of person that saves receipts. Until very recently, I had an entire filing box full of receipts, warranty cards and software license keys. This system bothered me in much the same way that keeping paper copies of research papers did: they can’t be searched (did I file the receipt for my coffee table under C for coffee table, T for table, I for Ikea, F for furniture?), and they take up space in my apartment, where space is pretty limited. There’s the additional problem of a paper receipt’s relative fragility: there’s only one copy of it, and it can be burned, crumpled, crushed, soaked, can fade into unreadability, and so on. The solution was pretty straightforward; I scanned everything in the box, gave them descriptive file names and let Spotlight index them.  To provide some additional protection against data loss, I burned two copies of the newly-scanned PDFs onto two separate DVDs and mailed one to my parents. This got me thinking; sure, this way it’s much less likely that my important documents will be destroyed in the event of a disaster. Really though, how safe is my data? More importantly, how safe can my data really be?

Some studies point to the fact that the lifespan of recordable CDs and DVDs is quite short (in the neighborhood of three to five years), which means that any “permanent” storage on DVDs would need to be refreshed on about that interval to remain current. Now, given that most of what I burned on those DVDs don’t need to be kept for more than five years, that shouldn’t really be a problem. Also, these are kept in relatively controlled conditions – no direct sunlight, ambient temperature where they are doesn’t get much colder than about 55 or much hotter than about 100. But what about things like photos, home videos? Stuff that is irreplaceable and would need to be stored over a period of, say, decades?

Every data retention scheme you can name has its breaking point, some condition that could irretrievably kill your data. In my opinion, the important factors are the likelihood and inevitability of the failure and if the failure can be proactively avoided. Let’s take a look at a few ways to store data sorted from least paranoid to “tin-foil hat” paranoid.

Single disk:

  • Data is lost when: the disk fails.
  • How likely/inevitable is this?: All disks will fail, it’s just a matter of when.
  • Proactive avoidance?: Unless you go to some outside storage source, none.

Scarily, this is what most people are relying on. Professional drive recovery companies might be able to get your data off the disk, but it will cost a pretty penny.

RAID array (multiple, redundant disks):

  • Data is lost when:
    • All redundant disks fail before the array can be rebuilt. Especially heinous if your disks all croak of the same hardware problem, or your power supply kills them by catching on fire.
    • (Hardware RAID only) RAID controller fails, no replacement or equivalent controller can be found
  • How likely/inevitable is this?: With enough diligence (and enough backup disks), this shouldn’t be too likely.
  • Proactive avoidance?: More redundant disks, software RAID, quick monitoring and early detection of drive errors, always having a spare drive or two handy.

Burning to CD/DVD:

  • Data is lost when: the disk becomes unreadable (crushed, melted, thoroughly scratched, hacked into un-spinnable chunks, or some other loss of structural integrity)
  • How likely/inevitable is this?: Some might argue that it will happen eventually, but it’s a lot less likely than your disk failing, especially if you don’t expect to touch the data itself frequently.
  • Proactive avoidance?: More copies, stored in more places. Re-burning periodically.

Backup to off-site computer/external disk that’s kept somewhere else:

  • Data is lost when: both the storage system storing the original and the storage system storing the backup fail before an additional backup can be brought online
  • How likely/inevitable is this?: Unless you’re not paying attention to your backup for long periods of time or there’s a nuclear war, you’re probably OK. The amount of time you have to react depends, of course, on the storage setup of the two machines.
  • Proactive avoidance?: More than one backup, as widely distributed geographically as you can afford.

Cloud Storage (Amazon’s S3, others):

  • Data is lost when: hopefully never. Companies providing cloud storage services spend a lot of money making sure that this doesn’t ever happen. Amazon’s SLA for S3 doesn’t even mention data loss, just the minimum amount of time they guarantee that the service will be available per year.
  • How likely/inevitable is this?: Much like the nuclear war scenario, if S3 loses data it will probably be on the news, at least in the Silicon Valley.
  • Proactive avoidance?: Multiple storage clouds, owned by multiple vendors. This is truly the peak of tinfoil-hat paranoia.

In my opinion, there are only two real downsides to the cloud storage approach. The first is that upload bandwidth, for most residential Internet customers in the United States, just plain sucks, and uploading a non-trivial amount of data is painfully slow. Fiber to the home (and a technology-friendly executive branch) might change this, but it won’t happen in the immediate future. Second, S3 costs money, both to upload data and to store it there. If you’re not willing (or able) to pay the bill, this isn’t for you.

This is by no means an exhaustive list, and there are a lot of hybrid approaches. I back up my computer’s hard drives to external drives periodically, and those drives spend most of their time in my desk drawer. My various media (photos, music, digital copies of DVDs) are stored on a software RAID-1 in the media center PC. As an additional layer of protection, all my photos are on Flickr, all my music is synchronized with my iPod fairly regularly and my DVDs are, well, DVDs.

This setup, while pretty decent, is by no means foolproof. One major concern that I haven’t talked about here is data corruption, and I’ve got pretty much zero defense against that. Once Apple releases Snow Leopard, hopefully I’ll be able to transfer all my data over to ZFS volumes. I could nerd out over ZFS for a whole other post, but the short of it is that ZFS makes far fewer assumptions about data’s integrity and comes pretty close to eliminating the data corruption problem.

How to apply to graduate school without going insane

December 19th, 2008

A few friends of mine are applying to graduate school. This was without a doubt one of the most stressful and chaotic things I’ve ever done. Now that I’ve been in grad school for a while, I have a small understanding of how the application process works. My experience is limited, of course, to the programs to which I applied, so your mileage may vary. Also, I applied to ten(!) different graduate programs, so some of these solutions might not apply to you if you’re only applying to one or two.

Getting Organized

One of the most challenging parts of the grad school application process is knowing what everybody wants and keeping it all straight. Just about everybody wants a statement of purpose, but they have different guidelines as to what they want to see (at most X words vs. at most Y pages, single-spaced vs. double-spaced etc). Everyone wants recommendation letters, but they differ as to what sorts of recommendations they expect; do they accept recommendations from people you’ve worked with in an industrial setting, or do they want recommendations strictly from faculty members? The list goes on.

The very first thing I did when applying to graduate programs was to look at their programs’ websites and their application forms and try to answer a few questions:

  • How much is the application fee?
  • What do they say they want in a statement of purpose? (I wrote down exactly what they required there)
  • How many letters of recommendation, and from whom? Do they want them mailed or filed digitally? If mailed, where should I mail them?
  • Do they take GRE scores? If so, do they require a subject GRE? Which one? What are their GRE institution codes? (This last bit is important so you can tell The Testing Mafia where to send your scores)
  • How many copies of my transcript do they need? Do they want it mailed or filed digitally? If mailed, where should I mail them?
  • With which professors would I want to work? What projects have they done or are they doing that I find interesting? (If you can’t answer both of these questions, reconsider applying to this school)

Once I was done making that list (it took me an afternoon, round numbers) I made a folder on my computer called “Graduate School”. Inside that folder, I made two folders, “General Purpose” and “Schools”. The “General Purpose” folder would hold all the information that all schools seemed to want in some form or another – statement of purpose, resume, transcript, extracurricular activities list, work experience, and so on. The “Schools” folder contained a subfolder for each school, and housed not only the application materials for that school but also copies of any confirmation e-mails or webpages I received after completing the application (in case I needed to produce them later for some reason). This über-folder was backed up periodically to a server across town to protect against any major disasters.

The Statement of Purpose

This is the big one. This is the portion of the application that the people for whom you want to do research are likely to read – think of it as your “elevator pitch” for yourself.

Some people say that you should tailor your statement of purpose for each university to which you apply – since I was applying to ten different universities, this tactic didn’t seem feasible. You can say “Oh, a graduate degree from Stanford is the reason I was brought into this world; I will remake the world in Leland Stanford’s image with the help of Professor So-and-so, who I worship as a god among men!” but the people reading your letter won’t buy it. If you’re applying to a program because you’re really interested in X, and a professor in that program is a leader in the field on X, and you’ve had prior research experience in X, then mention it. Otherwise, leave it out because it does you no good.

Be honest, both with yourself and with the university to which you’re applying. I didn’t consider graduate school seriously until my junior year in college, and a lot of the places to which I was applying expected lots of prior research experience from their applicants that I honestly didn’t have. I knew they would notice it, so I came right out and said in my statement (not in so many words, you understand), “Look, I know that I haven’t published anything and that my research experience is kind of slim, but I’m really excited about this stuff and I know that I’ll be able to meet and exceed your expectations”.

Remember that this is not like college applications; your materials are not, by and large, going to be read by some faceless bunch of professional college application readers. The people reading your application are probably among the people whose classes you will take and whose research you will do. More importantly, they will be the ones who will fund you and they want to know that they’ll be getting their money’s worth.

Recommendation Letters

So many “how to get into graduate school” websites say that the key with recommendation letters is to get the ball rolling early, and I agree with them. One thing they don’t tell you often enough (and that they should tell you) is that your college’s letter service is your friend. Professors, especially at “research universities”, usually don’t like being bothered about letters of recommendation by undergraduates. It means that, in addition to writing the letter itself, they have to fill out forms and get envelopes and stamps and it takes time away from their work. Your goal is to be as unobtrusive as possible, and the campus letter service helps tremendously with that.

Letter services offer your recommenders the ability to write their letter once and send it to the letter service office along with some identifying information. The letter service then keeps these letters and (for a nominal fee) ships them off to whomever you want without any further involvement from the recommenders. Another plus of this method is that you don’t have to worry about getting all of your recommenders to send everything in on time – just fill out a form online and you’re good to go.

In short, start early, and if your university has a letter service then use it.

Keeping Track of Deadlines

You may be the most diligent person on Earth and have all your applications done well in advance of the deadline. If you’re like most of us, applying for graduate school isn’t the only thing you’re doing – you’re busy. I wrote down all the deadlines for my applications (which fell in a six week window between the middle of December and the beginning of February) and put countdown timers on my Google homepage. Every time I opened a browser, there they were, a grid of about a dozen numbers that kept getting smaller. If that won’t keep you on track, nothing will.

I hope that my experiences with this horrible, tedious process will prove useful to someone. Godspeed, applicants.

Victory

November 5th, 2008

I really don’t know what to say. Thank you to everyone who worked so hard to make this amazing turn of events (that I never thought would actually happen) possible. I know that the words “hope” and “change” have been thrown around pretty liberally by both sides in the past few weeks, but I really hope that this is a sign that the insanity of the past few years is slowly coming to an end.

Proposition 8 and its sister propositions in several other states passed, and I think that’s unfortunate, but expected. It shows, I think, that Americans cannot cleanly separate their politics from their religion. I’m sure there will be counter-propositions and counter-counter-propositions ad infinitum until someone decides to amend the Constitution. We’ll just have to wait and see.

I congratulate President-Elect Obama (that’s still sort of surprising to be able to say, isn’t it?) on his well-earned and decisive victory. Now comes the time when you make good on the promises that got you elected. Please don’t let us down.

Update: I don’t want that “cleanly separate politics from religion” bit to be construed as a slight to religion or the religious. I totally understand how deeply religious people must have been of two minds about this issue, and what I meant by that statement was that people can’t vote on something like gay marriage without being influenced by factors like their religious convictions and that this is exactly why this issue will continue to oscillate forever until some decision is made at the national level.

Attacking religious people because you’re against Prop. 8 is just as bad, in my opinion, as attacking non-religious people because you’re for Prop. 8, and I apologize if my remarks were in any way misconstrued.

My Digital Transition

November 1st, 2008

After coming back from Microsoft with another bundle of printed research papers in hand I found that, in the course of a year, I’d amassed a stack of read (and needing-to-be-read) research papers that filled a set of binders almost a foot thick. This would be fine if I had categorized them and knew exactly what went where, but I hadn’t and I didn’t. In fact, I had no idea what was in those binders. Furthermore, on several occasions I’ve found myself saying, “Damn, I know there’s a paper about this that I’ve got in these binders …” and not finding anything after flipping through them for about 10 minutes. “Self,” I said to myself, “there’s got to be a better way!”

I identified three problems that needed to be solved:

  • A foot of paper a year, if it continued growing at that rate, would be as tall as me before I graduated. That just isn’t sustainable.
  • Even if I kept all six feet of paper, it would be impossible to find any one of them in that stack.
  • Often, I’m looking for a particular topic or a group of related papers rather than a single paper.
  • I write notes in my papers, and I’d like to retain the annotating flexibility of scribbling on a paper with a pencil when transitioning to a digital system.

At this point, I think I’ve got all but the last one figured out.

I had considered just getting a Fujitsu ScanSnap document scanner, as it translates paper directly into PDFs, but $500 for a document scanner was way too pricey. One of the things I had going for me was that all of the research papers I’d printed started out as PDFs. The easiest and cheapest solution was to find the original PDFs, index them, add any annotations I’d put on the papers beforehand, and recycle the originals. In the end I chose to not transfer the annotations; frankly, I couldn’t decipher most of them and it would have taken too much time. Armed with Google Scholar I was able to find PDFs (and bibliographic information) for my entire paper stack in about 90 minutes.

Now that all the PDFs were downloaded, I inserted them all into Referencer. Two nice things about Referencer are its ability to store bibliographic data (in the form of BibTeX citations) along with papers and its ability to associate papers with tags, descriptive words or phrases (similar to how del.icio.us does bookmarks). This really helps when searching for papers that fit a given topic and is a lot more flexible than any fixed filing system. It also allows you to give each paper a text “note”; I find that, if I take notes on a paper, this forces me to be more concise and structured in writing my thoughts about it than scribbling on a page’s margins would be.

Now that the papers were digitized and tagged, they needed to be searchable. If it’s one thing that Google and Spotlight have taught me, it’s that if it’s not searchable, I won’t find it. Beagle does an admirable job of indexing the text of all my PDFs automatically.

The last step in the process was to make sure that when(!) my hard drive died I wouldn’t lose all my paper data. This was relatively easy, since UCSD just got a shiny new NetApp file server with a boatload of redundant storage. I’ve set up a cron job to synchronize my home folder to that server every night at midnight.

Now I’ve got a much more accessible paper library that’s really easy to maintain. The march toward paperlessness doesn’t stop there, however; just a couple weeks ago I recycled several pounds of software manuals that, if I ever needed them again, I could find online. As a result I’ve got one less storage box in my closet which is a big help given that I live in a pretty small space. I have a feeling my “important documents filing box” is next.

Automatic Everything

August 21st, 2008

This was going to be an extended rant on how people use databases where people shouldn’t use databases, but the more I wrote the more I realized that this had been analyzed quite a bit by many in the systems research community and blogosphere at large, many members of which are far more knowledgeable than I. So I’ll summarize my rant in a paragraph and then move onto more philosophical, “meta”-type comments.

Twitter’s architecture (as much as they’ve shown us) is a Ruby on Rails app backed by a MySQL database. This combination is the Golden Hammer of Web 2.0. A frighteningly large number of web application developers seem to follow the mantra, “If I need to store data, use SQL as a Big-Ass Table (no, not that Big-Ass Table). Who needs high-speed middleware? I’ll write everything in Ruby!” The problem is that schema design is as close to alchemy as CS gets and tuning databases is tedious and hard to do right. If you are writing something that must process tens of thousands of messages a day, do not think you can write it in an interpreted language and have it frequently converse with a database. If you think this will work, you are living in a magical dream world. I’m talking directly to you, Twitter, you poor sad whipping boy of the Web 2.0 universe. Please, for your own sake, rewrite Starling in C or C++ and use a more suitable back-end.

That concludes the synopsis of my multi-page rant of doom. Now, for the meta: if I were to write an essay for NPR’s This I Believe, the following would be that essay.

I believe in telling systems what I want, not how to get it, and having them give it to me as quickly as possible. I believe that programmers are lazy, and that middleware should give them the ability to do the right thing the easy way. I believe in intrinsic scalability and building on sound principles. I believe that the disk is evil and writing to it should be avoided until you have no other choice. I believe in most of what databases do and in the potential of what their descendant systems can and will do.

I believe in the awesome potential of automatic everything.

CD Recommendation of the Epoch

August 14th, 2008

20 Minute Loop uses harmony as a weapon of mass catchiness. I approve. Here’s a taste:

20 Minute Loop – Our William Tell

Next »