Posted: February 12th, 2011 | Author: Alex | Filed under: Advice (Unsolicited) | Comments Off
I can’t overstate the importance of doing backups. Your data is important, and it should be protected. For the next couple of posts, I’m going to do a deep dive into backup strategy and how the changing face of computing is changing the way we have to think – or in some cases not think – about backups.
Why bother with backups?
Simple: someday, your hard drive will die. Backups are not like insurance; what you are preparing for is not a hypothetical situation. This is where I start to sound like a jabbering paranoid, but trust me, it really sucks to lose data.
I wrote a post a couple of years ago that examined a couple of different ways to do backups. I’ve changed my opinions somewhat since then, so hopefully this adds something to the points raised in that post.
What is a backup, really?
It’s easier to define what a backup is by defining what a backup isn’t.
I should preface this by saying that anything is better than nothing. You may not be able to feasibly satisfy a backup by my definition for all of your data, but the more of these points you’re able to hit, the better off you’ll be. With every layer of protection you put in place, the probability of losing data goes down. It never quite goes to zero, but it gets pretty close.
Your data is not truly backed up if:
- Your data isn’t stored on two different drives.
- Your data isn’t stored on two different file systems.
- Your data isn’t stored in two different computers.
- Your data isn’t stored in two geographically distant locations.
- Your data isn’t duplicated automatically.
- You can’t make at least one of the copies go back in time.
Two different drives is a no-brainer: if a hard drive fails, the data on that drive is probably gone forever. Putting your data on two different drives allows your data to survive the failure of one of them.
I say “probably gone forever” because I’m purposefully ignoring filesystem-level recovery systems like Norton Ghost and drive-level recovery services like Disk Doctors. I’m doing this because they cost lots of money and time and are not necessarily guaranteed to work.
Storing your data on two different file systems protects again stupid mistakes and computer compromise. If someone logs into your computer and deletes your data or a software bug hoses your file system, even if that data was stored on a RAID 1 array, it’s gone. It’s better to treat a RAID 1 array as a really robust single drive rather than as a complete backup solution by itself.
Storing your data on two different computers is an extension of the same protective strategy. If someone logs in to one computer and deletes data on all the drives on that computer, that data is gone unless it was stored somewhere else. Similarly (and somewhat more commonly) if your power supply goes kaboom and melts your computer and all the copies of that data are inside that one computer, that data is gone.
If you only go this far, you’re pretty well-protected against the loss of a single drive or a single machine. The next layer of protection, backing up in multiple places that are separated by distance, protects you against catastrophe, i.e. your house is destroyed or you get robbed.
If you’ve gotten this far and are diligent, the only things that are likely to kill your data are a major natural disaster or lots of different things failing at once.
The next point, automatic backup, is really a question of convenience. Personally, I don’t want to spend all day babysitting different copies of my data. You should find a backup solution that works automatically and reliably in the background and, more importantly, tells you quickly when it’s not working. The whole situation falls apart if you think you’re doing backups but you’re not.
The last point, time travel, is something that I think a lot of people overlook. If you do something stupid like delete data that you didn’t want to delete accidentally, it’s possible that the automatic backup system you’ve got in place will back up your mistakes and delete the backed-up copies of the data as well. Going back to an arbitrary past copy of your data isn’t really all that necessary in my opinion, but you should at least be able to go back in time 24 hours. Most backup systems allow you to do this, at least to some degree.
Paranoia In Action
Next time, I’ll give some practical examples, differentiate static data from dynamic data, and show how your data is probably easier to back up than you think.
Posted: January 27th, 2011 | Author: Alex | Filed under: Esoteric Tips | 1 Comment »
In this second Esoteric Tip, I focus on a way to improve the way that Terminal.app handles ANSI colors.
There are a set of standard escape sequences that most terminals support and that correspond to the same actions. If you’re really interested in all the gruesome history of the standard, I encourage you to read the Wikipedia page on the subject. Of particular interest in this post are commands of the form “color the text a certain color starting now”. A program can issue these commands to the terminal as it prints; this is what commands like the ridiculously useful colordiff use to output to the terminal in color and emphasize things that it thinks need emphasizing. Unfortunately, while Terminal.app understands these color commands, there’s no way of customizing the colors used, even though there is support for customizing background and text color. This is particularly annoying if you have a custom Terminal color scheme (which I do, more on that in a minute) and your background is, say, gray; dark green on gray looks terrible.
The only way to fix this at the moment is to use SIMBL and essentially hack Terminal.app to do what you want. There’s a lovely little SIMBL plugin call TerminalColours that enables changing what colors map to what color strings.
I like color schemes that are easy on the eyes (understand, I spend a lot of time looking at terminals and text editors) and I’ve grown particularly fond of the Zenburn color scheme. I use it in Emacs all the time, and now I can use it in Terminal.app as well. I found a Zenburn Terminal.app theme, but it didn’t customize ANSI colors and thus looked pretty awful on color terminals.

Blech.
After installing SIMBL and the TerminalColours plug-in, I rooted around in the Zenburn color scheme file that I use for Emacs to determine the right colors to use. Here they are for convenience, in the form “Color: Red, Green, Blue”
| Normal Colors |
Light Colors |
Black: 0, 0, 0
Red: 204, 147, 147
Green: 127, 159, 127
Yellow: 224, 207, 159
Blue: 140, 208, 211
Magenta: 220, 140, 195
Cyan: 147, 224, 227
White: 220, 220, 204 |
Black: 112, 144, 128
Red: 220, 163, 163
Green: 143, 178, 143
Yellow: 240, 223, 175
Blue: 148, 191, 243
Magenta: 236, 147, 211
Cyan: 147, 224, 227
White: 255, 255, 255 |
After modifying the colors with the handy menu provided by TerminalColour, the terminal is actually readable:

Update: You can download my modified Zenburn terminal theme here (right-click + Save As works best).
Posted: January 27th, 2011 | Author: Alex | Filed under: Esoteric Tips | Comments Off
I spend a lot of time on the command-line, mostly for running experiments and running various scripts over log files. OS X’s Terminal.app has gotten a lot better in the last few years, but there are a couple of things that don’t quite work out of the box. The next two tips in the Esoteric Tips series focus on fixing some of those quirks and making Terminal.app a little more usable.
Ctrl+Arrow Keys
I spent a couple years working on a Linux box during the day, so I got used to a common feature of Gnome’s terminals: if I held down the Ctrl key and pressed the arrow keys, I would go forward and backward by a word. This is especially useful, for example, if you just executed ./foo.py dir1 var2 var3 and want to execute ./foo.py dir2 var2 var3; just hit the up arrow to go to the previous command in your history (I’m assuming bash here), hit Ctrl + left arrow a couple times to go back two words, change dir2 to dir1. Fewer keystrokes, takes less time, and your hands hurt less. Key combinations like this send an escape sequence to the terminal, typically octal 33 (the Escape key, or from the terminal’s point of view “treat the next character(s) as a command”) followed by a character or two indicating what action to perform. The terminal looks up the character in a table and performs the appropriate action (in our case, move the terminal’s cursor left or right one word).
If you are used to this working, switching to Terminal.app will be a bit of a shock because the escape sequence sent when Ctrl+arrow keys are pressed doesn’t do the right thing; instead, it inserts a capital C or a capital D wherever your cursor happens to be at the time. So that previous example would leave my command line looking like this:
./foo.py dir1 var2 var3CDCDCD
This is pretty irritating. Fortunately, there’s an easy fix.
In Terminal, go to Preferences. In the Settings tab, there’s a sub-tab called Keyboard that contains a mapping of key combinations to the control characters they output. Here’s what that window looks like:

The two keyboard combinations you care about are “control cursor left” (highlighted in the above picture) and “control cursor right”. First, select “control cursor left” and click “Edit”. You’ll get a dialog box that looks like this:

Select the text in the text box under Action (that currently contains “\033[5D”) and press the Escape key followed by the ‘b’ key. The dialog box should look like this:

Click OK.
What you’ve just done is told Terminal.app to send the Escape key (octal 33) followed by the character ‘b’ whenever you press Ctrl+left arrow, which has the effect of moving the terminal cursor to the left one word.
Do a similar sequence of steps for control cursor right, except enter Escape + f instead of Escape + b. Your preferences window should look like this:

Now, Ctrl+left/right arrow should do what you (well, at least what I) expect them to do.
Posted: November 22nd, 2010 | Author: Alex | Filed under: Advice (Unsolicited), Computers, Opinions (Uninformed) | 2 Comments »

Thanks to PanelsOnPages.com for the image
On November 1st at 9 AM, online registration for 4-day passes to San Diego Comic Con began. By 9:05 AM, the massive volume of registering attendees caused the registration system to become inaccessible. By 10:30 AM, Comic Con International closed the registration site down, claiming that it would re-open in three weeks.
At 6 AM PST this morning, registration re-opened. By 6:05, the registration site was once again inaccessible due to overwhelming traffic volume. A lucky few managed to get all the way through the registration process, but most of us were left repeatedly retrying until, at 7:30 AM, the registration site was once again closed.
In the grand scheme of things, this was not a disaster. However, it inconvenienced a large number of people, myself included. In my opinion, both of these problems were entirely preventable.
I do research in the design of scalable, high-performance large-scale systems. Many people in my field work on technologies that are designed to prevent exactly the kind of failure to operate at scale that occurred this morning. While I am by no means an expert, I feel that I can speak from an informed position about scalability issues like this one. In this post, I’m going to speculate on what happened, and what might be done to fix the problem inexpensively.
Disclaimer: I don’t work for Event Planning International Corporation (EPIC) or Comic Con International (CCI), and I was only an external observer of their registration meltdown, so I don’t know exactly what occurred. I’ve seen problems like this documented enough times that I think I can guess what really happened from that documentation and personal experience. I also don’t work for Amazon or any of the other companies mentioned favorably in this post, although I have clearly consumed vast quantities of the cloud computing Kool-Aid.
I’m going to assume some things about EPIC’s architecture. In particular, I’ll assume they have a single well-provisioned database system and a handful of web servers, possibly with a load balancer sitting in front of them and distributing requests evenly among them. This is how most small websites usually look.
“What we’ve got here is a failure to communicate.”
In my mind, the timeline of the meltdown looked something like this:
6 AM: CCI posts a link to the registration site on its homepage. Frantically refreshing nerds see the green box and click it, presenting them with step one of five: enter your name and e-mail address.
Over the next 90 seconds or so, several hundred people open connections to the registration site. Since the page is just vanilla HTML and maybe a little PHP or Javascript, the web server(s) cache the first page and everything is running fairly smoothly, everything’s being served out of memory and all is right with the world.
People start clicking the “Next” button to proceed to step two. Each click of the “Next” button causes an HTTP POST request to be sent to a PHP script housed on one of the web servers. This script inserts some data into the database indicating that a person whose name is X and whose e-mail address is Y has reserved an attendee slot for the next few minutes – this is mainly there so that you don’t register more people than you can fit in the convention hall. While this is happening for the first few hundred requests, several thousand more registrants are about to hit the “Next” button and start this process for their registrations. The database server starts fielding thousands of connections and inserting thousands of rows into the registration lock table at once. It starts to run out of memory and starts swapping, or it hits 100% CPU utilization, or it’s disks are seeking all over the place. It gets slower and slower. Eventually, it effectively stops returning responses to queries.
So now the database server is effectively hosed. The web servers continue to issue database queries anyway, and they have an increasing backlog of PHP scripts waiting for their database queries to return. The web servers’ memories fill up with session state, the PHP scripts’ stacks, TCP connection state, etc. Eventually, the web servers run out of memory, they start swapping and their performance essentially drops to zero. Requests for page two begin to time out. “500: internal server error” begins popping up on the screens of frustrated nerds around the globe. These users furiously hit “Refresh” hoping that the website will come back to life, which creates new requests and only makes the problem worse.
At this point, sysadmins are running around like their hair is on fire trying to get the problem under control. They try every trick in the book, but nothing works. Frantic phone calls are made. Servers are powered off in hopes that demand will recede if the server is inaccessible for a period of time. Demand does not recede.
After about an hour, the decrease in volume from users giving up releases sufficient system resources for some lucky individuals to be able to advance to step 2: enter your address and phone number. Hundreds to thousands of users do just that and then click “Next” on step 2′s page within about 30 seconds of one another. This issues a flood of HTTP POSTs to another PHP script that is supposed to insert the information contained in the POST into the database and associate it with the name and e-mail address that the user entered in step 1. The problem returns with a vengeance and the servers fall over again. Few users make it to step 3 successfully.
7:30 AM: CCI orders the site closed, claiming that they will be “investigating their registration options”. EPIC (presumably) loses a high-profile client.
Cloud Computing to the Rescue
So, what happened here? Fundamentally, EPIC’s servers were not sufficiently well-provisioned to handle the load presented to it by SDCC’s registrants. The servers couldn’t handle the strain, and so they ground to a screeching halt.
How can we solve problems like this? One way is to buy more and more well-provisioned computers and spread the load across them until the load on a given machine becomes manageable. Unfortunately, this typically involves a lot of up-front and long-term costs: you need to buy the computers, find some place to put them where they’ll remain cool and dry, and fix them when they break. Additionally, when your servers are talking to a database, the server hosting the database must be doubly well-provisioned, often at significant additional cost. Oracle makes a lot its money selling ridiculously well-provisioned database servers the size of a refrigerator for hundreds of thousands of dollars a piece.
Getting enough computers to get the job done does not need to be expensive, however. Pay-as-you-go “Infrastructure as a Service” (IaaS) systems – one of the many classes of systems classified under the blanket term “cloud computing” by the world’s IT pundits – were designed to solve this exact problem.
I’m going to focus on Amazon’s EC2 and RDS here for the sake of example and because they’re the most popular services of their kind, but many other IaaS offerings exist. Joyent and Media Temple are two other great examples of IaaS companies with a strong presence in the marketplace, especially among startups that need affordable, scalable hosting solutions.
Let’s suppose that you need to register 24,000 people (~20% of attendees at last year’s con) and expect peak demand to last around 24 hours. You buy time on 50 “high-memory extra-large” EC2 instances (17.1 GB of RAM, dual-core processors) to use as web servers and ten “high-memory quadruple extra large” RDS database servers (68 GB of RAM, 8-core processors, “high I/O capacity”) to do your query processing. You reserve them on-demand, which essentially means you pay more per-hour but you only pay for what you use. Let’s suppose conservatively that you’ll store 20 GB of data to these database servers (that’s almost 1 MB of data per user, which I’m guess is more than enough to store basic registration information) and that you’ll read and write every byte of it once in 24 hours. Let’s assume that the bank’s credit card transaction processing servers will scale to meet the load; after all, they handle millions of transactions a day routinely (and are extremely well-provisioned because of it).
If you use all of it – that means all 50 web servers and all 10 database servers for 24 hours straight – it’ll cost you about $950 at the end of the month. If you only use the database instances (for 24 hours straight, and read, write and store 20 GB), it’ll cost about $320. (This according to AWS’ cost calculator). That’s far less than the retail cost of the parts in even one of the aforementioned servers.
Need more machines? Buy time on more machines for a few tens of dollars; Amazon’s magical network of services can even balance the load across them for you automatically if you ask it nicely. Don’t need all ten of those database servers you reserved? Only turn on five of them and leave the others alone. Demand starting to die down after 18 hours instead of 24? Start shutting machines down. Magical.
This kind of flexibility and “magic” provisioning of hardware is what makes cloud computing such a compelling way to solve website scalability problems like the one plaguing Comic Con, and it’s one of the reasons why the research and industrial communities alike are so excited about it. I hope that situations like this one will encourage more companies to leverage these sorts of technologies to deploy more scalable websites.
Posted: August 22nd, 2010 | Author: Alex | Filed under: Esoteric Tips, Fun and Games | Comments Off
I have an XBox 360 with a couple of wireless controllers that have rechargeable battery packs and this thing called a “Play ‘n Charge” cable that allows the controllers to charge over USB while still being usable. I picked up one of my controllers after a long (read: months) period of not using it to find that it had completely discharged. Not only that, but it wouldn’t even charge over the cable; the charge indicator light turned from red (charging) to green (charged) almost immediately. For a long time, I thought I just had a faulty battery and ran off of AAs. Then I found this thread.
Disclaimer: I am not an electrical engineer. Well, technically I am, but seriously … I’m not an electrical engineer. I may have gotten the details of this wrong.
All rechargeable batteries will lose their charge if not charged in a long enough period of time. Apparently, if a battery gets sufficiently discharged, it will have trouble accepting a charge at all. The XBox will then become confused, interpreting the inability to pass current through the battery to charge it as the battery being fully charged (hence the “a few seconds of red light, followed by a green light” phenomenon).
Here’s the two step solution:
- Plug the controller (with rechargeable battery attached) into the XBox with the Play ‘n Charge cable. The light on the end of the cable should turn on and be red.
- If the light stays red for a couple minutes without turning green, keep the controller plugged in. Otherwise, unplug the controller from the Play ‘n Charge cable, count slowly to five, and go back to step 1.
After some number of tries (nominally less than 50), the XBox and battery should finally figure it out and charging should begin.
Posted: December 12th, 2009 | Author: Alex | Filed under: Computers, Esoteric Tips | 10 Comments »
I’m responsible for the care and feeding of way too many computers. I say “way too many” because the probability of one of your computers doing something stupid on a Friday night increases in proportion to the number of computers that can do something stupid. Most of the time, the stupidity is routine (“Oh hey, what a surprise! Firefox is using all my RAM!”) but every so often, they surprise you. I’m going to post something here whenever I manage to fix one of these more “WTF”-type errors under the heading Esoteric Tip of the Day. Chances are you won’t care, but someone Google searching might, and I want to make this sort of information easy to find so that others won’t have to endure this same issue.
I noticed a couple days ago that my Mac mini (which acts as a storage server but will probably be doing more media serving once I get a better TV) was not visible to the rest of my home network. It turns out that the mini’s network interface was having serious issues, because the mini couldn’t even get an IP address. Doing a little digging into the log, I found a lot of log segments that look like this:
kernel AppleYukon2: error - FATAL: SkGeStopPort() does not terminate (Rx)
kernel AppleYukon2: error - Event queued in Init Level 0
configd[14] network configuration changed.
Firewall[66] krb5kdc is listening from :::88 proto=6
Firewall[66] krb5kdc is listening from 0.0.0.0:88 proto=6
kernel Ethernet [AppleYukon2]: Link up on en0, 100-Megabit, Full-duplex, Symmetric flow-control, Debug [796d,6c0c,0de1,0200,4de1,4000]
configd[14] network configuration changed.
Firewall[66] krb5kdc is listening from :::88 proto=6
Firewall[66] krb5kdc is listening from 0.0.0.0:88 proto=6
kernel AppleYukon2: 00000000,00000000 sk98nif - deadmanCheck - nothing received, soft reset of chip
kernel AppleYukon2: 00000000,00000000 sk98nif - deadmanCheck - nothing received, soft reset of chip
kernel AppleYukon2: 00000000,00000000 sk98nif - deadmanCheck - nothing received, soft reset of chip
kernel AppleYukon2: 00000000,00000000 sk98nif - deadmanCheck - nothing received, soft reset of chip
kernel AppleYukon2: 00000000,00000000 sk98nif - deadmanCheck - nothing received, soft reset of chip
kernel AppleYukon2: 00000000,00000000 sk98nif - deadmanCheck - still nothing received, hard reset of chip
There is only one group of people who know exactly what these messages mean, and they are the guys that wrote the AppleYukon2 driver. Thankfully, a little Googling and some link clicking later, I had a plausible explanation. Apparently this sort of error occurs when one of the networking-related preferences that OS X stores is corrupted somehow. This apparently happened quite a bit when users upgraded to 10.5.7, and it also happened to me. The driver reads a corrupted preferences file, takes some ridiculous action and wedges the underlying hardware in some kind of bizarre state that only a reboot can correct.
Thankfully, OS X can regenerate its preferences files. To do this cleanly:
- Reboot holding down the Shift key. This will force the computer to boot in Safe Mode and (as a side-effect) rebuild its caches, which is a good thing if they’ve been populated with garbage from your corrupted preferences files, which they probably have been.
- Delete the following files in /Library/Preferences/SystemConfiguration: NetworkInterfaces.plist, com.apple.airport.preferences.plist, com.apple.network.identification.plist
- Reboot and re-apply your network settings (IP address, et al)
Special thanks to Daniel Palmer, the poor soul who sat on the phone with Apple to get this resolved. Here’s the thread containing the solution.
Update 3/3/10: This problem has reared its ugly head once again, and I’ve found out a little more information about it:
When you set the speed and duplex settings of your wired network adapter manually rather than keeping the setting at “Automatically” and you disable IPv6, the driver no longer triggers hard resets for whatever reason. In my case, it triggers soft resets at precise 6-minute intervals.
Also, as reader lafber pointed out, this problem only seems to occur when there is no traffic on the wired network. To solve this problem, I wrote a one-line AppleScript script. The application start a command that pings my router every 30 seconds. The command itself runs in a virtual terminal, so I don’t have to keep an application running in the foreground. The AppleScript looks like this:
do shell script “screen -m -d ping -i 30 192.168.1.1″
To get it to start at bootup, I saved that AppleScript script as an application and added the application to my default user’s Login Items in the Accounts system preference pane.
Posted: December 7th, 2009 | Author: Alex | Filed under: Advice (Unsolicited), School | Comments Off
It’s that time of the year again, the most wonderful time of the year: time for college seniors to start applying to graduate school.
Although I’m quite happy where I am now, I look back on grad application time as one of the most physically and emotionally exhausting times of my life. My advisor Amin Vahdat has just written a very thoughtful article on the graduate school application process, how to know if you really want to go to graduate school and how to get into a good program. I also wrote a related post last year on the more technical side of the application process.
To all the applicants this year, good luck! You should consider applying to UCSD (nudge nudge, wink wink).
Posted: December 19th, 2008 | Author: Alex | Filed under: Advice (Unsolicited) | Comments Off
A few friends of mine are applying to graduate school. This was without a doubt one of the most stressful and chaotic things I’ve ever done. Now that I’ve been in grad school for a while, I have a small understanding of how the application process works. My experience is limited, of course, to the programs to which I applied, so your mileage may vary. Also, I applied to ten(!) different graduate programs, so some of these solutions might not apply to you if you’re only applying to one or two.
Getting Organized
One of the most challenging parts of the grad school application process is knowing what everybody wants and keeping it all straight. Just about everybody wants a statement of purpose, but they have different guidelines as to what they want to see (at most X words vs. at most Y pages, single-spaced vs. double-spaced etc). Everyone wants recommendation letters, but they differ as to what sorts of recommendations they expect; do they accept recommendations from people you’ve worked with in an industrial setting, or do they want recommendations strictly from faculty members? The list goes on.
The very first thing I did when applying to graduate programs was to look at their programs’ websites and their application forms and try to answer a few questions:
- How much is the application fee?
- What do they say they want in a statement of purpose? (I wrote down exactly what they required there)
- How many letters of recommendation, and from whom? Do they want them mailed or filed digitally? If mailed, where should I mail them?
- Do they take GRE scores? If so, do they require a subject GRE? Which one? What are their GRE institution codes? (This last bit is important so you can tell The Testing Mafia where to send your scores)
- How many copies of my transcript do they need? Do they want it mailed or filed digitally? If mailed, where should I mail them?
- With which professors would I want to work? What projects have they done or are they doing that I find interesting? (If you can’t answer both of these questions, reconsider applying to this school)
Once I was done making that list (it took me an afternoon, round numbers) I made a folder on my computer called “Graduate School”. Inside that folder, I made two folders, “General Purpose” and “Schools”. The “General Purpose” folder would hold all the information that all schools seemed to want in some form or another – statement of purpose, resume, transcript, extracurricular activities list, work experience, and so on. The “Schools” folder contained a subfolder for each school, and housed not only the application materials for that school but also copies of any confirmation e-mails or webpages I received after completing the application (in case I needed to produce them later for some reason). This über-folder was backed up periodically to a server across town to protect against any major disasters.
The Statement of Purpose
This is the big one. This is the portion of the application that the people for whom you want to do research are likely to read – think of it as your “elevator pitch” for yourself.
Some people say that you should tailor your statement of purpose for each university to which you apply – since I was applying to ten different universities, this tactic didn’t seem feasible. You can say “Oh, a graduate degree from Stanford is the reason I was brought into this world; I will remake the world in Leland Stanford’s image with the help of Professor So-and-so, who I worship as a god among men!” but the people reading your letter won’t buy it. If you’re applying to a program because you’re really interested in X, and a professor in that program is a leader in the field on X, and you’ve had prior research experience in X, then mention it. Otherwise, leave it out because it does you no good.
Be honest, both with yourself and with the university to which you’re applying. I didn’t consider graduate school seriously until my junior year in college, and a lot of the places to which I was applying expected lots of prior research experience from their applicants that I honestly didn’t have. I knew they would notice it, so I came right out and said in my statement (not in so many words, you understand), “Look, I know that I haven’t published anything and that my research experience is kind of slim, but I’m really excited about this stuff and I know that I’ll be able to meet and exceed your expectations”.
Remember that this is not like college applications; your materials are not, by and large, going to be read by some faceless bunch of professional college application readers. The people reading your application are probably among the people whose classes you will take and whose research you will do. More importantly, they will be the ones who will fund you and they want to know that they’ll be getting their money’s worth.
Recommendation Letters
So many “how to get into graduate school” websites say that the key with recommendation letters is to get the ball rolling early, and I agree with them. One thing they don’t tell you often enough (and that they should tell you) is that your college’s letter service is your friend. Professors, especially at “research universities”, usually don’t like being bothered about letters of recommendation by undergraduates. It means that, in addition to writing the letter itself, they have to fill out forms and get envelopes and stamps and it takes time away from their work. Your goal is to be as unobtrusive as possible, and the campus letter service helps tremendously with that.
Letter services offer your recommenders the ability to write their letter once and send it to the letter service office along with some identifying information. The letter service then keeps these letters and (for a nominal fee) ships them off to whomever you want without any further involvement from the recommenders. Another plus of this method is that you don’t have to worry about getting all of your recommenders to send everything in on time – just fill out a form online and you’re good to go.
In short, start early, and if your university has a letter service then use it.
Keeping Track of Deadlines
You may be the most diligent person on Earth and have all your applications done well in advance of the deadline. If you’re like most of us, applying for graduate school isn’t the only thing you’re doing – you’re busy. I wrote down all the deadlines for my applications (which fell in a six week window between the middle of December and the beginning of February) and put countdown timers on my Google homepage. Every time I opened a browser, there they were, a grid of about a dozen numbers that kept getting smaller. If that won’t keep you on track, nothing will.
I hope that my experiences with this horrible, tedious process will prove useful to someone. Godspeed, applicants.