Two Months Working with GitHub: A Retrospective

Posted: December 13th, 2011 | Author: | Filed under: Opinions (Uninformed) | 1 Comment »

For a long time, my thesis work was housed in a Mercurial repository on the department NFS server. We had a mailing list where team members could discuss ideas and report bugs. Ideas, to-dos and “bug reports” were kept in a combination of e-mail logs, Google Sites, Google Docs, and various text notes. We tried getting code review to work with a few different tools, but I quickly grew tired of keeping them running. About two months ago, in response to the team’s increasing size and in preparation for releasing the code to some teams within UCSD, we decided to move the project to a private repository on GitHub.

All the cool kids seem to be hosting their source code in GitHub these days, and it’s not that surprising. GitHub’s UI is pretty, responsive, and knows when to stay out of your way. Adding things like post-commit hooks are really straightforward. It’s “social” without being in-your-face about it all the time. We liked the idea of having code review, issue tracking, and a wiki all in one place that we didn’t have to maintain ourselves. Also, we liked the idea of having a private repository that could be made public with the flick of a switch1.

Switching over to GitHub was a big adjustment, but in the end I think we made the right decision.

The one big thing that made us hesitant to switch over to GitHub was Git. Nobody on the team was really familiar with it, and frankly it looked kind of scary compared to Mercurial. Lots of commands, the ability to edit history, etc. It looked really heavyweight. I can say though, that in the long run I’m really glad that we switched to Git, GitHub or no GitHub. Git’s technical complexity is a little daunting at first, and I’m still frightened of rebasing, but the transition was a lot less rocky than I’d expected it would be and I’m happy enough with Git that I find myself picking it over Mercurial even for projects that (for various reasons) aren’t hosted in GitHub.

Moving from “hack, hack, commit, push, hack” to “branch, hack, send a pull request, branch, hack” also seems to have been an easier adjustment than I expected. Once we got comfortable with the idea that branching and merging frequently was OK, things went pretty smoothly. Most of us are old school(?) CVS and Subversion guys who remember when branching was largely more trouble than it was worth (tree conflicts, anyone?), so dealing with many branches at once was a bit of a shock. As it turns out, frequent branching has been more of a boon than a burden. git branch and git stash together have changed my whole workflow. Being able to jump back and forth between tasks without having to have a dozen copies of the repository littering my directory tree is really liberating.

By far my favorite piece of GitHub’s tool suite is the code review system. Having every line of code looked at by at least one other person before it gets committed has really improved our code quality. It also helps to prevent any single person from being the only one who knows how a given module works. Being able to have conversations about parts of the code with other team members and having those comments show up in context is a huge win over ad-hoc fumbling with e-mail (“on line 20 of foo.cc: this thing should change this way; on line 25 of foo.cc: that other thing should change too”, etc).

I’m really impressed with GitHub overall, so much so that I’ve been slowly migrating repositories to them from other services. If you’re looking for a place to host your team’s code, they’ve got a really solid offer and you can’t beat the price (especially considering it’s free for public repositories).


  1. We’re going to open source this work sometime before I graduate. Just not now. It’s not ready now. 

SDCC’s split personality

Posted: July 30th, 2011 | Author: | Filed under: Opinions (Uninformed), Ranting | Comments Off

SDCC seems to be spiritually divided into two separate conventions. There’s the part of the convention that’s about comic books. I’m talking about artists exhibiting their work or trying to get their portfolios noticed, collectors buying and selling and getting stuff appraised. I imagine that this aspect of the convention operates in much the same way it did 40 years ago when the convention was first getting started. There’s a part of it that seems static and ageless in a weird way.

On the other hand, there’s the part of the convention that’s about “popular art”, which is pretty much a euphemism for “stuff nerds like”. This is the part of the convention with toys and games and movies and celebrity signings and Star Wars, so much Star Wars.

I think that both of these sub-conventions are important parts of what comic con has become. The “popular art” portion has become significantly larger than the comics portion, however, and that separation is getting more and more pronounced by the year. The only place I was ever able to freely move around on the show floor was Artists’ Alley (where individual artists can show off their portfolios), because it was pretty much deserted. Similarly with the vendors with endless boxes of bagged-and-boarded single issues; there just wasn’t anybody there. I don’t view that intrinsically as a bad thing, but I fail to see why some of these artists and vendors keep coming to the convention given how expensive it must be to get space on the floor. Maybe there’s an industrial interaction happening there (vendor-to-vendor, or artist-to-artist) that I’m just not seeing.

The thing that’s special for me about conventions like comic con is that it gives you, as a fan, the chance to interact with the people who make the things that you love. Whether that’s comic books, video games, TV shows or movies, you can actually meet and support (either directly through your cash, or indirectly through your public, vocal enthusiasm) the creators themselves.

There’s always going to be the marketing side of the con – after all, people are there so that you can buy their stuff, otherwise they wouldn’t have bothered coming all this way. That said, there’s a big difference between marketing in support of the creatives and marketing in support of the faceless mega-corporations, and you can definitely see some of the latter sneaking their way in. For some reason I find it really hard to pin down exactly how I feel about this, but some (not many, but some) of the large vendors had this tangible feeling of insincerity to their entire presentation, like they were hyper-optimizing for comic-con’s stereotypical target demographic (socially-awkward men in their 20s and early 30s who like bacon, apparently?) and were hoping desperately that we’d fall for it.

My only real concern with SDCC’s growing popularity is that the cost of maintaining the convention will eventually price out everyone but the faceless mega-corporations. I think that day is a long way off, if it even happens at all, but it’s still a concern.


Managing the Information Flood

Posted: July 11th, 2011 | Author: | Filed under: Computers, Opinions (Uninformed) | Comments Off

We’re constantly being bombarded with information. Good or bad, the amount of information reaching us is only going to increase. We need to be able to filter it or we won’t have time for anything else. Unfortunately, this filtering is hard for computers to do well.

If you’re subscribed to even a handful of RSS feeds, you might get hundreds of new items from RSS feeds streaming into your feed reader every day. Add social networks into the mix and you’ve got hundreds of people streaming pieces of their lives to you constantly. And then there’s e-mail. Everybody seems to hate e-mail because of how much of it they get every day. Entire books have been written about how to effectively manage e-mail, but their tips and tricks all seem to be different forms of sophisticated filtration.

Filtration assumes that not all items you receive over these channels are equally important to you. The filter’s job is to sort items in decreasing order of importance and hide items that are completely unimportant. In the past, newspapers and magazines served as filters, but they were inherently static and didn’t update very frequently. The name of the game in the 21st century is ridiculously fine-grained customization in as close to real-time as possible. Making this work in a straightforward, automated way is really hard.

Say you only want to subscribe to RSS items from a site that covers a particular topic. If you’re really lucky, the site has many available RSS feeds categorized by topic. If you’re only slightly lucky, the site’s items will each have an easy-to-find list of categories or a well-defined system for titling items, either of which can be extracted and used to filter the feed by something like Yahoo Pipes. Unfortunately, anything beyond the most basic filtration with such systems isn’t really effective unless you know regular expressions, and even then it’s a bit limited. Also, you’re limited to filtering by those categories – if you want a custom subset of items from a given category, you’re stuck.

This problem stems from the fact that most of the information on the web has a really complicated structure. Some might even say that it’s “unstructured”, but that isn’t really true. Everything that people produce is at least partially structured, it’s just that the structure is really complicated and it’s really hard to get computers to make any sense of it. Being able to look at a block of text and accurately determine what it’s talking about is a huge challenge that forms the basis for entire areas of active research.

Fortunately, filtering information by importance on a source-by-source basis is a little easier. Google+’s circles are a step in the right direction here, allowing you to filter the social network fire-hose that comes in (and goes out) to include only people in a given context. Twitter’s lists are another example of a fairly low-effort information filtration system, although they only work on your incoming data. The problem with these mechanisms, however, is that you have to manage them manually. Friends must be moved into and out of lists or circles or whatever other mechanism. There aren’t a lot of good ways available to say “please infer how I know these people and prompt me if you’re not really sure”. Doing things like this is yet another active area of research.

Clearly we haven’t gotten this problem completely figured out, but it’s really important that we do. As a society, we’re already experiencing information overload, and I fear that we’ll eventually feel like we only have two choices: be overwhelmed by information or simply live without it.


Computer Science Goes to Hollywood

Posted: June 11th, 2011 | Author: | Filed under: Computers, Opinions (Uninformed) | Comments Off

Update: Looks like Matt Welsh posted on the same topic while I was writing this. Check out his perspective, it’s a good read.

I just read an article in the New York Times titled, “Computer Studies Made Cool, on Film and Now on Campus” talking about the recent upswing in enrollment in computer science that, the article claims, is due in part to Hollywood’s favorable portrayal of the tech world in films like The Social Network. They also say that computer science programs are shifting away from a curriculum focused on theory toward one that integrates more applications and implementation in an effort to boost the major’s appeal and keep enrollment trending upward.

The fact that computer science enrollment is on the upswing is good; the US (and the world in general) needs more computer scientists. One missing piece of the puzzle that isn’t explored in great detail in the article, however, is these new students’ motivations for pursuing a degree in computer science. One thing that makes me anxious is the segment of new students who are becoming computer scientists to “become the next Mark Zuckerberg”. Spun one way, this means “create something that millions of people use every day”, which is great. Computer science is most exciting, in my opinion, because it enables you to create great things out of virtually nothing. Spun another way, this means “make billions of dollars and have a movie made about you that’s scored by that guy from Nine Inch Nails”. This is disconcerting. Becoming a computer scientist to become the next Mark Zuckerberg is like playing basketball to become the next Michael Jordan. It will happen to a handful of people, and takes a lot of hard work, dedication and luck, but the vast majority won’t even come close. If you’re not playing basketball mostly for the love of the game, you’re setting yourself up for a lifetime of frustration and disappointment. Same thing goes for CS.

I’m also not convinced that “banishing the perception of the computer scientist as a geek typing code in a basement” is entirely a good idea. I hate to break it to you, but most of computer science is geeks typing code in basements. Well, not necessarily in basements. Facebook was created and continues to be created by geeks typing code in office buildings. Apple was created by geeks typing code (and soldering stuff together) in a garage. Name a technology company whose products you use every day, and it started with a bunch of hackers cranking out code in a dumpy little room somewhere. This is like trying to banish the perception of the blacksmith as a big sweaty guy who spends most of his time in front of a furnace making things out of hot iron – sorry folks, but that’s just the way it is. Blacksmiths don’t just wave their arms and cause awesome swords to appear all day.

That’s not to say that our public image could use a bit of improvement. In the past, computer scientists were portrayed in film and television as a bunchy of pale, asocial man-children with an over-fondness for Mountain Dew and Cheetos. If we’re getting away from that stereotype, then that’s fantastic, but by and large I don’t think we are. I withhold judgment on The Social Network (I haven’t seen it yet, but it’s in my Netflix queue), but most of what I’ve seen has just shifted to portraying us as a bunch of ludicrously wealthy, pale, asocial man-children.

It’s true that the theoretical fundamentals of computer science are hard to make exciting without applications. However, if we swing in the other direction, over-emphasizing application at the expense of fundamentals just to enhance our appeal, we run the risk of over-narrowing the major’s focus and making it less useful and less educational. If that happens, I’m worried about the quality and quantity of computer scientists (who actually like what they do) that we’ll produce. I don’t think it will be a problem at places like UW, USC and Stanford, but the reality is that most people don’t go to those places.

I originally pursued computer science because I liked messing around with computers and I thought I wanted to make video games. If I had taken a major in game design in college, I might be working on the next Madden at Electronic Arts and hating my life. The breadth of my undergraduate education exposed me to areas of the field that I didn’t know existed. Now I sort things really fast every day, and I’m really enjoying myself. And I spend most of my time writing code, although I don’t do it in a basement.


Kindle: First Impressions

Posted: March 21st, 2011 | Author: | Filed under: Computers, Opinions (Uninformed) | 2 Comments »

I decided to buy a Kindle last week, for a few reasons. I really wanted to start getting into e-books; they’re cheaper than hardcover for new releases now, I don’t have to wait for them to get delivered and they don’t take up space. Although I probably would have preferred an iPad if money were no object, I’m not really willing to spend $500 on a tablet when I already have a laptop.

I bought the WiFi-only model (didn’t really see myself needing the 3G version) and have been fiddling around with it for a couple of days now. My first impressions are pretty favorable.

I’m really surprised at how fast the e-ink display refreshes. I’d played around with an earlier-generation Kindle and a Sony e-book reader for all of about a minute years ago, and was really turned off by the refresh speed on the display. No such problems with the latest-gen Kindle, at least when it comes to reading and menu navigation; there are times when I find myself getting ahead of it, but most of that is off of my typical operating path.

The Amazon marketing hype on the display isn’t too far off; it really does look a lot like paper and is pretty easy to read without a light on (although I’m trying to save my eyes by not reading in dim light these days). I haven’t tried it in direct sunlight yet.

The quality of e-books on the Kindle varies depending on what you’re reading. Some publishers didn’t really put a lot of effort annotating things like chapters in their books, which makes navigation a challenge; Kindle’s navigation works by jumping to “locations” rather than pages, so you often have to search for the location corresponding to a page rather than the page itself if you don’t have a bookmark handy. I’ve only had this problem for the freebies on the Kindle store; the books I’ve actually paid for have pretty well-groomed metadata.

The fact that the Kindle doesn’t support the EPUB standard and instead uses its own DRMed format is irritating, certainly, but I feel like I can live with it, especially since converting EPUB to their proprietary format is supposed to be fairly straightforward. I’m pretty confident that eventually they’ll do the same thing the iTunes Music Store did and drop DRM entirely, or at least support EPUB natively with a software update.

I’m pretty happy with the Kindle so far. Does anyone else out there have one of these things? Any tips and tricks I should know about?


On the End of the Space Shuttle

Posted: February 27th, 2011 | Author: | Filed under: Opinions (Uninformed), Ranting | Comments Off

Image from National Geographic

Earlier this week, the shuttle Discovery’s lifted off for the last time. Hopefully it will return safely to Earth on March 7th, and after that it will probably end up in a museum somewhere. The end of its flight will mark the effective end of the space shuttle program, and we don’t really have anything lined up to replace it.

Some people seem to think that this is a tragic loss for the United States. Personally, I don’t think the end of the shuttle program is such a horrible thing.

I’ll admit, I’m biased; my dad works for JPL, and JPL doesn’t really do any manned missions. That said, my opinions here are my own and should not be construed as his or anyone else’s but mine.

In order to ask why it’s bad that we’re losing the shuttles, we must first ask why we had the shuttles in the first place. Mainly, they’re for moving things from the ground into low-Earth orbit. Those things might be people, or satellites like the Hubble Space Telescope, or pieces of the International Space Station (ISS). People are usually along for the ride to help deploy the equipment aboard, or to put it together once they’re in microgravity, or do experiments and run tests either in the shuttle or aboard the ISS.

In my opinion, those satellites, that space station, those experiments and tests, are in the service of one overarching goal – to do good science. I don’t think that necessarily requires putting people in space, though.

This isn’t to discount the contributions of the manned space program; the fact that we’ve been able to keep people in space for long periods of time mere decades after first sending anyone into orbit is a remarkable testament to human achievement. We also know that, eventually, we’re going to have to leave the planet (if we wait long enough, after all, the sun is going to eventually burn out), and seeing how people react to time in space helps to figure out what challenges we’re going to face in our eventual exodus.

That said, it’s unclear how much of our unanswered questions can’t be answered on the ground. Suppose that the goal of the manned space program is to put a man on Mars. A successful manned mission to Mars depends on advances in materials science, propulsion technology, and the production of clean, plentiful energy, as well as studies of the psychological and physical effects of months of isolation with only your flight-mates for company. None of these things require shooting people into space. Asking what the loss of the shuttle fleet will do to our chances of getting to Mars first is like asking how your child’s first word will affect their chances of getting into college. It’s not even clear how we’ll get there yet, let alone when that’s going to happen. In all likelihood, any mission to Mars will have to be multi-national; it will simply be too expensive in terms of money and resources for any one country to handle alone.

If you’re worried about America’s supremacy in space, you shouldn’t worry all that much just yet. NASA remains the most advanced space program in the world. We paid for most of the ISS. We’ve got ground- and space-based telescopes and imagers and orbiters that have traveled to the outer edges of the solar system and explored the planets. We’ve got probes running around the surface of Mars. If the goal is to do good science, all these things are doing good science right now and in that respect, the US is second to none.

Next, let’s look at relative cost. According to nasa.gov, it costs about $450 million to launch a single space shuttle mission. The total budget for the shuttle program over its lifetime exceeds $160 billion. By contrast, the entire cost of the Mars Exploration Rovers – building, launching, landing and performing the primary mission – was $820 million (at least according to Wikipedia). The Cassini-Huygens mission to Saturn cost the US about $2.6 billion for the whole package (again, Wikipedia). These missions have been doing good science for years for the cost of shooting humans into space for a matter of weeks.

Another criticism of the shuttle fleet’s retirement is that it will discourage young people in the United States from becoming scientists and engineers. I have an admittedly cynical view here.

When I was a kid, the shuttle program didn’t make me want to be an engineer. It made me want to be an astronaut. If you see a fighter jet streak by overhead and you’re 10, you don’t think “I want to build jet engines!”, you think “I want to fly jets!”  What cultivates a desire to become an engineer is an innate fascination with how things work and a strong desire to either understand why things work, or build things that work, or both. What got me excited about engineering as a kid was my dad showing me up-close pictures of Jupiter and Saturn and knowing that he helped build the thing that was out there, really far away, taking those pictures.

People who weren’t as fortunate as I was to grow up in an environment where interest in science and engineering is cultivated can face the cold hard truth – that so-called STEM (science, technology, engineering and math) disciplines are where the money is and where the jobs are. In the current economic environment, that’s as much motivation as they’ll need if they’re even a little bit interested. Not that they should be solely motivated by money, but it’s a pretty good incentive.

If we’re only in the business of putting people into space for the sake of saying “America puts people into space all by ourselves, look how mighty we are”, without regard to why precisely we’re putting them there or what they expect to accomplish, we’re wasting our time and energy on a vanity project. In short, chill out – America is still doing stuff in space and we’re still doing good science. As long as our representatives in Washington understand that the space program isn’t just the space shuttle and don’t completely strip NASA of funding, that’s unlikely to change in the near term.


Analyzing The Comic-Con Registration Meltdown

Posted: November 22nd, 2010 | Author: | Filed under: Advice (Unsolicited), Computers, Opinions (Uninformed) | 2 Comments »

Thanks to PanelsOnPages.com for the image

On November 1st at 9 AM, online registration for 4-day passes to San Diego Comic Con began. By 9:05 AM, the massive volume of registering attendees caused the registration system to become inaccessible. By 10:30 AM, Comic Con International closed the registration site down, claiming that it would re-open in three weeks.

At 6 AM PST this morning, registration re-opened. By 6:05, the registration site was once again inaccessible due to overwhelming traffic volume. A lucky few managed to get all the way through the registration process, but most of us were left repeatedly retrying until, at 7:30 AM, the registration site was once again closed.

In the grand scheme of things, this was not a disaster. However, it inconvenienced a large number of people, myself included.  In my opinion, both of these problems were entirely preventable.

I do research in the design of scalable, high-performance large-scale systems. Many people in my field work on technologies that are designed to prevent exactly the kind of failure to operate at scale that occurred this morning. While I am by no means an expert, I feel that I can speak from an informed position about scalability issues like this one. In this post, I’m going to speculate on what happened, and what might be done to fix the problem inexpensively.

Disclaimer: I don’t work for Event Planning International Corporation (EPIC) or Comic Con International (CCI), and I was only an external observer of their registration meltdown, so I don’t know exactly what occurred. I’ve seen problems like this documented enough times that I think I can guess what really happened from that documentation and personal experience. I also don’t work for Amazon or any of the other companies mentioned favorably in this post, although I have clearly consumed vast quantities of the cloud computing Kool-Aid.

I’m going to assume some things about EPIC’s architecture. In particular, I’ll assume they have a single well-provisioned database system and a handful of web servers, possibly with a load balancer sitting in front of them and distributing requests evenly among them. This is how most small websites usually look.

“What we’ve got here is a failure to communicate.”

In my mind, the timeline of the meltdown looked something like this:

6 AM: CCI posts a link to the registration site on its homepage. Frantically refreshing nerds see the green box and click it, presenting them with step one of five: enter your name and e-mail address.

Over the next 90 seconds or so, several hundred people open connections to the registration site. Since the page is just vanilla HTML and maybe a little PHP or Javascript, the web server(s) cache the first page and everything is running fairly smoothly, everything’s being served out of memory and all is right with the world.

People start clicking the “Next” button to proceed to step two. Each click of the “Next” button causes an HTTP POST request to be sent to a PHP script housed on one of the web servers. This script inserts some data into the database indicating that a person whose name is X and whose e-mail address is Y has reserved an attendee slot for the next few minutes – this is mainly there so that you don’t register more people than you can fit in the convention hall. While this is happening for the first few hundred requests, several thousand more registrants are about to hit the “Next” button and start this process for their registrations. The database server starts fielding thousands of connections and inserting thousands of rows into the registration lock table at once. It starts to run out of memory and starts swapping, or it hits 100% CPU utilization, or it’s disks are seeking all over the place. It gets slower and slower. Eventually, it effectively stops returning responses to queries.

So now the database server is effectively hosed. The web servers continue to issue database queries anyway, and they have an increasing backlog of PHP scripts waiting for their database queries to return. The web servers’ memories fill up with session state, the PHP scripts’ stacks, TCP connection state, etc. Eventually, the web servers run out of memory, they start swapping and their performance essentially drops to zero. Requests for page two begin to time out. “500: internal server error” begins popping up on the screens of frustrated nerds around the globe. These users furiously hit “Refresh” hoping that the website will come back to life, which creates new requests and only makes the problem worse.

At this point, sysadmins are running around like their hair is on fire trying to get the problem under control. They try every trick in the book, but nothing works. Frantic phone calls are made. Servers are powered off in hopes that demand will recede if the server is inaccessible for a period of time. Demand does not recede.

After about an hour, the decrease in volume from users giving up releases sufficient system resources for some lucky individuals to be able to advance to step 2: enter your address and phone number. Hundreds to thousands of users do just that and then click “Next” on step 2′s page within about 30 seconds of one another. This issues a flood of HTTP POSTs to another PHP script that is supposed to insert the information contained in the POST into the database and associate it with the name and e-mail address that the user entered in step 1. The problem returns with a vengeance and the servers fall over again. Few users make it to step 3 successfully.

7:30 AM: CCI orders the site closed, claiming that they will be “investigating their registration options”. EPIC (presumably) loses a high-profile client.

Cloud Computing to the Rescue

So, what happened here? Fundamentally, EPIC’s servers were not sufficiently well-provisioned to handle the load presented to it by SDCC’s registrants. The servers couldn’t handle the strain, and so they ground to a screeching halt.

How can we solve problems like this? One way is to buy more and more well-provisioned computers and spread the load across them until the load on a given machine becomes manageable.  Unfortunately, this typically involves a lot of up-front and long-term costs: you need to buy the computers,  find some place to put them where they’ll remain cool and dry, and fix them when they break. Additionally, when your servers are talking to a database, the server hosting the database must be doubly well-provisioned, often at significant additional cost. Oracle makes a lot  its money selling ridiculously well-provisioned database servers the size of a refrigerator for hundreds of thousands of dollars a piece.

Getting enough computers to get the job done does not need to be expensive, however.  Pay-as-you-go “Infrastructure as a Service” (IaaS) systems – one of the many classes of systems classified under the blanket term “cloud computing” by the world’s IT pundits – were designed to solve this exact problem.

I’m going to focus on Amazon’s EC2 and RDS here for the sake of example and because they’re the most popular services of their kind, but many other IaaS offerings exist. Joyent and Media Temple are two other great examples of IaaS companies with a strong presence in the marketplace, especially among startups that need affordable, scalable hosting solutions.

Let’s suppose that you need to register 24,000 people (~20% of attendees at last year’s con) and expect peak demand to last around 24 hours. You buy time on 50 “high-memory extra-large” EC2 instances (17.1 GB of RAM, dual-core processors) to use as web servers and ten “high-memory quadruple extra large” RDS database servers (68 GB of RAM, 8-core processors, “high I/O capacity”) to do your query processing. You reserve them on-demand, which essentially means you pay more per-hour but you only pay for what you use. Let’s suppose conservatively that you’ll store 20 GB of data to these database servers (that’s almost 1 MB of data per user, which I’m guess is more than enough to store basic registration information) and that you’ll read and write every byte of it once in 24 hours. Let’s assume that the bank’s credit card transaction processing servers will scale to meet the load; after all, they handle millions of transactions a day routinely (and are extremely well-provisioned because of it).

If you use all of it – that means all 50 web servers and all 10 database servers for 24 hours straight – it’ll cost you about $950 at the end of the month. If you only use the database instances (for 24 hours straight, and read, write and store 20 GB),  it’ll cost about $320. (This according to AWS’ cost calculator). That’s far less than the retail cost of the parts in even one of the aforementioned servers.

Need more machines? Buy time on more machines for a few tens of dollars; Amazon’s magical network of services can even balance the load across them for you automatically if you ask it nicely. Don’t need all ten of those database servers you reserved? Only turn on five of them and leave the others alone. Demand starting to die down after 18 hours instead of 24? Start shutting machines down. Magical.

This kind of flexibility and “magic” provisioning of hardware is what makes cloud computing such a compelling way to solve website scalability problems like the one plaguing Comic Con, and it’s one of the reasons why the research and industrial communities alike are so excited about it. I hope that situations like this one will encourage more companies to leverage these sorts of technologies to deploy more scalable websites.


Thoughts on Neutralitygate

Posted: August 10th, 2010 | Author: | Filed under: Computers, Opinions (Uninformed) | Comments Off

There has been a lot of sound and fury today (well, technically yesterday) surrounding Verizon and Google’s joint net neutrality policy announcement. I didn’t really know about it until Tim brought it to my attention. He wanted to know my thoughts about this, and blogs are practically designed for uninformed opinions, so why not?

In the interest of full disclosure, I’ve worked at Google in the past. Several friends of mine are either current or former Googlers. Money from Google’s generous intern salary paid for all the furniture in my living room. I’m totally biased, I freely admit. The rest of this post will also probably reveal my profound ignorance, but that’s why there are comments.

Frankly, I’m surprised at the amount of rage that I’m picking up about this announcement. This isn’t a net-neutrality-crushing alliance between Google and Verizon. It’s not binding. There’s nothing in the language that says that it’s not open for discussion and feedback. This is basically these two companies saying, “This is how we think we’re going to deal with the net neutrality thing right now.”

The notion of allowing carriers to “offer new types of services that are not part of the open Internet” is also not all that surprising to me. In my view, net neutrality says “if two computers on the Internet want to talk to one another, the carriers providing Internet access to either of the two shouldn’t be able to stop them”. Under “shouldn’t be able to stop them”, I mean that my computer shouldn’t be unable to talk to Google’s computers because I’m on a “Microsoft-friendly” ISP. What Google and Verizon seem to be proposing here is that companies should be able to create services that don’t operate over the public Internet as long as those services don’t circumvent net neutrality. It’s not really clear what they mean by “not part of the open Internet” here; are we saying that the computers making up this service aren’t world-routable? If so, then companies are already doing this internally. Large corporate intranets could be seen as an example of a service that operates separately from the open Internet. Do they mean creating a parallel network over which these services run? In that case, data center networks seem to fit that description pretty well and clearly companies are providing dedicated networks for data center networks, or just buying their own. Again, on the face of it I’m not too terribly distressed.

Most of the rage seems to be directed at the fact that they don’t think that wireless access to the Internet should be protected in the same way that wired access is. Judging by Google’s blog post on the subject (see how biased I am?) their justification seems to be that they don’t know what the mobile Internet really looks like yet. It seems to me that this is basically a political compromise; Google caves somewhat on the wireless issue so that Verizon will agree to come on board with the agreement. Notice that the transparency requirement opens the door for any company that tries to pull any pro-neutrality funny business to be publicly reamed by their customers (you need only look at the Comcast P2P filtering scandal to see how much good press anti-neutrality practices generate for the telcos that implement them). Google promises to not be non-neutral in access to its properties, and it and Verizon get to be great friends; everybody wins.

Are Google and Verizon both huge multinational corporations that are interested primarily in pulling in a lot of cash? Absolutely. Does the language of this announcement open up the possibility of abuse, corruption and rampant mismanagement? Of course, but so does every other agreement between more than one person. Does this announcement violate Google’s “Do no evil” mission statement? Hardly. To me, this just sounds like Verizon and Google getting all buddy-buddy and announcing the sort of long-term, under-defined recommendations that make PR people all warm and fuzzy because they don’t actually require you to promise to do anything. Everything I’m hearing from Google indicates they’re really self-conscious about this whole thing (“it’s not an aggreement”, “we are still committed to net neutrality”, “we’re not sure what the mobile market looks like, so we have to keep our options open”), which gives me some comfort; the second you get a “so what?” announcement out of a company Google’s size (see Comcast’s P2P shenanigans, Antennagate), it’s time to be concerned. Yes, I know I’m totally defending Google here. I told you, I’m biased.

It might sound like I don’t care about net neutrality, but nothing could be further from the truth. People are absolutely right when they say that the Internet is the most democratic medium ever conceived, and it’s really important that it stays that way. The Internet is big business, and these days it’s dominated by big businesses with a lot of influence and a lot of money. As citizens of a democracy, we have every right (and, some may argue, the responsibility) to be outraged when a large, powerful entity looks like it’s trying to gain control of a fundamentally democratic communications medium like the Internet. I just don’t think that this announcement represents anything close to such a power grab.

And that is my opinion on the subject. Release the trolls!