Algorithms (for Hanging) with Friends

Posted: January 15th, 2012 | Author: | Filed under: Fun and Games | 1 Comment »

For the past few months, I’ve been playing Hanging with Friends with a few friends of mine. Hanging with Friends is a modified version of Hangman: your opponent gives you a word with one of the letters filled in and the rest left as blanks, and you have to guess the word one letter at a time without making too many mistakes. Once you try solving a word, you’re given a set of letters that you can use to construct a word that your opponent in turn tries to solve. The first person to fail to guess five words loses.

On a recent long, boring drive up I-5 I found myself thinking about Hanging with Friends, and what it would take to have a computer both solve Hanging with Friends puzzles well and come up with good puzzles of its own.

Solving Puzzles

Solving Hanging with Friends puzzles is pretty straightforward from an algorithmic point of view, as long as you have the dictionary of possible words handy. The program is given a template for a word as input (?op?er in the above example, where ? represents an unknown letter) and is expected to produce the next letter that the player should guess. One possible solution starts by filtering the dictionary for words that match the template; in the above example, “copter” would match the template but “spam” or “bacon” would not. It then counts the number of times each guessable letter occurs in this set of candidate words, and returns the letter with the largest count. The idea here is that an incorrect guess eliminates the largest number of candidate words.

If you don’t have a dictionary (or have to operate on the assumption that the dictionary isn’t perfect), things get a lot harder. You could do arbitrarily sophisticated things here; one thing that comes to mind is using a dictionary to learn what groups of English words tend to go together and guess those in an intelligent way. For example, if you’re given a word whose last three letters are “i??”, you might want to guess n, g, n, t, m or e because of the common word groups “ing”, “int”, “ine” and “ime”, or if you’re given a word where the revealed letter isn’t a vowel, you might want to guess each of the vowels. You probably would want to take frequency of occurrence of these common word groupings in the dictionary into account here as well. What you’d probably want is to train some sort of learning system to do this; there are a ton of different possibilities here.

Making puzzles

Picking good words to give your opponent is a little more tricky. Abstractly, you may want to pick words that are hard to solve. Typically these are words that are long, probably uncommon, and have a large number of distinct letters, since those words give your opponent more opportunities to make a mistake. You might also want to pick words with high scores. Hanging with Friends rewards you with some amount of in-game currency for each 200 points scored when creating puzzles for your opponent (using Scrabble-style scoring with letter scores and position-based score multipliers). This in-game currency can be exchanged for power-ups that make a word easier to guess by eliminating letters when you’re the one solving, or feeding you hard-to-solve words when you’re making the puzzle. In-game currency can also be used to purchase various cutesy avatars and whatnot in their in-game store. Of course you can purchase in-game currency with real money (this is where Zynga’s signature brand of devious social engineering rears its ugly head), but with two copies of an algorithm like this and a colluding friend, you could mine in-game currency as efficiently as possible. Not that I’m suggesting such a thing, of course.

To keep things as generic as possible, what you’re really looking for is the ability to pick a word with the maximum value for some cost function from the set of words that can be generated using the tiles you’re given. The algorithm when cast this way is also pretty straightforward if you have a dictionary at hand: filter the dictionary to the set of words you can legally play, give each word a score, and pick the highest-scoring word.

On Word Commonness

One additional ingredient that’s missing from the above algorithms is some notion of the commonness of a word. Intuitively, words that aren’t commonly used are harder to guess; in the above example of “?op?er”, I’m more likely to guess “copter”, “copier”, or “gopher” than I am to guess “mopier” or “dopier” because I use the former set of words more often than I use the latter. Word commonness is a tricky thing to get right; even if you had a corpus of information the size of the Internet from which to draw (which companies like Google do), the problem is complicated by the fact that many people have trouble with spelling and sometimes use one word when they mean to use another. Another complication is that the frequency of some words can vary substantially based on context; for example, computer scientists use the words “ping”, “packet” and “hash” more frequently than other people do. That said, a good first approximation would be to count the frequency of each of the dictionary words in the largest corpus of English text you could find.

The Code

I went ahead and implemented a simple version of the algorithms described above and put it on GitHub as hanging-tools.

Disclaimer (so that people won’t immediately stop playing Hanging with Friends with me): I didn’t use these tools to cheat in any of my Hanging with Friends games, although I did feed it old puzzles as test cases.

If you’re interested in playing around with this (or fixing my bugs!) I’d be interested to see what you do with it. That repository also comes with a copy of the ENABLE dictionary, which Zynga apparently uses as the basis for their word database.


Year in Review

Posted: December 31st, 2011 | Author: | Filed under: Random | No Comments »

Last year, I made a New Year’s resolution to post something on this blog every week for a whole year. At the time I made that resolution, that meant that I’d have to write 50 blog posts before the end of the year. If I count correctly, this post makes 46 total. Not perfect, but not bad. At one point, I even had a buffer that was several weeks long!

One of the things that I was curious about going in was what the effect of an increased post volume would be on my pageviews. According to Google Analytics, my number of visits increased by 40% from this time last year. Most of my traffic came from search engines or referrals from Facebook, and most people came for my esoteric tips or my crotchety advice.

For those who like looking at graphs (because really, who doesn’t?), here’s my weekly “views” every week this year (blue) vs. last year (orange):

Truth be told, I started to run out of things to say sometime in November. We’ll see if I keep this post-a-week thing going into 2012, but it’s been an interesting experiment at the very least. Happy New Year!


Holiday Chipmusic

Posted: December 24th, 2011 | Author: | Filed under: Music | No Comments »

Happy holidays! By now, you might be sick of being bombarded with the same 15 saccharine, dated holiday songs by mall loudspeakers and radio stations everywhere. As my gift to you, Internet, let me point you at something different – that’s right, it’s holiday chipmusic time!

Doctor Octoroc – 8-Bit Jesus: Holiday classics in the style of video game classics. Includes such gems as “Have Yourself a Final Little Fantasy”, “Bubbles We Have Heard on Bobble”, and “Super Jingle Bros.”.

George and Jonathan – The Best Christmas: The dynamic duo of pxtone wizardry bring their infectious energy to the holiday season with a Christmas-inspired album.

Rush Coil – 8-bit Christmas: Rush Coil covers Christmas music. Deck the halls … and defeat the Master Robots.

EvilWezil – Carol of the Bells: The inimitable EvilWezil takes on a holiday classic, and produces this in a day. EvilWezil cannot be stopped!


Sonic CD – Seriously, Just Play It

Posted: December 22nd, 2011 | Author: | Filed under: Fun and Games | No Comments »

Sonic hasn’t had all that great of a run for most of the past ten years or so. The franchise was handed off to a bunch of different teams in the 2000s, each of which had their own vision of what a Sonic game should look like. Unfortunately, each of those visions was more mediocre than the last. Recently though, the future’s been looking a little brighter for our spiky blue hero. Sega put a new guy in charge of Sonic’s future direction last year, and one of the first things his team did was pull all the recent “average” Sonic titles from the shelves. Shortly thereafter, they released Sonic Colors, Sonic 4 Episode 1 and Sonic Generations, and they all got pretty good reviews (for next-gen Sonic titles, anyway). Sonic 4 Episode 1 and Sonic Generations both attempt to re-attract the “older” generation of Sonic gamers who remember a time when Sonic titles were known for being good instead of, well, laughably awful.

The next big thing in Sonic’s “everything old is new again” renaissance is the re-release of Sonic CD on PC, XBox 360, PS3 and iOS devices. I’ve already played through Sonic CD several times (in emulators and on the Sega CD), but I bought it again for XBox 360, and I’m really glad I did.

Sonic CD is in many ways the spiritual successor to the original Sonic the Hedgehog, even though it was released a few months after Sonic 2. Most of Sonic’s sprites are re-used from Sonic 1, special stages are accessible at the end of the level rather than at checkpoint lampposts, and each zone has three acts rather than two. Sonic CD takes advantage of the ill-fated Sega CD’s hardware, sporting a CD-quality soundtrack (which was a novelty in 1993) and levels that are absolutely massive by comparison to those in the first couple Sonic games.

The big novel gameplay mechanic in Sonic CD is time travel (because really, what series doesn’t get better with the inexplicable addition of time travel?) Each level in Sonic CD is playable in three time periods: past, present, and future. Sonic gets between time periods by hitting a Time Warp sign (conveniently labeled Past or Future) and then running at top speed for a few seconds without stopping. The future is further subdivided into the “good” future and the “bad” future. Dr. Robotnik has put a machine in the past on each level that powers his badnik army; if Sonic travels to the past, then finds and destroys this machine, the future is saved (unlocking the “good” future); if not, Robotnik has taken over in the future and it isn’t a very happy place (the “bad” future). If you unlock the good future in every level or collect all the Time Stones (the stand-ins for that classic Sonic macguffin, the Chaos Emeralds), you get the good ending.

Sonic CD is pretty much tied with Sonic the Hedgehog 2 on my list of the best Sonic games of all time. The fact that every level is essentially four levels is a testament to the enormous amount of room the designers had to work with on the Sega CD. The different time periods aren’t just palette swaps either; they actually went through the effort to modify the graphics and level design and write different background tracks for each one. The soundtrack (while very ’90s) is pretty great in both Japanese and English versions (the Japanese boss music is by far my favorite). The “go really fast for a while without stopping” requirement for time travel actually inspired some really inventive level design; speed traps and loops that were just nuisances before become things the player actively seeks out. Seeking out Dr. Robotnik’s machines in the past puts a much larger emphasis on exploration, although you can still race through the levels at full speed if you want. It’s got “replay value” in spades.

What really sets Sonic CD apart from any other Sonic game from that era are the boss battles.

Typically a boss battle in a Sonic game is a pretty standard affair. Dr. Robotnik shows up in some kind of mech whose design is inspired by the level’s overall theme (i.e. if you’re in an ice level, that mech’s gonna have a freeze ray), you hit him seven or eight times while dodging his attacks, most of his mech explodes and he runs away. Sonic CD’s bosses are in that same vein, but are a lot less conventional. You’re still fighting Robotnik in a mech most of the time, but in one level he’s at the top of a giant diabolical pinball machine and you only have to reach him and hit him once to end the whole thing. In another, he locks you in a room containing a fearsome death trap and watches in mounting dismay as the death trap slowly tears itself (and his conveniently placed adjacent control room) apart. In still another, he traps you underwater only to make the critical mistake of making a shield for his mech out of air bubbles. Really inspired and original stuff, some of the best of the whole series.

Oh, and did I mention that in one level there’s a shrink ray?

Yeah, in one level? There’s a shrink ray.

To their massive credit, Sega has really done Sonic CD justice with this re-release. Rather than sticking a huge border around the old VGA graphics like a lot of 16-bit game conversions, they actually took the time to port it to 16:9 native. Many people think the US soundtrack wasn’t as good as the Japanese soundtrack, so they included both soundtracks. They even added Tails in as a playable character. I mean seriously. That’s pretty awesome.

Seriously. Play this game. It’s well worth the $3-$5 you’ll pay for it on your platform of choice.


Two Months Working with GitHub: A Retrospective

Posted: December 13th, 2011 | Author: | Filed under: Opinions (Uninformed) | 1 Comment »

For a long time, my thesis work was housed in a Mercurial repository on the department NFS server. We had a mailing list where team members could discuss ideas and report bugs. Ideas, to-dos and “bug reports” were kept in a combination of e-mail logs, Google Sites, Google Docs, and various text notes. We tried getting code review to work with a few different tools, but I quickly grew tired of keeping them running. About two months ago, in response to the team’s increasing size and in preparation for releasing the code to some teams within UCSD, we decided to move the project to a private repository on GitHub.

All the cool kids seem to be hosting their source code in GitHub these days, and it’s not that surprising. GitHub’s UI is pretty, responsive, and knows when to stay out of your way. Adding things like post-commit hooks are really straightforward. It’s “social” without being in-your-face about it all the time. We liked the idea of having code review, issue tracking, and a wiki all in one place that we didn’t have to maintain ourselves. Also, we liked the idea of having a private repository that could be made public with the flick of a switch1.

Switching over to GitHub was a big adjustment, but in the end I think we made the right decision.

The one big thing that made us hesitant to switch over to GitHub was Git. Nobody on the team was really familiar with it, and frankly it looked kind of scary compared to Mercurial. Lots of commands, the ability to edit history, etc. It looked really heavyweight. I can say though, that in the long run I’m really glad that we switched to Git, GitHub or no GitHub. Git’s technical complexity is a little daunting at first, and I’m still frightened of rebasing, but the transition was a lot less rocky than I’d expected it would be and I’m happy enough with Git that I find myself picking it over Mercurial even for projects that (for various reasons) aren’t hosted in GitHub.

Moving from “hack, hack, commit, push, hack” to “branch, hack, send a pull request, branch, hack” also seems to have been an easier adjustment than I expected. Once we got comfortable with the idea that branching and merging frequently was OK, things went pretty smoothly. Most of us are old school(?) CVS and Subversion guys who remember when branching was largely more trouble than it was worth (tree conflicts, anyone?), so dealing with many branches at once was a bit of a shock. As it turns out, frequent branching has been more of a boon than a burden. git branch and git stash together have changed my whole workflow. Being able to jump back and forth between tasks without having to have a dozen copies of the repository littering my directory tree is really liberating.

By far my favorite piece of GitHub’s tool suite is the code review system. Having every line of code looked at by at least one other person before it gets committed has really improved our code quality. It also helps to prevent any single person from being the only one who knows how a given module works. Being able to have conversations about parts of the code with other team members and having those comments show up in context is a huge win over ad-hoc fumbling with e-mail (“on line 20 of foo.cc: this thing should change this way; on line 25 of foo.cc: that other thing should change too”, etc).

I’m really impressed with GitHub overall, so much so that I’ve been slowly migrating repositories to them from other services. If you’re looking for a place to host your team’s code, they’ve got a really solid offer and you can’t beat the price (especially considering it’s free for public repositories).


  1. We’re going to open source this work sometime before I graduate. Just not now. It’s not ready now. 

MarkupServe – A DIY Evernote Alternative

Posted: November 28th, 2011 | Author: | Filed under: Useful Software | 1 Comment »

I’ve gushed at length in the past about how much I love Evernote. I upgraded to Evernote Pro about half a year ago so that I could attach arbitrary file types to notes. After a while, though, a couple of things about my workflow with Evernote were starting to irk me.

First, I wasn’t even coming close to using Evernote Pro’s generous data cap. Second, I wasn’t sure that paying $5 a month so that I could attach a PDF to a note was worth it given how infrequently I was attaching PDFs to notes. Third, and maybe most critically, their WYSIWYG editor has always kind of bugged me; lists never really indent the right way, sometimes formatting leaks to the next paragraph in unpredictable ways, etc. It’s just really hard generally to get precision control over how your text looks in Evernote’s editor (at least on the Mac, I have no experience with the Windows editor). Its lack of Linux compatibility isn’t a problem for me anymore, but it might become a problem if I start working on Linux boxes again in the future.

I figured the “paying for space I don’t use” problem would be solved if I could find something that would sync with Dropbox. I have a couple GB of free space with them that is really underutilized and I would prefer using that space instead of paying for more underutilized space. For the “imprecise editing” problem, I decided that what I really wanted was the ability to write notes in a markup language like Markdown (which I also love and have gushed about here), do my writing in Emacs, and be able to view the notes in HTML easily.

I couldn’t find any pre-existing solution that I really liked, so I decided to roll my own.

Dropbox support wasn’t a problem; I just exported my notes from Evernote, used html2text to convert the exported notes into Markdown, and moved the folder containing the converted notes to my Dropbox folder, where it happily synchronized. Dropbox, you are awesome.

At first, I used Marked to render and view notes individually. Marked is a great app, but I wanted to be able to interact with rendered versions of my files more quickly. I had thought about writing up a read-only filesystem with FUSE that would do the rendering of Markdown to HTML transparently and just present a filesystem of HTML files, but that sounded like overkill even for me. I figured that writing up a simple web server with Bottle to display the rendered files would get me to a working solution much more quickly. After a couple hours of coding on Thanksgiving, I had MarkupServe working.

MarkupServe is pretty basic at this point. It’s given a directory tree containing markup files and presents that directory tree as a directory listing similar to the one httpd gives you if you don’t have an index page. It has simple keyword search (which just runs grep at the root of the directory tree and HTML-ifies the results) and renders notes to HTML on-the-fly when they’re clicked on. I made MarkupServe extensible enough that it should support more than just Markdown, in case others would find something like this useful but want to use other markup languages. It’s not the fastest or prettiest thing ever, but it works.

The last hurdle was making attaching files in Emacs comfortable. Evernote exports all attachments for a note note.html in a directory named note.resources, so I figured I’d stick to that convention. I made MarkupServe ignore directories that ended in .resources so that it wouldn’t clutter up the file listing. Then I wrote a couple little elisp functions that create a .resources directory for a note and “attach” a file by copying it into the appropriate .resources directory and inserting a Markdown-style link to the new copy. I’ve posted those functions in a GitHub gist if you’re interested in looking at those.

One big piece that this system is missing is a mobile app. Evernote’s iPhone app is terrific, and I’m going to miss it. At this point, my solution is to use Elements to edit notes and add photos taken with my phone’s camera to notes manually if the need arises. It’s sort of awkward, but I used the image attachment feature so infrequently in Evernote that I’m not really concerned about it. The system also lacks Evernote’s slick image OCR capability, but that was another feature I never really used (my handwriting’s pretty awful and the OCR could never really parse it well).

I’m sure I’ll tweak this setup considerably as I get more experience with it, but it was surprisingly quick to throw together and has proven really useful so far. Hopefully open-sourcing this stuff will help any other people who might have a similar itch to scratch.


Escape to New York

Posted: November 24th, 2011 | Author: | Filed under: Random | No Comments »

Big buildings are big

I got the opportunity to spend some time in New York a few weeks ago. Officially, I was there to attend Hadoop World, but I flew up a few days early to visit friends and see the city. I hadn’t been there since middle school (and I can’t really say I’ve been to New York if I was twelve years old and with a tour group). My friend Mangesh and his roommate graciously let me crash on their couch for a couple of days until my official (and reimbursable!) conference hotel room was available. They are wonderful people.

I had a couple of days to walk around and enjoy the city. Spent most of Sunday at the American Museum of Natural History. The museum was just as great as I remember it being back in 1997, and their newly-renovated planetarium is absolutely spectacular.

I spent most of Monday wandering around Manhattan. Since a lot of the touristy things like museums are closed on Mondays, I spent a good deal of time in Central Park and wandered around Times Square and Rockefeller Center. There’s this amazingly stark contrast between Central Park and the rest of Manhattan. The decision to put the park there rather than developing that land was a really inspired choice on the part of city planners; it was a great place to wander around and relax after spending so much time getting jostled on the streets. The weather was perfect while I was there, which was a lucky break considering that the weather is apparently pretty erratic there in November.

I spent Tuesday and Wednesday at Hadoop World, hobnobbing with fellow big data geeks and interested businesspeople, talking about the state of Hadoop and the big data landscape in general. Since I’m in this weird pre-thesis-writing-but-thinking-about-graduation state right now, I spent much of the time getting a feel for what the various companies in the big data space were up to and doing a bit of shameless evangelizing of our group’s work with Themis (our follow-on work that has grown out of TritonSort). My friend Yanpei (who is now wrapping up a Ph.D. at Cal) gave a great talk along with Todd Lipcon from Cloudera on measuring and improving Hadoop’s performance that you should check out if you’re interested in that kind of thing.

Tuesday night was spent at EMC/Greenplum’s bowling-for-charity event, in which I affirmed that a) I am not that great of a bowler and b) tech companies know how to throw a party. The attendees (with the help of sponsors) ended up raising over $20,000 for Artists for Elephants, which is also pretty cool. If the choice of charity sounds random, it’s worth noting that Hadoop’s mascot is a little yellow elephant. We are actively considering adopting an adorable mascot for the Themis project.

In general, my impression of New York was a lot more positive than I thought it would be. I’m not a huge fan of crowds, but for some reason Manhattan felt a lot less crowded than I imagined it would be, although there were still a ton of people everywhere. I can definitely see why people want to live in New York; there’s so much cultural diversity and so many things to do and see there. On the other hand, the ridiculously high cost of living (granted, my exposure to the city at this point has been Manhattan, which is probably not representative) and the constant crowds would kind of deter me from moving there.

I took a ton of photos during my trip that are up on Flickr.


tmux: the new hotness for terminal management

Posted: November 3rd, 2011 | Author: | Filed under: Useful Software | No Comments »

Some time ago I talked about using GNU Screen to effectively manage a bunch of terminal windows. Turns out screen has some serious competition: tmux.

One thing that’s pretty cool about tmux is the way it handles windows. tmux’s window management model is purely client-server; windows are clients, and clients are managed by a single server tmux process. This allows you to do things like move a window between sessions or attach the same window in multiple sessions. I haven’t run into a situation where this was useful yet, but it’s nice to know that it’s possible.

The thing that I have found useful is the fact that tmux is reasonably scriptable. In order to get the list of windows in a screen session, I had to create a dummy window, stuff the real window list into it, dump the contents of the dummy window to a file and parse that file. With tmux, I call

tmux list-windows

from any window in the session, or

tmux list-windows -t <session name>

outside the session, and parse the output. Not only that, but tmux displays none of the weird “I can only make calls that change a session once per second” problems that I’ve been seeing in practice with screen.

Like screen, tmux won’t auto-sort windows by name or make their numbering contiguous. My py-screnum tool for sorting and compacting Screen windows takes several seconds to re-arrange windows and is 95 lines of Python (according to sloccount). The analog for tmux, tmux-screnum, is only 57 lines, and re-arranges windows instantly. I won’t claim that either of these programs are minimal, but 50% fewer lines of code and faster is a winning combination.

There are times when I still have to use one or the other; many non-BSD machines come with screen installed by default but not tmux. I’m starting to prefer tmux whenever I have a choice, though.


Pianobar – Command-Line Pandora Client

Posted: October 31st, 2011 | Author: | Filed under: Useful Software | No Comments »

I get a lot of use out of my Pandora account, but I’ve never really liked their desktop client. Adobe AIR has always seemed too much like Flash to me – too bloated, too flaky. Pianobar is a great alternative to the native desktop client.

Pianobar is a command-line interface to Pandora; no GUI, just text. This means I can stick it in a detached screen or tmux session and it doesn’t take up space when I don’t need to look at it. This is a huge plus, especially when I’m working on my laptop and don’t have a lot of screen real estate to spare.

Even better, pianobar can be configured to call a script whenever certain events occur – when the track changes, for example. This is really cool because it opens up all kinds of possibilities for integrating Pandora playback with other applications and services.

I wrote a little script called Pianogrowl a while back (to be fair, I ported someone else’s Bash script and added a couple extra features) that displays a Growl notification containing the album art, title, artist and album for the currently-playing track whenever the track changes. It also displays a notification if pianobar ever has connection or authentication problems. Being able to do that with Pandora tracks is pretty cool; I wish more applications had these kind of hooks exposed out of the box.

If you use Pandora a lot and spend a lot of time in the console, you might want to give pianobar a try.


Secure Your SSH Keys

Posted: October 16th, 2011 | Author: | Filed under: Advice (Unsolicited) | No Comments »

Until about a week ago, I used password-less SSH key pairs. I would keep a private key on a machine and stick its corresponding public key in authorized_keys files for my accounts everywhere else so that I would be able to log into any machine from any other without using a password. I figured the only way something really bad could ever happen is if someone were to get ahold of one of my private keys – and what are the odds of that happening, right?

Turns out Murphy’s Law applies here too. Somehow, someone got ahold of one of my private keys last week. I won’t go into the gory details of how, mostly because I don’t know exactly how they did it. I spent a couple days last week frantically changing every password I have and regenerating all of my key pairs. I then nuked the offending machine from orbit.

To say the whole event was unsettling is the mother of all understatements. The fact of the matter is that I have no idea what they did, what they took or didn’t take, what files they accessed … it’s that lack of information that’s the most terrifying. I’m not a computer security expert by any means, but I’m not a complete layman; I took a lot of precautions on the server in question to make sure this wouldn’t happen, and it did anyway.

This whole event triggered a lot of research and soul-searching. Here are some thoughts/recommendations that have come out of that process.

One of the blogs I read on the subject said that password-less SSH keys are “like credit cards without PIN numbers”. The analogy is pretty appropriate, I think. Let my mistake serve as an example – just don’t use them. They just aren’t safe.

You should generate separate, strong key pairs for each machine you use. I used 1024-bit DSA keys before, but I’m using 4096-bit RSA key pairs everywhere now. 4096-bit keys have modest performance overheads relative to 2048-bit RSA keys (ssh-keygen‘s default), and should still be really hard to compromise by brute force for the next couple decades barring some major theoretical advance or access to an enormous botnet. Nothing’s foolproof, of course, but it’s a start.

The keys you generate should have distinct passwords. That is, keys’ passwords should be both distinct from your normal password for that machine and distinct from one another.

Use ssh-agent to save yourself the headache of providing a password every time. The web is littered with tutorials on how to use ssh-agent, so I won’t talk about that here. These days, I start an ssh-agent process when I log in, and make sure it gets terminated when I log out. See this Github gist for the appropriate incantations to make this happen. On OS X, it’s even easier; ssh-agent has been nicely integrated into Keychain and launchd since Leopard. Really, there’s no excuse not to use ssh-agent.

Finally, minimize your attack surface. If a service can run in a separate user account with no privileges, it should. If a service must run as root, run it in a sandbox. If a port doesn’t absolutely have to be open, it shouldn’t be open. If a machine can still do its job while not being world-routable, make it non-world-routable.

Personally, I’m done running my own publicly-visible servers. Unless you’re in the business of running and securing servers (and have the known-how to keep them secure), I just don’t see how it’s worth the trouble.