Posted: May 12th, 2012 | Author: Alex | Filed under: Useful Software | No Comments »
I’ve struggled with research paper management for years. I gushed at length about Mendeley last year, but lately I’ve been having some of the same problems with Mendeley that I had with Evernote.
Its metadata lookup functionality, while convenient when it works, doesn’t seem to work that often for me anymore; this might be because I’m looking up newer papers or papers in an area that they don’t have a lot of metadata coverage for, who knows. Mendeley’s note-taking feature leaves a lot to be desired; you essentially have a Notepad window that you’re writing text down in. Too much formatting is distracting, I’ll admit, but I’d like to be able to at least bold and italicize things every once in a while. I also ran into the ceiling of Mendeley’s free space and had to start paying for extra storage (if you’ve read my markupserve post this is probably sounding familiar), which is irksome considering that I’ve got so much free Dropbox space available. Exporting BibTeX entries for individual papers in Mendeley is a lot more tedious than I think it should be. The social features of Mendeley would be interesting if any of my friends actually used them, but they don’t, so I don’t.
So last month, I sat down and figured out what I wanted – what I really wanted – out of a paper management system based on what I’d done with papers for the past five years. The list basically came out like this:
- I want to be able to take notes on a paper, both outside and inside of the PDF. It would be nice if I could do things like add images, formatting and links to notes occasionally.
- I want to be able to generate a BibTeX for a single paper or groups of papers quickly.
- I want to synchronize everything across computers without having to think about it.
Everything else, I felt, was a secondary concern.
Taking notes inside a PDF was taken care of – the PDF standard has supported inline annotations like highlights and comments for a long time. Taking rich notes outside a PDF is also not that hard – I could do what I’ve been doing with markupserve (Markdown, redcarpet and emacs). Generating BibTeX without dealing with BibTeX’s quirkiness required a metadata format that would be easy to convert into BibTeX. Thankfully, YAML‘s a pretty painless way to store information like this. As long as this whole thing stays on the file system, I could keep it synchronized with Dropbox. I figured that search wasn’t that big of a deal; as long as the PDFs were searchable and everything else was plaintext, I could just search with Finder.
The only thing I had left to write was the thing that kept PDF, notes and metadata in one place and allowed me to manipulate them and generate BibTeX. I wrote paper-pile to do this. Once paper-pile was relatively stable, I wrote a quick-and-dirty Python script that loaded my Mendeley library into paper-pile by parsing Mendeley’s BibTeX dump. After paper-pile’s basic functionality was done, I wrote a simple web server that would display formatted notes using the same libraries I used for markupserve. From start to finish, I figure it took me a couple weeknights to get it all the way I wanted it; most of the complexity was getting Mendeley’s library to import properly.
It’s not perfect – for example, I’m pretty sure all hell is going to break loose when I generate BibTeX for a paper whose author’s name has an accented character – but it gets the job done. When I encounter a bug or a feature that I suddenly wish paper-pile had, I just add it in. If I want to get BibTeX for a bunch of papers at once, I just list the papers’ paper-pile keys and pipe the list through xargs. No impedance mismatches, no GUIs getting in the way, and I’m in no danger of running out of room for papers on Dropbox.
If this sounds like something you’d want to use yourself, paper-pile is available on GitHub. I make no guarantees as to its performance or correctness (and you probably shouldn’t make the web UI world-visible), but I use it myself every day, and that has to count for something.
Posted: November 28th, 2011 | Author: Alex | Filed under: Useful Software | 1 Comment »
I’ve gushed at length in the past about how much I love Evernote. I upgraded to Evernote Pro about half a year ago so that I could attach arbitrary file types to notes. After a while, though, a couple of things about my workflow with Evernote were starting to irk me.
First, I wasn’t even coming close to using Evernote Pro’s generous data cap. Second, I wasn’t sure that paying $5 a month so that I could attach a PDF to a note was worth it given how infrequently I was attaching PDFs to notes. Third, and maybe most critically, their WYSIWYG editor has always kind of bugged me; lists never really indent the right way, sometimes formatting leaks to the next paragraph in unpredictable ways, etc. It’s just really hard generally to get precision control over how your text looks in Evernote’s editor (at least on the Mac, I have no experience with the Windows editor). Its lack of Linux compatibility isn’t a problem for me anymore, but it might become a problem if I start working on Linux boxes again in the future.
I figured the “paying for space I don’t use” problem would be solved if I could find something that would sync with Dropbox. I have a couple GB of free space with them that is really underutilized and I would prefer using that space instead of paying for more underutilized space. For the “imprecise editing” problem, I decided that what I really wanted was the ability to write notes in a markup language like Markdown (which I also love and have gushed about here), do my writing in Emacs, and be able to view the notes in HTML easily.
I couldn’t find any pre-existing solution that I really liked, so I decided to roll my own.
Dropbox support wasn’t a problem; I just exported my notes from Evernote, used html2text to convert the exported notes into Markdown, and moved the folder containing the converted notes to my Dropbox folder, where it happily synchronized. Dropbox, you are awesome.
At first, I used Marked to render and view notes individually. Marked is a great app, but I wanted to be able to interact with rendered versions of my files more quickly. I had thought about writing up a read-only filesystem with FUSE that would do the rendering of Markdown to HTML transparently and just present a filesystem of HTML files, but that sounded like overkill even for me. I figured that writing up a simple web server with Bottle to display the rendered files would get me to a working solution much more quickly. After a couple hours of coding on Thanksgiving, I had MarkupServe working.
MarkupServe is pretty basic at this point. It’s given a directory tree containing markup files and presents that directory tree as a directory listing similar to the one httpd gives you if you don’t have an index page. It has simple keyword search (which just runs grep at the root of the directory tree and HTML-ifies the results) and renders notes to HTML on-the-fly when they’re clicked on. I made MarkupServe extensible enough that it should support more than just Markdown, in case others would find something like this useful but want to use other markup languages. It’s not the fastest or prettiest thing ever, but it works.
The last hurdle was making attaching files in Emacs comfortable. Evernote exports all attachments for a note note.html in a directory named note.resources, so I figured I’d stick to that convention. I made MarkupServe ignore directories that ended in .resources so that it wouldn’t clutter up the file listing. Then I wrote a couple little elisp functions that create a .resources directory for a note and “attach” a file by copying it into the appropriate .resources directory and inserting a Markdown-style link to the new copy. I’ve posted those functions in a GitHub gist if you’re interested in looking at those.
One big piece that this system is missing is a mobile app. Evernote’s iPhone app is terrific, and I’m going to miss it. At this point, my solution is to use Elements to edit notes and add photos taken with my phone’s camera to notes manually if the need arises. It’s sort of awkward, but I used the image attachment feature so infrequently in Evernote that I’m not really concerned about it. The system also lacks Evernote’s slick image OCR capability, but that was another feature I never really used (my handwriting’s pretty awful and the OCR could never really parse it well).
I’m sure I’ll tweak this setup considerably as I get more experience with it, but it was surprisingly quick to throw together and has proven really useful so far. Hopefully open-sourcing this stuff will help any other people who might have a similar itch to scratch.
Posted: November 3rd, 2011 | Author: Alex | Filed under: Useful Software | Comments Off
Some time ago I talked about using GNU Screen to effectively manage a bunch of terminal windows. Turns out screen has some serious competition: tmux.
One thing that’s pretty cool about tmux is the way it handles windows. tmux’s window management model is purely client-server; windows are clients, and clients are managed by a single server tmux process. This allows you to do things like move a window between sessions or attach the same window in multiple sessions. I haven’t run into a situation where this was useful yet, but it’s nice to know that it’s possible.
The thing that I have found useful is the fact that tmux is reasonably scriptable. In order to get the list of windows in a screen session, I had to create a dummy window, stuff the real window list into it, dump the contents of the dummy window to a file and parse that file. With tmux, I call
tmux list-windows
from any window in the session, or
tmux list-windows -t <session name>
outside the session, and parse the output. Not only that, but tmux displays none of the weird “I can only make calls that change a session once per second” problems that I’ve been seeing in practice with screen.
Like screen, tmux won’t auto-sort windows by name or make their numbering contiguous. My py-screnum tool for sorting and compacting Screen windows takes several seconds to re-arrange windows and is 95 lines of Python (according to sloccount). The analog for tmux, tmux-screnum, is only 57 lines, and re-arranges windows instantly. I won’t claim that either of these programs are minimal, but 50% fewer lines of code and faster is a winning combination.
There are times when I still have to use one or the other; many non-BSD machines come with screen installed by default but not tmux. I’m starting to prefer tmux whenever I have a choice, though.
Posted: October 31st, 2011 | Author: Alex | Filed under: Useful Software | Comments Off
I get a lot of use out of my Pandora account, but I’ve never really liked their desktop client. Adobe AIR has always seemed too much like Flash to me – too bloated, too flaky. Pianobar is a great alternative to the native desktop client.
Pianobar is a command-line interface to Pandora; no GUI, just text. This means I can stick it in a detached screen or tmux session and it doesn’t take up space when I don’t need to look at it. This is a huge plus, especially when I’m working on my laptop and don’t have a lot of screen real estate to spare.
Even better, pianobar can be configured to call a script whenever certain events occur – when the track changes, for example. This is really cool because it opens up all kinds of possibilities for integrating Pandora playback with other applications and services.
I wrote a little script called Pianogrowl a while back (to be fair, I ported someone else’s Bash script and added a couple extra features) that displays a Growl notification containing the album art, title, artist and album for the currently-playing track whenever the track changes. It also displays a notification if pianobar ever has connection or authentication problems. Being able to do that with Pandora tracks is pretty cool; I wish more applications had these kind of hooks exposed out of the box.
If you use Pandora a lot and spend a lot of time in the console, you might want to give pianobar a try.
Posted: September 10th, 2011 | Author: Alex | Filed under: Computers | Comments Off
Sometimes when I’m bored or I’m losing focus at work, I’ll start doing what I call the “social network polling loop”:
- Repeat until several loops proceed without update:
- Check e-mail
- Refresh Facebook
- Refresh Google Plus
- Refresh Reddit
- Refresh Google Reader
- Load latest tweets on Twitter
- Check website stats on Google Analytics
It’s almost a reflex action, and it’s one that eats time. I’ve been trying to get myself out of the habit; it wastes time that I should be spending doing something productive. In addition to polling sites like these, I find that I spend far too much time every day looking at them when nothing has changed.
Thankfully, software can be a big help on both fronts here.
Eliminating the Need to Poll
There’s a lot to be said for just stopping yourself from polling websites in the first place. The knee-jerk reaction to this sort of approach is, “But what if I miss something?”
I’ve talked about RSS feeds here before; they’re a really good way to stay on top of changes to websites without polling them. Unfortunately, many social networking sites don’t make RSS feeds of their content available. I’ve basically given up checking Facebook regularly because of this; I’ll only look at it when the mobile app pings me or it sends me e-mail.
Other sites make RSS feeds available, but put them behind an authentication mechanism. Sadly, Google Reader still lacks support for authenticated RSS feeds. This is kind of a drag, since most major user-specific feeds are behind some sort of authentication these days.
My typical workaround for something like this is to build a wrapper around the protected RSS feed in Yahoo! Pipes. The wrapper performs the authentication and reads out the resulting RSS. After subscribing to the pipe’s private URL, I’ve got a feed that Google Reader will be able to read. The thing that’s great about Yahoo! Pipes is its ability to pass the feed through all manner of operators (filters, joins, and so on). This is great if you want to only get news on a particular topic from a site that only provides one “firehose” feed.
Changing the Access Method
Twitter is one of those services that lends itself quite well to polling; follow enough people and you’re guaranteed to be receiving at least one update every couple of minutes. They even make it easy to leave Twitter open and tab back in to load new tweets every few minutes “so that you don’t miss anything”.
Getting the Twitter RSS feed set up was pretty easy thanks to Steffen Grunwald’s status feed service; I was worried that I’d have to mess around with OAuth to make it work, but thankfully Steffen did the hard work for me.
Once the feed was up I found that there were just too many incoming tweets for me to get through, so I passed the feed through Yahoo! Pipes. Specifically, I filtered out any tweets that don’t a) contain links or b) contain a question mark. This essentially creates a “tweets that are asking questions or sharing a link to something” feed, which are the tweets I would least like to miss. I might expand this to include retweets at some point, but usually retweets include links anyway, so it works pretty well as-is.
Upper-Bounding the Time Suck
When it comes to quashing this social network poll loop, the spirit is willing but the flesh is often weak. This is where enforcement comes in. In Chrome, I use the StayFocusd plugin to limit myself to 15 minutes of total social network/feed reader time between 8 AM and 8 PM. StayFocusd isn’t very feature-rich (it doesn’t support multiple block sets with different timings, for example), but it serves its purpose pretty well. Whenever I have to bypass the block, I can just open a window in Incognito mode or disable the plugin. Unfortunately it still takes a deal of willpower to keep myself from abusing that ability.
E-Mail, The Time Waster Du Jour
The one piece of the polling loop that I haven’t managed to remove quite yet is e-mail. Usually I don’t poll my e-mail account, but I do get a lot of mail and have gotten myself into the habit of reading and/or responding to it pretty quickly after I receive it. I’m convinced that the frequent new e-mail notifications I keep getting are distracting, but the nature of my job and the way my co-workers and I typically use e-mail makes only checking my e-mail twice a day impractical. If any of you have strategies or experiences with this, I’d really like to hear them.
Posted: September 3rd, 2011 | Author: Alex | Filed under: Advice (Unsolicited), Computers | Comments Off
Last week, I talked about the bathtub curve and what it can tell you about bad hard drive reviews. I’m going to expand on that a little this week and talk about how replacing your drive doesn’t necessarily mean you’re solving the problem. Then we’ll briefly touch on another common source of consumer angst, hard drive sizes.
Correlated Failures
A common pattern in one-star hard drive reviews is the following:
First drive failed, sent it back. Replacement failed two weeks later. You computer people are all monsters. I’m going back to using a typewriter.
If you buy a drive from a company and it hits the wrong end of the bathtub curve, they will usually replace it. This is basically what hard drive warranties are for: they prevent customers on the wrong side of the bathtub from getting screwed over. Unfortunately, they will probably just pull the next hard drive box off the wall and send you that one. Those two drives probably arrived at their warehouse on the same shipping palette, which probably means that they were manufactured and left the factory at approximately the same time. If there was an unnoticed defect in that particular production batch, you’re much more likely to see the same problem on the replacement that you had with the original.
Incidentally, this is why you should never buy multiple instances of the same drive at the same time if you’re planning on building a RAID array with them; correlated failures might come back and bite you in a big way.
Drive Sizes Lie to You
Stop me if you’ve heard this complaint before:
I bought a 500GB hard drive, but it’s only got 465.7GB of space! I want my 34.3GB back!
I talked about this last year in the context of Wolfram Alpha. The short answer is that drive manufacturers are advertising their capacities in powers of ten and shipping with capacities in powers of two.
Operating systems vendors seem to be converging on lying to their customers rather than confusing them; Apple’s Disk Utility, for example, gives capacities in powers of two and units in powers of ten (500GB when it’s really 500 GiB). In my opinion, this is like setting the value of pi to 3.2; not only does it mask the problem, it hides some of the fundamental truths underlying it.
Posted: August 27th, 2011 | Author: Alex | Filed under: Advice (Unsolicited), Computers | Comments Off
Last week, one of my external drives failed, and another indicated that it’s about to die by failing a read and causing my RAID volume to degrade. Neither of these failures were surprising; both drives were well outside of their warranty periods. The way these drives failed and the (sadly ongoing) quest to replace them has brought up a couple of things that I’ll talk about here.
Failed drives means shopping for replacements. When it comes to external hard drives, we seem to be presented with a multitude of choices, none of which are good. Judging by reviews on NewEgg, external consumer-grade hard drives are some combination of:
- Unreliable
- Slow
- Feature-poor
- Plagued with awful customer support
I was surprised at how many of the one- and two-star reviews for hard drives on NewEgg (and virtually everywhere else that sells drives) display some of the same common misconceptions. It’s a sad indicator that as an industry, we still haven’t figured out how to make computers anything less than magical and inscrutable to the average consumer. I’m going to lay out a couple of those misconceptions in the next couple of posts. They’ve doubtlessly been rehashed elsewhere, but these are things that deserve repeating.
The Bathtub Curve
If you were to plot failure rate of hard drives versus time on a graph, the graph would probably look like the blue line in the graph below (thanks, Wikipedia!):

This blue line is what’s referred to in reliability engineering as a bathtub curve, because its shape is evocative of a bathtub. In plain English, the bathtub curve basically says
- Things that are shipped with defects usually fail early.
- Things that work as designed still eventually wear out.
- In the middle, anything can happen, but failure is less likely.
Many one-star NewEgg reviews I came across were some variant of:
Drive fails after X days of use. What a piece of crap. I’m never buying from this company again.
These are people who have unfortunately hit the wrong end of the bathtub curve.
Why does this happen? Well, some of it has to do with manufacturing; with something this intricate there will inevitably be defects, regardless of how much quality assurance you put into it. Some of it might have to do with what happens to the drives during shipping. Sometimes there is actually a systemic defect in a particular model or production batch that goes undetected by quality assurance; this usually results in a class action lawsuit months or years down the road.
The best bet, as I’ve stated here several times in the past, is to never assume that the drive will last another day. I was shocked at the number of times I read a review like this:
Bought this drive and it died three days later. Now 50,000 photos of my cat Muffins are gone. I hate you, Seagate, and so does Muffins.
So you bought this drive, and copied your photos to it, and then … you deleted the originals?! I’ve said it before and I’ll say it again: if there’s only one copy, it is only a matter of time before you lose that data.
Next week: why the replacement for your failed drive is more likely to fail, and why hard drive manufacturers are lying to you.
Posted: August 19th, 2011 | Author: Alex | Filed under: Useful Software | 3 Comments »
OK, so I’m about 7 years late to the party on this one, but man oh man do I love Markdown.
I spend a lot of time dealing with text, but most of the time it’s text designed to be consumed by compilers and interpreters rather than people. When I write people-facing text, it’s almost always in LaTeX. In the process of dealing with these kinds of writing tasks, I’ve become really intolerant of WYSIWYG text editors. They’re just not precise enough.
Evernote’s a notorious culprit here. I tell it to bold a line, it bolds the next blank line too. I change the font, it gets changed back in weird places. Bulleted lists sometimes re-bullet or re-indent themselves in weird ways. It’s irritating. I’ve had similar problems with WordPress’ visual editor.
In short, I’m one of those weird people who doesn’t care what it looks like on-screen while I’m editing it as long as the finished product looks like I want it to look.
Markdown brings the kind of precision I’m used to in LaTeX to the realm of writing HTML. One thing that it loses is all the markup (hence the name, I suppose). For example:
**bold** produces bold text, *italicized* produces italicized text. Similarly terse, readable syntax for headers, links, images, and so on. All of the common stuff that you’re used to when writing HTML, without … well, the HTML.
Apparently Markdown got huge a few years ago, and obsessive programmers like me have integrated it into all sorts of things. There’s a Markdown plugin for WordPress (in which I’m editing this post). Of course, there’s a Markdown mode for Emacs. The one piece of my daily routine that lacks Markdown is Evernote, unfortunately. I think I might be able to get around that with a clever combination of evernote-mode and markdown-mode. If I can figure something out, I’ll post it here.
Posted: August 13th, 2011 | Author: Alex | Filed under: Computers | 2 Comments »
Ever since I started exploring the more advanced features of GNU Screen, I’ve been using it constantly. For years, I never really thought of it as much more than a way to keep a shell open when I switched machines. I was also turned off by its appropriation of ctrl-A for commands, since it overwrites “jump to beginning of line” in Bash and emacs. You can still get at it, but the shortcut is ctrl-A + A, which never felt anything less than completely awkward.
Thankfully, Keaton sent me a copy of his screenrc which contains, among other things, an escape-sequence remapping command. I changed the control code to ctrl-O, since I couldn’t find any command sequence that conflicted with it (by adding escape ^Oo to my .screenrc).
I’ve posted the rest of my .screenrc file as a GitHub Gist.
Screen’s Hidden Gems
I used to just use screen for attaching and detaching single terminal windows, if I needed to leave a process running in a shell for example. In the past couple of months, I’ve gotten more of a taste of its capabilities, which are numerous. Among its more awesome features:
- Multiple Terminal windows per session that can be attached and detached as a group.
- Horizontal and vertical screen splitting – just like iTerm and Terminator, but
screen predates them both by years and allows you to attach and detach terminals from panes at will.
- Notification on activity and inactivity (that is, have
screen notify you when a terminal starts or stops producing output).
I honestly wish that I’d started to use screen more heavily a long time ago.
Fixing Window Fragmentation
I typically have a lot of windows open in a screen session at a time. If you close and open windows a lot, you’ll find that the windows’ numbers start getting fragmented. Rather than having windows numbered 1,2,3,4, for example, you’ll have windows numbered 1,3,4,7. Maybe this pegs me as an obsessive, but that’s sort of irritating. Also, I’d like to have the windows sorted by name. Because, you know, I sort of have a thing for sorting.
There’s a Bash script that’s been floating around called screnum that renumbers windows within a session so that they’re sequential, but I could never really get it to work reliably; it had problems with windows that have the same name and was really slow. So I ported it to Python (because in my opinion, Bash scripts get gross once they get more complicated than short lists of straight-line commands), fixed the same-window-name bug, and dramatically sped up the window sorting routine.
I posted the script, screnum.py, to Github. If you’re interested in using it or have ideas for improving it, have at it.
Posted: July 22nd, 2011 | Author: Alex | Filed under: Computers, Ranting | 1 Comment »
I’m going to go out on a limb and say that all existing methods for dealing with passwords are awful.
Just look at the options you have. You can pick an easy-to-remember password, but then it’s easy to crack. You can pick a complicated password, but then it’s difficult to remember. You can use the same password everywhere, but if a thief grabs one password, then they’ve got the keys to the kingdom. You can use a different password everywhere, but then you either have to remember 100 different passwords or come up with some easy-to-remember password generating algorithm for remembering them all. Either of these approaches require weakening the password somewhat. In short, passwords that are hard to guess are hard to remember.
Then there are password managers. Give them one master password, and they’ll generate a ton of cryptographically strong passwords for you. The only problem is that they present a large target for hackers, and they only need to guess one password – the master – to have the keys to the kingdom.
Two-factor authentication makes things a little better. In addition to providing a password, you have to provide information generated by something that you have. Google’s two-factor authentication system uses an application on your phone, for example. The problem there is that if you find yourself without your second authentication factor – say it gets stolen or broken – you’re in serious trouble, because you need that second factor to authenticate. The same thing goes for your master password in a password manager. If you lose it, you lose access to everything.
What makes it even worse is that a lot of websites completely fail at password security. They store your passwords in plaintext and e-mail them to you (in plaintext) when you forget them. They don’t accept symbols or capital letters in their passwords. They impose a maximum password length limit. And unless they support some form of external authentication like OpenID, you are at their mercy.
Let’s not leave key loggers out here. If you’ve got a piece of malware on your system logging your keystrokes, it doesn’t matter how secure your password is; if you enter it by typing it, the key logger will see it. The more complicated your password is, the easier it will be for scripts to extract.
So, the whole notion of having passwords seems fundamentally inadequate. Of course, nothing’s going to be perfect – even biometric methods like fingerprint scans and voice recognition aren’t completely foolproof. The problem really comes down to verifying that you are who you say you are. And if it’s so easy to make and get fake IDs in the physical world, why should faking your identity in the digital world be any harder?