Sort benchmark records!

Posted: July 2nd, 2011 | Author: | Filed under: Computers | Comments Off

My thesis work focuses on highly-efficient large-scale data processing. My group and I wrote a system called TritonSort that does large-scale sorting as a proof-of-concept for some of the techniques and technologies for developing more efficient data processing systems. In 2010, we broke two world sorting records with it. This year, we improved our own results in those two records by almost 50% and broke an additional three records.

Needless to say, we’re pretty excited about these results, but there’s still a lot of work left to be done. We’re currently working on gaining experience with and improving the general-purpose version of TritonSort; hopefully we’ll be able to publish some more details on that work sometime in the next few months. If you’re interested in learning more about the project, check out tritonsort.eng.ucsd.edu.


Software I Use Daily: Evernote

Posted: June 21st, 2011 | Author: | Filed under: Useful Software | Comments Off

Few tools have proven more useful in my day-to-day life than Evernote has. Evernote’s design is pretty simple; you can make notes that can include pictures, sound or documents as attachments, search through your notes, bundle them up into folders or tag them with tags. Notes get synchronized between any device that runs an Evernote client. Any image that gets included in a note also gets passed through OCR so that any words that appear in the image are indexed and searchable as well.

I probably refer to or write a note in Evernote at least half a dozen times per day. Every time I feel like I’m going to need to look up a piece of information more than once, it goes into Evernote. If I’m trying to figure something out, any information I find on that topic goes into Evernote.  As I refine my understanding about something, I’ll turn that raw info dump into something more compact and digestible. The ability to jump back to previous versions of notes is really helpful for this refining phase.

What I use Evernote for most is daily research logs. I really wish that I had started writing daily research logs years ago. They’ve been immensely helpful for organizing my thoughts and preventing me from doing redundant work. They also provide an easy-to-read archive of the work I’ve been doing, which makes preparing for status report meetings a lot easier.

I’ve been really pleased with Evernote overall, especially since they gave their mobile app a much-needed UI redesign. If you’re looking for a place to dump all the stuff that won’t fit in your brain, Evernote is definitely worth a look.


Computer Science Goes to Hollywood

Posted: June 11th, 2011 | Author: | Filed under: Computers, Opinions (Uninformed) | Comments Off

Update: Looks like Matt Welsh posted on the same topic while I was writing this. Check out his perspective, it’s a good read.

I just read an article in the New York Times titled, “Computer Studies Made Cool, on Film and Now on Campus” talking about the recent upswing in enrollment in computer science that, the article claims, is due in part to Hollywood’s favorable portrayal of the tech world in films like The Social Network. They also say that computer science programs are shifting away from a curriculum focused on theory toward one that integrates more applications and implementation in an effort to boost the major’s appeal and keep enrollment trending upward.

The fact that computer science enrollment is on the upswing is good; the US (and the world in general) needs more computer scientists. One missing piece of the puzzle that isn’t explored in great detail in the article, however, is these new students’ motivations for pursuing a degree in computer science. One thing that makes me anxious is the segment of new students who are becoming computer scientists to “become the next Mark Zuckerberg”. Spun one way, this means “create something that millions of people use every day”, which is great. Computer science is most exciting, in my opinion, because it enables you to create great things out of virtually nothing. Spun another way, this means “make billions of dollars and have a movie made about you that’s scored by that guy from Nine Inch Nails”. This is disconcerting. Becoming a computer scientist to become the next Mark Zuckerberg is like playing basketball to become the next Michael Jordan. It will happen to a handful of people, and takes a lot of hard work, dedication and luck, but the vast majority won’t even come close. If you’re not playing basketball mostly for the love of the game, you’re setting yourself up for a lifetime of frustration and disappointment. Same thing goes for CS.

I’m also not convinced that “banishing the perception of the computer scientist as a geek typing code in a basement” is entirely a good idea. I hate to break it to you, but most of computer science is geeks typing code in basements. Well, not necessarily in basements. Facebook was created and continues to be created by geeks typing code in office buildings. Apple was created by geeks typing code (and soldering stuff together) in a garage. Name a technology company whose products you use every day, and it started with a bunch of hackers cranking out code in a dumpy little room somewhere. This is like trying to banish the perception of the blacksmith as a big sweaty guy who spends most of his time in front of a furnace making things out of hot iron – sorry folks, but that’s just the way it is. Blacksmiths don’t just wave their arms and cause awesome swords to appear all day.

That’s not to say that our public image could use a bit of improvement. In the past, computer scientists were portrayed in film and television as a bunchy of pale, asocial man-children with an over-fondness for Mountain Dew and Cheetos. If we’re getting away from that stereotype, then that’s fantastic, but by and large I don’t think we are. I withhold judgment on The Social Network (I haven’t seen it yet, but it’s in my Netflix queue), but most of what I’ve seen has just shifted to portraying us as a bunch of ludicrously wealthy, pale, asocial man-children.

It’s true that the theoretical fundamentals of computer science are hard to make exciting without applications. However, if we swing in the other direction, over-emphasizing application at the expense of fundamentals just to enhance our appeal, we run the risk of over-narrowing the major’s focus and making it less useful and less educational. If that happens, I’m worried about the quality and quantity of computer scientists (who actually like what they do) that we’ll produce. I don’t think it will be a problem at places like UW, USC and Stanford, but the reality is that most people don’t go to those places.

I originally pursued computer science because I liked messing around with computers and I thought I wanted to make video games. If I had taken a major in game design in college, I might be working on the next Madden at Electronic Arts and hating my life. The breadth of my undergraduate education exposed me to areas of the field that I didn’t know existed. Now I sort things really fast every day, and I’m really enjoying myself. And I spend most of my time writing code, although I don’t do it in a basement.


Airfoil, the Whole-House Music Streaming Killer App

Posted: May 31st, 2011 | Author: | Filed under: Useful Software | Comments Off

iTunes has had the ability to stream music to a remote set of speakers for a while now. At first it was just devices like the Airport Express, but now they’ve struck agreements with a bunch of different companies that make speaker docks and A/V units so that iTunes can stream to them too. Being able to play music out of a remote pair of speakers is great, but there are a few key limitations to AirPlay that have always left me less than sold on the idea.

First, AirPlay is all about iTunes streaming to something. Let’s face it, the only reason most people use iTunes is because it’s the application for synchronizing music to iPods. It works fine as long as your music library isn’t enormous, but it’s far from the most feature-rich music player out there and its support for things like Internet radio hasn’t improved much in 7 years. If you want to stream Pandora or Last.fm to remote speakers, AirPlay  just doesn’t fit the bill.

Second, the device set that recognizes AirPlay is still fairly limited and decidedly non-free. If you want an Airport Express, you’ll be paying $100 for what is essentially a wireless access point with an audio out jack. This has always rubbed me the wrong way. I already have a computer with an audio out jack and a network connection, why can’t I just stream to that? For that matter, why can’t my computers stream audio to each other?

Enter AirFoil.

Attach AirFoil to an application (pretty much any application), and it captures that application’s audio and sends it to one or more sets of speakers. AirFoil can stream to any AirPlay device as well as anything running the companion AirFoil Speakers application. The AirFoil server application runs on OS X or Windows, and there are versions of AirFoil Speakers for OS X, Windows, Linux and iOS (meaning it works on iPhones, iPads and iPod Touches too). The AirFoil server keeps all the audio streams to the various speakers magically in sync. AirFoil uses Bonjour service notification messages to find and advertise speakers, so AirFoil can see any speakers on the local network without the need for configuring anything.

When I want to listen to music from my iTunes library in my living room, I just fire up AirFoil on my desktop in the bedroom and stream through my media center PC in the living room. If I want an additional set of speakers in the kitchen (because hey, why not?) I can hook a pair of speakers to my phone and run AirFoil Speakers on it. This is something that would have cost me hundreds of dollars in additional, purpose-bought hardware to do without software like this. I’m extremely impressed by AirFoil; if any of this sounds remotely intriguing to you, I’d really recommend giving it a try.


Scripting best practices

Posted: May 15th, 2011 | Author: | Filed under: Computers | Comments Off

According to sloccount, TritonSort has almost 9,000 lines of scripts. 95% of those lines are Python, the remaining 5% are bash and awk scripts. They do everything from setting up our testbed’s resources to monitoring experiments and computing statistics over results. Throughout the process of writing, re-writing and iterating over all those scripts, I’ve distilled a few hard-won lessons about what works and what doesn’t work when it comes to writing them.

A lot of this is going to be Python-specific, since that’s what most of my scripts are written in. However, I think this advice can be applied pretty readily to your favorite scripting language.

A giant, snarly Bash script is almost never the answer. Tools like grep, sed and awk are extremely powerful, and I do most of my ad-hoc text analysis by chaining these tools together with pipes. Unfortunately, anything more complicated than a for loop in Bash tends to get messy really quickly. Also, scripts like this that snarf in unstructured text tend to be rather brittle; if the format of your input data changes over time, your scripts tend to break in interesting ways.

Treat your scripts like libraries. It’s almost never a good idea to stick everything your script is doing in global scope. Instead, make the actual body of the script a function and write a few lines of main() boilerplate that takes in options and arguments and calls that function. Once the main body of your script is a function, you can just import that function somewhere else when you want to compose scripts together, which will make your life a lot easier down the road.

Script functions should return (at least) semi-structured data. If your script produces unstructured text, at some point you’re going to have to parse it. That can get messy really fast. If you want a script’s results to be human-readable, make the script function return some data structure and have main() print it. Better yet, have a second function in the script that prints a readable version of the data structure the script function spits out, or make the data structure a class that overrides __repr__ or __str__.

Make your output portable. If you expect that a program written in another language is going to have to consume a script’s output, it’s a good idea to make that output easy for the consuming program to read. If you’re just dumping out a list of numbers, by all means just dump that list of numbers with one number per line, but for anything more complicated than that you’ll want at least some metadata telling you what all this stuff you’re dumping actually is.

We’ve been starting to use JSON more and more since it’s got reasonably good support across a bunch of languages, is brain-dead simple to parse and is reasonably structured without much of the extra bloat that XML imposes. If you’ve got a really complicated configuration file that needs to be validated, XML might be a better choice, but most of the time you really just want key/value pairs and some limited support for nesting and lists and JSON does that just fine. I’ve also heard that YAML is awesome, but I’ve never used it.

Document, document, document. I know I’ve been on a bit of a documentation kick lately, but seriously, Future You will thank Present You for telling him what exactly it is that putTheThingInThePlaceWhereStuffGoes.py does, what input format it expects, etc. Along the same lines, don’t use names like that one. Give your scripts descriptive names.

Hopefully this deters you from making some of the same mistakes we did. Happy scripting.


On (Lack of) Documentation

Posted: May 7th, 2011 | Author: | Filed under: Computers, Ranting | Comments Off

I am really starting to get irritated with the lack of documentation present in some “production-ready” open source projects. Issues related to lack of documentation have hamstrung me multiple times in the last few months and it’s really starting to get on my nerves.

If you’re writing a library, your documentation is just as important as your code. The simple fact is that your library, regardless of how elegant or fast or awesome it is, is completely useless unless it’s got decent documentation. Decent documentation falls under a number of categories – all of these categories are important.

Thorough, up-to-date user-facing documentation: This means tutorials and example code, but it also means things like wikis that can change as users start to expose common traps and pitfalls. The documentation should change as the code changes, which means it should be auto-generated whenever possible.

Helpful exceptions and assertions: Don’t just assert(b == "foo"); actually attach a meaningful message to the assertion so that I know why the assertion was made and what it means if it failed. If you can give me a permalink to a page telling me what I’m doing wrong, so much the better. And don’t just give me “b isn’t foo. Something’s wrong.” That doesn’t give me any information. Similarly with exceptions: throwing a GenericException without any accompanying message or stack trace makes me want to punch you (seriously, I’ve seen this happen many times and it’s really irritating).

Also, please give me a stack trace. If a failed assertion doesn’t give me a stack trace and indistinguishable copies of the same assertion appear in 20 different places, the only way I’m going to know which assertion just failed is to hook a debugger to the program and try to reproduce the error. That sucks, and sometimes it isn’t even possible (if the problem is non-deterministic or the situation that causes it to happen is rare).

In-code documentation: I don’t necessarily believe that you should have a line of comments for every line of code you write; requirements that rigid lead to a lot of “This line adds 2 and 7 together” comments that just make the code harder to read. If the code gets messy, write some inline comments explaining at a high level what the code is supposed to do. Your users will thank you and when Future You looks at the code that Past You wrote, he might have a chance of understanding what it was Past You was thinking.

First, be helpful: Many things about the usage of a library may seem perfectly obvious to you because you wrote the library. To a new user, some things may not be so clear. So many times in mailing lists and message boards I see threads that look like this:

User: “Here’s a code block; it’s throwing some random error. Anyone know why?”

Developer (in Comic Book Guy voice): “I do not understand why you users are so stupid. Clearly you must initialize the host key container before initializing the SSL session but after initializing the session transport. Worst. Users. Ever.”

Or this:

User: “Here’s a code block; it’s throwing some random error. Anyone know why?”

Developers: *years of silence*

This is a great way to lose existing users and discourage new ones from using your library.

Commit messages are a part of internal documentation: Documentation is just as useful for other library developers as it is for users. Commit messages are a great deal more important for developers than they are for users, but they’re part of your documentation nonetheless. I heard a great quote relating to this in a post on source control by Troy Hunt: “Write every commit message like the next person who reads it is an axe-wielding maniac who knows where you live”.

tl;dr: Your documentation will never be perfect. It will probably never even be great, unless you’ve got people dedicated to working on documentation. Despite this, small improvements can make a big difference. Popular libraries become popular because they effectively solve a problem that a lot of users have and because it’s easier for users to use that library than it is to solve the problem themselves. Good library design and talented programmers make the first part happen; the second part can’t happen without good documentation.


App of the Moment: Bowtie

Posted: May 4th, 2011 | Author: | Filed under: Useful Software | Comments Off

I synchronize my music to my iPhone and carry it around with me. I listen to it in the car, at work, in the grocery store, while doing laundry … it probably gets a good 6-8 hours a day of use. At work, I’m faced with what I thought was a very obscure problem. I want to be able to control music playback on my iPhone from my computer.

It’s not that my phone isn’t sitting next to me on my desk when I’m working. I could just reach over, double-tap the phone button and press the screen to switch tracks. That requires my hands leaving the keyboard, though, and reaching over to switch tracks has probably cost me hours of accrued time debt over the years (sort of like how they say that you spent entire days of your life in total tying your shoes).

It’s irritating enough that I can’t share the music library on my phone with the local network; that would solve the problem right there. Unfortunately, Apple hasn’t seen a reason to implement that feature. I could also synchronize my iTunes library between my desktop and laptop, but unfortunately Apple hasn’t made that automatic and painless enough yet, and I’ve tried various other techniques (rsync, third-party apps, hosting the library on networked storage, you name it) without success.

For the longest time, I figured that I’d just have to deal with it (horrible, I know). Yesterday, the blog One Thing Well pointed me at an application called Bowtie.

The desktop version of Bowtie gives you basic control of iTunes (play/pause, next and previous tracks) with keyboard shortcuts and will show you the currently-playing track in a customizable little desktop widget. That isn’t very unique by itself; there are dozens of iTunes remote control apps of various maturity and feature-richness out there, and they’ve been around for years. Where Bowtie distinguishes itself is in the $0.99 companion app for the iPhone. Pair the iPhone application and the desktop application together, and you can control the iPhone’s music playback using the same keyboard shortcuts you use to control iTunes.

Pairing requires that the phone and the computer can see each other on the network (I’m not sure of the implementation details, but it probably relies on Bonjour). I use a wired connection at my desk (because the building’s wireless network is flaky), so I set up network sharing and connected my iPhone to the shared network and that has worked flawlessly so far.

Overall I’ve been really happy with it. If you’re running into the same first-world problem that I am, it’s more than worth the $0.99.


Software I Use Daily: Mendeley

Posted: April 17th, 2011 | Author: | Filed under: Useful Software | 1 Comment »

Almost two and a half years ago, I wrote a post here about my efforts to transfer my piles of research papers into digital form. At the time, I was running a combination of Referencer and Beagle, using Referencer to keep things organized and Beagle to make it all searchable. Unfortunately, this solution didn’t work out as well as I’d hoped. The main reason for this was the problem of manual cross-platform synchronization for both the papers themselves and all the various metadata associated with them. I didn’t want to waste time figuring out how best to keep everything synchronized between my desktop and laptop (one of the reasons I use my laptop exclusively for day-to-day development now), so I lived with that solution for about a year.

At some point in late 2009, I was introduced to Mendeley by some friends of mine in the CSE department. It’s like they took my wish list for a paper management system and implemented it. It’s fantastic. Here’s why:

Written by researchers, for researchers. It’s very clear that this application was written by people who have had to deal with academic papers a great deal. It strategically attacks so many pain points associated with dealing with large paper volumes that I can’t help but think that the entire design process was guided by researchers that were fed up with the current state of affairs.

It’s cross-platform. It works for OS X, Windows and Linux. And when I say “works”, that doesn’t mean “the Linux version barely works and the OS X version has a wonky UI”, which is true of a lot of cross-platform software in my experience.

Flexible organization. Like organizing with folders? It’s got folders. Like using tags instead? It’s got tags. Want to use both? Sure, go crazy.

Free, effortless synchronization. You can synchronize up to 500 MB of papers (both metadata and data) to Mendeley’s servers for free. For $5/month, that increases to 3.5GB. I’m currently at 365 MB and I’m storing 520 papers, so those 500 GB of space will go a long way. In my experience, synchronization between Mendeley instances “just works”, even across platforms.

Embedded notes and annotations. This is the killer feature for me. There’s nothing too complex here; just highlights, the ability to stick a note at a point in the text, and a dedicated notes region per-paper with basic formatting. The key here is that those notes synchronize across platforms and are actually readable everywhere.

It’s social (groan). It seems that in the new bubble, every piece of software you write has to have some social aspect to it. Thankfully, Mendeley manages to do this in a reasonably well-scoped and tasteful way. Your papers’ bibliographic information is sent to Mendeley, and they use that information to better recommend metadata for new papers that other users import. You can also share papers with other Mendeley users through “shared collections” (limited to 10 people in the free version), which is really useful for study groups and research teams that have to refer to the same pieces of literature. You can also track how many people are reading papers that you wrote and stroke your inner narcissist.

Mendeley is one of those applications I wish I had known about years ago. If you’re looking for a solution to your paper management problem, I encourage you to give it a shot.


Kindle: First Impressions

Posted: March 21st, 2011 | Author: | Filed under: Computers, Opinions (Uninformed) | 2 Comments »

I decided to buy a Kindle last week, for a few reasons. I really wanted to start getting into e-books; they’re cheaper than hardcover for new releases now, I don’t have to wait for them to get delivered and they don’t take up space. Although I probably would have preferred an iPad if money were no object, I’m not really willing to spend $500 on a tablet when I already have a laptop.

I bought the WiFi-only model (didn’t really see myself needing the 3G version) and have been fiddling around with it for a couple of days now. My first impressions are pretty favorable.

I’m really surprised at how fast the e-ink display refreshes. I’d played around with an earlier-generation Kindle and a Sony e-book reader for all of about a minute years ago, and was really turned off by the refresh speed on the display. No such problems with the latest-gen Kindle, at least when it comes to reading and menu navigation; there are times when I find myself getting ahead of it, but most of that is off of my typical operating path.

The Amazon marketing hype on the display isn’t too far off; it really does look a lot like paper and is pretty easy to read without a light on (although I’m trying to save my eyes by not reading in dim light these days). I haven’t tried it in direct sunlight yet.

The quality of e-books on the Kindle varies depending on what you’re reading. Some publishers didn’t really put a lot of effort annotating things like chapters in their books, which makes navigation a challenge; Kindle’s navigation works by jumping to “locations” rather than pages, so you often have to search for the location corresponding to a page rather than the page itself if you don’t have a bookmark handy. I’ve only had this problem for the freebies on the Kindle store; the books I’ve actually paid for have pretty well-groomed metadata.

The fact that the Kindle doesn’t support the EPUB standard and instead uses its own DRMed format is irritating, certainly, but I feel like I can live with it, especially since converting EPUB to their proprietary format is supposed to be fairly straightforward. I’m pretty confident that eventually they’ll do the same thing the iTunes Music Store did and drop DRM entirely, or at least support EPUB natively with a software update.

I’m pretty happy with the Kindle so far. Does anyone else out there have one of these things? Any tips and tricks I should know about?


Backups Revisited Part 2

Posted: February 21st, 2011 | Author: | Filed under: Advice (Unsolicited), Computers | Comments Off

In this post, I’ll focus on the practical side of backups.

Last time, I asserted that in order for a backup to really be a backup, your data has to be automatically replicated on two different drives using two separate filesystems on two different computers that are geographically separated, and one of those backups needs to be able to go back in time by at least 24 hours.

Satisfying all of these criteria at once usually isn’t free, but it doesn’t have to be hard, and you’re probably closer to a workable solution than you think.

In this post, I’ll examine a few possible solutions and point out some non-obvious ones. This isn’t meant to be comprehensive, but rather serves to give a general flavor of the state of backup solutions.

Built-In Solutions

OS X’s Time Machine fits some of our criteria for backups, but falls short in others. You can back up to two different drives,  the filesystems are distinct, and you’re able to move the Time Machine drive back in time if needed. Backing up to other computers with Time Machine is possible, but it’s unsupported and not very reliable (at least in my experience).

Using a network-enabled USB drive or a Time Capsule is essentially equivalent to backing up to another computer (they’re practically little computers themselves), but that costs a good deal of additional money. Unless you really know what you’re doing and are willing to take the time to make it work (and keep it working), making remote backups work with the Time Capsule is not really feasible.

Although I’ve never used it personally, Windows 7′s Backup and Restore feature appears to be feature-for-feature equivalent to Time Machine, but without Apple’s high-gloss glittery front-end. If you have a Professional or above license, it can backup to network shares, which is an improvement over Time Machine but requires you to pay more for the OS itself, which is kind of a drag.

You can use rsync by itself on pretty much any platform or with any one of a plethora of (usually OS-specific) front-ends. rsync can push files to pretty much anywhere and it supports incremental backups, so you could definitely satisfy all your backup demands with rsync, although it would take a little bit of work to get everything set up.

Sneakernet

If you’ve got a USB drive and are willing to lug it back and forth, there’s a relatively inexpensive way to come pretty close to an optimal backup solution. If you leave your USB drive at work, take it home and do backups every Monday night and bring the drive back to work on Tuesday, you’ve got your bases mostly covered. The problem here, of course, is that you have to remember to take the drive home with you, your backup granularity is kind of coarse (if your drive dies, you lose at most a week’s worth of stuff), and there’s a small window of vulnerability when your USB drive is at home. You get geographic distance for free, though.

Enter the Cloud

There are several companies that have recently started to offer so-called “cloud backup” services that provide you with some amount of storage space to which you can back up. Notable companies in this space include Mozy, Backblaze and CrashPlan. With cloud backup services, you easily satisfy all of our desirable backup properties simultaneously (unless you happen to live next to one of their data centers, of course), but it will usually cost you and doing the initial backup over the wide-area Internet may take weeks or months. Most services will ship you an external hard drive to which you can do your initial backup, but you have to eat the cost of a hard drive (~$100-150) for the privilege of writing to the drive and mailing it right back.

In my opinion, the standout favorite contender in this space is CrashPlan for one simple reason – they allow two computers running CrashPlan to back up an unlimited amount of data to each other for free. So if you and your friend both want to run backups, you can back up to each other.

Unexpected Surprises

If you care about your photos, you’ll want them backed up. If you share your photos on a site like Facebook or Flickr, you’re most of the way to an ideal backup of those photos. The only major drawbacks here is that restoring your photos isn’t trivial (you have to re-download them, although there are applications that will help automate that process) and you might incur a loss in quality when the site scales your image down. If you don’t mind those things though, these are great inexpensive ways to backup.

“But what about our time travel requirement” you might ask? If you’re editing photos, you might care about reverting to a previous edit. Most of the time though, you take pictures, upload them and never modify them again. Static data like pictures or music, where individual items never change but the set of items is expected to grow larger, is easier to back up because as long as you never delete anything the time travel requirement isn’t necessary.

My Setup

I have three computers that I care about – my desktop, my laptop and my home theatre PC. My desktop has an OS X partition and a Windows 7 partition and my laptop runs Debian in a VM, so I need to back up five filesystems in total. The HTPC has external storage drives that hold movies and music.

I admit that I break my own rules a bit – the external media drive on the HTPC is a RAID 1 with no other backups. I know, scary right?

Every system runs CrashPlan, even the Linux VM on my laptop. All systems backup to two places. The first is an old external drive attached to the HTPC. The second is a workstation under my desk at UCSD. Since I had an extra drive lying around, my desktop’s OS X partition also runs a Time Machine backup on a second internal drive.

That about covers it. Next week, something not related to backups!