Choosing a Distributed File System
Posted by matijs 30/06/2011 at 18h15
It’s happening, like it happens to all of us: My hard disk is getting full, and although the free space would have seemed like an ocean just a decade ago, now it’s a worryingly small pool of tiny little gigabytes. I could try freeing up some by tediously going through all the photos I never bothered to cull before, but with Gb-sized videos being added on a regular basis, that isn’t a long term solution. Where long term is anything that will tide me over to my next laptop.
But, what if I could offload some of those files to some other storage medium? I’m not really that fond of external hard disks, but perhaps a file server? Great! You mount some remote directory, and it’s like it’s right there on your machine.
There’s only one problem with that: My computer is a laptop, and as such, it gets carried around. Not a lot, but still. I won’t be able to choose beforehand which files I want to access (again, too tedious). So, what I really want is good offline behaviour.
So, what are the options? After some poking around on Wikipedia, it seems I apparently want a Distributed Fault-Tolerant File System. Look, it says right there:
for […] offline (disconnected) operation.
Yay! What follows is a full evening of reading about Lustre, MooseFS, Tahoe-LAFS, 9P, Ceph, AFS and Coda. The situation is not uplifting:
The only system that actually promises offline operation is Coda. It is derived from AFS, for which this feature was concieved as early as 1997. Unfortunately, not much has happened for Coda since more than a year. This wouldn’t be a problem if it were rock-solid, but there’s a bug open since three years describing how loss of network connectivity during a write can cause both the original file and its replacement to be lost.
The next best thing (feature-wise, at least) is OpenAFS. Another descendant from AFS, it seems more solid. In 2008, offline operation was a Google Summer of Code project. This has been integrated in the main code base, but is disabled by default. It also seems to need explicit commands to go offline and online, which is not ideal.
All the other options really don’t seem to provide offline operation at all. Everyone seems busy developing different flavors of massive petabyte-size storage systems for clusters of machines linked through rock-solid gigabit-per-second or faster networks. Offline operation is clearly not a use case there.
One honorary mention goes to InterMezzo, a descendant of Coda. It seems to have supported offline operation, but managed to become obsolete before its parent, because its developers are now working on Lustre, yet another multi-petabyte high performance cluster file system.
After all that, where do we stand? There is basically no production-ready solution for my needs, so I guess for now I’ll have to resort to getting rid of crappy files, removing installed debug packages or shrinking my hardly-used MacOSX partition.