ZFS is production ready

Background

In July of 2008, I was tasked with building a system to back up thousands of Linux based servers. Previous systems using Amanda and Bacula had failed, principally because they required a full time backup administrator to maintain. My job was to build a backup system that required very little maintenance, scaled well, and made restoring data straight forward and easy.

I initially deployed BackupPC which features data deduplication and would likely have reduced our storage needs by more than 60%. I deployed on two SuperMicro systems, each equipped with dual quad-core CPUs, 16GB RAM, and 24 one terabyte disks. I built out one system with OpenSolaris and the other with FreeBSD. After testing, we deployed both with FreeBSD.

BackupPC ended up being inadequate so I wrote my own backup system on top of rsnapshot. My backup system generates rsnapshot config files and then drives multiple concurrent rsnapshot processes on each of the backup servers, pumping data to the backup disks as fast as they’ll take it. I hacked up rsnapshot for better error handling and reporting. I log exactly how much data each remote system has, as well as how much is transferred during each backup.

About ZFS

The main reason we deployed on ZFS was file system compression. After testing several settings, I settled on compression=gzip. I noticed no difference in system performance between compression settings. The backup system has been in production since, with very little attention since deployment.

When initially deployed, each backup server required manual tweaks so that they would only crash once a day. The multiple concurrent rsync processes created a workload that stressed the ZFS memory pools. Working with the lead FreeBSD ZFS developer helped the situation and my systems only crashed once a week. When ZFS v13 was merged into FreeBSD 8-current, memory management improved and my systems only crashed once a month.

Even during the months of using ZFS with frequent crashes, I never lost any data. And there’s no need to fsck the disks after crashes. My confidence in ZFS grew enough that when I upsized the disks in my home file server, I switched from gmirror (tried and true) to ZFS mirrors. I back up my public server to my home file server and saw the same occasional rsync induced crashes. About the time FreeBSD released 8.0 beta releases, I updated and the crashes ceased. So I updated these backup servers and they too have been stable ever since.

I have added another server to the pool and currently store 58 terabytes of data and over a billion files. My compression ratio averages 2.25, more than doubling the effective capacity of the disks we purchased. After FreeBSD 8 was released, I upgraded all the backup servers and could scarcely be more pleased.

And then I learned that deduplication is coming to ZFS. I can’t wait to test it.

One Comment

Leave a Reply

Your email address will not be published. Required fields are marked *