Date: Fri, 1 Sep 2000 16:59:53 -0400 (EDT)
From: Matt Simerson <matt@cadillac.mi.us>
To: BSDI users mailing list <bsdi-users@bsdi.org>
Subject: Hard drive performance test results
Background: I'm being tasked with building a _really big_ mail server, something that will handle an incredible amount of mail and that will easily serve 50,000 users and scale up to around a million users. The stuff geeks dreams are made of. :-) I've been looking at high speed storage options including fiber channel SCSI implementations, our good friends at NetApp (which is still a strong contender), and a build it myself distributed storage environment. Each solution has it's limits and presents its own set of problems and I'm bashing on them to see how far and hard I can push each one.
Once I find the limits of each subsystem, I'm going to plan to have them operating with the normal peaks pushing around 50% of their hard limits, giving me some safety margin for when things go bump in the night. Setting the system load at an arbitrarily low number like 50% will also give me some cushion for capacity planning and will save me no end of heartaches when it does get time to start scaling this beast.
So, the first thing I did was grab a 5u box (Micron 3400), dual PIII 650's, 1GB of RAM, integrated Adaptec u2w SCSI controller, Mylex AccelaRAID 250 raid controller, and IBM Ultrastar 18.2 GB drives. I installed a RAID 5 array and threw FreeBSD 4.1-stable on it. <hang in there, it gets very BSDI relevant later> The box performed well but it just didn't seem to be as zippy as I expected. I've always used DPT RAID cards and I was surprised that the machine didn't feel better.
I can get different controllers but I need to have some justfication for it so I installed Bonnie and did some raw performance numbers that proved, beyond any measure of doubt that the configuration we had wasn't performing as well was I was needing. I tried a few configurations, changing stripe size (the maximum size of 64 delivered the best performance), using RAID 1, and RAID 0 to see if it was the drives, the card, or what. It turns out that the card is just slow. So, what to do? Find a solution that would meet my needs. I don't have any (at my immediate disposal) DPT RAID cards that FreeBSD supports but I did find a Mylex ExtremeRAID 1100 and threw that in.
Having a better card made a substantial difference but it still wasn't all I've been hoping for. I don't think it's too much to ask to find a SCSI RAID controller that can do raw writes to a 5 drive array at speeds approaching 20MB/sec and random read/writes (normal use) in the 10MB/sec neighborhood. Of course, we're talking non-cached raw performance here, so when the system gets pressed into service we'll gain a nice boost from drive caching. I'm going through great pains to assure that any system caching gets negated in my tests.
Anyway, I've been looking for an excuse to use BSDI and the lower performance of this RAID card was the perfect excuse to use BSDI and a DPT RAID controller. It just so happened that we have a half dozen DPT 1564 Ultra 160 controllers sitting around. BSDI doesn't claim to support them (per anything documented) but I shoved one in a box, popped the 4.1 cd in and it recognized it appropriately and installed BSDI. Unlike FreeBSD, I had to tell BSDI about my second CPU and then I ran some tests. I couldn't believe how poor the numbers were so I resorted to RTFM and STFW to find some knobs to twiddle.
Per the 4.1 manual, if you put a RAID controller in, make sure to manually (unlike FreeBSD, it won't figure it out by itself) enable tagged queueing and change the concurrency to a reasonable value. Of course, you'll also learn that you also need to change the rotational delay to achieve any benefit from the other knobs. Wahoo, I've treaked all the knobs the manual suggests and run tests again. Instead of running like a raped ape it's still crawling along like a slug on barbituates. Weird! Hey, what the heck, it's a RAID card, lots of cache, battery backups, let's enable soft updates. Cool, enabled soft updates and still no major change. Tried RAID 1 as well with little other than expected differences. OK, let's resort to just putting a non-RAID drive hanging off the on board Adaptec controller and test. Much better but still nothing like the numbers FreeBSD was giving me on the same hardware.
Since BSDI recognized every SCSI drive (DPT or otherwise) through the sd framework I read the "man sd" page and assumed that the overhead exists in that driver. I didn't have an ata drive to test then but somehow I suspect that the numbers weren't going to change much. Anyone got some ideas, suggestions? I've got several other BSDI systems around and all the performance tests on them I ran were quite similar so I know it's not a hardware specific issue. BSDI is incurring a lot of load on the SCSI bus for reason. It might be possible to tweak it to perform well but it's not documented and therefore I'm certain 95% of BSDI users use the SCSI substem as shipped.
Now, for the shocker. I've always scoffed at IDE/UDMA/ATA drives. I didn't like Apples decision for putting one in my G3 machine which I rectified by dropping in a SCSI card (which I needed anyway for my scanner, zip, and CD-R. I read a couple good reviews on the IBM GXP75 ATA/100 drive so I decided to give it a shot. Wow, that sucker is fast! It might even been out the Cheetah when I get a ATA/100 card to test with. The numbers on the chart (as indicated) are on a ATA/33 bus. It's also the quietest drive I've ever used and generates almost no heat. It's barely get's warm to the touch. I ordered two of them, one for my G3 and one for my FreeBSD machine at home.
Anyway, after testing the heck out of a plethora of really fast drives (cheetah 10k's, barracudas, IBM's) I have this beautiful matrix of drives and their performance. You can view it at:
I've color coded the table rows to help identify configurations that are on systems that should deliver comparable performance. I've tried very hard to make my tests as accurate as possible.
Things I've learned:
Cheetah's are wicked fast.
So is IBM's GXP75 drive. IDE can do that!? (ATA/100)
Low end Mylex cards suck better than a Kirby.
BSDI's disk I/O is not impressive. It's not even good.
UDMA drives are impressive in terms of performance.
Unlike their older siblings, the new Cheetah 10k's are quiet and cool.
Promise RAID ATA/66 RAID controller = $100
Promise RAID controller doesn't do RAID 5.
Adaptech ATA RAID controller = $300 <gotta get one :-)>
Mylex RAID BIOS and Adaptec BIOS don't play nice together.
Soft updates make little difference on fast drives.
FreeBSD has excellent disk drivers. What it supports, it supports well.
I'm still in the process of testing and will be throwing up some more tests soon. Since I'm running qmail on a dual CPU system I'll have cycles to burn so I'm going to try vinum's software RAID performance on FreeBSD just to see what kind of numbers I'll get. I wouldn't be surprised to find that it's faster than the elcheapo Mylex (dac 960) card.
I hope that a few people find this information useful. I also hope that I'll learn how to make BSDI's disk I/O perform reasonably. Once I finish this round of testing I'll put some really fast drives in a couple boxes, run a strand of 100Base between them and start hammering on BSDI and FreeBSD's NFS performance. There's LOTS of fun knobs to twiddle there.
Matt
````````````````````````````````````````````````````````````````````
Matt Simerson http://matt.cadillac.mi.us/
Unix Systems Engineer http://www.hostpro.com/