Handy domestic app

A year or two ago, I found GroceryIQ, a nifty shopping list application for the iPhone. It has an enormous built in catalog of grocery items, as well as being able to add custom items and custom stores. So, I can walk into REI, and it’ll show just the items I’m looking for.

The only feature I’ve wished for that it doesn’t have was syncing the list on Jen’s iPhone with the list on mine. A few months back, I poked around and lo and behold, the feature had been added.

Words cannot scarcely describe how wonderful this feature is. Any time Jen adds an item to her list, I get a push messsage that notifies me. And vice versa. When I open the app, the item is in my list.

Because the database is so large, items normally have the brand, size, and quantity as well. So when I buy something, I know I’m getting exactly the right product and quantity.

ZFS is production ready

Background

In July of 2008, I was tasked with building a system to back up thousands of Linux based servers. Previous systems using Amanda and Bacula had failed, principally because they required a full time backup administrator to maintain. My job was to build a backup system that required very little maintenance, scaled well, and made restoring data straight forward and easy.

I initially deployed BackupPC which features data deduplication and would likely have reduced our storage needs by more than 60%. I deployed on two SuperMicro systems, each equipped with dual quad-core CPUs, 16GB RAM, and 24 one terabyte disks. I built out one system with OpenSolaris and the other with FreeBSD. After testing, we deployed both with FreeBSD.

BackupPC ended up being inadequate so I wrote my own backup system on top of rsnapshot. My backup system generates rsnapshot config files and then drives multiple concurrent rsnapshot processes on each of the backup servers, pumping data to the backup disks as fast as they’ll take it. I hacked up rsnapshot for better error handling and reporting. I log exactly how much data each remote system has, as well as how much is transferred during each backup.

About ZFS

The main reason we deployed on ZFS was file system compression. After testing several settings, I settled on compression=gzip. I noticed no difference in system performance between compression settings. The backup system has been in production since, with very little attention since deployment.

When initially deployed, each backup server required manual tweaks so that they would only crash once a day. The multiple concurrent rsync processes created a workload that stressed the ZFS memory pools. Working with the lead FreeBSD ZFS developer helped the situation and my systems only crashed once a week. When ZFS v13 was merged into FreeBSD 8-current, memory management improved and my systems only crashed once a month.

Even during the months of using ZFS with frequent crashes, I never lost any data. And there’s no need to fsck the disks after crashes. My confidence in ZFS grew enough that when I upsized the disks in my home file server, I switched from gmirror (tried and true) to ZFS mirrors. I back up my public server to my home file server and saw the same occasional rsync induced crashes. About the time FreeBSD released 8.0 beta releases, I updated and the crashes ceased. So I updated these backup servers and they too have been stable ever since.

I have added another server to the pool and currently store 58 terabytes of data and over a billion files. My compression ratio averages 2.25, more than doubling the effective capacity of the disks we purchased. After FreeBSD 8 was released, I upgraded all the backup servers and could scarcely be more pleased.

And then I learned that deduplication is coming to ZFS. I can’t wait to test it.

ssh bruteforce attacks become sophisticated

SSH scans and bruteforce attacks that have have been common since my first SSH enabled server was attacked in 1996. Back then, attacks were so rare that monitoring logins and manually adding attackers IPs to /etc/hosts.allow (TCP Wrapper) was sufficient to keep systems secure.

In the mid 2000’s, the rise of botnets resulted in distributed bruteforce attacks, in which dozens of IPs (bots) would attempt to bruteforce my SSH daemons. I wrote a shell script that collected the IPs and added them to the TCP Wrappers deny list. A while later, denyhosts, was released and I started using it instead of my shell script.

Since installing denyhosts, I only monitor logins and scan the nightly security reports. In the past few years, the frequency of attacks has slowly risen but occasionally there are significant changes in attack frequency and duration. The last significant escalation I can recall was in the months leading up to McColo being shut down. Immediately after their shutdown, I noticed a dramatic reduction in bruteforce SSH attacks.

It was during that time of increased activity that I wrote Sentry. Like my original shell script and denyhosts, it adds attacking IPs to the TCP Wrappers deny list. Sentry also adds their IPs to my PF firewall. Sentry worked much the same as denyhosts, except that when someone attacks one of my IPs, they got blacklisted on all of them. The number of attacks that made it into my security logs dropped accordingly.

Months later, after McColo was shut down, the distributed attacks all but ceased. Since then, attacks have remained sporadic, perhaps 10 a week. In the last couple weeks, the number of attacks spiked. I’m seeing dozens of new IPs getting blacklisted each day. Unlike previous attacks, the usernames the attackers are using are not being duplicated, which means the command & control network behind this latest round of attacks is more intelligent than most.

less paper, no regrets, part 3

To get a single document into Paperless while accomplishing the goals listed in part 2 requires several steps:

  1. Scan documents to PDF files
  2. Process PDFs through OCR engine
  3. Import  PDFs into Paperless.app

The ScanSnap comes with ScanSnap Manager. SSM allows you to create scan profiles, which specify scan quality, destination (file, OCR app, Paperless.app, etc), and simplex versus duplex (one or both sides of the paper). There are 4 quality choices: Normal (150dpi, 18ppm), Better (200dpi, 12ppm), Best (300dpi, 6ppm), and Excellent (600dpi, 0.6ppm). Excellent is very slow. I use it only for scanning photos.

I ran a few OCR tests to determine what settings would result in the highest degree of OCR accuracy. Adobe suggests scanning in Black & White at 300 or 600 dpi. The tips I found on Abbyy’s site suggested that the higher quality the scan, the better the results. After experimenting, I reached the following conclusions:

  • Acrobat Pro goes a good job on high quality text documents
  • Acrobat does a poor job on halftone text (i.e., receipts printed on thermal printers, dot-matrix, faded documents)
  • On documents that Acrobat does poorly at, it does no better if the document is scanned in B&W versus color.
  • When scanning in B&W, faded receipts are often illegible. Scan in color instead.
  • Abbyy FineReader does a great job on most any legible scans
  • The difference between 600dpi and 300dpi is not significant

Factor in that disk is cheap, my time is not, and I’ll never be able to scan these documents again, and the settings I use for all documents is: Best, Duplex, and Color. Color is easy to desaturate to greyscale, the ScanSnap will omit the back side of the page if blank, and I can downsample images later if I need to. With Best quality, it takes about the same amount of time to scan a stack of documents as it takes me to sort, organize, and jog a fresh stack for the document feeder.

ScanSnap Manager will feed new scans directly to an application. So, I tested by configuring it to scan the document directly to FineReader for OCR processing. While that may work well for day-to-day SOHO needs, it is too slow when staring down a mountain of papers. For that, raw speed is required and nothing will get you through step 1 faster than saving to files.

There is one more setting to deal with. This dialog box describes the choice:

If I have a stack of 50 sheets in the document feeder, do I want them to end up as 50 PDF files, or one PDF with 50 pages?  If the 50 pages are each individual receipts, then I want them each as individual files. When I have more than one page in a document, such as a phone bill or American Express cardholder statements, a multipage PDF is perfect. The ScanSnap can’t really know my intent so I must tell it. I do so by creating two profiles, one called ‘Standard’ which is a multipage PDF. It takes everything I drop in the hopper and outputs a single PDF. My second profile is named ‘One file per page,’ and it does just that.

Next time, “the shortest path between 8 file drawers of paper and 10,000 PDFs”

less paper, and no regrets, part 1

Last year, our office bought a document scanner. Unlike every other scanner I had purchased or used, this thing was tiny. Its footprint on a desk is smaller than a piece of paper. It is designed specifically for turning pieces of paper into PDF documents. I had to try it. It scanned both sides of my paper in 3 seconds. One pass, both sides!

Such a gadget is very exciting because I have a lot of paper. I’m not a compulsive hoarder, but I do keep financial records longer than the minimum 7 years. Combine that with the documents retention required for our business and before long we had two 4-drawer file cabinets of documents. And a couple desk drawers. And the pile on Jen’s desk. And the pile in my hutch.

The prospect of making all that paper disappear helped me get over the resistance I had to parting with $400. So I purchased the SnapScan S510M (since replaced by the S1500M). While waiting for it to arrive, I started thinking about how I was going to organize the thousands of PDF files that would soon be residing on my hard drive.

I had nightmares of the days before iTunes when I had to painstakingly tag all my music by hand, and then organize the music files into directories so I had a slight chance of finding what I was looking for. I needed an iTunes equivalent for PDF documents. Google led me to ReceiptWallet, which has since become Mariner Paperless. It promised to be iTunes for documents, so I bought it. Instantly.

The SnapScan comes with several software packages: Adobe Acrobat Pro 8, CardIris, Abbyy FineReader OCR, and the SnapScan drivers.  With a large bucket of tools in place, it was time to do some planning.

To be continued…

My favorite iPhone/iTouch Apps

Grocery IQ ($0.99) – shopping lists

– organizes by store & aisle, includes a large DB of items, and you can email the list to your SO if they’re going to be stopping by the store. My only wish for this app is that it would sync shopping lists between phones.

The Weather Channel (free) – An excellent weather app, with hourly, 36 hour, and 10 day forecasts, doppler radar, etc.

AIM ($2.99) – Instant Messages with push

– Instead of IMs being sent via SMS to my phone while I’m ‘mobile’, they are sent via push (which is completely free).  If you want a multi-protocol client, have a look at beejive ($9.95). I prefer being ‘offline’ from my other IM accounts (jabber, MSN, facebook, etc) when I’m away from my computer.

Facebook (free)

– if you use facebook, this app is a must have. Upload photos and status updates from your phone. This app provides a second reason to have your phone in your hands while sitting with Uncle John. The first being, keeping your phone as far from the bowl as possible. 😉

Remote (free) – remote control for iTunes and Apple TV.

PasswordWallet – sync your PW  between your mac and iPhone. I find this app is essential since I use one time passwords everywhere. I wouldn’t be able to use many iPhone apps without this one.

WordPress (free) – write blog posts and upload photos.

Amazon.com (free) – Use it to check prices while at the store. Place orders. Buy stuff. Because it’s an app, it loads pages and performs searches faster than using amazon.com in Safari. It’s fast enough that it’s actually useful while you’re at the store. Or just buying something that you remembered while lying in bed.

E*Trade Mobile Pro (free) – useful app if you have an E*Trade account.

Skype (free) – Place skype calls via WiFi.

Motion-X ($3) – a full featured GPS application.

– Uses the iPhone build-in GPS and compass for navigation. Caches map data, which is extremely useful. I’ve taken tracks while out fishing and also used it while geocaching. Just make sure to have a spare battery pack available.

Lose It (free) –  weight loss app

– set some weight goals (mine is lose 1 lb per week). Each day, enter the food you eat and any exercise you do. Step onto the scales and record your weight. Makes calorie counting fast and fun.

TextFree ($6) – free unlimited SMS messages on iPhone and iPod Touch

Trapster (free) – Get alerts sent to your phone as you approach speed traps, red light cameras, and live police patrols.


iDisk (free) – Access to your .mac iDisk. Another handy way to get files to/from your iPhone.

Air Sharing ($5) – Launch this app and you can mount your iPhone on your Mac or PC as a remote disk (webdav). Drag and drop files to it.

Wikipanion (free) – Wikipedia interface. Faster than using Safari.

HPA, Host Protected Area

I’m a big fan of technology that helps users. HPA could be one of those “helpful” technologies. HPA is a “feature” of some motherboards whereby they steal hard disk space (typically the last few megs of your disk) and use it for backing up the system BIOS, a recovery partition, etc.

I just purchased a GIGABYTE GA-EP45-UD3P motherboard, RAM, and CPU to drop into my file server. Today I assembled the trio, stripped my old mobo out and dropped this new one in.

The machine booted up but there was a little problem. Two of my disks (members of a ZFS mirror) were corrupted!?  That effectively destroyed one of my filesystems, which made me very unhappy.

A few Google searches later and I learned all about HPA. This nasty little surprise was tucked away in Advanced BIOS Features -> Dual BIOS Recover Source = HPA (page 49,51 in the manual). My version of BIOS doesn’t have this option, but I found accounts online of older versions that do. It seems that changing that setting didn’t actually work (ie, disable HPA), so Gigabyte removed the feature. They have left me no way of disabling this destructive feature.

After hours of fiddling, I have worked around it by:  a) moving the disks off the first two SATA connectors b) rebooting onto the HDD GURU Magic Boot ISO, c) removing the HPA partition from both disks, d) rebooting into FreeBSD. Finally, my ZFS mirror was back with one disk, because the motherboard had helpfully restored the HPA on the first disk.

If you’re using this mobo and migrating disks to it, I’d suggest installing a sacrificial disk on the first SATA controller. That will appease the HPA demon and let you successfully migrate your RAID volumes to it. I’m stuck moving the data off the disk I recovered. Then I’ll recreate the array on the disks with the HPA partition and all will be well.

Is it worth upgrading to the iPhone 3GS?

Short answer: yes.

Longer answer: Absolutely!

Hillbilly answer: You betcha!

Beancounter answer: I purchased my first iPhone 4GB (2G) in Oct 2007 for $300, direct from Apple. Today, I sold it on Craigslist for $225. I purchased my 16GB iPhone 3G for $300 in July 2008. That phone is about to get sold on Craigslist as well, for about $375. My cost to own for both iPhones is $0. I expect to sell my 32GB iPhone 3GS next year, for more than I paid. It’s an unbeatable deal.

Geek answer: The combination of a faster processor and more RAM makes a huge difference. I would bet the RAM is contributing more than the faster CPU. A good analogy would be using OS X with 1GB of RAM and then upgrading to 2GB (just enough). With the memory pressure relieved, nearly everything is more responsive.

The previous iPhones lagged when switching apps, [re]loading web pages, and especially when taking and saving photos. All those little pauses are gone. Switching back and forth between apps is nearly instantaneous. That alone is worth upgrading for. Seriously.

But I upgraded for the better camera. The previous iPhone camera was quite poor. The 3GS camera is not yet good, but certainly better.