less paper, no regrets, part 2

As with all systems, the results reflect the [lack of] planning that went into them. Since I had goals greater than just getting rid of all the paper, the next step was defining what exactly I wanted. In a nutshell, I wanted all my paper documents more accessible (ie, find them faster on the computer than I could in the file cabinet), easily backed up, and securely stored. I also want a ‘system’ in place that makes it easy to prevent the accumulation of paper in the future. The following feature list embodies my goals:

1. Tagging:  Tag documents with metadata about them. Examples:
2009, Receipt, Gas, Shell
2009, Receipt, Climbing, Trekking Poles, REI
Statements, Investing, Vanguard, 401k

2. Custom Fields: places to store specific types of data. For example, dates, prices, expense category, pay method, etc.

3. OCR: The files in the cabinet are orderly and it takes mere seconds to put my hand within an inch of the right document. But it might take 10 minutes to search through that file folder to find the document I’m after. Once scanned, each document is a PDF among hundreds of thousands of PDFs. OCR is the key to being able to find documents faster on the computer than in the file cabinet.

4. Spotlight searchable. Spotlight is the search technology built into my mac. It can index and search most document formats, including PDF. In order to be useful, the OCR results must be searchable via Spotlight.

5. Aggregation of numeric data: Perform summary math on contents of custom fields. Ie, when I select a group of receipts, automatically sum them all.

6. Backups: Make it easy to use standard backup tools to keep the documents safe.

7. Security: It must be easy to keep all the data reasonably secure. Fortunately, this can be easily accomplished on the Mac by creating a sparseimage and storing the document library on it.

8. Open & Future-Ready: The file format of all the documents must be an industry standard with multiple vendors supporting it. PDF is one such standard. In addition, once the documents are “archived,” I want the ability to manipulate them with external apps. For example, I may want to re-run the OCR against all my documents in a couple years when the technology has further improved.

Part 3 will explore the workflow used to achieve my goals.

One thought on “less paper, no regrets, part 2”

  1. #5 sounds pretty bold… I know the high end custom systems banks and data services use can do things like that, but they are probably built around templates that know what columns to match against. Miss having you down here Matt!

Comments are closed.