Skip to content

Building a Scanning Studio from Scratch

I’ve been asked by a small-sized museum / special library to make a suggestion on what kind of equipment a start-up digitization studio would need while operating under a budget of approximately $3,200. This studio would need the cabability of digitizing photos, and possibly some text, post-processing the scans, hosting the images on an existing web server, and then preserving the master scans. It’s an interesting project, and one that can easily be accomplished within a tight budget. By putting the recommendation online, I’m hoping that others will find this and make further recommendations on the following:

Here are the criteria for the digitization studio:

  1. Budget of $3,200
  2. Manageable by a staff of basically one, with minimal to no technical support
  3. Capable of digitizing photos & text according the best practices per the BCR Digital Imaging document
  4. Capable of long-term preservation of master files
  5. Capable of public access to the resulting web-ready scans

Here are some recommendations using the lowest prices (found via Google Product Search) as of this writing:


Computer-Baseline: $600

What’s amazing to me is that a baseline studio-ready PC can be had for as little as $600, maybe less. While it’s easy to spend more, as long as it has a minimum of 2GB memory and 100GB+ hard drive you’re good to go. Upgrade options here include a larger monitor and more memory. Macs will be more expensive, but can be a good solution.

Scanner-Baseline: $735

PlusTek OpticBook 4600: I’ve used the OpticBook 3600 previously and was very happy with it. It offers a scanning area that is good enough for 8 1/2″ x 11″ photos while also being configured to drape books over the side in a way that won’t break the binding and still allows you to get deep into the gutter of the page.

Upgrade option – Epson 10000XL Photo: $2,500. This scanner has excellent dynamic range, which for photos is paramount but comes at a price. Other options include a digital camera, but if you go in this direction you’ll need to worry about lighting and dealing with RAW files. For a start-up studio that may be more trouble than it’s worth.

Post-Processing and Quality Control

Image-processing baseline: $176

Photoshop CS3 is probably your best option. The latest version is CS4, but for a start-up studio the earlier version is going to work fine. This software will be used to run the scanner for capturing master images and to create derivative files that are web-ready. Price: $150

Alternately, the open-source software Gimp is an excellent solution. It will do almost everything that Photoshop can do. The only downside is if your just learning image manipulation, most of the tutorials available are for Photoshop. Price: Free

Tutorials: Ben Willmore’s Photoshop CS3 Studio Techniques is an invaluable resource. It will walk you through everything you need to know to make web-ready derivatives: straightening, cropping, adjusting levels, sharpening, resizing, etc. Price: $26

File Name Editing: Bulk Rename Utility is a good resource for renaming an entire directory-worth of files. This is an invaluable resource and free.

Image Quality Control: I find IrfanView to be an excellent, light-weight, and fast image viewer. This is crucial if you need to browse through a couple thousand images quickly and easily. It also has the capability to do some simple image editing, like cropping. Price: Free

Optical Character Recognition: OCR will give you the capability to convert images of text to actual text. This will allow you to create PDFs as well as text files of the scans. Most scanners come with the Sprint version of Abbyy FineReader, which is fine for starting out. Upgrade to the Pro version is a good idea when the studio expands its capabilities. Price: Free ($179 for an upgrade to Pro)

Public Image Presentation

Hosting Baseline: Free (as in “free puppy”)

Greenstone is open-source hosting software designed with the small institution in mind. As long as you have a web server to place files on, you’re good to go. Alternately, you can go with CONTENTdm, which has either a hosted option or a version that you host yourself. Fees for this can be quite steep, so moving to this later may be an option. One caveat: if technical support is not available to you, getting content you’ve placed on Greenstone into CONTENTdm is not trivial. Plan on rebuilding your collections from scratch with any migration. Price: Free

Long-Term Preservation

Preservation Baseline: $749

This is about ensuring that the master files created in a digitization studio can be kept indefinitely, at least on a bit bit level. Migration, emulation, etc. is another issue entirely. (Most istitutions are taking the attitude that we’ll burn that bridge when we come to it.) For a small institution without a network, what’s needed is redundant storage that’s easy to maintain. Data Robotics has an elegant solution in its Drobo product. It accepts cheap and easily purchased hard drives which it monitors for you to guard against bit rot (which is inevitable, eventually). If a drive goes bad, you simply replace it and Drobo manages the reformatting and transfer for you. With this setup two copies would be kept: one on the scanning computer, and one on the Drobo. If it’s also possible to keep another copy on a network, that would help spread the risk.

Grand Total: $2,260

That’s about $1,000 below budget, which will give you room to maneuver. Upgrading from here I’d recommend getting as large a monitor as possible, upgrading the ram of the computer to 4GB, and possibly finding a scanner somewhere between the OpticBook and the Epson.

Comments and suggestions are welcome.


  1. Update: I asked OCLC if they had any feedback on CONTENTdm licenses for smaller institutions and below is what they offered. Thanks to Michelle for allowing me to post this:

    As we discussed, OCLC now offers the power of CONTENTdm at an affordable price. As requested by smaller Institutions, departments within larger organizations, and groups wanting a scalable path to get started, the Quick Start option would be ideal to get started easily and includes:
    • A hosted environment providing secure and cost-effective systems support
    • Full OCLC support
    • No hosting setup fee
    • All future releases of the CONTENTdm software
    • Participation in a highly active user community (listserv and user groups meetings)
    • Full access to the online User Support Center (with step-by-step tutorials for collection building)
    • Metadata harvesting to WorldCat

    The price of the CONTENTdm Quick Start subscription is per year. The price includes 3 acquisition stations all with JPEG2000 with an item limit of 3,000 and storage of 10 GBs. If more space is needed in either levels, an upgrade to a Level I license would be the next step where we would then discount.

    To review:
    Option A: Quick Start: New Annual Subscription – 3,000 item limit, 3 Project Client stations with JPEG2000. Upgrade credit of 3,000 given when at the limits of this package are outgrown
    Option B: Level I License Purchase, Hosting Service, and ongoing maintenance estimates – 10,000 item limit with 50 Project Client stations all with JPEG2000.
    For even more information, I encourage you to go to our CONTENTdm Web site. This would be a great option to consider for a department implementation while you work on funding and presenting to your system.
    Please let me know if you have any questions, or how I can be of assistance in your digital management plans.

    Michelle Phipps
    OCLC Inc.
    Inside Digital Services Consultant
    800-848-5878 x4301
    614-718-7145 fax

    Posted on 11-Dec-08 at 2:40 pm | Permalink
  2. So many of the standards, best practices and tech recommendations are geared toward large institutions with big budgets. I have found that small (usually public) libraries often don’t know how to get started on a small scale. This should be a good resource for those types of organizations. Thanks.

    Posted on 27-Jul-09 at 2:40 pm | Permalink

Post a Comment

Your email is never published nor shared. Required fields are marked *