BHL
Archive
This is a read-only archive of the BHL Staff Wiki as it appeared on Sept 21, 2018. This archive is searchable using the search box on the left, but the search may be limited in the results it can provide.

Digitization Guidelines

Back to Help documentation
printer friendly

Table of Contents

Scanning Best Practices
What to Scan
Foldouts, tissue, and inserts
Cropping etc.
Other file processing
Formats & Technical Specs
Filenaming Conventions
Internet Archive's Digitization Standards
Hopefully the information on this page will answer your questions about digitization best practices and digital imaging standards recommended by BHL partners, if it doesn't, please contact Keri Thompson - thompsonkeri thompsonkeri Dec 4, 2014

Scanning Best Practices

If you are scanning material to upload to Internet Archive via Macaw, or want to know if previously scanned material will be compatible with other digitized items at BHL, the following is for you.
BHL strives to provide a “faithful rendering of the underlying source document” including completeness, image quality (tonality and color), and with the ability to reproduce pages in their correct (original) order such that a legible printed facsimile could be produced in the same size as the original. [FADGI Still Image Guidelines p.51]

--> If possible, follow the FADGI standards for digitization. If this isn't possible, below are the minimum standards you should follow to be compatible with other material in BHL.

What to Scan

Scan every page, including covers and blank pages. If there are more than 10 blank pages in a row (e.g. "filler") you may stop scanning after the 10th page and resume scanning at the next page with content, or the back end papers, whichever comes first.
The first image scanned can be of the background with colorbar or other calibration instrument - this is useful to keep the page "hand" (recto/verso) correct. Alternately, a colorbar can be placed as the last image in the sequence.
Create one image per page, unless the content on the page spans the gutter, as in a notebook or scrapbook, a two-page spread illustration or in a foldout. Field Notes conventionally are scanned as two-page spreads regardless of whether the text spans the gutter or not.

Foldouts, tissue, and inserts

Foldouts and two page spreads should be handled in the following way to preserve the page order (right, left, right,left) : spreads should have a blank page inserted either before or after, unless there is another two page spread immediately following that will enable preservation of the right/left order. For foldouts, which are typically found on the recto (right side) of the page, convention is to show the folded-up foldout, then a blank “filler” page, then the unfolded foldout, then the verso of the folded (or unfolded - up to you) foldout. The goal is to show all the information contained on the pages and preserve the page order.
Tissue should not be scanned unless it contains information, e.g., overlayed text. In this case, the page should be scanned with the tissue over the underlying image, then scanned again with the tissue rolled back, then (to preserve page order) a blank "filler" page should be scanned.
Inserts - tipped in (attached) inserts should be scanned as if they were a standard page. For inserts that are not tipped in, it is at the discretion of the scanning institution whether or not to scan them. Obviously, inserts relevant to the text should be scanned (bookmarks...maybe not.)

Cropping etc.

The images can either be cropped just inside the edges of the page (as Internet Archive does) or just outside the edges of the page to show the entire page has been digitized. Do *not* include excess background, colorbars or other calibration devices as part of the page image.
All images should be de-skewed (rotated) to align the text on the page perpendicular to the length of the page, such that OCR can be done efficiently. (this is optional for manuscript material, which should only be de-skewed to maximize legibility.)
For incunabula and other unique texts where the text-block and page shape *really* do not line up, exercise your judgement as to how much or if the image should be de-skewed.

Other file processing

This is at the discretion of the scanning institution. In general, BHL is interested in a faithful rendering of the original, and legibility such that the text can be OCR'd with as much accuracy as possible. There should be no need to do color-correction in your software or other touch-ups if your cameras are regularly calibrated.

Formats & Technical Specs

Filenaming Conventions

All files should be named using a unique identifier and a "counter" number to keep the images in the same order as they were in the original. Most BHL libraries use the unique identifier of the book (e.g., barcode or catalog record number) followed by an underscore and a four-digit counter then the file extension. If your material does not have unique identifiers, use a portion of the title or author with the year of publication to create a unique identifier (example below uses Internet Archive - generated filenames.)
ResearchOnMollu1972Fi_0001.tif
ResearchOnMollu1972Fi_0002.tif
ResearchOnMollu1972Fi_0003.tif


Internet Archive's Digitization Standards

Coming soon. or you can look through other pages