QA Policy
back
Scope/Purpose of QA:
The purpose of performing Quality Assurance testing on scans is to ensure a consistent level of scan and metadata quality across the various scanning centers and BHL partner libraries and to minimize the loss of intellectual content in the works scanned. As such, it is imperative that
all BHL partner libraries perform baseline Quality Assurance on their scans, regardless of scanning vendor or source, using the procedures outlined below. The primary consideration when performing QA on scans is to determine if the digital object created will support the access and data mining needs of the BHL portal, EOL, and other human and machine users of the materials. To be clear, the goal of digitizing for BHL is not for digital preservation or to create true facsimilies of works. For this reason QA will not address scan quality issues related to the user's experience of the item, such as color variation on pages, etc.
unless they affect the ability to access or data mine the intellectual content of the work. For our purposes, the determination of what constitutes intellectual content for each item is at the discretion of each institution, based on the guidelines outlined in this policy.
Procedural Recommendations:
QA should always be done with the original object in hand - this is particularly important for items with odd pagination or unpaginated plates. It is most efficient to insert QA into the libraries' workflow immediately after receipt of scanned items. Ideally, procedures will go something like this:
- Upon receipt of cart/shipment from scanning center, confirm items on manifest match items in shipment. (all libraries using IA for scanning should be receiving both an electronic - .xls - and printed shipping invoice/manifest.)
- Identify the number of items that need to be QA'd using the statistical sampling see documentation...
- For each item that is QA'd:
- check metadata by comparing _meta.xml file on archive.org to metadata supplied with original (usually from library's ILS) determine extent of metadata issues if any. Metadata issues are considered minor errors, but will need to be corrected either by IA or by library in the Portal.
- conduct page-by-page assessment of actual item against flip book
- note all errors, determine extent of error, and record both major and minor errors for each item
- potential major errors, such as page blurring, low contrast, etc. will be checked against OCR using the full-text view as the standard, not the PDF, to determine if scanning error has affected OCR. Any error which clearly affects the OCR is a major error. See examples below to help determine if error is major or minor.
- potential major errors in non-text images (i.e., plates) should be verified by comparing the jp2 to item in hand. only obvious color/line variations should count as a major error.
- Return to statistical sampling chart and determine if cart passes or fails based on number of major and/or minor errors see documentation
- Cart Passes = return any items that had major errors to the scanning center with the next shipment, noting errors with slips and explaining where errors were found and how to correct them in an email to the scanning center.
- Cart Fails = (@10% of sample size, i.e. 2% of total cart) = consequences TBD:
- notify scanning center of QA failure
- send contents of carts back...?! (shipment issues)
- invoice issues need to be addressed for rescan of items
Common Errors and examples:
Issue
|
Minor Examples
|
Major Examples: affects intellectual content
|
Notes
|
Missing Page(s)
|
- blank pgs which misalign page location within the item ex. versos changed to rectos
- missing tissue that affects page order but text not affected
|
- tip-ins
- page MIA
- misalignment of centerfolds
- tissue obscures content in scan
|
- adverts and non-meaty content determination will be up to library/subject expert
|
cropped text
|
|
- several letters that make OCR on word or phrase impossible
|
|
lcontrast / white balance issues or blurry scans
|
- does not compromise OCR, but difficult to read on screen
|
- readable, but compromises significant parts of OCR text, or compromises OCR of taxonomic information or key words on page.
- unreadable / illegible to average person
|
- when minor color/contrast problems are detected, alert scanning center and request that the cameras be recalibrated
|
foldouts
|
- orientation does not match item, but does not compromise OCR or view-ability / readability
|
- orientation is way off, upside down / backwards
- if color variation is so bad your sp. is now a sp. nov.
|
|
skew
|
|
|
|
gutter span
|
- gutter and portion of next page visible
|
- content that spans gutter not addressed as a foldout, content is cropped or otherwise unclear
|
- BHLers need to indicate gutter spans as foldouts if necessary
|
Other Issues
- Calibration of Scribes, foldout stations -- what is foldout methodology?
- Resources and timing (SIL @ ~10 hours/100 books)
- QA should be done cart-by-cart rather than shipment by shipment in order to manage sampling.
- Major reject vs. minor reject ==> cart fail = 2 major errors OR 3 minor errors
- Minor metadata errors are portal edits not IA issues; major errors are IA edits
- methodology for fold-outs? calibration, workflow, etc. WE NEED DOCUMENTATION esp b/c @ $2/pg.
- gut vs. darken: Always, always gut
- in the future we expect corrective service from IA when errors are discovered which were not discovered in QA process
Culled notes
- advertisements = missing pages, but if issues w/ adverts then ok to go ahead, IA needs to know to contact us w/ probs not simply to ignore adverts ad hoc ==> minor fail
- No missing intellectual content! cropped text, tip ins (pages/notes added in), margins, skew ==> stitching opportunity; up to our discretion NOT IA's as to major or minor fail
- page order, content out of order ==> major fail vs. mal-alignment, physical sequence : image viewer error ==> minor fail