BHL
Archive
This is a read-only archive of the BHL Staff Wiki as it appeared on Sept 21, 2018. This archive is searchable using the search box on the left, but the search may be limited in the results it can provide.

NHM Internet Archive Work Documen

Workflow (QA process) in use by IA at NHM London 20090203

*
Each day, the scanning centers review a set of books from the previous business day. The number of books to QA depends on the total number of books in the set.
books in set 9-15 16-25 26-50 51-90 91-150 151-280
number to QA 3 5 8 13 20 32

In general, QA should include either 2 books per scribe or 2 books per scanner. When new scanners are hired, QA should look at more books done by them (more experienced scanners can be excluded).
Procedure
1. Open the Meta-manager URL in a Firefox browser window.
UOFT: __http://www.us.archive.org/metamgr?uoft__

RICH: __http://www.us.archive.org/metamgr?rich__

IALA: __http://www.us.archive.org/metamgr?la__

LOND: __http://www.us.archive.org/metamgr?lond__

NYC: __http://www.us.archive.org/metamgr?nyc__

ILL: __http://www.us.archive.org/metamgr?ill__

BOSTON: __http://www.us.archive.org/metamgr?boston__

WASHINGTONDC/SMITHSONIAN: __http://www.us.archive.org/metamgr?washingtonDC__

CAPITOLHILL/LC: __http://www.us.archive.org/metamgr?capitolhill__

MARYLAND/JHU: __http://www.us.archive.org/metamgr?maryland__

CHAPELHILL/UNC: __http://www.us.archive.org/metamgr?chapelhill__

INDIANA/FORTWAYNE: __http://www.us.archive.org/metamgr?indiana__

NJ/PRINCETON: __http://www.us.archive.org/metamgr?nj__

RALEIGH/NCSU: __http://www.us.archive.org/metamgr?raleigh__

2. We inspect all the fields that our financial partners see in their sponsor-view of Meta-manager, plus a few others required for IA, so the following fields should be selected. (Using the links above will automatically load these fields into your view of Meta-manager.)
3. Filter the result set for the date you want to check:
In the "scandate" filter box, type the date in string format. For example, to filter for October 13, 2006, type 20061013*.
Click the "filter" button.


4. You should now see a result set of ~75 - 250 books, depending on how productive the scanners were that day. ;)
Click the "show all" link to view the results in one table.
5. Scan this table for obvious errors, anomalies, and gaps in the metadata.
Correct what you can in Item Manager using the Modify_XML tool, and mark that book with the appropriate error code.
6. Check the posscopystatus fields. Make sure that every book is marked NOT_IN_COPYRIGHT (unless there is some unusual circumstance where UNCLEAR or IN_COPYRIGHT books have been allowed). If copyright is UNCLEAR or does not appear please report this to BooksQA?
7. Sort on the scanner or operator columns (by clicking on the column
title) to choose a sample set. e.g., choose two books from each scribe, or two books from each scanner, etc.
8. Click on a bookid URL to launch the book's details page.
9. Verify that all the access formats (djvu, pdf, flip book) are available.
10. Note the posscopystatus and date values that are displayed on the details page, as well as title and author.
11. Open the flipbook and verify that the copyright information matches the report on the details page, and the bibliographic data matches the book.
12. Check every page of the flip book, looking for cropped text, washed out text, blurred text, missing pages, double-scanned spreads, and any other problems that affect readability. (Make a note of all errors found so they can be coded into Meta-manager.)
13. Return to the books details page. Click on BookView and open the WebBook.
14. Verify that the following pages have been asserted. These will be indicated by words in black type next to the blue hyperlink page number.
e.g., 0007 Title page.
15. Use error code numbers for books (see below) to record errors in Meta-manager.
16. If you come across a book that would be a good spotlight / display book (prominent author, lots of illustrations, perfectly scanned) mark it with code 198.
17. Create a QA report. From the Meta-manager page:
- remember that you can use * as a wildcard

Codes
see CurateCodeHistory.
DARK CODES.
-- Formats --
Books with errors 101-110 can be darked in the scanning centers.
101 Test book
102 DjVu is missing or corrupt
103 PDF is missing or corrupt
104 Flip book is missing or corrupt
105 Text file is missing or corrupt (gutenberg)
106 Orphan bookstub that was never scanned
107 Yearbook scanned & downloaded by DDO
108
109 Item's condition makes it unscannable
FREEZE CODES.
-- Uploading or piping problems --
110 Truncated file(s)
111 Book uploaded from scribe before completed. Incomplete Scan
112 Missing files(s)
113 Cr2.tar file is malformed
114 Cameras assigned incorrectly
115 - 119 not used
-- Metadata --
120
121
123 Possibly not in public domain. Can be darked by QA, loaders, or coordinators.
124 Removed by request of copyright holder or library. Can be darked by QA, loaders, or coordinators.
125 - 129 not used
-- Images --
130 Cropped text
131 Blurred page(s)
132 Missing page(s)
133 Front/Back cover missing
134 White streak in scan that obscures text
135 Book was scanned twice; this copy to be darkened
136 Text is washed out or overly dark --This should be used when the lighting is so bad that it affects human readability and/or OCR-ability.Books with this code will be dark'd.
137 Evidence of scanner (fingers/shadows/etc) visible on page
138 Glass not centered in gutter; text is distorted or cropped
139 Foldout scanned as a normal page, i.e., folded up --More specifically this means, "a foldout was scanned folded up, i.e., as a normal page".
140 Book and metadata do not match --Books with this problem should be fixed immediately in the scanning center (i.e., "post-Biblio'd"
("post-Metafetched" for you old-timers). The 140 code should only be used when something prevents this from being done right away, i.e., as a flag to fix the problem later.
141 Call Number is missing or incorrect*
142 Tissue pages marked incorrectly
143 Anomaly in image format is under investigation
144 Left/right pages are reversed
145 - 149 not used
INFORMATIONAL CODES
These elements do not prevent a book from being approved, but are helpful in improving the process.
150 Bibliographic data missing
151 Bookplate or watermark missing or corrupt*
152 Copyright evidence was reported incorrectly*
153 Bibliographic record from library is truncated
154 Possible error in bibliographic record from library
155 Foreign language character encoding is incorrect
156 - 159 not used
160 Light/dark pages (intermittent)
161 Light/dark pages (throughout)
162 Pages skewed
163 Color cards show in access formats
164 White cards show in access formats
165 n/a
166 Image of cradle is visible at front or back
167 Different crop-box sizes in same spread
168 Bad crop at page edges/gutter
169 Duplicate page spreads scanned
170 Page types not marked or marked incorrectly
171 Title page not marked b/c book does not have title page
172 Scan factors not marked or noted
173 - 194 not used
195 This book will be rescanned -- it should not be darked. Use with FREEZE.
197 This book was checked out and gutted
198 This would be a good display book
199 Approved with no problems noted

201 Google quality problems
2009 Un-dark in 2009
2010 Un-dark in 2010
2011 Un-dark in 2011

Publication dates
When entering copyright / publication dates into Biblio, follow these general guidelines.
e.g., "Copyright 1894, 1897" -- use 1897.
phrase: "Entered according to Act of Congress..."
present) from the title page.
Examples of possible scenarios:
1. the word copyright is present, but no date:
wording in posscopystatus)
2. no "copyright" word or symbol and no date:
wording in posscopystatus)
3. no "copyright" word or symbol and publication date of 1901 on title page:
Reprints
My understanding of the rules regarding reprints is this:
If the book is a simple reprint with no edits or additions, it is ok to scan. These are usually books that were deteriorated or only available on microfilm and reproduced on paper for preservation purposes by the library.
If the book was reprinted after 1923 with additional material (e.g., foreword, illustrations), it is not ok to scan. This will usually be noted on the copyright page. e.g.,
'Copyright 1919. Illustrations copyright 1956 by E.H. Shepard'
'Reprinted 1956 with additional material by A.J. Fowler'