QA Summit Meeting Notes
Back
Attendees: Bianca Lipscomb, Grace Duke, Matthew Bolin, Erin Thomas, Keri Thompson, John Mignault, Joe deVeer, Don Wheeler, Matt Person, Diane Rielinger, Kevin Nolan
Questions for Robert: QA rejection issues in latest revision? metadata issues, approve portion of cart? (Robert Miller was not available for the call @ 2:30).
Scanning Update
NYBG: IA has asked us not to send any books; sending fold-outs to suck up lots of money; foldouts deemed too big, but whole book is NOT rejected (but other places it is!); option to photograph foldouts in-house being explored soon (to send out url)
MB: AMNH finished with funds, not sending anything; final bills maxed out b/c of foldouts
MCZ: continuing to send about 2 carts/3weeks
MBLWHOI: sending about 6-7 carts/6weeks, 1 more load at the end of July, then finished, some QA; fold-outs are marked by flags on scribe, then 1 person on day/night shifts are dedicated to doing only fold-outs
SIL: fold-outs are only being done in-house, b/c LC wasn’t happy with Fedscan foldouts
Discussion Points
- NEED TO REACH CONSENSUS ON REJECT CRITERIA
- Stitching in foldouts
- Foldout size ==> foldout rejects
- Volume size
- Quarantine system: sending images via Flickr for consult
- How much weight being put on books, how careful are people @ IA handling the books? Some books coming back damaged (more so than if they were just being read)
- Need to give IA consistent feedback from each of us BHLers b/c only doing QA at SIL
- What determines scope of QA? Did they miss pages? Is someone’s hand on page? Scan unreadable? Expanding scope to include book damage?
- Other BHLers now doing catch-up on QA ==> failed carts sent back? ==> serious consequences for the whole project. For example, SIL has met with Fedscan and has delivered suggestions that Robert Miller has not listened to. Other BHLers to scanning center locations is a bigger issue than SIL : Fedscan b/c locations are not as close to home institutions.
- SIL is having most problems with missing pages; mythical page counting system is not working as advertised – paginated pages are clearly being missed, not just non-numeric pages; how do you miss one page if there are 2 cameras taking images of 2-page spread – could be in the process of deleting tissue pages; “assert” to mark sections where pagination starts and stops…add to glossary
- Bound-withs being handled differently at different places, assertion or different sections of items
- Costs based on images, pagination shouldn’t interfere with costs; JM’s guess is that it’s an issue in assigning metadata to file names re: things that do not have continuous pagination ==> we have found errors where there is one page is missing, we feel it’s an assertion related or tissue page related issue…can you explain?
- MBLWHOI: numbering system alpha-based to indicate bound-withs
- QA stats question to SIL: How many shipments checked? How many sent back, how many ok? Sent back = 5 full carts (250 books / cart) out of 1/wk since August of last year; some evidence of clumping -- bad times esp. last October, November, February
- IA Contract says 2% error rate
- SIL gets very few rejects b/c of due diligence pre-sending to IA (just as NYBG had been doing pre-fund run out), rejects based on size discrepancy – sometimes Fedscan rejects but in-house Scribe can do it
- Does the individual scanner have the discretion to make decisions about rejections? Sometimes things are rejected for tight-binding (gutter=binder’s crack; ¼ inch is IA criteria) when they really shouldn’t be or haven’t been in the past?
- SIL methodology: Checking only one format, 1) checking flip-book, if problem 2) then checking PDF, if problem 3) then looking at OCR
- Process for marking problems with QA?; time at which they can’t insert – shouldn’t be, but anecdotal evidence from NYBG that there is.
- What about documentation for QA problems? Emails @ SIL being kept to document communications w/ Fedscan ==> broader documentation issues need to be addressed
- What if we severed our relationship w/ IA? What about doing our own insertions to correct missing pages, etc.? ==> Ingest issue; making portal editable re: images; need more technical detail about stitching process – can we do image portal edits? If Brewster falls off the boat are we entitled to make changes on the IA side?
QA Round Robin
- NYBG: ad-hoc QA on patron complaint basis ==> can we have a feedback system where patrons flag pages that need QA
- Why is Enum&Chron out of sequence exactly? Where in the process is there a glitch? AMNH notices it through Wonderfetch process, result of catalog items out of sequence
- Sometimes magnified pages are not the same as the derivs
- NYBG: was looking for an intern for QA but nobody worked out
- MCZ: using portal to do QA, not looking at OCR
- MBLWHOI: now in 2nd round of QA, last one in March of 2008, this time = 100% QA for 76 books, all formats! What do you reject? What don’t you reject? Why are we doing this? Archival purposes? OCR? Visually read on screen? PDFs un-readable? Found 9/76 that should have been sent back
Overall Issues
- Purposes of Portal = Access and Data Mining, not preservation
- What does access mean? E-copy or hard copy?
- Data Mining, how far do we make assumptions on technology, or how much can we decide that OCR is so bad that the book just needs to be rescanned? – sometimes depends on scanner (the person on the Scribe)
- Black & white issues = missing pages
- Grey issues = corners aren’t OCRing b/c pages aren’t flat, camera lens is off; if part of page is not getting OCRed = scanning error = candidate for being sent back
- For non-text images the JP2s are the thing to look at
- How are the flip-book images generated? As derivatives of JP2s
Call with Robert Miller:
Calibration:
- if a camera gets swapped, if we observe a camera drifting = light, dark drifting on verso and title page, or if notice problem from QA
- Suggestion from LOC that cameras be calibrated 1/day for each machine
Discretion of scanner:
- Single-scribe center -- Stefaan has full discretion for example, if gutters too tight or condition of the book should be no brainer
- Multi-scribe scanning center = scanner ==> site coordinator as final veto; the loader could also reject; collaborative process among all employees @ scanning center
Digital Document Process: if cart fails QA at scanning center is it possible that a portion is approved? OR What is the process when a cart fails QA @ a scanning center?
- If problems can be identified to a single source…
- If not obvious…if coming from multiple areas…would there be a different process…
- Was it random or systemic via scanners or equipment or software?
- Site coordinator is supposed to do evaluation to determine level of problem ==>
Page assertion / capture [um, someone is going to need to explain this to me]:
- Scanner will insert 1st page number they see…
- Person at scribe, performing work on screen, why wouldn’t they see that the page didn’t correspond? If single plate ==> shame on operator, if multiple pages…QA?
- Image counter : operator, algorithm software…
- Paginator tool thrown out if books are a beast; no page numbers at all
- Page count is not tied to billing
Stitching process:
- We can all send scans to be stitched, upload to website, send the id to the site coordinator along with the book, 100-500MB images can be uploaded
- Statute of limitations on stitching? 30 days…”but we want to get it right”; “I want to be a good citizen”; “we end up doing a fair amount of pro bono”
Wrap-up and Action Items
- QA demonstration
- Problems we find
- QA within workflow
- Assess Reject criteria
- Develop outline for consortium-wide BHL QA policy
- Decisions re: criteria
- When/How to mark certain sections of an item for clarity at scanning center (everybody’s doing it! MCZ puts note in packing list)
- Addressing special treatment issues -- MP: covers example, sometimes plates; scanners say: “thanks for the flags”
- And everything else we’ve talked about earlier today – Bianca to organize notes and itemize decisions that need to be made
- NISO standards?
DR: does anyone have problems with books coming back, missing or from other places?
back to top
Day 2
Shipments
Returned differently for different BHLers
- MBLWHOI 1 out is 1 back in, books in order of how they were sent, don’t send another shipment until previous is returned, send 6-8 carts at a time, no identifier tag
- MCZ same as MBLWHOI but not as many carts I don’t think, identifier tag?
- NYBG does not get books back in order, don’t send another shipment until previous is returned, required that cart is in approx. call no. order, identifier flag
- AMNH does not get books back in order, don’t send another shipment until previous is returned, required that cart is in approx. call no. order, identifier flag
- SIL shipments send on weekly basis, do NOT get back what is sent and it is returned in no particular order, things flagged to death, identifier flags + scanned flag
Boston
|
NJ
|
Fedscan
|
Manifests for returns come back the day the carts come back
|
Manifests for returns are not coming back consistently
|
Manifests for returns come back the day the carts come back
|
Few flags: rejects
|
Some flags: identifiers, rejects
|
Lots of flags: identifiers, rejects, scanned, else?
|
Wonderfetch links associated with item numbers
|
Wonderfetch…?
|
Wonderfetch…?
|
Selected QA notes:
- Call when books come back damaged, NJ sounds like they need some serious damage control
- 13.6% view ok for overall, if doubts then zoom
- Zoom on light pages to check and check OCR
- If potential image issues are reflective of the actual text, then ok
- Finish page-by-page QA first, then check OCR and other issues
- Need to check all pages, regardless of fails encountered b/c of opportunity for page insertion
- Use full text to check OCR
Items to Table:
- Making this clear to users, esp. regarding issue of color plates!)
- Bin see also cart
- What’s up with invoicing? (TG) timing MCZ gets it monthly
- Define alignment vs. order
- Rejects consensus
Action items:
- Keri to provide QA sampling chart
- Matthew to add explanation of sampling chart
- Everyone to edit wiki page
- Martin to submit policy to IC
- Bianca to post meeting notes
- Keri to discuss HR sharing w/ MK
back to top