BHL
Archive
This is a read-only archive of the BHL Staff Wiki as it appeared on Sept 21, 2018. This archive is searchable using the search box on the left, but the search may be limited in the results it can provide.

StaffNotes3Call040208

April 2 at 2:00 pm EASTERN TIME Zone
Conference call attendance: Joe (Harvard); Chris, Doug (MoBot); Suzanne, Keri, Erin (Smithsonian); Diane, Matthew, Jen, John F. (MBL/WHOI); John M., Lisa (NYBot); surprised guest: Tom G.
Recorders: Suzanne and Joe
Abbreviations:
• NYBot = New York Botanical Garden.
• MoBot = Missouri Botanical Garden
• MBL /WHOI = Marine Biological Laboratory and Woods Hole Oceanographic Institution
• IA = Internet Archive
• BHL = Biodiversity Heritage Library (duh)
• BPL = Boston Public Library (host to a IA scanning center)
• NYPL = New York Public Library (host to a IA scanning center)
• BLC = Boston Library Consortium
• FedScan = IA scanning center located at Library of Congress

I. Next call to include Bernard. Agenda is building even as we speak here: AgendaStaffCall4
Friday, April 11 at 10 a.m. Eastern

II. Do we need to think about a face to face meeting?
Agreed that we should think about what we would need to meet face to face about before jumping into too much scheduling and planning. We will all keep a list of things going. Ideas included:

• Agenda that is practical
• Wonderfetch demo or Webinar
• Commiserate (with refreshments)
• Collection development/selection from practical point of view
o Potential for better choosing availability and logical groupings
o Potential for understanding impact on individual library’s workflow, current processes
o Discussion of future work after materials are scanned
• Cataloging technicalities
• Potential meeting in conjunction with other reasons to gather
o BHL Institutional Council / All hand on deck meeting discussion with track events

III Reports from meetings people attended
• Chris reported that the architecture meeting was good. Point of the meeting was affirmation that portal was good version one and then plan Fedora for version two. Notes on Wiki and Action Items
• The new garage band is now called The Chris Freeland Project and a mirror site for the BHL will be at ChrisFreeland.org
• Matthew reported that Chris and Martin gave a good talk to BLC about how a project like BHL gets put together. Everyone enjoyed the talk while sauna-ing in the 95 degrees. There is a Slideshare site for the presentation and MIT blog was reported. Diane gave a cheer for Chris’ clear answer to the question of “what would you do over or what did you not account for in initial planning of this project” with The Amount Of Staff Time was much more than expected.

IV. Status of BHL projects
NYBot John M. reported:
• Pressure from IA and NYPL had been to get more books there. While John was in Boston, NYBot got 220 books shipped. Stacy, IA manager at NYPL then asked about next shipment because she was concerned about meeting the turn around time. Rush rush…wait wait. Tuesday should be next shipment.
• No foldouts sent yet.
• Strategy working from shelf to bib list and getting bigger pulls. On target 150 to 200 each week.
• Only sending serials. Don Wheeler is working on serial bidding. They are still in the “A”s. NYBot bids before scanning and has more bids than processed.
o Action Item: Don will be asked to join next call.
• Matthew suggests looking at Botanicus if MoBot has scanned. But the level of work pre pulling/scanning makes another search tough.
• Botany Libraries need to talk about de duping. Logistics still need to be worked out on how and when and who. John suggested offline calls among botany libraries.

MoBot Doug reported:
• Local scanning starting up again with new hire.
• Michelle is working on serial bid list with about 40 % done.
• [Suzanne: Unclear on the monographic deduping system from MoBot]
• Chris, Doug and John F. will need to touch base to get the workflow down of what they have scanned and what they intend to scan in a either a large batch processes or spaced with time.

Harvard Joe reported:
• Moving along with almost 2 months of shipping every 2 weeks. About 150 items on about 2 carts each shipment.
• Last shipment comes back this Friday was lots of foldouts. And they got it done in two weeks. Before had only 5 titles with foldouts and it was slow. But seems to have gotten to a nice flow.
• Will check links to the images to look at quality.
• Boston seems to be operational with Foldouts.
• Diane reported that while visiting BPL she took a look at foldouts and quality. Resolution issues on IA regarding pixels per inch. Foldouts are flagged with yellow slips bookmark. With just one foldout station and a full time foldout scanner. Seemed to work. Significant time to process one foldout included smoothing the paper [?]. It is not fast. But they are working it in.
• Diane and Joe talked together about Wonderfetch™ and worked together on the template and plugging in the packing list and putting together the url. Next shipment with Wonderfetch™ and regular packing list together. There was a little bit of a problem moving from Excel to Open Office spreadsheets. Syntax stuff needs a little bit more review.

MBL/WHOI Diane reported:
• Resuming shipping on Wednesday with 2 ½ cart that have to be rescanned and resuming usual mono and serials. The plan is to Wonderfetch™ the whole load. They have done a test load with mixed results.
• Shipping will be 6 carts every 3 weeks to space now that the scanning center is doing more of the BLC members. Also will include foldouts.
• Rescans are a variety problems. PDF conversion problems and some missed or files lost or glass reflections or fingers on pages. Still concern that haven’t gotten all the problems.
• At the scribe they can still rescan pages. But once files are sent to California you can’t substitute pages.
• If the metadata is reversed (assigned to the wrong scan) the whole book needs to be rescanned. Suzanne frowned.
• Individual pages skipped haven’t found many but PDF conversion has been the issue. The page counting is done at the scanning is good.
• Perplexed as to how files are just missing.

Smithsonian Keri reported:
• We have components for foldout. We hope to get it going soon. Our scanner is finished with his rescans of materials all caught by IA in their new review. SMI has still not done a QA process.
• FedScan center about to test shipping about 20 books. With temporary shipping solution. Homeland Security is scared of our books.
V. WonderFetch™ and other OCA scanning centers workflow
IA loves it. Diane reports they are thrilled! Question is if other OCA users are in the loop especially our partners in scanning for BHL – Betsy K. at Univ. of Ill.?
Other partnering issues of concern: MoBot will be using U of Ill to scan overlapping holdings for BHL. John F. has done some monographic deduping of U of Ill. but, there is no system in place. They are not doing serial bids. No one knows what is going on with Field and the monographic deduping and serial bidding

• Action item: Keri is going to touch base with Betsy re Wonderfetch™
• Action item: Doug will contact Betsy to talk about scanning MoBot stuff in June
• Action item: Suzanne will get in touch with Field for updates

VI. Monographic DeDuping
John F. and Doug will talk about workflow to get MoBot information.
Open Office was discussed. Currently all BHL working sites have to work in Excel anyway so no need to convert to Open Office as an acceptable format but will keep it as an option for future development.

VII. Quality Review
• Brief summary of Quality Control discussions between MBLWHOI Library and IA - REVISED 4/2/08 QC brief revised.doc
Quick summation – MBL QC review &IA did a review. Face to face and calls between library staff and IA to work out what was going on. IA is changing workflow for missing books, reverse metadata, and other things that can be checked and fixed at the center before shipping files to California. Item review scanned/rejected/dark/correct metadata and all files while they still have the book.

100% upload checks before sending files to California and books back to library.

PDF generation problems thought to be because of JP2 derivation and conversion to a new versions of Abby. Error messages say something about token type and drawing something or other. Some have a few pages blank but not all pages blank.

IA thinks it is fixed and that the time frame was identified: books scanned during December and January. Results are either rescan or rederive. 79 books from MBL. Harvard was asked for 20 books to rescan. No one knows if London has been contacted, material reviewed from this problem time period. MBL reports that of the 1000 items scanned during the December and January months, they have only had to send back 80 books. Other problems found were solved without rescanning. Jen reported that Robert will be going through all the sites to review these issues. She suggests that everyone should check their files and be pro-active about QC.

With issues like fingers and hands or the pages not flat – IA is working retraining of scanners and trying to match the skill sets and abilities to various tasks. They are going to also review the criteria for scanning and possibly have a higher reject number.

Initially believed to be a resolution problem and unclear images is more related to the derivation from JP2s. PDFs and DJVU were bad but the JP2s were good for detailed illustrations.

Foldout images are going to be reviewed because there could be a resolution issue with these. IA is getting a document out that should help us know proper expectations with different formats and resolutions, etc.

Chris’s brilliant BHL Portal presents the JP2 when magnifying an image, thus providing good resolution for illustrations. It is product of down loading pdfs that we loose quality.

Matthew reported that the whole experience of working with IA intensely on the quality issues has brought a bit clearer the IA workflow and process. It is all still a work in progress and it seems to be getting better with this kind of intense scrutiny and working out the details.

Of the 4,000 items scanned, IA is not planning on 100 % review. And looking forward, the new workflow changes should catch many of these problems… we hope. It was reported also that IA has some new staffing regarding quality review. Julie is now on board and Marcus has left the company. The backlog of QA is being dealt with. It was also reported that the quality review by IA does NOT find cropping or too tight of binding issues, word loss due to cropping and odd center folds scanned that should not have been.

Keri reported that the Smithsonian scanner has been doing Smithsonian quality review and possibly some other sites scanning review (due to some slow scanning issues being resolved). IA’s scanner is doing some rescans of Smithsonian materials. Scanners at the scribes are looking on screen and do the page counting etc. They are looking at preview not the actual image. Scanners can not do real time review due to loading and shipping of files.

• Action Item: Diane will let us know when the IA guide becomes available and the outcome of her first foldout review

NYBot has received another book with a torn page (one damaged book per shipment). John M. will be contacting Stacy at NYPL to find out what is going on. Possibly scanner training issue but it needs to be reported so IA staff can look into why this is happening

• Action item: John M will report back to group the outcome of the NYPL IA staff response and actions regarding damaging of books

VIII. Serial Bid list updates

Matthew reported that some of us are only bidding our own record and others have been bidding on related members. With the move to the permanent server, a way to merge these kinds of records will be in place. The move has happened but isn’t quite ready for prime time.
• Action item: Suzanne will inform Bernard that we all are waiting for his wonderful serial mashup database update on the next call.

IX. Permissions and workflow

Erin is working with Tom and getting the workflow a bit better. Contracts are getting collected and there should be something that everyone will be able to see and review fairly soon. Martin and Tom will discuss the communication of permissions and the workflow to the collections group.

It was suggested that possibly when permissions are received there is a proactive, push of notification as a way of communicating to the BHL staff that a title is now “allowed” and could help facilitate a location for scanning.

X. OCLC Collection Analysis

Doug reported that all the data seems to be in the OCLC Analysis tool. Jen reported that MBL has had data in the OCLC tool for over a year as part of the BLC but the staff have not really worked with it much. Doug has reviewed it a little bit for BHL. Of course the quality of the data is the data they received but the comparisons were expected to be better. The amount of uniqueness for library holdings seems off. It was expected that OCLC’s deduping would be more on target than our attempts. Jen reported that when she was using the tools she could see some obvious duplication that OCLC had as unique. Until the tool is examined more closely, it is still unclear how this will be used for selection.
• Action item: Doug will be looking at his local OCLC provider for possible training opportunities with the Collection Analysis tool

XI. Workflow for linking to scans in OPACs
Lisa started the conversation on the Biodiversity list group. Posted the question to list about who was starting a workflow of recording things scanned in local ILS. It is very time consuming to do this by hand and there has to be either a better way or another way of processing this material back to the library

MBL’s “catalogeroo” Maggie Rioux (a term of affection) has automated a process for creating URLs in the catalog. They are using Voyager. It is a batch process. The script auto-adds holdings records for the BHL “e-versions” and adds an 856 to the bib records. They have not done any yet since they hit the quality issues. MBLWHOI relying on database created from pick lists to create a list of items to ingest with URLs.

• Action Item: MBL will look at the procedures of what they are doing. Some is ILS specific and some not. They hope to be able to post the wiki their workflow and scripts.

Originally members were looking to add the link to the IA site since the BHL portal was still under development. Now the question is should we ingest urls to IA and BHL or only one. If you download from BHL you are still fetching IA material. Patrons are looking for the PDFs usually when they come to the library’s ILS. JP2s are too huge to serve up for downloading.

The process of getting titles and urls from picklist returned from the scanners. How would this workflow work with Wonderfetch™?

Using the IA URL as a placeholder, there is potential to globally replace in the ILS with the BHL portal address. Or add wording with each link explaining the difference in addresses.

MoBot is doing separate records for electronic resources. Doug is rethinking this as the digitization increases and how this form of cataloging will scale; as well as patrons’ expectations when searching for information. Separate records in OCLC help outside libraries get a record pointing the scans of the materials when they do not own the hard copy. Is this worth it? Is there a way for OCLC to do some harvesting of BHL to create eversion records?

MBL has a hand created screen that goes to the journal/series page with the links to the volumes. This is hand created. BHL’s serving up that navigational page might be a better solution.

Lisa looking for examples of the automated process of creating the 856 URL. A possible query from the BHL portal of “give me all the NYBot from this date” might be the way to go.

Chris suggested that we consider the query that he uses to ingest IA records to the BHL portal.

• Action Item: Lets continue discussion next time and folks report back on local library policies.

XII. Surprise visit from Tom G. who was going to use the conferencing call number after us. All other topics will be saved to the next call.

• Action item: Suzanne will talk to Tom about the use of the number and “booking” the use of it.