BHLStaffNotes041108
Notes from April 11, 2008 Conference Call
Attending: MoBot: Michelle, Doug and Chris; NYBot: John M. and Don; AMNH: Tom B. and Matt B.; MBL/WHOI: Diane, Matt P., Jen, and John F. ; Harvard: Joe; NH London: Bernard; Smithsonian: Suzanne and Erin. Note takers: Suzanne and Jen
1.Round Robin Updates:
NYBot
- John M. narrowed down quality issues clarification – two damaged books – one had two torn pages and one had corner torn. All intact at time of scan. All pieces and parts have returned. John M. drafted a statement to Susan (boss) and Statcy (IA Pod Manager at NYPL) cc’ing Rober Miller
- Action Item: John M. will report back results
- 130 or so pieces sent next week. Might send 200 the following week
- Biblio or meta manager access lacking. Extremely frustrating.
- Considering changing search identifier to the OCLC number
- Diane reported that no one has access to Biblio. And Chris reported that Metamanager is being phased out - IA plans on having the web interface be The interface to all data and access.
- John M. is working on script to fetch out of Millennium create the WonderFetch url
- Don question to the group: How are places recording and keeping track of rejected works - especially rejected by IA staff
- MBL/WHOI:
- Diane is keeping a master list of all volumes returned - and a list of all rejects for future gap filling
- Monographs - keeping list. Loading monographic dedup tool before shipment and when getting back taking down and replacing with really what is scanned. Rejected volumes will have been removed.
- Matt P. reported that it takes a lot of effort to work with IA Site to figure out what has and hasn't been scanned.
- London:
- Bernard – launched scanning inhouse system with Wonderfetch being used next week.
- They can flag up with reason by IA staff in their system
- Similar to serial bid list - extension
- NH Things not pullable is doing in title level.
- Track internally shelf to return
- Harvard:
- Joe rejects are in the item ExLibris record, a note in the circulation has aplace for hold or rejected and another note field for an internal note where they put the reason why.
- Harvard can pull a report on what IA rejected and Harvard rejected.
- Track of things done and spread sheets and packing lists and importing to filemaker database of everything scanned. Deleting what was rejected by IA
Harvard:
- Joe – Foldouts and the links to IA have been sent and they have been able to view the PDFs and the DJVU only. These works have not been put into the BHL portal yet. They have compared the foldouts with the images can can tell there is loss of detail. JP2s have not been examined yet to see about the quality there. They hope to next week or so.
- Wonderfetch is being worked on with Diane’s help. Sent one shipment with Wonderfetch links. They sent back some examples of how metadata appeared. The current problem has been with the date information. Wonderfetch pulled date from bib record and put it in date IA field. Title bar includes the journal title and the date of the whole title and not the volume year. We would like the date to be the item from the packing list not the bib title level date. Keri, Diane and Joe are tweaking WonderFetch. More testing later with next items sent for scanning. Diane : This is not scanning station specific problem - everyone will have the title date listed in the title bar. WonderFetch needs to look at the date field.
MBL/WHOI:
- Diane: 496 items sent Wednesday of this week. Schedule now about 500 items every 2 or 3 weeks with a mixture of serials and monographs with lots of rescans being shipped.
- Wonderfetch the last load – but now need to hold off until we can get the data field correct.
- Some foldouts with each. Not a complete load. And due diligence copyright statements
- Asked about time allotments: An extra full time / evening job for Diane and Matt! Basically, Matt and Diane are full time on BHL except for about 5 to 6 hours per week. With some balance changes with various phases of the project. This phase now includes foldouts, Wonderfetch and Due Diligence.
- A lot of the local work and usual patron service is not happening
- Chris felt that it was a lot to ask to give up one to two full time staff level. He will query the Institutional Counsel about possible solutions. Currently, BHL has been under spending on scanning because of the slow turn around.
- Action Item: Chris will report to Tom G. about the staff time issues
- Smithsonian will be able to hire but it has been hard with our HR steps to get the jobs posted
- MBLWHOI booked over 3,000 hours since July. General staff (removing Cathy’s time) spent 2500 hours of just staff of over 13 staff members with some staff spending as little as 10 or 12 hours (due diligence searching) and other individuals have spent 1000 hours. This was maintaining 400 to 500 items shipping every 2 weeks.
MoBot:
- Doug – as of Monday fully staffed with 6 imaging technicians though two are full time with the herbarium specimens. The hiring process took a while but are happy with the staffing level now.
- Emphasis now will be on the selection flow and the comparison of things that are already bid on or scanned by fellow BHLers.
- Chris has on his todo list working on getting data out. (The DNS server died and it had an effect on everyones time. Things are better now!) Chris is working with programmer to do the query getting.
- Michelle reported that about 75% of the serial already scanned by MoBot are now bid on in the serial bid list. No bids have been made on things that they intend to scan. She hopes to be complete with the retro-bidding in a week or so.
- Action Item: MoBot will report back when they have finished the retro-bidding on serials.
- Action Item: MoBot will report back when they have uploaded the monographic already scanned titles to the deduping tool.
Smithsonian:
- A test of about 20 volumes were sent to the LC’s FedScan, but nothing has been scanned at that facility yet
- Our local foldout machine was missing a piece (or two)
- Our new firewall is up but things are still insane.
AMNH:
- Tom heard on Monday that Brewster signed their contract but they are still waiting to get it back. They have started to work on the logistics of shipping stuff to NYPL. IA got sample picklist with the requested metadata IA needed from an initial pick list.
- Tom B. reported that they are ready to ship a about 570 monographs as soon as they get the contract back, signed by Brewster
NH London:
- 1085 volumes scan with a single scribe that they have had about a year
- Scanning foldout for the last 2 or 3 weeks. They were missing pieces of the foldout machine for 6 weeks.
- Have not looked at quality of images from foldouts yet.
- Serial Mash up/ Bid list is now on a production server with some extra functionality You can add bids on what you intend to scan and then mark what was scanned. We need to think about the logistics and workflow of how we all want to do this.
- Merging functionality is coming. Bernard is working with Matt P. and it should be ready to test soon
- Scanning management system has been started with the first trolly of material this week. Nathan, the IA scanner, will be using WonderFetch.
- London is taking files back for archival purposes/ digital preservation. Downloading the JP2 in packets. They are not full JP2s but the edited, cropped. They seem to be pretty good. They ftp lists of identifiers from metamanager to retrieve the files and then are putting the images in a storage area with a tape off site back up.
- 1 ½ shifts is starting with new IA scanner coming on board. They will be scanning now from 9 a.m. to 6 pm. They should process more items – 40 to 60 a week
- QA problems – especially the PDF problems reported by MBL/WHOI – London will try to doing some new scans
- Diane had to send 80 items to rescan because they couldn’t derive the pdfs properly.
- Smithsonian had to rescan as well.
- IA has failed the QA process for that window of time
- Chris reported that IA had a staffing problem. They have brought someone else on to do QA. They do not do 100% or about 2% of x amount. Diane: And they are not opening every file format when IA does the QA checks
- IA did 100 % of pdf for Dec and Jan for MBL/WHOI or for everything BHL?
- Action item: Bernard contact IA about solve pdf errors for the know time period.
2. Serial Bid List updates
A round of applause was done for the 988 bids recorded on the bid list.
See above for Bernard’s updates on the new server and functionality coming our way soon.
- Action item: Bernard and Matt P. will let us all know when we should begin to play with the new serial bid list functions including merging of obvious duplicate titles.
3. Workflow of linking scans to local records
NYBot: Lisa John M. have been working on linking OPAC to BHL for some materials. They would like to figure out a query to IA or BHL to pull the serials.
Group discussion on RSS feed vs reporting feature. Most preferred the report since updates to local systems will most like be done in batch processes. Reports should be able to be grouped by what has already been “seen” vs “newbies” or “Deltas” as John M. calls them.
Chris reported that there is about a 30 day lag time from IA posting to BHL ingest because IA has the ability to pull things off during that period for whatever reason. BHL does not want to pull until we are sure of its “stability”. No one seemed worried about this delay if we know that it is there. Ingesting BHL metadata will be more accurate than what IA can serve up. IA’s information is in various files. BHL can provide a more accurate pairing of metadata to volumes.
Bernard reported that these records are also going into the European EDIT project.
OAI is another option – is it worth exploring for folks? Can we harvest?
Smithsonian can’t. We are handcuffed by our systems office.
- Action item: Others? OAI harvesters?
- Action item: Fields and Formats checklist to Chris from BHL Staff
Other Non – BHL people are asking for what is in BHL. Advise on what to give them and how? An annual list or quarterly official list to be published? On demand? Or a check-list of formal BHL offerings? Tom might need a quarterly report for reporting issues.
Are BHL members taking things other than what you give to scan? Currently MBLWHOI is only taking what they have scanned and linking in their OPAC. NH London is storing copies of JP2 (cropped) of what they have scanned.
Title identification numbers vs item identification numbers discussed. It is common for multiple BHL sites to participate in the scanning of one monographic series or serial. Filling in for rejected volumes, missing volumes, etc.
- Action Item: BHL Staff who are working with III might want to talk off line about a systematic way to do some work.
- Action Item: All need to review what Maggie put together for MBLWHOI on the wiki. Some is custom to their system other parts are most likely reusable no matter what the local system.
BHL is become a library aggregate and potentially could be the local system for this topic range of information. Is the local OPAC worth maintaining this information? Matt P. and Doug both feel this is something worth keeping in mind and discussing with local libraries and thinking about policies.
- Action Item: Suzanne will talk to Tom G. about OCLC’s interest in harvesting metadata out of BHL and creating records for the digital editions in WorldCat.
MBL's success with connecting BHL to OPAC (Diane)
Go to our catalog (www.mblwhoilibrary.org) and search by keyword for "biodiversitylibrary" and see how we set up the monographs (over 1000 now in)
Multivolume sets - search title "Natural History of British Shells" and view intermediate page
Procedures
Adding URL Hotlinks for BHL.doc,
Adding BHL Hotlinks Appendix A.doc,
Adding BHL Hotlinks Appendix B.doc
4. WonderFetch(tm) (Diane)
BHL members need to pass both title (bib) and item level identifiers - for portal functionality. Will this work for people? Which fields to use for which id's?
Do we need to schedule a Wonderfetch (tm) call for next week?
- Action Item: Schedule a WonderFetch™ call
Diane has posted a WonderFetch™ template but suggests waiting for version 1.1. (oh so Microsoft wanna be!) There is date field that seems to be causing some issues and concerns. Keri, Diane, and Joe will be working on this next week.
- Action item: Report back on the data issue and next steps for WonderFetch at the to be schedule phone call?
More discussion on the title id with use in WonderFetch™ . BHL lobbied hard to get this information added to IA. Each BHL library has a link field to find in their local system. Frankenserials are a problem to aggregate
Standard identifiers of ISSN/ISBN/LCCN and OCLC still a problem with older titles that don’t’ have these and different libraries using different OCLC numbers or not passing OCLC number (at all or in different fields?).
Bernard’s serial bid list will begin to let merging happen. There is a potential connection.
Currently, Chris is working on the MARC leader and second pass with 245$a. Leader is not unique and only if the data is from the same institution.
Bottom line: BHL needs a librarian! A metadata curator. A BHL eresources guru. A dedupifier.
Local identifier need to match up with your system. That will be URL for BHL. Wonderfetch is stuffing identifier title and item level for purpose of matching cross reference IA id and BHL id and local system find.
5. Monographic DeDuping (John F.)
Developer Ryan is about to do next enhancement and overall new code for the deduping. The system is pulling 8 core fields to ingest and is ignoring local data. OCLC number, title, volume and call number are reviewed. We need to standarize on the field names so time isn’t wasted on mismatching things like “year” vs “chronology” and “Vol.”, “v.” and “volume”.
- Action Item: John F. will send out an email to the group asking for help on the standardized form of the fields.
- Action Item: Everyone give feed back to John F. if the system seems to be overly burdensome or difficult.
- Action Item: Suzanne will get in touch with Field Museum to see whats going on.
- Action Item: MoBot and NYBot will wait until field names are standarized before sending already scanned titles to the deduper.
6. Quality Review (Diane and Jen)
Question from Martin K. about percentage of acceptable error rate - do we have a suggestion on what we think is acceptable and the "kinds of errors" we are talking about
John M. find difficult to address this – pdfs there are times when you can’t get quality because of various reasons. Do we want to do a blanket quality or by format quality?
How should we create a BHL standards: MBLWHOI discovered some of the more outrageous errors and discovered what turn out to be legitimate / not avoidable issues.
Diane suggest that this is a much longer talk. Fingers hands – some think this is horrible other feel that the data still can be OCRd so who cares. Erin frowned. The IA staff were appalled because they couldn’t figure out even how that could happen. Some are so bad that IA should offer us free rescans and free cookies! Some of the other errors are a bit more gray.
Which can be overlooked and which can not. Be reasonable etc.
Don clearly articulated that BHL should have a Quality group that comes up with basic standards that we all agree to and present to IA as a unified front and part of our overall contracts. The standards should be reasonable, doable and agreed upon by IA. Bernard seconds this idea (as long as we stay realistic understanding that the goal is to keep the 10 cents per page). Perfection is not the goal.
Chris brought up that we need to be clear about what we need as the best quality for our BHL users and what we can do derivatives and not worry about IA’s derivatives.
John M. pointed out that the underlying idea is that we are not doing preservation digitization. The main issues is if the error hinders access to the book?
Matt B. requests a reasonable baseline standard and avoid the overly picky. Personal baseline might be different for each and too difficult to maintain so we need consensus.
- Action Item: All are to think about this and come up with suggestions on how we should move forward.
7. Permissions and workflow (Erin)
Erin is still working with Tom to get the information freed from Tom’s email. Tom is still working on his backog of email finding more relevant information. New agreements will help with the workflow being hammered out. The collections group has been notified.
8. Collection management issues (Doug)
Doug and Connie will be talking about the OCLC Collection Analysis. There are some wild numbers and estimates. Tom is doing some creative mathematics coming up with volumes and page counts. In the future, this group might be called on for feed back or something.
9. Status of a face to face meeting agenda items:
Quality control document face item
10. Next call – scheduling
The group consensus was that the calls could be more frequent with focused agenda items and shorter times with general calls interspersed.
Wonderfetch™.
Local systems issues and data returns with Maggie joining
AM Schedule works best and Bernard can call our conference number.
- Action item: Suzanne will start to schedule a specific topic call soon.
- Action item: Suzanne will start a separate schedule for a general call soon.