BHL
Archive
This is a read-only archive of the BHL Staff Wiki as it appeared on Sept 21, 2018. This archive is searchable using the search box on the left, but the search may be limited in the results it can provide.

bhlstaffcallfeb202014

Back to Staff calls main page

Dial 1-877-860-3058 and enter the passcode 961479
Lead: Diana Duncan
Notetaker: Carolyn Sheffield

Agenda:

Pagination - (Jackie)

Gemini - (Jackie)

OCR of non-standard fonts by IA - (Matthew Bolin)

Question cross-posted from last Collections Committee call (JJ et.al.)
What do we do if a user asks BHL to digitize something that is available externally because they are unhappy with the external experience (could be quality, could be lack of BHL tools)? Among the expressed questions/opinions:

Round Robin updates - all

Regrets
Joe deVeer
David Iggulden

Minutes
Attending
Diana Duncan, Carolyn Sheffield, Randy Smith, Cathy Buckwalter, Becky Morin, Martin Kalfatovic, Tomoko Steen, Trish Rose-Sandler, Mike Lichtenberg, Matt Person, William Ulate, Matthew Bolin, Marty Schlabach, Alison Harding, JJ Ford, Don Wheeler, Jackie Chapman, Daria Wingreen-Mason, John Mignault, Connie Rinaldo

Pagination standards (Jackie Chapman)
Do we have any best practices or existing documentation on how to apply pagination tools?
JJ – There is documentation that Joe Cardin at MCZ put together for the MCZ workflow.
Jackie – That would be great to put together for all of BHL to use!
Diana - We had one that was out on the wiki. I'm not sure if its an older document.
I believe it is more documentation of the how, and not so much when you should use the different features.
Marty – when are you doing the pagination and with what? With Macaw before hand or in admin environment?
Jackie – this is for pagination features in admin dashboard

Update on available pagination documentation available on wiki:
files/BHL-MCZ+Pagination+Guide.doc
files/Pagination+How+To.doc
Pagination
Pagination+ST+Plan
BHL+Cookbook

Gemini (Jackie Chapman)
It’s been really quiet.
It's important for closing these out for grants, for quarterly reports, so please get in there and try to respond to any issues that you’re on.
You should be re-assigning if you can’t scan or address the issue that was assigned to you.
Also, it's part of our Staff Charge – so please be active in the issue-tracking system!

Follow-up from Jackie: Reminder that the scanning spreadsheet is available here:
https://docs.google.com/spreadsheet/ccc?key=0Ak0hDkSQMhfDdF9SOGZyeGdFOUFwUm5ZVzkycEE4cXc&usp=sharing#gid=0
Please update the Scanning Frequency Column as need

OCR (Matthew Bolin)
For OCR of non-standard fonts and texts, I thought IA said some stuff would be improved from the gibberish.
In particular, I'm wondering about Fraktur font.
There's improper identification characters, beyond just diacritics.
Martin – will check with IA on status of fraktur.
Usually OCR engines turned on by metadata in volume. There might need to be an indicator in the metadata. MK will also follow up on that.
Becky – being able to replace OCR with transcription?
MK – that is something that we would like to experiment with.
Problem with corrected OCR is making sure that character maps are correct, less relevant with full transcriptions.
William -- Another issue, we could possibly write a file for some of the collection but not all of the collection. Need to do that in portal rather than IA. Problematic for files we don’t own rights to modify files.
Transcriptions will only be for stuff we own and not for harvested content. In other words, it needs to be member provided content.
MK--We don't have edit rights in IA.

Collections Committee Question (JJ Ford)
For external content now being linked through BHL, should we digitize something if it is available externally? Or let it stand it’s own?
Jackie – how would this fit into the Gemini workflow?
Requests for things in Hathi or Google. Don’t have a formal prioritization process for things avail elsewhere. Value added because of names. Google isn’t one of our trusted repositories. For one thing, their scans don't have foldouts.
Additionally, some stuff that Google scanned for us we’re re-doing because the quality wasn’t very good.
If it’s from a trusted repository, though, we may not have to re-do.
MK – a good idea because a lot of the IA content prior to foldout machines, won’t have foldouts. All those will be lacking foldouts. Right now the policy is to re-scan if requested, and that is the right policy.
Tomoko – have some LC scanned content would like to add to BHL. Also, some great unpublished materials – might be good collections call issue
Connie – Macaw would be the mechanism

Round Robin Updates (All)
ANSP – continuing to scan, waiting for shipment to come back and then sending another one out. 2 – 3 items
AMNH – shipment out to Princeton end of Jan. 11 boxes, 1 showed up late. Box damaged, books were not. Getting lawyer boxes to avoid further issues
CAS – Long awaited rescans just about done. QA-d entire batch. When that’s done, a small batch of new scanning in response to Gemini requests. Get that out by end of Feb/beginning of Marc. CAS making a jump to new ILS. Macaw up and running for supplemental field scans and working great.
Connecting Content grant is wrapping up. Asked for a little extra time waiting for exact deadline. Just this week started working with programmers and developers, the portable app product that works to bring in content from all the different areas. Might be looking for testers once it’s ready. Through iNaturalist, but not branded as such, a new app.
Cornell – The main thing we’re focusing on the digitization for the IMLS grant with MOBOT. Meeting with NYBG to coordinate that process, working with ML to set up subcollection of seed/nursery catalogs already in BHL. Primary source is NAL. 4500 items in IA, ingested a portion of those. Intern at Cornell helping to identify catalogs and that will get coordinated with NYBG and NAL to minimize duplications. Current focus is grapes!
HUH-BOT – Our next shipment is going out. Working with Macaw. Coming up with a standardized list of fields to enter before ingest. Copyright field was questionable but Becky answered those. New licensing – might be collections call topic
Harvard MCZ – Joe working with Macaw. It takes a couple of hours to load one of the field notebooks. Plan is number of items scanned by Harvard that haven’t been able to put in BHL yet so about 200 items we’ll be working on. Joe working on Transcription stuff for Gaming grant Interested in SI TC. Continue to send things to be scanned, and more than Gemini requests. We've been working on a reclassification project and through this have discovered more items to scan.
LC – We just sent out for digitizing, identified a lot of items but quality of book was not good. So about 16 being processed now. Really long set of series with a few missing. Will ask collection people how to handle
MBL/WHOI – working on Gemini requests
MOBOT – Finishing up Engelmann correspondence, about another 3 months to go. Moving into QA. Blog posts on Digging into Data – came out on 28th, another on BHL linkouts. Research sprint – Mining EOL & BHL. Couple of suggestions on how to improve API. Pro-iBiosphere how to mark up content to extract knowledge for creating an open biodiversity knowledge management system. WU also attended for Digging Into Data PIs, looking at tools to help mine content in BHL.
NHM – Mostly responding to Gemini requests. Project starting soon to scan official press. Out for a couple months, March – May. Jane Smith will be contact in absence.
NYBG – putting a shipment out early next week. I look at Gemini. JM downloading files for first upload with Macaw.
Trish: WebWise – gaming speaker with Dartmouth, going to send our RFP to them. Managed to have 2.5 hour kickoff meeting. Looking at transcription tools that Joe has been reviewing.
SIL – in-house publications. Contributions series. Scanning as normal. Working on ANSP shipment that came. Items from Field Museum arrived yesterday
Balance on those pan-BHL funds is $6,921.89.
SWC grant -- $14k to support interns and working with Jiri.
Singapore and WUSTL, joined as new members, Jeffrey Trzeciak will be the rep. As many of you may know, Chris Freeland is also there. Will work out how they want to participate. Singapore is both a global node and member of BHL Classic. They’re very eager to start scanning. Purchased three machines specifically for BHL and other work. Also have a tech person that will be working with us on IT stuff.
For those of you who are official reps, official council meeting on Mar 10 and 11. Staff put forward any questions, concerns, forward to your rep so they can bring those to the meeting
Lastly, we’ll also have a BHL Technical Meeting occurring at MOBOT on Apr 2-3.
Field—volunteer is still working on pagination.