August 20, 2013
Tech Review meeting 8/20/13
Present: Bianca Crowley, William Ulate, Mike Lichtenberg, Trish Rose-Sandler
Art of Life
What is the issue with not being able to access the images on the cluster? Run on IA the book reader allows to extract an image on the page. Book reader on cluster is not running on cluster and may not ever have run. Mike L thinks it used to run on the cluster but is broken now. Can we extract images from JP2 zip file? Book reader is downloadable. Don’t need book reader as much as image addressing scheme that book reader uses. Images are on cluster but within zip files. IA has a way to extract images and convert to JPG.
2 options for getting images 1) get book reader working again on cluster 2) extract images ourselves from zip file then have to convert from JP2 to JPEG.
Option 1 - Not sure how easy this would be as its more than just downloading the book reader. Would need to get the script from IA that controls the image addressing schema and change it to point to the image locations on the cluster. (Phil had worked with IA in the past to get this working, maybe it was a man named Raj?? Mike thinks if we went with #1 then Anthony would be responsible. It would also benefit us by being able to serve images from the cluster when IA isn’t available.
Option 2 - Mike says its fairly easy to extract images from a zip file but the conversion could be more complicated. Would need Kakadu software. If we went with #2 then Kyle would be responsible although would still need to work with Anthony in making sure the cluster had the appropriate software for conversion.
Scheduling a face to face meeting in fall – does tech team have any dates that won’t work for them?
Working on presentation for Digital HPS conference in early Sept.
BHL Link outs document – no comments other than Martin this past week. Still time to refine it in Aug. Bianca did some editing but still 10 pages. Staff were asked to review if they are interested on Thursday’s call. Bianca will send out a reminder to staff to review.
Send William Update meeting documents – TDWG, DHPS, MLA
Question about who is an expert in running OCR software – Mike Bloomberg maybe, William will ask tag team if any of them consider themselves an expert. At IA who is expert? Mike hasn’t dealt with anyone at IA who does the OCR.
Copyright metadata – Bianca says to redo from scratch and need a due date. Who is asking? BHL Europe doesn’t have the most current copyright statements for our records. Need ids that need to be updated and need to know where to get the latest copyright statements are. Send to Jiri. Bianca says May need to do more than once since metadata will change. IA often doesn’t complete the updates so we periodically have to go back and check. How often should we recommend that BHL Europe update our metadata? Not sure but would be good to establish a schedule. Maybe start with yearly review and then do more often if needed. BHL China – Joel has disks for second group that we will be sending to China soon. Could we send them to the cluster for getting the latest metadata? Yes, Mike says the cluster seems to be updating regularly. Bianca will wait to hear from BHL Europe on their timeline if they have one.
Deduping algorithm – determined that automatic clustering should only be done on those titles that match at rating of “1” (exact match) . Otherwise would result in false positives and be risky. So only those Anything with score from .5-.99 as potentially related.
TDWG accepted our workshop and will be Monday 4-5:30pm. Trish, Martin, Jiri, Connie, William, Lucie will attend from BHL. Conference theme - Virtual communities for biodiversity science
Global BHL meeting will be held in Australia last week of January. We might have a tech meeting but still to be determined.