Aug 18 2014

Tech review meeting 8/19/14
Present: William, Trish, Mike

BHL
Migration of CB content to BHL is pretty much complete with only JEANH and AMNH left to do. JSTOR is still a question since they didn’t respond to Trish about how to filter biodiversity content from IA

Art of Life
Mike L discovered more problems with algorithm exports from SIL- isolated just to SIL I think. Fixed and Mike confirms they look fine.
Kyle has incorporated code changes at IA and IMA and processing started back up. Not sure if he has reset flags to reprocess missing ? Trish check with Kyle
MOBOT access to servers at IA – Mike L. now has access and checked processes are running.
Macaw and FromThePage issues appear to have been resolved by removing the IPS from firewall! Added 2 more volunteers. Need to ramp up more – Trish followup with library schools that Richard Hulser recommended.
No reports of error messages on either application
Pushing SI algorithm outputs to Macaw – how many? 1.3 million. When to push over? Trish Ask Joel if import size would be problematic. Macaw is using PostGres so should handle large data. Server might be a problem. Trish ask Joel and Mike W about it.
Conversation with big data researcher Kalav Leetaru – only date everyone is free is July 28th 12-1pm CST. Trish will confirm that date/time with everyone.
What could he contribute? Could he identify the colors of the images? Could he help us with bulk uploads to Flickr and extracting data out? We should incorporate our conversations with him into our final report.

Purposeful Gaming
OCR output from MOBOT – update from Mike B? Generated OCR for 9 books – 3 good OCR, 3 fair and 3 bad. Put in Botanicus do not delete folder. Downloaded Tesseract and tried out. Output from Tesseract looks easier to parse to Mike L and coordiantes closer to aBBYY more so than PrimeOCR so will probably use. Takes about 1 hr a book. Need to use images from IA so that both OCR are coming from same image.

Mining Biodiversity
Urgent task list
Incorporating AddThis and Google Scholar Md tags to BHL –
AddThis is a tool that could be added to all BHL pages so that users could use it to reference it and build discussions to BHL content. Problem is the way AddThis works is they track lots of personal info and sell it to highest bidder. Goes against BHL privacy policy. But then again SI and gov has been using them for a long time so maybe not an issue?
Google scholar md tags – need to answer what is benefit to BHL or users to add the citation info in page source? How is this different from schema.org and why do we need to repeatedly add the same citation info in different encodings: COINS, schema.org, Google scholar md tags?