BHL
Archive
This is a read-only archive of the BHL Staff Wiki as it appeared on Sept 21, 2018. This archive is searchable using the search box on the left, but the search may be limited in the results it can provide.

April 27 2015

BHL Tech meeting 4/27/15
William, Mike, Trish

BHL general
Joel talked to SIL technology folks about getting server space for moving content over but still more details need to be worked out. Still on track for Sept launch
Full text search plan - William still has to send out a detailed plan with new timeline to Bianca, Trish, and Mike
Discovery tools group – looking at what they need to get our content. Adam Chandler is leading. Looking at NISO KBART practice – another output format
GBIF approached BHL experiment to take location of specimins of literature and data mine it. They are taking our data and working with it.

Art of Life
Science Gossip – collections that were on pause due to metadata issues are unpaused but not live. Trish will suggest a call this week with Zooniverse for next steps.
Mike sent Briana a file containing results of all 3 image finding algorithms (Briana, ABBYY, Contrast). Brian reported the following after a manual review of the results:
Here's an overview of the results. The definition of 'illustration' here is from a quick visual skim of the pages by me. I may have gotten a little crosseyed, call it a 1% margin of error! Comparing to the data coming out of the Science Gossip project will give slightly different results, as I didn't try to tell the difference between a proper illustration and a woodcut, etc.
Of the ~8000 pages, all but two illustrations are caught by either 'abbyy' or 'briana,' or both. 'abbyy' misses illustrations on ~20 pages, 'briana' misses ~30, because they're using different methods to find the illustrations there's very little overlap in those misses. Contrast is a very distant third in all this.
I've attached the two missed illustrations. The first one was actually picked up by 'contrast,' I know I could tweak my algorithm to detect it, it's just a hair narrower than allowed. The second one I understand why all three missed it!
algorithm, count, illustration count
abby briana contrast, 737, ~all
abbyy briana, 1253, ~all
abbyy contrast, 13, 10
abbyy only, 73, 20
briana contrast, 66, 5
briana only, 288, 16
contrast only, 1042, 1
none, 4481, 1

If we want to filter the current stuff in Zooniverse Jim would need to do that. If we want to filter future uploaded material Mike would have to change his code for how he grabs the images – currently he just specifices a book and says grab everything.
Joel reported we are now complete processing all 44 million pages in BHL! Mike reports he has just started processing a lot of MOBOT content which has a lot of pages that are out of sync with IA. He has been pushing those issues over to Mike L to resolve. It’s a significant pile to resolve but no urgency and can be done over time
Art of Life officially ends this week. Trish will begin pulling together stats for the final report but it is not due til end of July.
We need stats on number of images tagged in Flickr and Zooniverse. Mike has pulled the Flickr data from BHL stream 1 year ago but hasn’t updated since then. He will start looking into pulling the current data from both the BHL and IA streams in Flickr. Querying the IA stream will be more complicated for IA due to the number of images and query limitations set by Flickr (he will probably need to set up multiple keys as Joel and Kalev did.). Zooniverse has shared data with us via their website but has not given us export files yet. We need to start talking with them about this.

Purposeful Gaming
Desmond meeting – has been working on binarization. Trish check in with him again this week.
Tiltfactor – William emailed Max about talking with him about error message this week. William was using IE 9. IE11 on Windows, . What progress have they made on Windows? Probably won’t have IE 11
F2F meeting Set date for May12th. – Trish will start working on agenda. Trish check and see who would like to do a dinner the night before

Mining Biodiversity
Trish finished annotations and Jen will continue doing adjudication
Met on Sat with Corrected OCR group – will add new sources of lexicon
Altmetrics – bug on their end and Mike L hasn’t heard back from them

Other
Trish out next week for CCLA workshop at Univ of Mary May 6-8th
IMLS 2 pg proposal accepted and asked us to submit full proposal by June 1st MOBOT needs Letter of Commitment from each partner institution, resumes for key staff, IDC Rate agreements and job descriptions for those positions that will be filled with grant funds