July 27 2015
Back to
Tech Mtg list
7/20/15 Tech meeting
Trish, Bianca, Martin, William
BHL general
Servers are more stable now but suspended further ingest until Mike comes back. PDF Generator is running. If they present problems Mike left instructions on how to move when he’s away.
Bianca will setup a new email
techteam@biodiversitylibrary.org. Transfer the OAI email as well. Will let us know password next meeting.
Stop words list – did Mike get that? Bianca still has to do.
Discovery Tools Working Group – should be meeting this week.
IA contact – with Robert Miller gone who do we contact? His tasks are now split but no overall coordinator.
All the BHL metadata corrections to the IA Book images Flickr are completed. Martin is not sure if IA is going to upload the total 14 million images that Kalev found with his algorithm but will ask.
Mining Biodiversity –
William and Trish met with Riza last week to review latest set of triples and relationships. Because there were some with the entity types being wrong we decided we would remove those types from for now and just have the volunteers focus on whether the relationships are valid. New set will be available this week.
Waiting til Aug to have intern work on DISCUS problems
API issues with grabbing OCR pages were resolved - they will get from Manchester instead.
Art of Life
Trish finishing up the NEH report which is due Friday.
Some interesting stats being compiled. Spent time with Mike L on Friday looking over stats related to the success of the algorithms. Determined the ABBYY algorithm was 76% accurate and Constrast only about 30%. This was much lower that what we originally thought – predicted about 87% accurate based on small gold standard set we provided to IMA during development. Good to know as we can rule out use of Contrast in the future for predicting where pages might be. Also good to know we can’t just rely on ABBY algorithm and will need some human verfication
Purposeful Gaming
Nothing new to report.
Rod Page posted feedback on games last week.
Other
TDWG Who’s going ? Martin, Carolyn, William
2 more BHL-related proposals were submitted on Friday for TDWG 1) Engaging the citizen scientist in content enhancement for the Biodiversity Heritage Library 2) Unlocking knowledge in biodiversity legacy literature through automatic semantic metadata extraction
Martin will ask Gail if there are other crowdsourcing or citizen scientist talks where it could fit better. Otherwise we can fit into the BHL African symposium