BHL
Archive
This is a read-only archive of the BHL Staff Wiki as it appeared on Sept 21, 2018. This archive is searchable using the search box on the left, but the search may be limited in the results it can provide.

Aug 4 2014

Tech review meeting 8/4/14
Present: William, Trish, Mike

BHL
Pending to do for William
JJs questions about OCLC platform (World cat)
How do MARC fields map to BHL fields?
Serv14 still having issues – folks can’t get to PDF this weekend
GNRD not working – Mike turned off til fixed
OCLC numbers displaying in BHL - FAST subject identifiers are now being added by OCLC and showing up in BHL. We don’t want these to display but would want to store them. Mike says issue is that storing them into production requires changes to data model for subjects. Its in the ingest database so we can retrieve it if we want to in the future but would require changes to datamodel so a lot of work. For now we will not bring into production or change data model for subjects.
Brazilians – want to use system from Canada that queries BHL. Would require changes to API – Mike L would need to more investigation to see how long it would take.

Art of Life
Joel Resolved the problems with numbering and missing pages in export. Updated code . He is Getting back on track for processing. Should catch back up with total processed at SIL in 1.5 – 2 weeks. Exports from Joel look fine from Mike Ls perspective
*Trish check in with Guarav about his project and how he might be able to help us extract

Kyle has to incorporate code changes at IA and IMA. Needs to reset flags to reprocess missing
MOBOT access to servers at IA – Mike L. sent email to Jake asking for access to server but cant get a response from him.

Macaw – continuing to have error messages when saving. Not sure if Mike W is any closer to a resolution. Really setback ability to classify – have volunteers who can’t work and more requests to join volunteer team. Mike W is out all week in training. Same problems are happening with saving in other versions of Macaw and with FromThePage. Might be a DNS issue.
*Trish send email to Joel about experience with auto save vs save button on Friday to see what http operations happen between the 2 methods. CC Mike L and William
3 versions of Macaw
- Macaw at Github – used for institutions who want to locally install
- Macaw in Cloud (hosted at Smithsonian)
- Art of Life Macaw (hosted at MOBOT)

Purposeful Gaming
Meeting with Tiltfactor on Thursday – is 9am cST OK? Yes

Max sent tasks for next meeting


Sample data – Trish will forward Tiltfactor what Joe and Mike put together for manuscript
OCR output 1
Generated by ABBYY fine reader – format is djvu.xml (contains both coordiantes for word and value interpretation of word)
OCR output 2
Generated by either PRIME OCR or Tesseract
Meeting with Mike Blomberg today – outcome? Mike B. will investigate the outputs from PRIME OCR.

We need to decide how many book pages of OCR outputs we want Mike B to generate . Do we want books from all time periods? Do we want to include multiple languages?

Mike can mock up at JSON with the info we are thinking of providing to them and see if that will suffice for the sample OCR output
We need to decide if we want to know how many users verified a value or if we set a minimum number of people we want to verify a value before they send us the data back.

Backend output – what format do we desire? JSON or XML works fine for us. Should Mike’s JSON mockup include what info we want back? E.g. number of users that verified a value

Mining Biodiversity
Dalhousie will provide automated OCR corrections by end of summer. They will provide us both corrected text and a service. Google NGrams (Evangelos)