TechCall_04jun2018
LEADS Project - Welcome to Gretchen Stahlman, introductions, and brief overview of the project
Susan shared overview of NYBG's NEH grant for digitizing correspondence of John Torrey, sometimes referred to as the "Father of American Botany". Working with Ben Brumfield to transcribe; Would be a good point of contact for geotagging.
Status update on API v.3
Will resume once feedback received.
Recent feedback on full text search
One bug - advanced searches where collections selected. Was fixed last week.
The remainder were more significant changes.
We'll dedicate a call in future based on Martin's input on priority.
Next steps for Transcription incorporation
- Requirements: https://docs.google.com/document/d/1tXfdwu71fOtZ8dlLMDFjR5ksof7KLO9mOEbWLZc-2ME/edit?usp=sharing
- Ignore markup. Leave it visible in UI. We can revisit later if we can need to.
- Change pageocr to pagetext and provide a page redirect. Notify Rod Page and others who might be using it. Code is not always set up to use redirect. Mike talked to Rod at TDWG and we had broken something for exactly that reason.
- Decided that can't change text within BHL. Might upload just 3 pages out of 100 for an item. FTP - corresponds to a work, not individual pages.
- Individual page logs - Check in with Harvard and ALA to see if they were planning to make changes at page level. There are some similarities with batch article definition. Just know account name and date that changes have been applied. Helpful because gives a chance to speak with them. Carolyn to reach out to Joe and Nicole
- Purposeful gaming output - is it one time thing? Move into proper format for BHL. Joel can write a script to move over; log the fact that we've replaced the pages. For Harvard, Joe probably considers the ALA output as the master copy. Were these individual pages? 3,000 pages maybe.
- Flag items that receive page inserts - prevent regenerated OCR from overwriting transcription or whatever is there. Would never want to overlay anything that was a transcribed manuscript. Might be good to move it to a queue to have it eyeballed. Harvard Botany and NYBG both have items that are manuscripts that require re-organizing the pages. What are conditions from which we re-ingest OCR from IA? Let's not overwrite existing OCR/text unless a manual intervention. From now on might need to involve Mike. Can we just denote source of text? Decision: Don't a flag and will just denote the source of the text. Agreement that's sufficient.
- If someone is uploading text to wrong item, ... could have a rudimentary error checking - pages in BHL with pages in transcription file. Could issue a warning. We may not be uploading all of the pages. Could be helpful if just title is displayed. Can re-grab OCR file from IA to replace any text uploaded by mistake.
Review and prioritization of action items from 5/21 Tech Call:
- Joel will follow up with Dima to see where we're at with the work referenced in the thread he shared with Carolyn and Mike
- Joel and Mike to define details of when things will be reviewed to manage potential changes from pushed updates/ App server reboots
- Susan will review the 3 Gemini issues related to the OCR issues and respond to Rod.
- Susan contacted Rod; Rod opened a new ticket saying it was wrong. Susan will review it.
- 60922 : Mike will create an indicator at the item level in the admin dash to prevent ingest of article metadata from external sources such as BioStor. Involves database changes, changes to Admin site, and BioStory processes.
- 60527 Mike will look into being able to export DOIs using the API. A few database procedures to be updated; shouldn't be too much work now.
- 61182: Mike will look into if segment types "BookItem", "Journal" and "Unknown" are being submitted, for example, by BioStor.
- Update: Mike has asked Rod about the Genre/Type values that BioStor can send. In theory, it could include "any valid OpenURL Genre value", but in practice things like "BookItem", "Journal", "Preprint", "Proceeding", and "Conference" do not seem to be in use.
- Update 2: Rod's response... "Currently all I use are: article - article, book - whole book, chapter - part of a book, letter - used for "articles" that are individual letters in a scanned volume of correspondence"
- Decision: "Unknown" potentially still of use; Keep "Unknown". Remove "BookItem", "Journal". Note: Seem to make sense at item level not segment
- 60710: Susan will talk to Diane Rielinger about using page prefix to add MM-DD info.
- 61157: Mike will look into date issues in the MODS exports and make sure no changes would impact DPLA harvests. All read through and discuss on next call.
- Susan will consult with Cataloging Group on status of author name work. (They have a call in about two weeks from now; Tech Team will revisit after that call)
- 58613 and 61034: Carolyn will clarify with Bianca what is needed for Supplement
- 60989 and 60752: Carolyn will clarify with Bianca what is needed for resequencing. Fairly complicated to implement.
New Gemini Issue for discussion (time permitting)
- 58083: For moving wall titles, include Link outs to more current issues for a title that we do not provide access to in BHL