TechCall_18July2016
Action Items
- Bianca will send an email that says these are the field names
- Joel will update Macaw and then Bianca can work on PartnerMetaApp and with Keri on Picklist
- Mike will integrate changes for ingest
- Susan will talk to Joe DeVeer if all of this will work when scanning institution is using TT Scribe
- Bianca will update our documentation and make sure BHL Staff are aware they can start passing data through those fields when we're ready.
- Would be good to test with passing to IA via Parter MetaApp, Macaw, and TableTop Scribe before announcing to staff
Follow up Chat clarification (Bianca, Joel, Mike)
Mike, Joel and I had a quick chat this morning about the topic of codes vs. using full content provider names below and we have decided AGAINST using codes after all. In other words, the decision is to recommend that BHL Staff enter the full content provider name for the “rights-holder” and/or “scanning-institution” fields. From the digitization workflow side, it will be easier for Staff to choose (more on this later) the correct name than to decipher/recall the correct code.
As discussed, Joel will use BHL’s APIs to integrate the list of BHL Content Providers into Macaw for Staff to select upon metadata entry for Macaw items.
Joel said he would try setting up similar functionality (via VBA) in an Excel spreadsheet template for the Internet Archive Partner Meta App.
The Table Top IA Scribe may still be an outlier however…
For new content providers, namely new “rights-holder” field entries, BHL Staff will be able to enter a new name (as in a name not yet available in the Content Provider list) and it will be assigned as “Unknown” in BHL. A new Gemini issue will be created requesting improvements to the way “Unknowns” are reconciled.
Agenda and Minutes
- Contributor Changes
- Debrief on communication schedule for completed changes
- Discuss next steps for implementing changes in Macaw and Partner MetaApp (Bianca will join)
EABL expectation is that after Admin Dash, corresponding changes would be made in Macaw
This is the priority for Macaw changes that are in the works.
Advantage of using short code for institution's in Partner MetaApp.
Rights holder field will be used more frequently, including others outside of EABL team.
Bianca will update our documentation and make sure BHL Staff are aware they can start passing data through those fields when we're ready.
Would be good to do a test first.
PartnerMetaApp contributor changes will be consistent with how it's configured in Macaw
Don't use underscore
Bianca will send an email that says these are the field names
Joel will update Macaw and then Bianca can work on PartnerMetaApp and with Keri on Picklist
Mike will integrate changes for ingest
EABL has content for testing
Test via IA after that
Other interface is TT Scribe, Susan will talk to Joe DeVeer if all of this will work when scanning institution is using TT Scribe
David Kohn is planning to resume work in August and September. Martin has been in touch to schedule a call with David and will be in touch if anything further is needed.
Martin, Carolyn and Bianca spoke with Tom Orrell (ITIS) on July 15. Notes from the call:
- Global Names is made up of a loose network of partners with no point person currently in charge. (Tom represents ITIS one of the GNA partners)
- Communications and maintenance around GNA services are not clear but Tom is motivated to clarify this among partners going forward
- Tom appreciates the feedback BHL receives from users as these demonstrate clear use cases he can present to the GNA partners as a call for strategies to improve communications and service maintenance
- Tom will consult with colleagues about providing a contact for BHL which we can use to submit feedback about scientific names issues – this does not mean the issues will be resolved quickly but at least they will be documented for follow up
- Tom will be given access to BHL’s Gemini and a Scientific Names workspace to keep tabs on the feedback BHL receives
- Future meetings will be scheduled to continue discussions
- Gemini 57792: Identifying articles for BioStor processing
- The EABL workflow for defining articles in BHL is a work in progress. Current thinking on the workflow is:·Patrick Randall at MCZ contacts the publisher and asks for (1) The names of databases/services that index the publication (2) All available article metadata in a structured format. Required fields are:·Article title·Author names. Lastname first, followed by a comma, followed by first name and middle initial. When multiple authors are credited, the names must appear in the correct order.·Journal name aka container name·Year of publication·Journal volume·Journal issue if any·Page range·Patrick includes all of the above information in a Gemini ticket. Gemini tickets are one per publication or possibly one per publisher. (One per publisher makes sense when a publisher has only 2 publications.)·The EABL team plans to use either EndNote or Zotero, both of which support storing citations in the cloud and collaboration by members of a team. If we use EndNote, we'll define one shared library and multiplegroups, one group for each publication being articlized.·A member of the EABL primes the EN group (or Zotero equivalent) with article metadata received from the publisher if any. ·A member of the team add citations available from the following sources:·Web of Science·PubMed·Scopus·Other citation services identified by the publisher·Google Scholar·If the articles in a publication contain DOIs, an EABL team member downloads article-level PDF files to his/her desktop. The citation management software searches each PDF for a DOI, uses the DOIs to pull article-level metadata from CrossRef and defines an article/item in Zoteroor EndNote·The EABL team may or may not add additional article metadata to the reference management software in a manual or semi-manual way. If the publication is born-digital, an EABL team member can cut-and-paste the article title and author names from the PDF file and use this information to create a new citation entry in the reference management software.·If the case of a scanned print publication, the team member can cut-an-paste needed metadata from the OCR text. ·After all article level metadata has been gathered for a publication, QA is performed. We look for empty required elements, non-standard forms of the volume and issue numbers...·When QA is complete, a team member exports all of the citations for a publication in RIS format, opens a GitHub issue for Rod Page and then attaches to the RIS file to the GitHub issue.The EABL team decided on the 7/14/16 EABL call to define articles for the publications for which we acquired permission and also for publications uploaded to BHL by other means.
- EABL team discussed exploring how much time is required to define articles manually. With that information, we could make informed decision on the feasibility. Same for gap filling; benchmarking would be useful for determining cost of process. Anything that can be done in batches should be done in batches. Often a person can just cut and paste much of the data though OCR for title is not guaranteed to be accurate, so that would be part of the QA.
- How do we deal with duplicates of articles from Rod? We try to group duplicates but don't de-duplicate per se. Rod also is able to largely avoid duplicates across content he submits, though not with content submitted by others. Some reference management software also has dedupe capabilities; identifies what's in common and what's different.
- From an export, attempted to load 180,000 articles into EndNote; import failed at 160,000, assuming it had to do with quantity of data.
- Recent Gemini ticket about being able to export from BHL to Mendeley but not to Zotero. Susan may want to clarify whether trying to export the metadata or the PDF at the article level.
- Updates/Questions for the group
- BHL Move to SIL - some vulnerabilities found and are being addressed. Stay tuned for a launch date.