BHL
Archive
This is a read-only archive of the BHL Staff Wiki as it appeared on Sept 21, 2018. This archive is searchable using the search box on the left, but the search may be limited in the results it can provide.

TechCall_02may2016

Dial: 1-877-820-7829
Enter the passcode: 407326

Agenda
  1. Discuss Contributor field changes (see below), All
  2. Updates on ElasticSearch and/or web migration, Joel
  3. Any other updates / discussion for the group

Contributor Notes
To review, we currently have a data model that looks like this…


Item

ItemID
<other fields>
InstitutionCode
<other fields>


Institution

InstitutionCode
InstitutionName
<other fields>


This allows a single Institution (Contributor) to be associated with each Item.


The proposed changes to the model look like this…


Item

ItemID
<other fields>


ItemInstitution

ItemInstitutionID
ItemID
InstitutionCode
InstitutionRoleID


Institution

InstitutionCode
InstitutionName
<other fields>


InstitutionRole

InstitutionRoleID
InstitutionRoleName


This model allows for multiple Institutions to be associated with an item, each with a different Role. Institution Roles would be things like “Contributor”, “Rights Holder”, and “Scanning Institution”. If a single Institution played multiple roles, it would be associated with the Item once for each Role.




After digging into the code updates that will be needed, I have come up with a few questions…


1) Should we duplicate these changes for the Segment table, so that multiple Institutions can be associated with a single Segment?
2) There is a relationship between the Title and Institution tables that is not entirely useful at best, and misleading at worst. If multiple Contributors provide volumes for a single serial, the Contributor of the first volume added is made the Contributor of the Title. While technically true, this can at times obscure the other Contributor (or Contributors) to that serial. Should we give serious thought to simply dropping the Title-level Contributor field?
3) The proposed data model changes will allow more than one Institution associated with an Item to have the same Role (for example, an Item could have two Contributors). What are the business rules? Is it valid to have two (or more) Institutions in the same Role for an Item?
a. If this IS valid, then reporting by Contributor becomes complicated, as Items can be counted more than once. Note that most BHL reporting is by Contributor.
b. If this IS valid, I propose adding a “Primary” flag to the ItemInstitution table. The associated business rule would be this: for any given Item, only one Institution in each Role can be marked as the “Primary” for that Role. For example, if there are two Institutions in the Contributor Role, then only one of them can be the Primary.
i. The Institution marked as the “Primary” would be used for reporting and any other cases where a single Institution must be selected.
ii. There is a precedent for this idea. Specifically, the relationship between Items and Titles… an Item may be associated with more than one Title, and one and only one Title must be the Primary Title for the Item.
iii. Consider this scenario as an example (thanks to Susan and Trish for this): BHL-AU gives BHL their article metadata for a journal. BHL gives it to Rod Page for inclusion in BioStor. BHL ingests the metadata from BioStor. It seems to make sense that both BHL-AU and BioStor be recognized as Contributors in the public UI. However, for reporting I believe we would want to recognize the original source of the metadata, BHL-AU. By making BHL-AU the Primary Contributor, we could do that.


I am open to other suggestions for ways to handle the “multiple Institutions in the same Role” scenario. What I have proposed here is actually an improvement over my first idea, so there might be an even better way to handle it.

Meeting Notes
Case Studies to help inform if should also include Contributor changes at segment level
NYBG handles monographic serials as monographs
SIL treats as serials
NYBG has begun also adding to BHL as a segment to make it discoverable that way, too
Additional metadata could address the question of who then is contributor of articles SIL scans

Museum Victoria is contributing metadata about thousands of articles for which they are the rights holder. They will send the metadata (possibly as csv) to Rod to run through BioStor and send to BHL. These are for items that have already been scanned and at item level are listed as contributed by those institutions that held them on their shelves. Museum Vic would like to be acknowledged for contributing the segments.

What if there were two roles,
Contributed by (e.g., Museum Victoria) and
Segment Processed by (e.g., BioStor)?

Or, what if we simply list multiples for the Contributed by at the segment level?
e.g., Museum Victoria; BioStor

We could also indicate who is the Primary and who is the Secondary Contributor. This would help with Reporting
MRK: Let’s go ahead and duplicate these changes at segment level to keep it parallel

On EABL, Susan and Trish have been exchanging emails with Rod about defining articles for permissions titles. For one title in particular, asked Rod why it wasn’t done yet and was because his process is to focus on content that is important to him. He’s open to suggestions and offered to have us leave lists on GitHub of titles we’re interested in.

If we were to send these kinds of lists to Rod via GitHub, it provides an opportunity for us to prioritize what we want worked on.
MRK would like to pursue this as an ongoing strategy because otherwise we're relying on ISSNs or titles or Web of Science. To do this, we will need to identify source of metadata at segment level.

Going back to discussion of contributor roles, would separating them out into two different kinds of contributors (e.g., metadata contributor vs content contributor vs segmentation or similar) make reporting easier?
Probably. Simplifies query and UI

BioStor is primary source of segments. SciElo also contributed a significant number.
Will it always just be metadata at the segment level?
In CiteBank things were different but that doesn’t necessarily mean they will be that way again any time in the near future.

How can we ingest articles directly?
Been more focused on hwo we can articilize issues

Bianca – maintains definitions for BHL Metadata, documents all of the fields.

For new fields, Susan will work with Bianca and see if we can move forward and maybe not need to involve Collections Committee

Title level metadata and Contributor Institution.
Currently, there might be mutliple contributors for a series with 100 volumes, yet only the contributor of volume 1 is listed at the Title level.
When merging titles, this gets even messier.

MRK thinks it’s outlived its usefulness and would like to make an executive decision soon. Might be that we drop Contributor at the Title level as one scenario.

Martin and Mike have not heard back from Dima on names, Mike will follow up

Susan – update on Diacritics issues
NYBG, AMNH, and Cornell are fixed. Exchanging emails with Keri and Jackie about SIL’s ongoing diacritics issues. Turns out these are only happening with IA scanning from the Scribe.
NYBG, AMNH and Cornell were not from Scribe machines; caused by ILS upgrades
Cornell were using Scanning center for first time, so basically startup issues
Others were upgrade issues.
Keri and Jackie will be looking into this for SI diacritics issues
Susan's focus has been on making sure things go in smoothly going forward