TechCall_23oct2017

Agenda & Notes

TDWG report (Martin, Carolyn, Mike)
1. BHL Symposium about 60 attendees, Martin, Carolyn, Katie, Mariah, Dima, Tom presented one day. Lots of interest, great questions. Mike participated in another session the following day and gave a very clear presentation on BHL's data exports and APIs, also very well-received. Susan will share the links to watch the recorded sessions (Susan emailed the link to group).
Technical development and communications flow (Carolyn): For discussion, we'll review a DRAFT RACI matrix (i.e., who's Responsible, who's Accountable, who needs to be Consulted and who needs to be Informed) for Version 1.5 Work Packages identified at the Tech Meeting
https://docs.google.com/spreadsheets/d/1Efl9Ju4bjAvYwt47_Wm2tS2ZHY6LHPIxqNN0B1CZ9E0/edit?usp=sharing
Batch Articles (Mike) - status update. Any challenges or anything needed to move forward?
1. Nothing needed at this point. Should be pretty close at end of this week for testing.
2. Susan - volunteered. Been preparing spreadsheets for ingest into tool. Using column headings in requirements doc; seem a little chatty. Is there a template with column headings? Mike will send over what he has. Certain titles are in production but not beta database; so shifted focus to Annals of Carnegie Musuem and an Arnold Arboretum publication. Both we have significant amounts of article metadata from publisher. Deliberately exercising all features in code; diacritics, page numbers vs page ids. Can either run herself or make available to Mike, or to anyone else who wants to test.
Full Text (Mike and Joel) - UI feedback from Martin; next steps. Any challenges or anything needed to move forward?
1. Nothing new to report since Tech Meeting.
Faceted search and metadata creation for images (Ari and Joel) - From Ari: One of the things I'd like to suggest to BHL is to provide a "beta search" that provides best guesses for scientific names of illustrations. When we'd talked about this in June, Mike explained this as hard but doable. I'd like to flesh out this idea further. In your mind, what kind of time and resources would be involved? How would this be different from the current search result model and upcoming full text search model? Do you see any benefits or drawbacks to this and why? An alternative would be to allow crowdsourced correction of these results rather than messy results, or a combination of the two (so corrections on an ongoing basis.) Any opinions on the above would be very useful.
1. Verify if we have page type as a filter for scientific name searches in full text requirements (CAS to check). Following the call, CAS checked requirementshttps://docs.google.com/document/d/1uEQcZPfGSUjuW_xSvVZQNXQ4lkLSZ3C-cgEHCTa8blM/edit?usp=sharing and found the following mentions:

Facets (p.3)
Fields to facet on are (in this order?)
Type (Book, Journal, Field Books, Article, etc)
Author
Date of Publication
Contributor
Subject
Language

Within-book Searching (p.4)
The results will include links to the page of the book that contain the search term along with a text snippet with the search term highlighted. The results will also include facets for Page Type and Scientific Name.

INDEX 2: Pages (p.7-8)
Includes the text of each page.
Names are attributes of pages.
Enables drilling into a given book/segment to find pages matching search term.
Enables searches for a name on any page in any book.
Enables faceting pages by scientific name or page type.
Search term highlight within the search results

A request came into add "suppl." to the piece types in Macaw for page-level metadata. These are sent to IA in the scandata.xml <piece> info. A) Should we? B) How will BHL react?
1. Issue, part, piece. Request to add suppl. as an option.
2. CAS - check with Bianca and cataloging group.
3. Hard-coded 3 possibilities in paginator. Issue, number, part. Pre-fix and pick the value. Don't think it would be a huge deal to add one to the drop-down. Is it displayed anywhere? Don't think it's displayed anywhere. But might be used to resolve things. Not sure if it's used at all.
4. Joel going to look for an example of it.
5. Susan - I think we export the value, only the numeric portion in a citation. Cascading subdivisions - what you label it is not as important as a cascading subdivision.
6. Mike - it may not be as simple as adding to drop-down. Not sure how data harvester pulls from IA.
7. CAS - send around the request from Chloe (sent)
HTTPS - wondering if we have a regression test bucket? If we have a full set of tests that need to happen when certain changes happen. Going forward we'll want to think about impact. Regression test bucket with as many tests automated as possible. Usually you design those as part of the development process. We could identify what we found as failures this time. Normally as part of development process. Some place where people can see them. Then automate them. If we had this, we could recruit additional people to run the tests. Everything that can be automated. Test Plan document we can start with.
Author Names - added BHL Cataloging Group to RACI to be consulted on authorities for author names. Also, OpenRefine has reconciliation capabilities. You can have a column (like in a spreadsheet), run code, compare values with external data source could be VIAF. Might be good to run against BHL author names. One strategy to consider as we're building out the tasks and requirements for this work package.