TechCall_23apr2018
Agenda
Full Text Search
Update on last week's work (including agenda items from Mike's email):
- Ingest issues - new indexing processes, needed to enhance change auditing so can re-index when anything about a book changes. Someone did a bunch of collection maintenance which flagged those as changed. The process that uses METS files uses those tables, so it saw those as well. So it went ahead and uploaded changes to items, so it slowed down ingest process. Either auditing needs to change (a big job) or we could consider what ingest does, if item in IA changes, it downloads files for that item, so we have the latest. We could potentially skip that step. Ramifications for stopping it? The book viewer in order to properly size the pages, believe it looks at locally stored copies before looking at IA. So if there were an insert of pages at IA it could potentially pull wrong dimensions. Let's continue as is with understanding that sometimes happens. Review auditing is a nice-to-do in future. For now, monitor as we go forward. 2,000 items that are out there without MARC records. Couple of years... Also a drag to ingest process. Secretariat is in conversations with Anne-Lise and Adriana. 1,300 of EABL field notes from a vessel (Dan Moore?) from North Carolina. those may have been changed to no longer cause trouble; is that correct? CLIR, about 550 sitting there. IMLS - could be EABL. South Africa about 400 right now. Susan L had started looking at adding MARC records for South Africa items. Would have been fairly easy to add MARC records for Anne-Lise. Will reconvene and possibly take up on that offer.
- OCR sync - complete.
- I have not been able to work on APIs at all this week. How important is it that the updated APIs be in place for the launch of the new-and-improved search? Five or six of about forty total API methods are affected. Historically, about 3% of all API requests are for one of those Search methods. Nothing will break if we leave it as-is, those methods just will not be taking advantage of ElasticSearch.
MRK: We can go live without those being updated.
- I have identified some new Admin functionality that would be useful in managing the new search indexing processes. Most likely this will be sysadmin-level functionality, only visible to a select few. This was not on our original schedule, and IMO does not need to be ready for launch. Still, I'd like to have time to build these tools before moving on to the next big task.
Probably post-launch. Tie up a few loose ends.
- Update to the discussion about hack to merge of subfields in MARC 600 and 611 fields. Internal parens in a subject cause a bad URL failure. See BHLFEED-61226. Generating invalid URLs; parens need to escaped. Should be easy fix. Revisit following launch of full text. Impacts NEH grant. Can be moved to top of list for as soon as live with full text.
Last Monday, Cataloging Group started merging records to take care of authority problem. We really need to consult with Mike that they merge in such a way that names are matched successfully.
Work planned for this week
- Observing indexing processes and making necessary tweaks to it. When an item is unpublished in BHL and needs to be deleted in ElasticSearch index, very slow to do a delete because involves page deletes from index, too.
- APIs
Any other topics
Beta site. Data is old, When try to pull up archival material using Correspondence. Mike will look into this week.