TechCall_23Jul2018
Agenda
For quick reference, BHL Tech Workplan available here:
https://docs.google.com/spreadsheets/d/1Efl9Ju4bjAvYwt47_Wm2tS2ZHY6LHPIxqNN0B1CZ9E0/edit?usp=sharing
Quick check-in
Active Tech Topics:
Full text search
Update on Advanced Search changes
Next step is to add new index for true catalog searches for one hit per title rather than volume.
- Re-indexing to enable fielded searching / metadata only searching
- UI change to enable toggling between full text and metadata
- Anything needed to continue moving forward? No.
Update on API v.3
Dependent on the index changes
Link to the old interface
Once full text refinements near completion, Tech Team heck in to set date for removing link (aiming for around 6 month mark after the May 7 launch)
Transcriptions
- Update on request for sample documents
- Harvard's are good to go with what Katie sent. Deadline end of next week
- NYBG: Susan L and Mike may need to discuss further. FTP always exports every page in digital object; no support for picking and choosing pages.
- SIA: pending?
- Anything needed from Tech group at this point?
- What sort of modifications to transcription tool export files are acceptable?
- Mike's notes:
- MCZ did more than just clean up markup from the text and normalize the line endings... the format of the file exported from DigiVol was changed (BHL identifiers added, columns renamed, some columns removed).
- Presumably this was done based on the outcome of the meeting in late November at which the cleanup of transcription exports was discussed. The minutes/notes of that meeting are a bit incomplete/disorganized, so it is difficult to say for sure.
- The issue is that if the BHL import process is based on the files provided by MCZ, and then someone else wants to use DigiVol to transcribe items and submit them to BHL, then they have to follow the exact same process for transforming the DigiVol export files. At that point BHL is not really accepting DigiVol exports, but files based on DigiVol exports.
- Therefore, does BHL want contributors of transcriptions to change the format of the export files, or does BHL want the text cleaned up without otherwise changing the files?
- UPDATE: Ricc added data to his Smithsonian Transcription Center data files as well (a column for BHL identifiers).
- Perhaps the policy should be that data can be added to files, so long as the existing file columns/elements/structure is left intact. “Extra” data can be ignored by an ingest process, but removed or renamed data is a problem.
To revisit in future: options for batch ingest.
Follow up on action items from prior calls
- 61034: What will be required to add Piece Type: Suppl. to Macaw and Admin Dash? In Macaw, it's a simple enough task to add. A lot of this falls to Mike. Not difficult but does touch a lot of things. Carolyn and Mike to touch base in next week or two on if complete before or after work on Transcriptions
- Mike looking into filter issues - Susan opened issue 79504 (See comment in transcription document). Filters seemed to not be working. In Chrome. Susan will also check in Firefox and MS browsers. Mike sees the problem in Chrome now. Doesn't look like it's a browser issue. If log files had incomplete data because it was entered as code was still evolving... might be wishful thinking. Will review once transcriptions underway.
- For June 25 or later: Susan will consult with Cataloging Group on status of author name work. (They have a call in about two weeks from now; Tech Team will revisit after that call). Group met about a week and a half ago. Still needs to be ironed out. Meeting next week. Note: WikiData group are submitting lists of discrepancies between OCLC and LC numbers; actively scrubbing the data. Meeting scheduled for Thursday of this week. Susan will share the notes from
- Carolyn updated Gemini ticket 61157 to look Bianca back in on date issues with MODS exports (for when Carolyn returns, after 7/16).
- To revisit after Transcriptions: Moving Walls
- (58083) - links to content on publisher/rights holder website that is more recent than what would be available in BHL
Decision: (from discussion with Martin) To be worked on after Transcriptions, before DOI assignments. - 60922 : Mike will create an indicator at the item level in the admin dash to prevent ingest of article metadata from external sources such as BioStor. Involves database changes, changes to Admin site, and BioStor processes. For revisiting after transcriptions.
- Searching by identifiers: Susan and Joel to discuss offline on search needs for identifiers such as TL-2 and Soulsby numbers
- FYI for now: RBG Kew and others - PDF emails are getting delayed or marked as spam. Working on identifying the cause of the problem. Initial indication is that the problem is on the receiving end. SI is workign on making things look less like spam, e.g., digital signatures. Might help alleviate the problem once it rolls out.
- RSS feed for material added to BHL over the weekend; only 39 things added. Harvest is running long this week. Also a handful of errors in Macaw.
- Macaw - Problem that digitization tech reported, review complete, just hangs. Just letting it sit so Joel has a test case. Joel is going to try to replicate the issue and identify error.
- Gemini ticket of someone asking when API would be updated.
Google has enabled mobile-first indexing for
http://www.biodiversitylibrary.org/ ... How are we going to respond to it?
MEETING NOTES
Full Text UI
Need more feedback on the Full Text search changes - Joel will review after the call today
API V3 is done but not documented.
Transcriptions
We can't accept modified files from the various software. We just want to content straight out of the software. XML from FromThePage is good. Others not so much.
Harvard is
not good to go. Mike will contact them. SIA may have added things but they may be okay.
PDF emails
In the past, we had reports of users not getting the notification email about the PDF being ready. There have been no reports since the last spate of them,
Long-running IA Ingest
The long-running ingest finished on Friday and the regular ingest over the weekend was finished by sunday morning.
Gemini ticket of someone asking when API would be updated
Joel will respond to this person
PDFs with no images
64 items in Macaw are hanging and they have only been partially loaded in
Mobile-first Indexing
Google announced that they have switched to using mobile-first indexing. This does not change our ranking and we don't need to panic, but with 1/5 of our users on some sort of mobile device, we should probably think about how to move to mobile.