TechCall_16aug2018
Action Items
- Mike: email response to Susan's API feedback
- All: provide feedback on API discussion, following Mike's email, to help keep with wrapping up version 3
- Joel: formulate a response to Dima to reiterate our request of what's needed on our end
- Carolyn: Regarding 61034, determine if we want Supplement or Suppl.
- Susan: Share the notes from last Cataloging Group meeting
- Carolyn: Check in with Bianca regarding Gemini ticket 61157
Agenda
For quick reference, BHL Tech Workplan available here:
https://docs.google.com/spreadsheets/d/1Efl9Ju4bjAvYwt47_Wm2tS2ZHY6LHPIxqNN0B1CZ9E0/edit?usp=sharing
Active Tech Topics:
Full text search
Update on Advanced Search changes
- Re-indexed to enable fielded searching / metadata only searching
- UI changed to enable toggling between full text and metadata
- Responded to input from Grace
Update on API v.3
- Status
- Anything needed from Tech Team?
- Policy discussion based on Mike's email
- Target date for notifying API users of change - too soon to set date.
APIs could be a Collections as Data grant; they're not very large grants. Cohort. Max of $80k - Go ahead and get Version 3 out sooner than later. All, respond as soon as possible to help Mike wrap up.
Link to the old interface
Once full text refinements near completion, Tech Team heck in to set date for removing link (aiming for around 6 month mark after the May 7 launch, i.e., Oct 7)
General
- Did the last ingest run long? Hasn't run long since those two weekends in July. Last week's run finished a day earlier.
- In late July, there were 64 items hanging in Macaw and only been partially loaded. Have those been completed? Believe all set here.
Global Names
- Update from Dima and any next steps on our end?
Dima is still working on name finding. Don't see this as a short term priority as he's continuing to develop.
We'll first need to look at backwards compatibility or compatibility to BHL in general.
MRK - have a conversation sooner to keep reiterating our request. Joel will formulate a resonse and handle the interaction the Dima
Transcriptions
- Status update - nothing to report
- Anything needed from Tech group at this point?
- For reference, notes from prior discussions on transcriptions:
We can't accept modified files from the various software. We just want to content straight out of the software. XML from FromThePage is good. Others not so much.- Mike's notes:
- Harvard is not good to go.
- MCZ did more than just clean up markup from the text and normalize the line endings... the format of the file exported from DigiVol was changed (BHL identifiers added, columns renamed, some columns removed).
- Presumably this was done based on the outcome of the meeting in late November at which the cleanup of transcription exports was discussed. The minutes/notes of that meeting are a bit incomplete/disorganized, so it is difficult to say for sure.
- The issue is that if the BHL import process is based on the files provided by MCZ, and then someone else wants to use DigiVol to transcribe items and submit them to BHL, then they have to follow the exact same process for transforming the DigiVol export files. At that point BHL is not really accepting DigiVol exports, but files based on DigiVol exports.
- Therefore, does BHL want contributors of transcriptions to change the format of the export files, or does BHL want the text cleaned up without otherwise changing the files?
- UPDATE: Ricc added data to his Smithsonian Transcription Center data files as well (a column for BHL identifiers).
- Perhaps the policy should be that data can be added to files, so long as the existing file columns/elements/structure is left intact. “Extra” data can be ignored by an ingest process, but removed or renamed data is a problem.
To revisit in future: options for batch ingest of transctiptions.
Follow up on action items from prior calls
- 61034: What will be required to add Piece Type: Suppl. to Macaw and Admin Dash? In Macaw, it's a simple enough task to add. A lot of this falls to Mike. Not difficult but does touch a lot of things. Carolyn and Mike to touch base in next week or two on if complete before or after work on Transcriptions. Carolyn: Suppl. spelled out or abbreviation? Susan - probably has to be abbreviated in public UI, just not enough space. Mike: Doesn't show up in UI.
- Mike looking into filter issues - Susan opened issue 79504 (See comment in transcription document). Filters seemed to not be working. In Chrome. Susan will also check in Firefox and MS browsers. Mike sees the problem in Chrome now. Doesn't look like it's a browser issue. If log files had incomplete data because it was entered as code was still evolving... might be wishful thinking. Will review once transcriptions underway.
- For June 25 or later: Susan will consult with Cataloging Group on status of author name work. (They have a call in about two weeks from now; Tech Team will revisit after that call). Group met about a week and a half ago. Still needs to be ironed out. Meeting next week. Note: WikiData group are submitting lists of discrepancies between OCLC and LC numbers; actively scrubbing the data. Meeting scheduled for Thursday of this week.
Susan will share the notes from last meeting. Nothing actionable at the moment. - Carolyn updated Gemini ticket 61157 to look Bianca back in on date issues with MODS exports (for when Carolyn returns, after 7/16).
Statistics
MRK - from NDSR, looking at Tableau for displaying statistics in a more approachable, public-facing way. How accessible is content behind Admin Dash for statistical type things?
Some reports can be exported by CSV but no API inputs on that at this point. So currently someone would manually go in and grab CSV reports.
About
Last accessibility scan, hoping up on Monday. We are not planning to link from the website yet.
- To revisit after Transcriptions: Moving Walls
- (58083) - links to content on publisher/rights holder website that is more recent than what would be available in BHL
Decision: (from discussion with Martin) To be worked on after Transcriptions, before DOI assignments. - 60922 : Mike will create an indicator at the item level in the admin dash to prevent ingest of article metadata from external sources such as BioStor. Involves database changes, changes to Admin site, and BioStor processes. For revisiting after transcriptions.
- Searching by identifiers: Susan and Joel to discuss offline on search needs for identifiers such as TL-2 and Soulsby numbers
- FYI for now: RBG Kew and others - PDF emails are getting delayed or marked as spam. Working on identifying the cause of the problem. Initial indication is that the problem is on the receiving end. SI is workign on making things look less like spam, e.g., digital signatures. Might help alleviate the problem once it rolls out.
- RSS feed for material added to BHL over the weekend; only 39 things added. Harvest is running long this week. Also a handful of errors in Macaw.
- Macaw - Problem that digitization tech reported, review complete, just hangs. Just letting it sit so Joel has a test case. Joel is going to try to replicate the issue and identify error.
- Mobile-first Indexing - Google announced that they have switched to using mobile-first indexing. This does not change our ranking and we don't need to panic, but with 1/5 of our users on some sort of mobile device, we should probably think about how to move to mobile.