TechCall_22Aug2018
Agenda
For quick reference, BHL Tech Workplan available here:
https://docs.google.com/spreadsheets/d/1Efl9Ju4bjAvYwt47_Wm2tS2ZHY6LHPIxqNN0B1CZ9E0/edit?usp=sharing
Active Tech Topics:
Full text search
Update on Advanced Search changes
- Additional feedback from BHL Ad hoc call: Gemini 79766
- Full Text - radio buttons for exact phrase vs all terms. Disregard others.
Update on API v.3
- Status
- Beta version went out. Susan began testing and sent around some testing. We don't return rights holders for parts. Many of the parts are in (c) so returning that would be important.
- API is returning wonderfetch IDs, we filter out with public interface but coming back with API.
- For Joel - wasn't able to run python against beta because of invalid security. Security certificate is in process. Joel will see if there's a workaround in python.
- Anything needed from Tech Team?
- Policy discussion based on Mike's email
- Goal is to release sooner than later. Are we ready to set a target date for notifying API users of change? We can push out to production and no one will know; we can announce when it fits our schedule. Carolyn coordinate with Grace. Aimin gofr around September 10
Link to the old interface
Once full text refinements near completion, Tech Team check in to set date for removing link (aiming for around 6 month mark after the May 7 launch, i.e., Oct 7)
Wikis
Any updates on public and private wiki migrations?
Updated the schedule for private wiki; preview-able private wiki in early September.
About BHL coming along.
Metadata Model
Index Data was selected. We'll be scheduling an in-person focus group which we'd like for you all to attend. There will be some travel support; more details coming soon. In the meantime, we'll be sharing the current metadata model and NDSR report. Mike could you point me to the most up-to-date version of the metadata model?
GitHub
Would be good to look at Gemini tickets that have to do with the data model. Specifically CLIR issues.
Transcriptions
- Status update, if any
- Anything needed from Tech group at this point?
For reference, notes from prior discussions on transcriptions:
- Mike's notes:
- Harvard is not good to go.
- MCZ did more than just clean up markup from the text and normalize the line endings... the format of the file exported from DigiVol was changed (BHL identifiers added, columns renamed, some columns removed).
- Presumably this was done based on the outcome of the meeting in late November at which the cleanup of transcription exports was discussed. The minutes/notes of that meeting are a bit incomplete/disorganized, so it is difficult to say for sure.
- The issue is that if the BHL import process is based on the files provided by MCZ, and then someone else wants to use DigiVol to transcribe items and submit them to BHL, then they have to follow the exact same process for transforming the DigiVol export files. At that point BHL is not really accepting DigiVol exports, but files based on DigiVol exports.
- Therefore, does BHL want contributors of transcriptions to change the format of the export files, or does BHL want the text cleaned up without otherwise changing the files?
- UPDATE: Ricc added data to his Smithsonian Transcription Center data files as well (a column for BHL identifiers).
- Perhaps the policy should be that data can be added to files, so long as the existing file columns/elements/structure is left intact. “Extra” data can be ignored by an ingest process, but removed or renamed data is a problem.
To revisit in future: options for batch ingest of transctiptions.
Follow up on action items from prior calls
- Mike: email response to Susan's API feedback - complete
- All: provide feedback on API discussion, following Mike's email, to help keep with wrapping up version
- Joel: formulate a response to Dima to reiterate our request of what's needed on our end
- Carolyn: Regarding 61034, determine if we want Supplement or Suppl.
- Susan: Share the notes from last Cataloging Group meeting; another meeting tomorrow
- Carolyn: Check in with Bianca regarding Gemini ticket 61157
For more details (if needed):
- 61034: What will be required to add Piece Type: Suppl. to Macaw and Admin Dash? In Macaw, it's a simple enough task to add. A lot of this falls to Mike. Not difficult but does touch a lot of things. Carolyn and Mike to touch base in next week or two on if complete before or after work on Transcriptions. Carolyn: Suppl. spelled out or abbreviation? Susan - probably has to be abbreviated in public UI, just not enough space. Mike: Doesn't show up in UI.
- Mike looking into filter issues - Susan opened issue 79504 (See comment in transcription document). Filters seemed to not be working. In Chrome. Susan will also check in Firefox and MS browsers. Mike sees the problem in Chrome now. Doesn't look like it's a browser issue. If log files had incomplete data because it was entered as code was still evolving... might be wishful thinking. Will review once transcriptions underway.
- For June 25 or later: Susan will consult with Cataloging Group on status of author name work. (They have a call in about two weeks from now; Tech Team will revisit after that call). Group met about a week and a half ago. Still needs to be ironed out. Meeting next week. Note: WikiData group are submitting lists of discrepancies between OCLC and LC numbers; actively scrubbing the data. Meeting scheduled for Thursday of this week.
Susan will share the notes from last meeting. Nothing actionable at the moment. - Carolyn updated Gemini ticket 61157 to look Bianca back in on date issues with MODS exports (for when Carolyn returns, after 7/16).
To revisit after Transcriptions:
- Moving Walls (58083) - links to content on publisher/rights holder website that is more recent than what would be available in BHL
- Decision: (from discussion with Martin) To be worked on after Transcriptions, before DOI assignments.
- 60922 : Mike will create an indicator at the item level in the admin dash to prevent ingest of article metadata from external sources such as BioStor. Involves database changes, changes to Admin site, and BioStor processes. For revisiting after transcriptions.
- Searching by identifiers: Susan and Joel to discuss offline on search needs for identifiers such as TL-2 and Soulsby numbers
- FYI for now: RBG Kew and others - PDF emails are getting delayed or marked as spam. Working on identifying the cause of the problem. Initial indication is that the problem is on the receiving end. SI is workign on making things look less like spam, e.g., digital signatures. Might help alleviate the problem once it rolls out.
- RSS feed for material added to BHL over the weekend; only 39 things added. Harvest is running long this week. Also a handful of errors in Macaw.
- Macaw - Problem that digitization tech reported, review complete, just hangs. Just letting it sit so Joel has a test case. Joel is going to try to replicate the issue and identify error.
- Mobile-first Indexing - Google announced that they have switched to using mobile-first indexing. This does not change our ranking and we don't need to panic, but with 1/5 of our users on some sort of mobile device, we should probably think about how to move to mobile.