TechCall_09Jul2018
Agenda
For quick reference, BHL Tech Workplan available here:
https://docs.google.com/spreadsheets/d/1Efl9Ju4bjAvYwt47_Wm2tS2ZHY6LHPIxqNN0B1CZ9E0/edit?usp=sharing
Full text search
Update on Advanced Search changes
Next step is to add new index for true catalog searches for one hit per title rather than volume.
- Re-indexing to enable fielded searching / metadata only searching
- UI change to enable toggling between full text and metadata
- Anything needed to continue moving forward? No.
Update on API v.3
Dependent on the index changes
Moving the link to the old interface to page footer
Report out on Action Item from 7/2: Joel and Carolyn to revisit with Martin on potentially removing altogether and if so at what time. To have link perform as intended will require a lot of search-page specific code that normally functions independently of page being viewed. Header and footer intended to be frame so that particular link is dynamically built link. So very much tied to search functionality itself.
Can remove the link around 6 mo mark; we'll check in once closer to wrap up of those advanced search changes.
Transcriptions
- Update on request for sample documents
- Harvard's are good to go with what Katie sent. Deadline end of next week
- NYBG: Susan L and Mike may need to discuss further. FTP always exports every page in digital object; no support for picking and choosing pages.
- SIA: pending?
- Anything needed from Tech group at this point?
Ampersands and semi-colons - what gets stored? Those are xml formats, will use xml tool to interpret correctly
XHTML might be better format for Mike. Might make it easier to pick the text out; not yet an absolute decision but leaning that way.
Realizing every single item will need to be processed separately. In St. Louis, believe Ricc had provided a spreadsheet with each row being a BHL item. In re-reading specs, does not seem to be batch capability. That's the way it's written now.
400 items from NYBG for Torrey, 30 minutes each, so about 6 weeks.
If we want to do batches, we would need item identifiers in the files. There would be other issues, too.
FTP gives ability to export single item or entire corpus, so those are the two options. For corpus, in one monster file.
To revisit: options for batch ingest.
Follow up on action items from prior calls
- 61034: What will be required to add Piece Type: Suppl. to Macaw and Admin Dash? In Macaw, it's a simple enough task to add. A lot of this falls to Mike. Not difficult but does touch a lot of things. Carolyn and Mike to touch base in next week or two on if complete before or after work on Transcriptions
- Mike looking into filter issues - Susan opened issue 79504 (See comment in transcription document). Filters seemed to not be working. In Chrome. Susan will also check in Firefox and MS browsers. Mike sees the problem in Chrome now. Doesn't look like it's a browser issue. If log files had incomplete data because it was entered as code was still evolving... might be wishful thinking. Will review once transcriptions underway.
- For June 25 or later: Susan will consult with Cataloging Group on status of author name work. (They have a call in about two weeks from now; Tech Team will revisit after that call). Group met about a week and a half ago. Still needs to be ironed out. Meeting next week. Note: WikiData group are submitting lists of discrepancies between OCLC and LC numbers; actively scrubbing the data. Meeting scheduled for Thursday of this week. Susan will share the notes from
- Carolyn updated Gemini ticket 61157 to look Bianca back in on date issues with MODS exports (for when Carolyn returns, after 7/16).
- To revisit after Transcriptions: Moving Walls
- (58083) - links to content on publisher/rights holder website that is more recent than what would be available in BHL
Decision: (from discussion with Martin) To be worked on after Transcriptions, before DOI assignments. - 60922 : Mike will create an indicator at the item level in the admin dash to prevent ingest of article metadata from external sources such as BioStor. Involves database changes, changes to Admin site, and BioStor processes. For revisiting after transcriptions.
- Searching by identifiers: Susan and Joel to discuss offline on search needs for identifiers such as TL-2 and Soulsby numbers
- FYI for now: RBG Kew and others - PDF emails are getting delayed or marked as spam. Working on identifying the cause of the problem. Initial indication is that the problem is on the receiving end. SI is workign on making things look less like spam, e.g., digital signatures. Might help alleviate the problem once it rolls out.
- RSS feed for material added to BHL over the weekend; only 39 things added. Harvest is running long this week. Also a handful of errors in Macaw.
- Macaw - Problem that digitization tech reported, review complete, just hangs. Just letting it sit so Joel has a test case. Joel is going to try to replicate the issue and identify error.
- Gemini ticket of someone asking when API would be updated.