TechCall_Nov14

Agenda

A couple of surprises came up while I was working to define articles in BHL as part of the EABL project. To summarize the problems:

In many cases, Rod Page’s code can’t find articles when the article start page lacks a page number in the page level metadata.
Much of the content in which we want to define articles is being digitized at the Smithsonian.
Many articles start on pages that are implicitly numbered but lack explicit page numbers.
The Smithsonian can’t enter implicit page numbers using the IA Scribe software

Macaw

Carolyn -- issues with selecting files for upload; Joel is working on updates, will be testing over next week or two, and then should be launched soon.
Uploading large items to Macaw (Trish):

Trish is trying to figure out a way to reduce the need for moving large files between diff locations in order to upload them to BHL via Macaw. Right now Mariah is grabbing files from websites as well as ftping them from publishers sites to a Dropbox folder. Trish would like to be able to grab these files directly via the upload functionality in Macaw but Macaw only allows grabbing files stored in a local folder. This requires TRish to download each file from Dropbox one by one to her local machine then upload to Macaw. This is very tedious when dealing with hundreds of files. Is there a way we can make this process more efficient?

Mariah and Trish both uploading via Macaw; Mariah's been putting page images but Trish is unable to grab from dropbox so needs to download to her computer. Files were so big, taking up all dropbox space. Files in Dropbox behind a password. Link to dropbox folder, can go without logging in.

Right now, Macaw requires image files to be locally stored to grab.

In future, maybe if you had a definitive url, that could work in theory. PHP is friendly to that.

Let's continue exploring, see if we can give Macaw a url to a zip file. Dropbox might allow to download a bunch of files with zip folder. Assuming don't need to enter username and password. Joel will look into.

Also adding a PDF to uploader.

Full Text
Need to pin down how we want the search to look.

In order to model index appropriately, need to have some sense of how want search to work.

Joel proposes keeping the tabs the way they are

Maybe add a full text tab

Maybe remove the Subject tab

Facets
How do field books fit into this?
Illustrations - facet, a page type

Do we show results at title level, or at volume level?

Business need - admin dash, for language

About Our Collections - we could add something like this, and include things like number of items in different languages
Could be dynamic and update itself

Hathi Trust
Appears to be item-level result sets
When you click through, there's a 'within book search' that's pre-filled with your search text
If we follow that model, two indexes - one for items, then another for showing which page(s) the term occurs on

so search over the OCR, and another search for within the book if we want to go with that model.

HathiTrust
Search Amphibian (full text)

"Illinois Landowners guide to amphibian conservation"
On first page of book, there's a search box on top of that page

Joel to write up more formally, draw up some wireframes and will send out in next couple of days.

Joel will send Mike specs on the server he wants to order. 4TB on the spinning disks to keep the OCR

From DLF, Hydra in a box - to consider for BHL version2