TechCall_10apr2017

Agenda

Collections Visualization (Alicia Esquivel)
There is a researcher, Charles Julien, who has done work on LCSH tree modeling for browsing and searching collections. He worked with approximately 131,000 bibliographic records from the science and engineering library at McGill University. He matched the subject strings in the bibliographic collection to authority records. He then restructured the matching subjects into a tree structure by creating a single root node, getting rid of cycles, and only allowing each node to have one parent node.

I am wondering if this method could be applied to BHL collection to use as a visualization of collection coverage. It would not be a perfect representation of the collection, but could be useful or interesting for finding collection gaps.
See also the correspondence forwarded April 3.

Created some things in C#,
Pretty much all BHL written in C#

How would we access the MARC XML records with the hierarchical ?

When we move to production, in production things get merged around. Can be very hard to put together original MARC

Could be useful for some analysis. Wouldn't necessarily exactly map. So if we were to show it to public?

Does import database include hierarchical?
Sort of. What's in the MARCXML isn't exactly what's in MARC

BHL has always broken them up.

A couple of people at NYBG are concerned that incoming subject had a personal name and life dates, grouped together in MARC, BHL split apart. Goes back to Botanicus

This has occasionally come up over the years.

In DPLA metadata guide, rationalize splitting subjects split than search was better. Removed the dependency on order.

Call numbers or subject headings?
Haven't been looking at call numbers. Others on Collections Committee have before
Are you looking at monogrsphs or serials? Both

Do we always have a call number? Depends on what's in the MARC

Would it be worth doing without worrying about the hierarchies?

Not in tree view, but similar
Would have to use a different tool

Brief abstract - sounded like there was an option to lob it off, until you do find a match

Then just a list of keywords and not a lot to make a visualization of content.

We do have OCLC numbers, could be a way to pull subject headings using that?

There are also a bunch of Gemini tickets about ingesting so much botany content, it's difficult to identify the zoology ones. Folks are requesting that we separate in buckets for RSS feed.
Would we do that using subject string or the call numbers? Either

Pictures of prototype

What are some of the goals?
Focus future digitization efforts
What's already in BHL?
Trying to also assess what's not in BHL

Want to create visualizations based on geographic, temporal, taxonomic coverage.

There are tools for full text visualizations.

Thinking about how to use what we have in subject headings to create visualizations.

Mostly a tool for BHL staff, that gives us the option of using what we have in import database. Recognizing it's not going to be a perfect map.

Would be interested in working with that.

Have a conversation with researcher - is it something we could adopt?
How much of Mike's time would be required?
5 years old, hard to know how much would be involved

If Mike could see ahead of time, might be useful
Mike, Alicia, Susan Lynch,

Do BHL staff have any access to those tables?
More work than tables than publicly available. Not self service; go through Mike to get those.

Trish - let's just say we can figure the tool out; if we can get the visualizations,
Do we take what we find and compare to collections committee scope diagram

Maybe the density in some areas vs others.

Won't sufficiently show what we don't have necessarily.
Maybe just looking at things that barely or don't show up?

In order to know how much coverage we have, we need to know how many publications exist...
Another part of my project is looking at statistical probablility models to get an idea of size of literature
Might be possible to do smaller sample sizes
Not start with all of biodiversity but to take smaller steps
Put together, might be better able to tell us what we have and what is out there.

Ecology , Capture Recapture

One idea that's come up with Collections Committee on Comparing BHL bibliographic records to those in Hathi Trust
We have a collegial relationship with them; might be willing to share this kind of information with us

DPLA (Susan Lynch)
'I discovered that the thumbnail images that are included in the OAI feed used by DPLA don’t match the images that are displayed on the right side of the BHL bib page. For some of the EABL titles, the image we provide to DPLA is not the eye-catching image that we display on the BHL bib page.'
See also Susan's email thread of 4/7/17

We're in the final stages of getting the next DPLA harvest
When last harvested, there were no thumbnail images whatsoever
On the left side, for BHL items were a plain rectangle there
With recent mapping and re-harvest, the thumbnails were one thing to address
Thumbnails more important in a resource like DPLA
Assumed thumbnails would be the same as are shown in BHL

Even though page type of title page controls image in BHL, but it has no effect on image in DPLA

Bunch of EABL titles, had a bunch of colorful titles, Amphibian & Reptile

Column in item table; separate column in search index table. So the one shown in BHL is what we should be sending and exposing in our APIs; the other one is not so useful
Searches for Title Page, if doesn't find, uses the cover

We could live with the way it works currently
When partner library, when doing pagination, they like ot have control over image used.
For them to have explicit control, to go in admin dash and explicitly mark as thumbnail in BHL and DPLA
It would be enough if people (those at partner libraries doing pagination) understood how BHL makes the selection

Do we want to do this for the future?
It would be in code responsible for OAI feed. It would take the image for the Search Catalog.
Ran this by DPLA, if we made the changes, would it show up?

Every time they re-harvest, it's a full replacement not just updating changes
Might be every 2 or 3 months or so

are there any requirements on dimensions of images for thumbnails? not that aware of

do we rely on something to do re-sizing, we request them from IA. we request an approx size
Dimensions from images in search catalog would be equivalent to those in Item Table? They are the same? Will they look the same? Coming from same source

Server for Full Text
Quote from Dell, clearing up existing order..

Macaw Work on the way (brief update)
Moving into cycle of development. Tickets are managed in SIL's Gemini (not all are in BHL)

Copy Specific Information
Article/Segment Metadata
Further Error Checking
Other Loose ends

More space has been requested, waiting to hear back.

Why can't we assign articles in BHL without BioStor?
There can be a place for article definition in Macaw. Also a need for bulk definition without going through BioStor

How it might work.

In Macaw, it would be for new stuff going in. We can't require it because you don't always know if it's there

Susan will talk to Mike and Trish offline

Mike suggested to Marty at Cornell, and then it took on more momentum
So let's talk about it

Harvard Botany, using article metadata to define individual letters in correspondence

End of January, allowed users to multi-select pages and generate PDFs
Mike checked back and it looks like we'll be ok.
Has bounced back to level we were at before.

In the past, Macaw was sending description fields to IA and IA was also pulling descriptions. It's fixed!
Corrected and eliminated the dupes. Smithsonian local notes was one of the things being duplicated.