Hathi Trust
Back to
Collections Committee Notes
Jan 25 2017
Attendees:
me, Connie Rinaldo, Kelli Trei
Mike Furlow - Exec Director HT
Heather Christensen - Fed docs & collections (broader collections)
Lizanne Payne - Shared Print Program Officer
Sandra McIntyre - Director of Marketing?
HT folks felt BHL Collections CMTE inquiry more related to operational analysis vs. collections CMTE related analysis
HT wants to know What are we (BHL) are facing?
collections survey report put out in late 2015, to time to get analysis completed; most useful outcome = Board wants to focus on books & book-like objects based on where they are now, best to focus on where they have resources/strengths
gap-filling is key
fed docs collection = best ex. of targeted collection development; all other collections for HT are priority #1 mass-digitization
HT now looking to take stock
Collections CMTE going through personnel changes, they have new chair now
metadata: strategy, use & quality
curation: corrections process & workflow, improve quality
collection analysis: 1) Lizanne looking at what holdings members have over HT collection
2) Heather: more qualitative in terms of what gaps are in the fed docs collection?
focus on book-like: HT has stopped short of archival materials b/c of lack of MARC records, they do no editing of MARC
Fed Docs Project
what do we have? don’t have? want to get?
HT has lots of depository libraries for Fed docs; over 760,000 docs!
developing U.S. Fed Gov Doc registry for the past 2.5 years
goal for registry is to
define universe of gov documents published
collected records from 40+ libraries
now working on dedupeing and consolidating @ work level (book, piece level)
currently have 5.3 million records to deal with; dedupeing and consolidating match is FUZZY at best
hoping to map registry against Hathi Trust collection to assess gaps that need to be fulfilled
analyzed Hathi Trust collection based on identifiers (SUDOC), provenance, usage
goal to identify key titles for comprehensiveness
operational goal now to explore processes to help fill gaps
based on string matching, name authority ==> have produced picklists for some partners to direct their scanning
running reports based on registry and partner holdings data
Shared Print Program
(Lizanne is head of)
Focus on monographs in 2 phases:
- short term = quick launch - library management issue to achieve retention commitments for things in HT
- long term = substantive collection analysis
All HT library partners provide print holdings data
HT matches against their digital holdings
Members pay based on dedupe calculation...OCLC number based = "broad brush approach"
using estimate reports to help members prioritize
make it easy for members to match holdings against broad criteria
very rudimentary, not bibliographic metadata or subject metadata based
if no OCLC number then no match opportunity
HT focus is on mass digitization - minimal touch, ingest @ high rates, few thousand/day
operations based on this mass digi model so haven't addressed more careful curation activities as part of operations (as BHL has)
What role can HT play? but also commercial market
??Greenglass tool?? - (by "sustainable collection services" company ??? bought by OCLC)
Informed HT that BHL working on ideas for tools via NDSR 1) collection analysis 2) best practices & tools - what's out there that BHL could use?
2 different types of analysis for mono print holdings w/ MARC data
A) across institutions
B) performing deep analysis within a single institution
compare holdings to WorldCat
*biodiversity collections heavily serial focused*
HT Members said Fed docs were a high priority so invested in registry creation
enum/chron issue w/ serials "licks us every time"
HT curious to know: who else has these problems? asking these kinds of questions?
CRL have done a couple of projects on shared print = Agriculture & Law
*metadata issues for how you compare collections* - don't we know it?!
WorldCat does not enforce conformity - EU libs careful about merging digital libraries
TRAIL as example Kelli suggests
HT to follow up in 2-3 months re: 1) metadata strategy & policy 2) collections committee 3) staff retreat
HT would like to hear about progress of NDSR residents as well