Repository Needs June 06 Draft

RepositoryNeedsQuestions.doc

For the June 12 Meeting, discussion needs to focus some on the specs for the Metadata Repository building on what we have learned from the three libraries extraction to OCLC.

This document attempts to find all the various points raised in conversations and documentation that might be relavant to the Metadata Repository. This is a working Draft! Please do edit it for clarity and fill in for omitted information.

Thanks,

Suzanne

BHL Metadata Repository Needs and Questions Document
Metadata repository first stab by OCLC
Participating Libraries: MBOT, NH London, SIL.

Our first and foremost goal was to see if the data could be merged together in some way that worked. OCLC volunteered to take on this task and all three libraries have sent some data to them.

Goal one was achieved! Next will be to add 5 more libraries.

· We need to give some specific guidelines to the remaining 5 libraries about how we want their data
· We need to have a clear picture of what we will be doing with their data.

Approaches to this first phase:
1) What does this data show about our collections (analysis)
2) How can we use this data for BHL
3) We need to decide on terminology for things like “item”, “item-level”, “volume” etc.

Analysis of the data:

· Raw totals
- o Number of title level contributed to date (even duplicates)
- o Item level from what you can tell even if we know it doesn’t include a whole library.
- o Can you tell analyzed works? Descriptions that are on the series or serial level AND then separate records for unique titles within these “sets”?
· Format received worked?
- o What else needs to be outlined before the next 5 libraries send their records?
- o Do we have a the “kinds of” data that was universal to all three data sets that we want to make sure we get from the next 5?
· Sorts
- o Sort on topics (how can this be done?)
  - § Call number
  - § Subject
  - § Shelf indications from some libraries
  - § ???
- o Sort on locations (can we keep “ownerships” with each title)
  - § Title level
  - § Holdings level (One library has almost entire run – missing items located at second library)
- o Sort on date to indicate what is out of copy right and ready to roll for scanning and what needs to be held off.
- o Sort by some other criteria that would indicate priority of scanning
  - § Associations that will allow us to scan
  - § Institutions that will allow us to scan
  - § Titles located in Neave and Sherborn
  - § Other?
· Serials
- o How are the holdings from the three libraries – What do they look like?
- o What do we want to request from the next 5 libraries
- o What percentage of titles did have holdings
- o What percentage had some kind of holdings
- o What percentage had no holdings data
· Item level information storage and fetching (needed from scanning stations in current workflow model)

Things we’d like to know:

· Duplication algorithm
- o How is this working? What fields are considered? How much exactness is needed for a match? To what level are we “sure”?
- o Is this running also against WorldCat? How much duplication is being found there?
- o Grouping of “almost matches” or “like” items?
- o FRBR applicable?
· Clean up after computer work – And all that it entails from assignment of the work to criteria for the work – and even what work needs to actually be done.
· Preferred record qualifications and qualities? And everything that implies.
· If incorporated into the giant WorldCat, how will there be a unique BHL label be done for filter or searching.
· What topics were included by one library but not the others? How will we inform other 5 libraries of topics we would like them to send?
· What formats were included by one library but not the others? How will we inform other 5 libraries of formats we would like them to send?

Future needs of the data:
Organization of scanning sites:

· BHL will need to use the analysis of the data to possibly prioritize /locate the records that will need to be scanned.
· Keeping ownership is important so that we know whose record is where and which scanning station can be used.
· Is keeping ownership important for other needs?

Attaching metadata to scanned object
Scanning stations will need to connect the digital object to the descriptive metadata

·Fetching of the data with specific unique identifier searches (barcode level on bound items Z39.50 searched against the metadata repository.)
· Can OCLC allow us this kind of granular searching – storing the barcoded information in a Z39.50 exposed field?
· Can OCLC deal with 400 fetches a day from 3 scanning stations based on this granular request? (estimated two 8 hour shifts per day)

Attaching automatically generated metadata

· from the scanning station into the descriptive metadata
· or coordinating the conjunction of the two into METs
· generate granular metadata through the various parts of a text (sections, divisions, etc.)
- o Incorporate Luratech and CCs?
- o Page turning and citation discover?

Attaching human edits

· Metadata repository needs to be able to be edited by humans

Updating of local records with data about scanned objects

· 856 tagging to link scanned item with BHL scanned product
· Other data needing to be “returned” to the participating libraries?

Updating of the BHL metadata repository

· What data needs to be kept “current”?
· How?

Assignment of GUIDs

· Levels requiring GUIDs
· How are the GUIDs formed/formatted
· Who needs these GUIDs?

Applying other tool sets to the data

· MBL/WHOI

Corpus of all Biodiversity literature in the BHL libraries

· All formats
· All areas of interest
· Even things never gonna be digitized

Digital Registries

§ What registries do we need to inform of the information we intend to scan?
§ Automated as part of the whole metadata repository creation?
§ Automated part of selection process from the metadata repository?

Final Look and Feel

· Metadata repository used only for tracking of scanning?
- o “Circulation” to check title out to scanner and return
- o Record issues discovered at scanning station
- o Indication of alternative workflow requirements
- o Other?
· Communication tool for linking together the various parts of the BHL to give a cohesive feel to the users of BHL
· Other uses
- o METs document description of scanned items
- · What level are we “mets’ing”?
- · Are we databasing and then only producing Mets on the fly
- · Are we producing Mets and creating database structure for other aspects?
- · Huh?
o Structure map information
o Other?

Harvesting Capabilities

· OAI?
· Other?

Preservation Issues

· Leave that to Internet Archive?
· Investigate PDF/A?
· Administrative rights data?