OCLCAnalysisConcerns
Issues Surrounding the OCLC Collection Analysis Tools as Applied to BHL
NOTES FROM CALL 3/16/2007 OCLC Analysis call.doc
Quality Review and Improvements to the matching:
- Can you provide in advance a flowchart which details how the records would be matched? We are not asking for the actual program code itself, but something as granular as "take subfield a in 245 field". If this fails, do such and such etc. This would help immensely when evaluating the success of the matching process.
- CDF: We didn't ask this specifically, but I'm guessing no. They mentioned that their matching algorithm was very complex, so might not be able to (willing to) publish it.
- We are primarily interested in matching pre-1923 material. Is the probable lack of numeric matching possibilities likely to make it prohibitively difficult to achieve?
- CDF: OCLC uses more than just ISSN/ISBN, and does many levels of string matching across titles.
- CDF: OCLC uses fixed fields in 008 for date determination.
- What data is hidden from the view that is currently given? Seems to lack some control numbers and granularity that is there in our system records. (The dirt and grim)
- CDF: There's more data than is displayed, but local control numbers are not included in this analysis data set.
Tracking of data after merger for processing in the BHL Portal:
- Can the local control numbers (if specified in a particular field per contributing institution) be included in the merged record so that we can use the resulting data in a more integrative way in our ILS.
- CDF: Not directly from the analysis dataset itself. The analysis lists OCLC titles with holdings from libraries within the collection "group" - it's not a deduping across our titles but instead a match of our titles to OCLC master records. If any given library's holdings are not up-to-date or complete in OCLC, they will do a free batch load. That library will then need to do a Reclamation Project to pull back OCLC ids into their ILS.
Post analysis: Subjects:
- How are you going to conduct this on records without a standard classification schema or subject headings? At least two (possibly more) libraries are in this position.
- CDF: Classifications & Subject Areas are pulled from call numbers. Call numbers are provided as part of the "Load/Reclaim" process.
- How granular can the subject analysis be to satisfy the queries of subjects (marine biology) or geographic distribution (species in South America) or ???
- CDF: Probably not as granular as EoL would like, and we have to be prepared to say "Our data cannot answer that question."
Post analysis: Member Identification and Holdings:
- Will there be a way to generate a pick list for each member based on the results? Will it be sortable by format, call number, and/or other ways required by each member to generate such a list
- Can we specify that item level and call number data is included as part of what is attached to the "matched" core record?
- Can you deal with sites who specify MARC holdings in different ways? e.g NHM would supply them in "bounced down" MARC fields through the export (ie via a single bib record), but other sites may include multiple bib records with separate holdings sections for each bib record where there is more than one holding for the title.
Post Product:
- MARCXML downloadable in bulk on complete set, subsets, and record by record
- CDF: From analysis we can get tab-delimited text file for a record or selection of records, but it is not MARC/MARCXML.
- Hiddend data explosed to BHL Portal for potential tracking of versioning
- Editable records that can be posted and reused by anyone anytime for any reason
- Intent to scan - Will this record the BHL Member's "bid" on planned scanning? Or should records be extracted from this and stored elsewhere as the "intent to scan" and bidded titles?
- CDF: We will have to rehost for "Intent to scan".
- Can the metadata be fed into the framework of the BHL Portal - live interaction? Or does it need to be harvested off OCLC and stored elsewhere to run the BHL Portal?
- CDF: No live interaction. Metadata will have to come from individual library ILS, not OCLC. Sebastian Hammer can help with this.
- Public list of our holdings to those coming to the collection of data from outside the Biodiversity communityr?
- CDF: No. Again, place for Sebastian Hammer to help.