BHL
Archive
This is a read-only archive of the BHL Staff Wiki as it appeared on Sept 21, 2018. This archive is searchable using the search box on the left, but the search may be limited in the results it can provide.

Ingest Issues for Consideration

See the BHL Beta site for the ingest test

Please use 4 "~" (tildas) in a row to create signature for comments

Back to Ingest Analysis Summary

Ingest Decisions

Table of Contents

See the BHL Beta site for the ingest test
Back to Ingest Analysis Summary
Ingest Decisions
Duplication of content in BHL
Dupes in scanning workflow
Contributors
Metadata Quality
Image Quality
Resource Allocation
Scope
Search/Browse
Controlled Vocabulary
Ingest Methodology
10-18-2009 Email exchange between Tom Garnett and Suzanne Pilsk
Consideration
Benefit
Problem

Duplication of content in BHL

duplicates in collection provide more opportunities for users to find what they need -- more titles/author metadata to search on; marginalia ok; different ed.
Users frustrated by too many duplicates of the same title; how many duplicates are too much
- dougholland dougholland Oct 8, 2009I also see rampant duplication as a potential concern for funding agencies.
Solutions
Comments
- Pilsk Pilsk Oct 1, 2009Option: Continue to de-dup against our brothers and sisters in BHL. Let duplication from ingest happen.
- Pilsk Pilsk Oct 1, 2009Right now, things are in IA that we are not dedupping against. Moving into the BHL will not be that different except titles will be put with ours.

- Pilsk Pilsk Oct 1, 2009Benefits of new stuff out weighs duplication.

- Pilsk Pilsk Oct 1, 2009New search interface potentially could address users concerns of seeing multiple copies.

Dupes in scanning workflow


  • we will duplicate scanning of content already provided for free
  • current deduplication tools cannot support
Solutions
Comments
- Pilsk Pilsk Oct 1, 2009Continue to dedup against ourselves for our workflow
- Pilsk Pilsk Oct 1, 2009We don't spend funders money on duplicating our own stuff
We know our tools need to be improved but this is a big problem that will not be solved in a timely fashion.
- Pilsk Pilsk Oct 1, 2009Stop dedupping completely
- Pilsk Pilsk Oct 1, 2009Not my first choice but maybe go back to some more general way of "bidding" of topics

Questions/Comments
- d_wheeler d_wheeler Oct 1, 2009Chris, I believe that this issue over duplication and de-duplication that Keri and John are talking about has to do with the BHL libraries selecting materials from our own collections for scanning into BHL. There has to be a mechanism for us to identify materials from outside BHL libraries that have been injested into BHL so that we don't scan the same thing again, using up the funding allocations without advancing the content. The current tools available to BHL libraries to assist with this work will not be useful for this purpose, making the identification of the non BHL titles extremely cumbersome and time consuming : essentially a title for title search of the BHL portal for any packing list sent to IA.

Short Term Solution

Long Term Solution


Consideration
Benefit
Problem

Contributors

New pool of contributing members. BHL strengthens its position as one-top-shop for biodiversity content online.
How to display contributor information? Current alpha display clutters contributor list, obscuring BHL participating members
Solutions
Comments
- Pilsk Pilsk Oct 1, 2009Move the drop down to an "advance" search
- Pilsk Pilsk Oct 1, 2009Does this have to wait until we have a new interface? Can it be done sooner than later?

Questions/Comments

Short Term Solution

Long Term Solution

Consideration
Benefit
Problem

Metadata Quality

Addition of good quality metadata to the collection increases access for users
Poor quality metadata will further complicate existing metadata synchrony issues.
Solutions
Comments
- Pilsk Pilsk Oct 1, 2009Ingest. Don't let this concern stop the work. Encourage Gemini reporting of metadata concerns
- Pilsk Pilsk Oct 1, 2009Establish workload of metadata clean up as Gemini reports are created.

Questions/Comments

Short Term Solution

Long Term Solution

Consideration
Benefit
Problem

Image Quality

Addition of new pages, increases BHL collection and gets us closer to our page count goals. More content available for the name services we provide and data mining opportunities.
Ingested scans of poor quality will only frustrate our users; access to bad scans from non-BHL member libraries will be difficult to "fix" if at all
Solutions
Comments
- Pilsk Pilsk Oct 1, 2009Ingest and encourage Gemini reporting of missing pages, bad scans etc.
- Pilsk Pilsk Oct 1, 2009Gemini will help us determine workload of dealing with these issues as reported. Decisions can be made to take down an entire title, redirect users to ILL services, potential new service of "digital" ILL that BHL provides. Hard to throw stones at ingest quality when few titles supplied by BHLers are being quality reviewed.

Questions/Comments

Short Term Solution

Long Term Solution


Consideration
Benefit
Problem

Resource Allocation

Ingested content is free and allows the BHL to grow without spending $$ for digitization
Staff resources need to be reassessed in managing considerations listed above, namely deduplication, image quality (QA), & metadata quality (portal editng)
Solutions
Comments
- Pilsk Pilsk Oct 1, 2009Doesn't seem to be a real issue
- Pilsk Pilsk Oct 1, 2009Use Gemini tracking of reporting to establish true workload issues

Questions/Comments

Short Term Solution

Long Term Solution


Consideration
Benefit
Problem

Scope

BHL expanded to include wide range of fields that relate to biodiversity; expansion of user base.
Irrelevant content cluttering collection
Solutions
    • Comments
- Pilsk Pilsk Oct 1, 2009Possibly not a real problem with the qualifications imposed on ingest
- Pilsk Pilsk Oct 1, 2009Search and display may elevate some of these issues. Smithsonian is use to having an ILS that has a wide range of topics. To my knowledge, no user has complained that when searching for one topic, an "out of scope" record appeared.

Questions/Comments

Short Term Solution

Long Term Solution

Consideration
Benefit
Problem

Search/Browse

Known-item searches conducted anyway; users benefit from BHL as is, improvements to come with development of CiteBank
More content = decreased precision & recall

Questions/Comments

ST Solution

LT Solution

Consideration
Benefit
Problem

Controlled Vocabulary

Non-issue: no controlled vocabulary at present in BHL
lack of synchrony with author/title lists makes it difficult for users to find related items

Questions/Comments

ST Solution

LT Solution


Consideration
Benefit
Problem

Ingest Methodology

As collection increases with regularly ingested content so will our BHL subject headings & call. nos. -- do we consider these additions helpful in expanding our scope?
OR are the additions of subject headings and call nos. as a result of ingest only adding to scope "creep"?

Questions/Comments

Short Term Solution

LT Solution**

10-18-2009 Email exchange between Tom Garnett and Suzanne Pilsk