Collections Debate
A Philosophical Debate on Collection Development
-
Sep 3, 2009 Wiki editing hint (thanks to Bianca): when you insert comments, if you preface the comment with 4 "~" it will automagically insert your name and avatar.
Connie says, well, this didn't work for me. Easier to type my name. -
Sep 3, 2009 four tildes, no quotes space after ...
Which books to select for digitization & ingest (using the California Digital Library as an example)
The context for this discussion was ingests where BHL does not have to pay for the scanning.
BHL-E is having their members choose what content is germane for contribution to BHL-E. The discussion also assumed (rightly) that extensive subject term analysis and classification number analysis have been done based on the subject terms and classification ranges in the current BHL corpus.
BHL has carved out a niche with our "brand." Should we protect that and not 'junk up' our collection? One person pointed out that the majority of our access bypasses searching the Portal and goes straight from Google searches to an item in the Portal. They aren't seeing the "brand." [Henning: Therefore, we have to find a way attracting those people coming via Google and tell them the real value and functionality of BHL to see BHL as a brand. However, even regular users of BHL uses Google first to find a book because of the limited search functionality of the BHL Portal. Improving the functionality of the Portal will help building the brand.] [-
Sep 2, 2009 I don't believe that the BHL "brand" lies in the portal; I'm all in favor of the "best" portal, but feel that it is the content behind the portal that holds the true brand value. Portals as destinations are a losing proposition beyond a very hardcore and limited audience. If the audience is coming to the BHL portal via alternate means (e.g. Google, OCLC, etc.), we need to provide a set of coherent content once they get there (come for item x that you found via Google, stay for item y). This is what will set the BHL content "bucket" apart from "yet another digital library"
Connie: I agree with Martin here. The brand is the coherent, relevant content. I might add that coherent and relevant will be in the eyes of the user, to a great degree and would argue that the boundaries should not be too strict: see notes below about seed catalogues and serial runs. Boundaries that are too strict might also reduce the value of BHL for citizen scientists and those who link from EOL and may not be specifically and only interested in links to taxonomic names.
But, our libraries have broad content that has been developed over time; someone determined the book in question was relevant to add to the collection, even if out of scope for narrow description of taxonomy (Victorian decoration from MOBOT). If a library has horticulture or practical agonomy texts, why should that be a problem for uses who want strict taxonomic works? -
Sep 2, 2009 As we've demonstrated with the "found bibliographies", once we move beyond the core taxonomic community of 3-5,000 users, the ancillary materials that BHL participating libraries can provide that will make BHL useful for answering many of the questions that we often associate with BHL (e.g. ecology related). An example, seed catalogues would not be deemed "core" taxonomic works (strictly speaking), but as Doug Holland has pointed out, that's often the first citation for hybrids and variations.
-
Sep 3, 2009
We should not assume how the materials we scan into BHL are going to be used, or by whom. (wow, talk about censorship…….) It is our function to provide access to them. Ideas change, methods change, applications change (not the iphone kind……). Here is a quote from an article that John Mignault forwarded to me that speaks to this directly : “Paul Conway from the iSchool at the University of Michigan <
http://www.si.umich.edu/> gave a short presentation at SAA on how experienced researchers feel about our digitization efforts. In general, those he interviewed somewhat mistrust our selection activities. They wonder what was not chosen and why, and who decided it was not worthy. He’s working on some articles that will express these assertions through more scientific means.” (read the blog here
http://hurstassociates.blogspot.com/2009/09/guest-blogger-ben-goldman-on-selection.html ). As for the Agricultural stuff, early agricultural literature included information that, while an applied ‘science’, included stuff that may still prove relevant to the sustainability of biodiversity today. Stuff like chemicals for agriculture, soil composition, varieties of vegetables and breeding techniques. In Ewan’s A Short History of Botany in the United States (New York : Hafner Publishing, 1969) Conway Zirkle writes on plant genetics and cytology: “During the second half of the nineteenth century numerous Agricultural Experiment Stations were established and the genetics of cultivated plants was studied intensively. … Of the nine Americans who attended the 1899 hybrid Conference of the Royal Horticultural Society only four had university connections, and only one of these was in a department of botany. The other three were from horticultural and agricultural departments” (p. 62-63). Read also the section on Liberty Hyde Bailey by George Lawrence (p. 133) “Liberty Hyde Bailey … was the first to elevate American horticulture from the level of a craft to a science. It was he, more than any other person, who made botany the basis of sound horticultural research … [he] was the first American taxonomist to apply the basic principles of taxonomy to the identification and nomenclature of cultivated plants”. He also notes that David Fairchild, trained as a mycologist and a botanist, established and contributed to the plant introduction section of the USDA. We should be careful about how we view applied vs ‘pure’ science. A couple of our scientists discovered a symbiotic relationship between a mushroom and a member of the blueberry family via a mycorrhizal fungus. (
http://www.springerlink.com/content/d33265598109631w/ ) This is not strict taxonomy but the recognition of an ecological and systematic necessity for both organisms to survive. And here is a link to the Strezlitzia page of EOL :
http://www.eol.org/pages/44468#BHL69 ; look at the names pulled up by ubio in Belgique horticole, a ‘garden journal’. It’s not for me to say how the information will or can be used.
What about revisiting the books that we've already paid for? Some are certainly out of scope. -
Sep 2, 2009 I think that all of the contributing libraries have been fairly assiduous about not putting items that are egregiously out of scope in the portal. Yes, there are a few, but not too many. There are also individual titles in larger serial runs, but those would be hard to cull. If there is a decision to be strict on the ingest, I would argue that it might be good to excise some of the peripheral titles.
Connie: I would disagree with Martin here. I do not think systematic culling is a good use of our resources. It would be more valuable to improve and deepen the metadata.
What about using TaxonFinder?
Only include a book in BHL if it has a species name, or a relevant weight of names
- If we had a book about a single species that's not in NameBank, we wouldn't bring in the book. And that could be among the most valuable and important works in the corpus.
- Unusual printing resulting in poor OCR (like Sp. Pl. & Sys. Nat., others) have few names found, though thousands of names described.
BHL Audience
- Hardcore taxonomists
- Natural history and botany library patrons
- A user who comes to BHL looking for a specific title doesn't care about content that isn't germane to BHL.
- A user who arrives at BHL via an external source (e.g. Google or Europeana) because of a specific known title search and stays for the less germane content
- History of science and exploration types
- Armchair naturalists, citizen scientists, backyard botanists
- Users "fishing" for illustrations and pictures.
Which contributors to bring in from IA
- Just CDL or all of the content in IA texts? Beta ingest has been for all IA texts. - Sep 2, 2009 I would recommend harvesting from all IA scanning partners. I don't believe that CDL has any lock on "quality" of content (over, say Toronto). Connie: We should certainly include Toronto but I would be hesitant to bring in all the google books that have been downloaded into IA, not just because the metadata is almost non-existent.
- A review of content is needed; possible that you might eliminate an entire library if, for example, the scans were uniformly poor or the metadata regularly inaccurate and misleading. - Sep 2, 2009 Not germane to the IA harvest (from the point of view of scan quality); all IA scanning is roughly of the same quality. Metadata might be a different story, depending on how IA and its scanning partner set up the metadata fetching. Connie: Again, improving and deepening metadata should be a priority.
Duplications
Lots of duplicates across all of IA + BHL
Do we care about duplication?
- There is a cost associated with holding onto books over time though some felt in the scope if storage costs this would not be a big amount.
- There is a cost associated with duplicating a book that's already scanned. Even those who felt duplicates weren't a big problem for access, for users felt avoiding duplicates when we are paying for scanning was important.
- There is a presentation issue related to display of duplicates Connie: Yes, we need to find some kind of solution for presentation of duplicates.
- There is also a need to elaborate the different meanings of "duplicate," e.g.
- Two scans of the same printing, the same edition, the same metadata - call this true duplicates.
- Two scans of different printing, the same edition, same or different metadata
- Two scans of the same work but different metadata (we have several of these). [Henning: If only the metadata are different, it is a duplicate in my opinion, but this is one of the difficult issues] Connie agrees.
- Two scans of the same work but different editions. Is it right to refer to these as "duplicates?' [Henning: This is not a duplicate in my opinion] - Sep 2, 2009 I agree with Henning here. This is not a duplicate. Would we only want the first edition of Sys. Nat.? Connie agrees.
- Two scans of the same printing, the same edition, the same metadata but with different marginalia and hand-written annotations, e.g. books from Darwin's or Ernst Haekel's personal library. [Henning: This is not a duplicate in my opinion]. - Sep 2, 2009 This is the hardest of the duplicates to identify. Smithsonian has many association/annotated copies of works we are NOT scanning because they are appearing as duplicates. Beyond the big famous examples (Darwin, Haekel) little of this is captured in our metadata unless, for some reason, the book was tagged for being "special collections". Connie agrees and notes that part of the purpose of the IMLS planning grant for special collections digitization (and, we hope, the follow up grant) is to develop better ways of presenting items like these and improving metadata to tease out the relationships and differences.
- Two metadata records referring to different digital objects of the identical physical text. - Sep 2, 2009 I'm not sure exactly what this means
- Two differing metadata records referring to the same digital object. - Sep 2, 2009 This would be a duplicate metadata record (?)
- There are implications for assigning/exposing a 'permanent' GUID/URL to duplicated titles
- What happens if we 'darken' a duplicate, but a user has linked to it
- One approach might be to be "tolerant" or "relaxed" about free, ingested duplicates and then crowd source the research and analysis of discovering or exposing the relationships among the items. What sort of interface would be required to allow this? - Sep 2, 2009 This strikes me as one of the Open Library agenda items Connie: In general, I think we should be a bit more relaxed about duplicates but should find ways to make the presentation/finding explicit and natural.