BHL
Archive
This is a read-only archive of the BHL Staff Wiki as it appeared on Sept 21, 2018. This archive is searchable using the search box on the left, but the search may be limited in the results it can provide.

Content Enhancement Tasks

Back to Interns & Volunteers main page

See also Gaming page on public wiki

BHL Content Enhancement Tasks

Table of Contents

BHL Content Enhancement Tasks
Pagination*
Article-ization
Image identification & extraction
Adding scientific names & common names
Related
Wikimedia Commons (related to image identification above)
Wikipedia
Questions

Pagination*

BHL receives page-level metadata through its scanning collaboration with Internet Archive and the institutions who share their scans with BHL. In some cases, there are no page numbers expressed in the metadata, and no indications of the type of page (i.e. Table of Contents, Text, Cover, Illustration, Map, Blank), which makes navigating those books nearly impossible without a time consuming page-by-page review.

Current Contributors: JJ, Gilbert Borrego
Tools used for task: Macaw, Paginator
Benefit: improves end user searching within BHL portal
Intern/Volunteer discussion of 2/29/2012 decided pagination was best as focus of work, see Intern Volunteer Coordination

Article-ization

BHL already has a UI that allows users to select non-contiguous pages from a scanned volume and bundle those pages into a PDF that's created on the fly and delivered to them via e-mail notification. This is conceptually similar to the Table of Contents rekeying, but still different because not all published journals have Tables of Contents, and in historic literature not all pages for an article were printed together (plates were often printed at the end of an issue).

Current Contributors: any BHL user can generate articles and have them added to Citebank as long as they create a Title and Author for the article. Rod Page via UBio
Tools used for task: PDF Generator, uBio
Benefit: provides end users with article level access that they desire, connects content in Citebank with content in BHL


Image identification & extraction

BHL has coordinate-based OCR for nearly all of its scanned pages. We'd like to automatically identify the objects within a scanned page that are a "visual resource" (i.e. figures, plates, illuminated texts, tables) and then provide a way to rekey the caption or other descriptive information.

Current Contributors: Gilbert Borrego
Tools used for task: Paginator
Benefit: would provide an easier way to browse through illustrations and plates in BHL books and journals other than the very manual process of browsing page by page in current viewer. Images can be more easily extracted from BHL portal and incorporated into other image-related portals like Flickr, Flickr is a great way to reach new audiences for BHL, particularly those in non-science disciplines who are heavy users of illustrations.

Adding scientific names & common names

Related to the two tasks above, BHL staff manually identify illustrations and set page types for particular books of interest and upload those images to Flickr, where they can be tagged & indexed by others within the Flickr community. Of particular interest to our users is being able to find illustrations by scientific name and by common name. When we add a "machine tag" with a scientific name to an image in Flickr, that image is then indexed by the Encyclopedia of Life and made available to its large community of users.

Current Contributors: Gilbert Borrego, Flickr users
Tools used for task: Flickr
About BHL stream and instructions for contributing http://www.flickr.com/people/biodivlibrary/
BioDivLibrary’s photostream
http://www.flickr.com/photos/biodivlibrary/sets/
Benefit: further promotion of BHL content, connects BHL content with content from EOL

Related

Wikimedia Commons (related to image identification above)

http://commons.wikimedia.org/wiki/Commons:Biodiversity_Heritage_Library

This is a project initiated outside of BHL by Guarav Vaidya, grad student at Univ of Colorado, Boulder working on biodiversity informatics http://www.ggvaidya.com/ He's Tagged 200 images (as of 2/23/12) from BHL, fixed their copyright status, and linked back to BHL website (linking at item and not page level). Its described here as "This project hopes to facilitate a partnership between the BHL and the Wikimedia Commons to their mutual benefit. In order to convince the BHL to commit scarce resources to this task, we want to start by showing them the value of the Wikimedia Commons as a facilitator in making their content widely available and as a way to enhance their brand.
The files: http://commons.wikimedia.org/wiki/Category:Files_from_the_Biodiversity_Heritage_Library
Template for tagging BHL files: http://commons.wikimedia.org/wiki/Template:Biodiversity_Heritage_Library

Current Contributors: Guarav Vaidya
Benefit: further promotes BHL content, puts BHL images into other portals, may drive more traffic back to BHL



Wikipedia

BHL & Wikipedia
Wikipedia links to BHL content
http://en.wikipedia.org/w/index.php?title=Special:LinkSearch&target=http%3A%2F%2F*.biodiversitylibrary.org&limit=500&offset=0
http://linkypedia.info/websites/34/pages/


Engaging wikipedians
Glam-wikian: Sarah Stierch, Was Wikipedian in Residence at Archives of American Art and now at Smithsonian Archives StierchS@SI.EDU. http://en.wikipedia.org/wiki/User:SarahStierchShe is part of GLAM-wiki initiative http://en.wikipedia.org/wiki/Wikipedia:GLAM/US

ways of engaging wikipedians - editathons
http://www.nypl.org/locations/tid/55/node/134716?lref=55%2Fcalendar


Questions


What about folks uploading citations to Citebank? Does this activity fit in with content enhancement?