2009 Architecture Meeting Minutes

Table of Contents

Action Items from 2009 Architecture Meeting Agenda
Monday, March 9
Development Priorities
Scanning Priorities
Content Acquisition
Content Management
Tuesday, March 10
Content Delivery
Hardware Infrastructure
Data Mining
Portal development
Recap & Administrivia
Wednesday, March 11
Name Finding
Nomenclatural Acts
Automated Markup
LifeDesk/BHL Integration
Parking Lot

Project Plan: BHL_DevPriorities_09.pdf (PDF) or BHL_DevPriorities_09.mpp (Microsoft Project)
last updated 3/18/2009

Review Action Item: Review thumbnails of PDF generator
Action Item: Creation of Dissemination/Publicity Group
Action Item: Outreach for more effective use of PDF Generator
Action Item: Size files from MoBot in the PDF Generator
Action Item: Articlizing metadata review of capture to see how it is going and for development
Action Item: Add more links from Wikipedia.
Action Item: Wikipedia as content delivery for every title. Highlight specifics with pages.

Development Priorities

Action Item: Key dates send to Chris F. Revisit this priority list with more time lines and milestones.
Action Item: Investigate with Zotero Commons / Dev Roadmap
Action Item: Chris revises list with discussion from meeting
Action Item: Review Richard Pyle’s Zoobank and use of LSIDs
Action Item: by July OCLC ISBN discussion
Action Item: BHL to EOL from species discovery bibliography.


Action Item: Europeana bringing in the citizen into the BHL-Europe and in-fact EOL demand. Impact on selections. EOL needs.
Action Item: October European Digital Library meeting. Need to potentially identify participants.
Action Item: eBiosphere potential networking
Action Item: Start a FAQ on BHL wiki responding to common user complaints/suggestions, with goal of migrating to public-facing BHL-E project wiki. Post known problems and status of issues. Help frustration levels. Explore staffing for promoting what we have and what we know we don’t have or need to fix
Action Item: BHL – Europe membership to the BHL Wiki space.

Scanning Priorities

Action Item: Index Animalium use as prioritization. BHL Portal key value pairs with Title with an identifier (TL-2). Is able to fuzzy match.
Action Item: Reconstitute a new Collections Group. Bianca lead. Exec/IC committee will work on formation of subgroup.
Action Item: Look at taxon specific groupings/classification/counts of what content has been represented in BHL.

Content Acquisition

Action Item: EOL’s LifeDesk connection to the Article Repository
Action Item: Policy discussion around copyright/use re: Article Repository
Action Item: Johann Bollen Los Alamos use of triple stores and the semantic web use of citations.
Action Item: Search over BHL Portal and the Article Repository – Article articlized source from BHL into the article repository keep “open” and not locked down into group.
Action Item: Identify the connection of BHL Portal titles and articles.
Action Item: Gather test group & use cases to help define functionality seed groups to encourage community building. ZooBank’s citation needs of handles in the Biblio (Richard Pyle)
Action Item: Can biblio pull the xmp data out from pdf?
Action Item: All sign up for the Article Repository & provide feedback
Action Item: Name the Article Repository - Captain Kidd Spitball, caveat linktor
Action Item: Small workgroup/ task force to examine requirements for a dedupping and bid list. Report to Henning. Executive Committee designed. Review Open Library’s dedupping algorithms.
Action Item: Google Docs BHL Digitization Specs link send to group and any updates that have been created on the wiki (Bernard?) It needs to be reviewed to see if we have a common lowest denominator of needs. Develop Strawman for discussion. Deliver to Henning
Action Item: Survey of BHL – Europe like we surveyed ourselves to find out what the data is like, what do they have. Needs to go out before BHL – Europe meeting.Questionnaire.doc
Action Item: Define METS profile for SIP/scanned content
Action Item: Europeana wants to ingest – Documentation of ways to get data out – Repositories that want our data. Static export done over times but not a feed, OAI, SOAP, REST, etc.
Action Item: Take sample from California Digital and dedupe what we have done already. Then would be to bring it into our prescanning dedupping processes.
Action Item: Review Open Library dedupe algorithm, possible change of existing BHL practice.
Action Item: Identify sources of scan material. Who they are the type of material. Potential contributors by class of donator. BHL Europe, BHL China, IA partners, Publishers, Back Files, BHL Partner libraries.

Content Management

Action item: Follow up with Robert Miller at IA about insertion page – dummy book with insertion before files and after files. See what they do and tell what they do.
Action item: Cathy find out what Brewster meant by the Migration comment. What migration to what?
Action Item: Exec. Committee decide if we need to work with IA on correcting data & harmonize data with other locations, ie: Open Library
Action Item: Mike L to look at what records might have mistakes because of diacritics. NY Bot skipped the letter so not as easy to find. NY Bot might be able to give a MARC dump of the date ranges.
Action Item: Review portal editing needs with Mike L and Chris F
Action Item: Notification of record changes. RSS. Tracking, logging of changes.
Action Item: Implications MARC records in to BHL portal. Pointing two records to same scan etc. Describing one set of scanned items as both a monographic series & a serial.
Action Item: Volume information normalized. If find examples send to Chris. F and Mike to see on multivolume sets for fill in gaps to get sequence.
Action Item: Write best practice document for multi-part complex bibliographic items as used by biodiversity scholars.
Action Item: Exec Committee retrospective portal clean up data – expectations?
[from here up, dates updated according to MicroProject file on 3/19--B. Lipscomb]

Tuesday, March 10

Content Delivery

Action Item: Suzanne adds Portal Editing Wiki Page to the action item
Action Item: Enhance pdf deliverable and email to explain information
Action Item: Scim open source pdf reader for Mac’s. Review needed.
Action Item: Better integration of OCR to the PDF deliverables. Prices etc.
Action Item: Everyone needs to gather information PDF forms and deliverables. Dissemination group to announce the pdf and collect information.
Action Item: Dissemination group way to facilitate the gathering of user feedback and end users on portal development
Action Item: BHL Dev Team review indexing MARC for title-level access
Action Item: BHL Dev Team Solr index of keyword across all OCR text. Is this really needed. Data mining tools might be over kill. Review what implications are.
Action Item: Investigate techniques for place name searching.
Action Item: Work with IA on the page types from IA for helping in identification of Illustrations.
Action item: Revisit delivery of thumbnails or small page images for browse – visual exploration of BHL portal. Adobe side board.

Hardware Infrastructure

Action item: Further discussions with BHL – Europe Adrian with BHL Dev team on data moving and speed etc.
Action item: Chris F. and Adrian discuss the way to have BHL Europe builds its first deliverable mirror and planning the next platform and load balancing. draft a possibilities to immediate solutions with the goal of the Darwin and Datanet as the long term
Action Item: Datanet Tom G discussion with Datanet as potential use as a dark archive for BHL
Action Item: Phil looking at BitTorrent as an alternative distribution model
Action item: Using Fedora with petabox solution redundance IA content
Action item: Chris and Adrian and Henning to talk before 2-3 Europeana kick off Netherlands OCR conference where AIT (Austrian) in the Hague in April 6-7– Adrian and Henning. Determine Chris’ availability.
Action item: Tom find out the SI resolution by end of march
Action item: Chinese Academy of Science discussion continues
Action Item: Cathy to monitor the resource space that is in the Otis Airbase request


Discussed at lunch.
Action Item: Evaluate BHL exports for incorporation into ViTaL/SFX
Action Item: Evaluate ingest of AnimalBase into Drupal/Biblio via OAI
Action Item: Evaluate Falx, which is ingesting Scratchpad bibliographies via OAI; determine shared vision for global bibliography management (including LifeDesks & Scratchpads)

Data Mining

Action Item: Chris M and Chris F provide Amhed page identifier Evaluation set for last summer evaluation of Taxonfinder 2J
Action item: Amhed to report evaluation to Cathy and Chris
Action item: Drupal module for TaxonFinder2J
Action item: Stable url verify with Amhed and Ryan for the Taxonfinder 2J
Action item: Mike results of Zea mays number count seemed off from search and discover bibliography.
Action item: Follow up with Patrick L about the development of the API on EOL of the name synonymies service
Action item: taxonfinder 2J against gray names in Mobot’s storage of names
Action item: Tom G. to follow up with Lee Giles check in to see what is going with the latest Penn State request
Action Item: Cathy, Holly and others George Toma at NLM follow up on markup
Action item: Tom G. investigate partners for research grant for automated article metadata structure. Google summer of code. Berkley I school.
Action item: Dissemination group look at the use of invitation the pdf articlizer.
Action item: Chris F. to contact Vince Smith about google contacts.

Portal development

Action item: LC Flip book. Mike Hand at LC, Joe , John M., Chris F. and Martin to work with the move to modular approach.
Action item: IA flip book beta in the BHL Portal beta site for bhl members to look at. Testing swap out ability
Action item: exports include all pages or pages only with names? Mike L. Can we give everything that we offer.
Action item: TDWG literature interest group on specs citation and title resolver for biology literature. John M. and Chris F. David Remsen.
Action item: Article Repository – DOI’s from articles in the repository. Proof of concept some hight impact titles. David Shorthouse and Jim Edwards and DOI deal. Linnea monographs. Chris F., Tom.
Action Item: Outreach group social networking web2.0 opportunities. Institutional Counsel or list. Tom G to do the discussion list.
Action item: Search engine optimization. Phil and John M. planet software Code4Lib – aggregate that blog on similar topics.


Action item: Martin will create list of all of the social networking BHL is participating in currently.
Action item: Strategic plan of BHL has communication parts. Susan F. include some of the social networking group.
Action item: One page list ideas of monetizing the bhl data for incorporation in the sustainability plan.
Action Item: Add to monetizing in sustainability plan the Kirtas deal. Investigate the Abe book thing.
Action Item: Email list more active – technical list created with Henning and Adrian attached to the technical list. Technical committee. Form the committee with specific technical group.

Recap & Administrivia

Action Item: Suzanne send action item to Tom
Action Item: Tom sends to everyone the action item telling those with assignments.
Action Item: Technical committee to decide on follow up next actions and meetings etc.
Action Item: Chris prepare one page executive summary for IC

Wednesday, March 11

Name Finding

NameFinding notes - Patrick & Paddy
Action Item: BHL to describe high level description of needs (framed during discussion) and input into Jira; determine a date by which we need this
Action Item: Organize meeting around NameFinding (with vernaculars) & NameReconcilliation before June 30th

Nomenclatural Acts

Action Item: Review search interface to make it more article-friendly with existing metadata present in BHL
Action Item: Tom to describe Nomenclatural Acts issue in Jira

Automated Markup

Action Item: Chris to pick back up with reCAPTCHA
Action Item: Tom to discuss OCR rekeying with Chinese Academy of Science
Action Item: Chris & Mike L, with Patrick, to think about/look at leaving tags for names in OCR
Action Item: Reevaluate wiki for rekeying/markup as OCR is major barrier

LifeDesk/BHL Integration

LifeDesk/BHL integration notes - David S & Paddy
Action Item: Write a document for a shared vision & identify coding needs; where is overlap what works needs to be done.


Review discussion from Day 1 & include decisions here
Decision: Volume item and date information can be done by anyone.

Parking Lot

Statistics of EOL needed Patrick and BHL – number of species BHL link. We also need BHL volumes not title.
Interface with Image server with Jpeg 2000 help user
IA metadata issues and redundant
ISBN and Metadata for digital manifestation records (Cornell)
Looking at the article repository to see how data is being used and how the cross information
Copy right issues/fair use and safe harbor concepts
NLM DTD wrapper
Images from Abby documentation from IA – there but need to pull out.

Redundant issues
Content and PDF delivery
Can’t find articles they want
Files too large
OCR problems
Wiki for every page or wiki for every book for public editing.

Taxonfinder development – tomorrow
