BHL Institutional Council Notes March 21, 2008
Attendees:
- Baione, Tom (AMNH)
- Edwards, Jim (EOL)
- Fraser, Susan (NYBG)
- Freeland, Chris (MOBOT/BHL)
- Arrive: March 17, US Airways 3803, 4:31pm
- Depart:March 21, US Airways 4035, 3:15pm
- Garnett, Tom (BHL)
- Godow, Michael (FMNH)
- Gwinn, Nancy (SIL)
- Higley, Graham (NHM)
- Holland, Doug (MOBOT)
- Norton, Cathy (MBLWHOI)
- Rinaldo, Connie (Harvard)
- Warnement, Judy (Harvard)
ACTION ITEMS
1. Tom Garnett, for BHL, will send a note to Peter Raven regarding importance of increasing bandwidth.
2. Graham will provide a regular status report for international agreements.
3. BHL needs to identify appropriate person to suggest for EOL/DPG. (Tom/Institutional Council)
4. Send Executive Council minutes to all of the Institutional Council. (Connie)
5. Add quarterly calls for the Institutional Council following the quarterly report. Tom will set up dates.
6. Tom will set up the current presentation page to show speaking engagements for the longer term.
7. Jim and Tom should review ways to make the BHL/EOL relationship clearer.
-Tom will provide some language for the EOL blog to make the connection between the BHL and EOL
- BHL should link to EOL or identify as partner
- BHL Secretary should respond to EOL forum BHL related questions.
8. Tom will distribute EOL funding reporting form so that money can be attributed to members' contribution.
9. Doug, Cathy, Susan and Connie volunteered to identify the cost of scanning particular bundles of information for development purposes.
10. Tom will ensure BHL representation on upcoming EOL provider meetings.
AGENDA DRAFT
Agenda for the BHL Institutional Council Meeting
March 21, 2008
Museum of Comparative Zoology, Cambridge Massachusetts
Chair, Graham Higley
Secretary: Connie Rinaldo
I. Continental Breakfast 8-9
II. Welcome , Introductions-----Connie Rinaldo, Graham Higley 9-9:15
III. The State of the BHL Program (Handouts to be distributed prior to meeting): Cathy
Up and running in most scanning centers. Cathy has visited several. About 10,000 (7000 in portal) volumes scanned, 3.9 million pages. MBL ahead of curve because they started early and had 10 scanners. Now Harvard is scanning at same site. Foldouts have become a big issue. Books are pre-screened to be sure they are suitable for scanning. Waiting for Wonderfetch to be implemented so all libraries can add permission statements (IA does not do this). Wonderfetch has been implemented. MBL also checking for copyright renewal. All staff involved and all items checked. Hot links back to IA from MBL catalog (automated process into ex libris catalog). Will also want to do this for BHL. At Smithsonian, working with LC and hiring scanners but some issues with government security. LC has a full pod. IA relationship good with Boston center but they may have reached the end of their technical rope regarding boutique scanning. We need large volumes, foldouts etc. One option is to scan foldouts/large items at home libraries. But they can only tip in during scanning of the book so all foldouts have to be pre-scanned. Foldout solution starts at 300 dpi and can go as low as 180 dpi. Another issue is data security--hacking, earthquakes. MBL lost 200 items in the hacking incident. Tom has asked for preservation/backup plan but has not received anything--they don't appear to have documentation. IA does what it does well--high quality. But our collections fall outside what IA can do. IMLS planning grant addresses this but on a slow track. MBL has been checking every item. We are looking at mirror sites and other possibilities (Fedora, e.g). Might have to allocate some money to a travelling boutique scanner. Workflow is such that items can't be tipped in once the book has been "published". If anything is missing, the book would have to be re-done. 16% of quality control is done on site; IA does 2% in San Francisco (metadata for number of pages). Robert Miller has been very accommodating to the challenges we face .
Operations are underway but there are problems--some resolvable, some maybe not.
Jim asked a policy question--do these concerns need to go to the EOL Steering Committee--maybe we are at that point. Now that the architecture meeting has occurred, we can present a better formed report in June. Jim noted that the June meeting will be discussing sustainability.
We have critical mass for support of BHL.
a. Digitizing/Metadata/Internet Archive Cathy Norton 9:15 – 9:30
b. Portal Development Chris Freeland 9:30 – 9:45:
Architecture meeting: Review where we are and where we are going. We have the funding for support and money to move to a sustainable platform. Version 1 release was successful: taxonomic intelligence, ingest from IA, display in good model. Suggestions for interface and access points--query by language. This can be done and will help with the BHL Europe relationship. EOL/BHL name finding (taxon server) 14.7 mill name occurrences. Need to figure out how to put the information in front of users. We can store it but making sense of it for users is the challenge. User testing, usability analysis, form technical advisory committee and user group for feedback. Display of namespace information must be addressed.
Documentation for custom querying of IA has been done by BHL --consultation with IA developers. Open Library is not very library friendly.
Network speed--image loading time is an issue for both MOBOT and IA--storage and bandwidth availability. MOBOT bandwidth needs to be increased--there is money but stalled by politics. BHL needs to help MOBOT increase priority.for this.
ACTION: note to Peter Raven regarding importance of increasing bandwidth.
Outbound MOBOT but also link between IA and MOBOT to get data. Speed issues related to IA. IA maintains the bandwidth is sufficient.
Storage and speed are limiting factors. Need 12 terabytes for current BHL content. Money is available to build Fedora into network. Can use some for storage to enable mirror site/redundant copy but if buy today, may not be able to maximize the money available through BHL for storage since it is more expensive now. Architectural issue connected with display speed: could be resolved with commercial server software. Grab from IA and once MOBOT bandwidth is increased, the image will display more quickly.
Moore Foundation: administers Fedora grant to MOBOT. Open source jpg2000 server and automated markup of serials. Requested 3-5 page pre-proposal from MOBOT but would like another institution to take the lead--Tom and Chris have found some partners --Smithsonian will take lead on open source jpeg 2000 server with Fedora. Automated mark up is bigger problem and grant (millions): problems to solve: Structural: raw unstructured OCR doesn't have structural pieces defined. Most digital scientific journals fit with XML DTD from NLM and there are services that work with this. It is vital that BHL digitized journals can fit into this to increase interoperability. Smithsonian and IA provided money for pilot with Penn State. Another component is semantic markup--descriptions of component parts of taxonomic literature. Very laborious. Will submit proposal for structural and semantic automated mark up. There is one commercial solution but very expensive. Looking for open source.
Martin and Chris will attend a meeting regarding evaluating name finding. Will work with Carol Palmer at UI.
OCLC numbers are placed differently and so "find it at a library" link is inconsistent.
BHL Library: adding library-like software as overview to make a real library. BHL will be the largest trusted repository for this literature. There are other ways to access the data but the BHL portal will be the main stop. Other languages are important--this becomes important as we expand beyond scientists.
c. Agreements Tom Garnett 9:45 – 10
Strong desire to find ways to digitize and add post-1923 literature. Thus we need to obtain permissions (monographs 1923-1965 where copyright has not been renewed). For journals we can ask copyright holders and get permission to digitize journal runs. Many of the smaller societies are happy to work with us. 49 signed permissions and one nearly ready (Malacologia); others underway.
Agreement with BIOONE for a subset of journals-- a moving window of content that will be supplied to us. Not yet signed. There is a standard permission form on the wiki. Can be tweaked a little. JSTOR discussions initiated; Tom will meet with them in NY in mid-April. Some publishers have digitized files they want to give to us but we have to weigh effort invested to import with rescanning. Referrals have come from scientists in institutions--so talk it up! Keep Tom involved from the beginning. Serials bid list must be notified immediately. Negotiated permissions are prioritized as best we can.
Graham has initiated conversations with commercial publisher. Tom has ongoing conversations with China, Japan, India and Taiwan country digitization efforts.
d. International Developments Graham Higley 10 -10:15
Wiley Blackwell interested in working with us and may give access to metadata and OCR. Interesting work with government of Malaysia. Will not build a library with their Natural History museum because they can get it all online. Given NHM money to do some test scanning. Could lead to more money. Atlas of Living Australia interested in working with us-paying for their own scanning. Scanning must be done at a national level in Europe--no money available on other levels. h Dutch are doing large-scale scanning. Bibliotheque Nationale of France is scanning in France. Germans are developing a bid to German Research Ministry for 5 mill euros to scan zoology and botany materials (zoology in this case is entomology). Germans have allocated 250 mill euros for scanning literature for the next 5 years: Polish, Slovakian, Czech and German language materials (Dec 2008 funding). Botanical Garden in Madrid is doing substantial scanning and is a potential partner. A management team and technical team needs to be developed for Europe and there is another bid to econtent plus (from Berlin to EU cultural section responsible for European Digital Library) (3-4 mill euros) to be a "BHL Europe" (money would be available early 2009). How do we connect the international architecture effectively. Name services, etc. The suite of techologies: Fedora, Marc XML, jp2000, TIFF. In Europe there is a stronger drive for open source software. There may be some redundancy but we should go ahead and scan what is allowable by copyright and in our institutions. ACTION: status of international agreements should be provided on a regular basis.
e. Finances Tom Garnett 10:15 – 10:30
subawards to MBL, MOBOT, AMNH, NYBG, NHM subaward goes through MBL All have been executed. Invoices are sent to Tom as expenses accrue. BHL has not spent much money: IA is a bit slow in invoicing us, Tom's salary is still on federal appt. until March 31, Smithsonian scanning is still waiting on agreement with LC (should improve in spring). Primary expenditures have been for travel, meetings, workshops and portal development at MOBOT. MBL has done the most scanning with BHL funds. Allocated 3 mill through June 2009 and have actually spent
IV. Status Report on the Encyclopedia of Life – Jim Edwards 10:30- 10-45
EOL launch on Feb 26. Down for 6 hours first day. Had 11 or 19 mill hits the first day but only 70,000 people. Slashdot noted some of the problems. 15-20,000 users a day now. There is a review on April 14-15 to go over architecture and issues. Launch went quite well. EOL steering committee met in San Francisco:
1. Atlas of Living Australia requested to become a cornerstone, endorsed by SC and MOU in progress.
2. Naturalis (Leiden) proposed a regional EOL implementation to serve information about plants/animals of the Netherlands in Dutch with same info. in English on main EOL site.
3. TED meeting (14 EOL people there but probably too many). Breakfast for EO Wilson 1 year anniversary of TED prize receipt. 150 people attended. Got about 60 contacts from TED community. Lots of IT interest (Amazon, Sun) and offered to host back-up, mirror sites and this is very relevant to BHL. Possible entre into Dubai.
4. June 2/3 SC meeting is about sustainability. DPG (development group: Chris Elias at Smithsonian, Wendy Skinner from MBL, Edith Wunn at Harvard...) is looking at funding and dichotomy/conflicts of raising money at institutional level vs. EOL level. BHL is very interesting to donors: need to think about how to leverage these possibilities. How can BHL have a member on DPG that can target development.
ACTION: BHL needs to identify appropriate person to suggest for DPG.
5. Taxon-based meetings--trying to get groups together on how they will produce good species pages for their taxa. BHL may get requests for specific materials to be digitized. First at MOBOT, next is Ant summit at Harvard (EO Wilson donated money from TED to develop ant pages), and Fungal meeting organized through Harvard (Anne Pringle) but at Mycological Society at PennState.
ACTION: Connie can attend the ant meeting for BHL; Tom can attend Mycological Society meeting.
Break coffee and tea 10:45 – 11
V. Feedback from Architecture meeting/Data model for roles 10:45 – 11
Review of Action items from BHL Architecture meeting.
BHL+Architecture+Notes+March+19-20%2C+2008
Field Museum publications being scanned by U of I so Field Museum does not show up as contributor. So we need to develop a data structure to support the various roles that partners play. Targeting a proposal for terminology and understanding of content: first draft quickly; by end of April.
Engage BHL in social networking world using Library Thing and Flickr. Will support EOL by posting tagged pictures, for instance.
Discuss with Marie Studer (EOL Education) the "storytelling" aspect for the BHL materials. online exhibits and cultural artifacts.
met the deliverable to get literature to scientists but have not engaged the citizen scientist and public community. Natural History literature has broader impact than just scientists.Tom Baione expressed concern that working on the educational component may dilute the work we need to do for the BHL core constituency. Jim E. noted that we don't know how people are going to use this information or what questions they will ask. Need a robust tagging system built in for text as well as pictures. Aging project is a big tagging project. Have to have the technology available and let the users do the tagging. Chris noted that the technology for this needs to be the same in EOL and BHL. We should take advantage of Marie Studer since she is working on this for EOL. Not a major priority now but will be important in the future.
VI. Bylaws (to be sent and discussed before) Nancy Gwinn 11 – 11:30
VII. Lunch 12:15 -1:15
VIII. Cooperation and Communication – What is working and what is not working. 1:00 – 1:15
Be frank.
The group in the trenches have begun a weekly conference call. Reading minutes of the people in the trenches is very helpful. This group may need a face-face meeting. The wiki has a great deal of information--access to people directly working on BHL. Exec group posts minutes on wiki for weekly conference calls.
Improvements? Architecture meeting seemed to be by invitation and directors were not aware they should pick someone to send. More communication might have been helpful. Make sure specialized meetings and attendees are more transparently presented. If meeting needs to be restricted, make it clear.
ACTION: Send exec minutes to all of the Institutional Council.
ACTION: More conference calls with all of the institutional Council--quarterly immediately after Tom's quarterly report.
Translation of highly technical communications. Chris could provide technical highlights for quarterly conference call or possibly quarterly report. A simplified version of the financial spreadsheet could also be provided. Tom needs to provide "money remaining". Jim noted that bullets that show areas of concern or disagreement would be helpful for EOL Steering Committee.
Graham asked Jim if EOL has a risk management/risk assessment document--and the answer is not really. There was some risk management discussion in the EOL annual report.
Tom noted that he travels frequently to meetings and workshops and he could check in with members on these trips, talk to others in organization.
Presentations: need a simplified list that we can put in as we go forward. Sharing the longer term will be helpful for communication. ACTION: Tom will set up the current presentation page for this purpose.
Jim expressed that BHL sometimes seems like not quite a part of the EOL team because BHL has been around longer--make this into a virtue. Provide some language for the EOL blog to make the connection. When BHL members attend EOL or other meetings, it would help to introduce them and make the connection.
Graham noted that EOL points to BHL but not vice versa. ACTION: Jim and Tom should review ways to improve this relationship.
Jim suggested that when searching BHL and find a taxon, should be able to get to EOL page from that name. On the forum for EOL there is a specific request for information from BHL. Responders are unknown. The BHL secretary should be responding to BHL questions on EOL forum--how can this be arranged.
IX. Next Strategic Steps 1:15 - 3
a. Collections/analysis/selection
Doug Holland and Connie Rinaldo have volunteered to lead coordination of the analysis and collection selection issues within the BHL collection. This will involve evaluation of the OCLC analysis tool for this analysis. Tom can provide background documentation.
b. Long-term sustainability Tom Garnett
Must plan for long-term sustainability of the assets we are obtaining and curating. Need a host in addition to IA as a backup--dark archive to allow rebuilding. Need mirrors in the medium term (BHL Europe? ) Migration to Fedora Commons--built as repository platform. All or portions of our collections might be mirrored in this way. DataNet--NSF funded program to create centers for the long-term management of scientific data. EOL is part of one proposal--we may be able to piggyback for mirroring, hosting or serving. For profit offers--Amazon, pay as you acquire more storage and they do 24/7 backups and infrastructure.
Institutionally, we are part of long-term brick and mortar places. Relationships here must be explored and developed. What is the brick and mortar institution commitment to the BHL Digital Library once initial funding is finished. Could range from nothing to financial support and lots in between. The work is important but taking it forward through time is critical.
Short-term, medium term and then centuries. Even Fedora is looking for sustainability. It is not just us--the partners are also looking for sustainability.
Have to play in the world of libraries, publishers, scientists--a prime use case. Internet Archive mission is to archive so we are ahead of the game by working with them, even if we need to back up in other ways.
jim noted that MacArthur won't consider another funding cycle unless we have a sustainability proposal so we must have something in 2 years. Need costs through 2017 and steady state cost. E-Biosphere: International Biodiversity Informatics conference--should have a BHL section with presentation and also feedback. First 3 days are open and will spend time getting feedback from users; final 2 days would bring together the projects to identify how to move forward. To ensure involvement, BHL could be a sponsor ($20,000) but better to rely on EOL (David Patterson). Also Bryan Heidorn.
c. New Institutional members and new partners (Graham)
Dialogs in EOL steering committee regarding new partners--moving out of start-up and still an Anglo organization. Germans, French, Spanish, Japanese etc are interested and we don't want regional BHLs. Need to think about how this group can function as we add members from non-Anglo world. As EOL makes determination here, it will inform how BHL moves forward. Have to move that way if our mission is to make all of the literature in all languages available. Need to think about members who are not simply reps of institution or collection but have broader roles. BHL can only move forward as a global project in this way. (grey literature, invasive species literature..). Partners may want some of budget pie. Atlas of Living Australia comes with its own money. Must think from sustainability perspective about institutional registration--incorporation as organization. (longer term). Levels of membership and participation should be explored. Europe is not as driven by institutional relationships as is the US.
Jim noted that some institutions are suggesting federal earmarks as a way to get more funding. Smithsonian can't participate. Grant money can't be used to lobby.
X. Future Fundraising 3 – 3:30
To date: MOBOT submitted and obtained Moore Foundation grant to migrate BHL portal to Fedora.
IMLS planning grant to look at existing practices for boutique scanning and look at ways to make it more cost efficient.
NYBG has a Mellon grant to scan Latin American literature--may be multi-year: will include oversized/fragile books. Might be outsourced.
DatNet solicitation, Moore proposal.
Please keep Tom informed about funding proposals--he can keep it confidential but can also help coordinate. BHL can also act as an endorser.
EOL Development committee keeps informed by having a summary on EOL intranet so that there is no interference competition. EOL has form that they would like us to fill out to identify money that can be used for institutional match. ACTION: Tom will distribute. Please have development office fill out the form.
Need to put some pricing on bundles of material so that development folks know how much to ask for.
ACTION: Doug, Cathy, Susan and Connie volunteered to identify the cost of scanning particular bundles of information.
XI. Wrap up and assignments. Graham Higley 3:30 -4