4th gBHL Meeting Notes
printer friendly
Fourth Global BHL Meeting Notes
Fez, Morocco, May 27-28 2012
Attendees
Ely Wallis, Museum Victoria, BHL Australia, Global BHL Chair
Nancy Gwinn, Smithsonian Libraries, BHL-US/UK, Global BHL Secretary
Anne-Lise Fourie, SANBI, BHL-Africa
Jiri Frank, NM Prague, BHL-Europe
Jinzhou Cui, Institute of Botany CAS, BHL-China
Dr. Magdy Nagi, Bibliotheca Alexandrina, BHL-Egypt
Martin Kalfatovic, Smithsonian Libraries, BHL-US/UK, Program Director
William Ulate, Missouri Botanical Garden, Global BHL Project Coordinator
Agenda
DAY 1: Monday, 27 May 2013
- Welcome and Report from the Chair
- Review of Agenda
- Report of the Chair
- Report from the Nodes
(achievements + challenges of last year & future plans)- Australia
- Africa
- China
- Egypt
- Europe
- US/UK
- Brazil?
DAY 2: Tuesday, 28 May 2013
- Discussions
- BHL IT needs
- Content Synchronization
- Portal: is it neccessary for multiple global platforms: What would the use case/needs justification for a local local platform be?
- Scanning workflow tools
- Biodiversity Library Exhibitions
- Membership & Governance
- Are existing by-laws adequate?
- Discussion of expansion of BHL membership model; what is the role of the nodes?
- BHL relationship to national, regional, and global aggregation (DPLA, Europeana, WDL, etc.)*
- Library of Congress joins BHL: how can this assist with global BHL activites (e.g. involvement with other National Libraries, etc.)
- Review Regional BHL Membership: What are growth areas? How should contacts be coordinated with areas currently not represented in gBHL (Russia, India, New Zealand, Southeast Asia, Japan, Taiwan, Latin America, Canada, etc.)
- Funding strategies at the different existing global nodes
- Funding options to support Global BHL activities (and what this might mean)
- gBHL-CC Elections
- New gBHL-CC Meeting: Date | Place
Meeting Notes
Welcome and Report from the Chair
Since some of the participants were new this year, Dr. Ely Wallis, gBHL Chair, described the changes that have occurred in the Global BHL. Chris Freeland has moved to Washington Universityh in St. Louis and William has been appointed Technical Director for BHL. Noha Adly has accepted a position in the Ministry of Information and Communication and Henning Scholz has moved to Europeana in The Hague. Dr. Wallis also welcomed Anne-Lise Fourie, newly elected Chair of the BHL Africa Steering Committee.
Dr. Wallis reviewed the goals of gBHL, pointing out the main objective is collaborating to further the aims of gBHL worldwide as part of a bigger entity. She emphasized that each node operates independently and pointed out it may be necessary to rethink how individual institutions and nodes work together.
Reports
BHL-US/UK
Martin Kalfatovic described the advances of the BHL-US/UK group. He reviewed recent events, retirements, new staff and content growth in the portal (statistics). There are now 92,413 “segments” and increased agreements with publishers for copyrighted materials. Fourteen BHL member institutions have contributed in-kind resources adding up to14 FTEs, more than $1,239,200 in staff and other costs, excluding the cost of the Secretariat and Technical staff. Current statistics show a steady growth (3,628,088 users from 2007 to 2013) with more than 17 million page views of new (50.06%) and returning (49.04%) visitors.
Nancy added an update on the recent governance structure modification of the BHL-US//UK node for simplification, adaptability and sustainability where partners would pay annual dues. Thirteen of the founding members were able to contribute. There are other organizations eager to become part of BHL-US/UK. The node structure was reorganized. Paying members now form a Steering Committee; officers are elected from this group. The officers plus the Program and Technical Directors form an Executive Committee, which meets twice a month with the third meeting open to all SC members, all via teleconference. There will be a second group of nonpaying participants called BHL Affiliates; these could be content contributors or providers of other services, publishers, or other biodiversity organizations such as GBIF, EOL, CBOL etc. A service fee structure may be developed to help cover the costs of integrating and maintaining content if an organization was simply seeking a host.
Currently there are 13 members and 2 affiliates. Martin explained that organizations like the Biblioteca del Real Jardín Botánico de Madrid have expressed interest in joining the BHL US-UK and could become affiliates. Martin has also talked to GBIF about becoming a GBIF member. Jiri Frank, mentioned that BHL-Europe might find a home with the Consortium of European Taxonomic Facilities (CETAF), which has more than 30 partners from 8 countries.
Ely pointed out that it might make more sense for someone from Australia to fly to New Zealand to help an organization there with set up and training. Anne-Lise explained that an organization in South Africa is interested. Jiri mentioned that an organization with a lot of taxonomic publications has contacted BHL-Europe and is willing to pay fees. BHL-Europe can’t receive fees now, but this might be arranged through Cetaf. The same from Acta Japonica. Anne-Lise mentioned that there is a lot of new content in South Africa that they want to put out.
Technical Developments (BHL-US/UK)
William presented the technical developments for BHL-US/UK. He described the process of integrating the BHL-Australia design with the US/UK portal, resulting in "BHL-AU-US-ome". William thanked the Australian team for all their efforts. Launched on March 18, 2013, the new design is based on the User Feasibility Tests from 2011; it’s much more than a design change because now BHL includes articles and allows for a finer granularity. Article boundaries have been incorporated from BioStor, Rod Page’s database in Scotland that analyzes Tables of Contents and Bibliographies. Other sources are expected later, which will require implementation of deduplication capabilities. A next task for Mike Lichtenberg, our developer, is to start working on the dedup algorithms. BHL content is now linked to Nomenclators (Aggregators of Nomenclatural Acts) like Zoobank and IPNI; and other potential contributors are being evaluated (like Plazi).
Users want DOIs, particularly on legacy articles and They expect BHL to incorporate them. This implies a cost but also there’s a consideration for potential DOI conflicts that needs to be better resolved with the new CrossRef API and, most probably, human interaction.
Ely explained that BHL-Australia now uses the same Portal as BHL-US/UK, but still keeps a local copy of the metadata; this allows them to do the searches and other functions internally and, at the same time, works as a backup of the metadata and the OCR content.
BHL-Europe
Jiri Frank, from the National Museum in Prague (NMP), presented a status report on BHL-Europe. After the official project ended, there was no consortium or structure to carry it on. Graham Higley retired and Henning Scholz left Germany in November and his position as BHL-E coordinator was left vacant. Among the technical issues there were several bugs on the portal, on the ingest process, limited budget and slow performance.
Nevertheless, work continued at a slow pace. After the project, a good amount of content was harmonized; therefore parts of both the European and US content are on the portal, and the Executive Committee requirements were fulfilled (surveys, documentation and a soft launch in Dec. 2012). The Gemini feedback mechanism was implemented and dissemination activities continued. Since January 2013, National Museum of Prague took over the responsibility of BHLE and coordinated the preparation of the portal. This made possible a successful launch of the Portal officially in synch with BHL-US/UK in March. Since then Vienna, Berlin, London and Prague have remained interested and staff are actively meeting and collaborating in the node reactivation. The floor thanked the National Museum of Prague for keeping BHL-Europe going with local resources.
www.bhl-europe.eu now contains 6,000 volumes from 90 providers; it has all US content ready to put online but must show the Europeana content first to satisfy some requirements. Currently Pre-ingest is stable and bugless and the portal performance has improved, especially the speed of the content viewer. Zheng Li, a Chinese developer, was hired to improve the content viewer and help NHM to solve the performance issue, probably by separating, in the next month, Fedora and Lucene from the Portal in a third virtual machine.
The BHLE core group was re-united and re-established with active participation of Vienna, Berlin, Tervuren, and NHM besides Prague. At the CETAF meeting in Tervuren, NMP initiated communication with representatives of the BHLE core partners: J. Frank, J. Kvacek, J. Hoffmann, Ch. Hauser, W. Koller, H. Rainer, C. Sleep, P. Mergen. They are interested in setting up the Consortium as a group within CETAF and they will be the leaders for the project. Currently they are negotiating and preparing a proposal for CETAF, which has expressed interest.
The group is reviewing the previous BHL-Europe partners’ commitments:
- Prague -- BLE dissemination
- Berlin -- group coordination
- London -- technical director and hardware maintenance;
- Vienna -- content ingest flow, technical support (Natural History Museum)
In preparation there’s an official letter to NHM technical, scientific director (Chris Sleep’s supervisor). Also, a new contracted person from Berlin is working 6 months full time on harmonization/preparation of content, while Prague is doing the harmonization support and GfBS is doing content harmonization of ODE content and technical support. Technically, MNH is working on portal improvements and bug fixing. BLE will continue doing Outreach and Dissemination.
BHL-China
Dr. Cui Jinzhou presented BHL-China advances. The China portal now has15,000 vols. containing 1.8 million species names, including 150,000 books from the BHL-US/UK node. The content is mostly botany, but zoology and microbiology are being included, and later they want to include marine biology and paleobiology. There is a newly designed website that includes a list of papers with new species identified. All Flora of China at the province level have been OCR’d so full-search text is available. They want to linkeBHL literature to Species 2000 species names and to markup all genus and species of a family as pilot:species. Jiri pointed out how the changes in taxonomy could affect the hierarchy BHL-China wants to develop.
The Chinese are developing a NSII (National Specimen Information Infrastructure) -- a project to link biodiversity and geodiversity. It would include 9 million names from 100+ institutes and universities; it would include DNA barcodes and incorporate the content from BHL-China as a supporting database.
Chinese names are not being found now because taxonomies are not in Name Finder, so they need to develop a way to do this. It was suggested that we might contact Dima to improve the finding algorithm. Martin mentioned they have contacted the uBio team to improve the algorithm. Jiri also mentioned a proposal that included uBio. Part of the plans to develop the existing applications further is to take free text from e-floras and structure (SDD) the content to get data into a DwCA format.
BHL-Egypt
Dr. Magdy Nagi, from Bibliotheca Alexadrina presented the new portal being developed by the Egypt team. (bhl.bibalex.org). It then contained 6,865 books available by 5935 authors in 7 languages (now it shows more than 16394 books and grows daily). Users with an account can have a bookshelf, bookmark, underlying capabilities, comparing, access private content, group content or public library.
Based on the PASI, they have developed archive servers that are optimal for BHL-Egypt: A server with 45 disks with 3 SATA Controllers and all does 3 works. An advantage is that price is cheaper than the previous Petabox.
Dr. Nagi explained BA has a copy of the IA, from 1996 until 2007 and keeps a live backup. They also belong to the Internet Preservation Group (IPC), led by IA, BA, British library, BNF (French library).
Egypt is gathering content from Arabic countries but hasn’t directly asked for it and will be asking for permissions later.
There are three flavors of Arabic content: no copyright, copyright and have a physical copy; in the latter case, they put up the copy (remove the one on the shelf and leave the digital one).
They plan to contribute Arabic biodiversity to BHL, but it hasn't happened yet because there has to be agreement on a common format for metadata. In two months, the BA will have all of the BHL content available on its new site.
Action Item: William will send Dr. Nagy the discussion on the metadata format.
Dr. Nagi also indicated that they are trying to collect content from institutions that are not available and Meta-meta aggregations and national libraries. ???
He suggested that we could look at the World Digital Library. Martin explained that we have some services in IA. THe WDL asks for much more information than we presently include in the records. Dr. Nagi asked how Europeana does it? Jiri explained. Martin suggested that we will take this for the Portal Discussion. WDL is similar to general portals like Europeana and DPLA; BHL is devoted to taxonomic content. Ely mentioned that we could adopt the principle of trying to go where the users are, making sure the effort can be sustained. Jiri mentioned that for Europeana the OpenUp! Project have worked on helping digitizing the collections and making it accessible to Europeana.
BHL-Australia
Elly explained that the Atlas of Living Australia had originally funded the Museum Victoria to lead BHL-Australia to hire Simon, Simone and Joe (digitization manager). Joe also worked with Joel Richard on making Macaw available for other institutions. The Atlas funding finished last year and Joe’s contract finished in October. Only volunteers are now scanning three days a week and using Macaw to upload files; their work won an award from the Government. A librarian identifies the appropriate items to scan. They scanned In house journals to present day, as well as the Journal of the Field of Victoria up to 2 years ago. Many Aussie journals have already been scanned by the BHL US-UK, Bianca negotiated the rights to digitize the Linnaeus Journal. Joe and Allan, a volunteer, wrote a paper for the Inclusive Museum Conference about Volunteer work and what they get out of the project. Volunteer staff are, primarily, retired highly skilled professionals, they like to be supervised by Museum staff. The availability of Macaw made it possible for other institutions in Australia to contribute content. South Australia has been scanning and the Australian Museum is also interested. The real challenge is money. ALAS has received money for core activities, but no BHL money. The first meeting of the Atlas since last June will be a meeting that Ely will attend.
The National Library of Australia now only links to IA, not the BHL interface.
Ely also showed the Curiosity Cabinet blog (blog.biodiversitylibrary.org/2013/01/curiositycabinet.html). The blogpost was changed after the several copies were found to be available.
About ALAS, there are name references found in the TROVE-NLA. They downloaded the full OCR to create an “uBio-like” service that looks for all text. BHL was criticized because it doesn’t have current literature and not considered valuable enough to be funded by current director.
Challenge for this year is to follow Jiri’s example and get a technician who can explain Macaw and make sure the institutions can scan and upload content to IA.
BHL-Australia’s biggest success was integrating the BHL-Australia interface with the BHL US/UK portal. They are pleased with the result. Simon, Ajay and Simone worked on this. TContent and metadata are constantly synchronized. The aim is for BHL-Australia to keep a complete copy of the BHL-US/UK metadata. BHL-Australia has only redirected some links to use the US/UK functionality.
BHL-Africa
Anne-Lise Fourie, Associate Director for the SANBI (South African National Biodiversity Institute) Libraries, talked about BHL-Africa's recent launch in CapeTown in April 2013 and described the structure of BHL Africa, the staff involved in its Executive Committee and the regionalization of Africa. The launch received media coverage and excited users are asking to use it. 38 organizations have signed the MOU including those inTogo and Mauritania. They are working now on expectations for content creation, audience building, content identification and partnerships. The group stressed the importance of sending a letter to the different Regions so that bosses recognize the value of being part of BHL-Africa. Immediate challenges:
- Pointing out the uniqueness of the collection:
- Understanding what the commitment to BHL Africa actually mean.
- Training and Infrastructure
- Funding
Macaw is now working at the University of Pretoria and they are waiting for installation of 2 Scribe machines.
BHL-Brazil
William gave a presentation about BHL Brazil, since no representative could attend the meeting. He described the BHL SciELO Network; 385 works have been scanned including books about Brazilian fauna and flora and national scientific journals. The goal is to include around 2000 scanned works from Brazilian institutions. he Portal of biodiversity journals includes 38 scientific journals published in 5 countries of the Sci ELO network and an updated collection of 69 national laws and international treaties
BHL-SciELO Brazil has completed converting the metadata from scanned journals into MARC 21 so they now can use Macaw to upload. They are currently experimenting with the Macaw version installed at Missouri Botanical Garden.
Tuesday, May 28, 2013
Technical Update
William presented a Technical update showing the current state of global replication and serving. All content is now replicated at Woods Hole and partially in London, Alexandria and Beijing. BHL Egypt has downloaded all the content -- or most of it. Australia is contributing content to IA directly – although it has OCR & metadata. BHL SciELO Brazil will start uploading content to IA but will install a copy of the portal and translate to Portuguese and BHL -Africa will contribute directly also to IA. Also, the rest of content will be sent to China.
"Forking" the code -- SciELO wants to start with our portal, but translate it to Portuguee -- but if we make other changes, they will diverge -- trouble down the road
Macaw= Metadata Collection and Workflow System -- Joel mostly, but with collaboration from staff of Australia and Brazil.
BHL Flickr currently contains 1,427 sets with 73,523 images and a new front page. Most images are in public domain are all under the most liberal license; copyrighted ones are on noncommercial license.
Synchronization for various nodes needs to developed so they can:
- keep track of who has what
- figure out how often things should be synchronized and
- what kind of information will be synchronized
The Smithsonian Libraries is planning to replicate the BHL files. One reason is is that IA doesn't provide a way to track changes in files—but SIL could apply proper tools for doing this.
William described the Art of Life project to describe and extract natural history images from the BHL. He also commented on the $450,000 proposal presented to IMLS to use gaming to improve OCR or transcribe field notebooks and seed catalogs in collaboration with NYBG, Cornell, and Harvard. It’s called "purposeful gaming." He a proposal to do text mining to extract certain kinds of data was presented to Digging into Data, a challenge for humanities organizations, funded by multiple multinational agencies in 4 different countries, including NSF and NEH.
One issue with the gaming approach is that you can't count on crowdsourcing to actually finish projects – it really depends on size of crowd.
William recommended that the group look at "Transcribe Bentham" on the web for statistics on who participates in crowd-sourcing and the reason to do so.
Workflow tools
It is still a challenge to resolve the Global approach to scanning and having workflow tools. Basically, group members need to know if someone has scanned something already -- now have to go to each node to find out. Because of differences in copies, there may be a local priority to scan something already included in BHL.
Martin asked about the best way to handle the workflow tools. He recommended not worrying about the pipeline of proposed scanning, as experience showed that it may not be done. It’s most valuable to scan 5 copies rather than avoid scanning it because others indicated they would scan it. Building a large infrastructure may not give an acceptable ROI. It may be adequate to build a tool as an API in the BHL Portal. ancy mentioned that, as scanning is getting more expensive, we may want to try to determine more easily what has not been scanned already and concentrate on that.
The BHL US/UK Collection Committee has compiled criteria on when to duplicate.
Portals
Discussion turned to concern about having multiplke piortals. China and Egypt already have separate ones and now Brazil may join them. There is talk about an African portal also, but that is unlikely to develop any time soon. Perhaps the issue is more about how we can be consistent in the branding and the message and the links. BHL Australia links to Trove, but not vice versa. Ely will keep working on them to recognize BHL.
A principle for gBHL could be that, at least for language with Roman characters, rather than a separate portal, an API could be developed in the relevant language to show the country’s content. This way, the problem of forking code would be avoided.
Jiri mentioned it would be better to think about building APIs that could provide the services needed. For example, the RMCA needs to show their content to their patrons, so they create a screen in their place that shows the content in the BHL. You can show the content from your node through the API. It is possible to do this in BHL.
Action Item: Ely will present the proposal to the group, inviting the Brazilian team to make it simple, solve the synchronization problem and show the Portuguese interface
Dr. Nagi pointed out that beyond having an interface in their own language, one of the main reasons to have a portal of their own is that they would require a commitment of adequate support.
Ely asked about the mismatch between the content in Internet Archive and the Chinese content. Dr Cui said they have it uploaded to IA.
BLE
Jiri Frank gave a demonstration of the BHL-E Spice exhibition, which the National Museum in Prague developed. Others are on: Expeditions, Poisonous Nature, Nature at Home, and there is one in development about Mushrooms. The tool was built in Drupal 7 Forms as part of OpenUp!, an Europeana project to digitize natural history, including images, specimens, audio and video. Jiri can create an account so any partner can use the forms and create an exhibition. The group discussed ideas for a Global BHL exhibition to promote the partnership.
Action Item: The Executive Committee will discuss further how to do this.
Membership and Governance
The group had no comments on the current draft of the bylaws and the document can be considered complete.
DPLA
Nancy presented information about the Digital Public Library of America (DPLA). The idea was explored at a meeting in October 2010 convened by Robert Darnton at Harvard University and got started a year later with a grant from Sloan and Arcadia and other foundations. It is a grassroots initiative where anybody could participate to make the cultural and scientific and humanities corpus available and serve K-12 educators, as well as scholars, and hopes to become a deep resource for researchers anywhere in the US. Its focus is mainly on materials from US institutions. Martin and Chris Freeland were co-chairs of the Technical Workstream, before Chris moved to WUSTL. Other groups looked at legal issues, content scope, and finances; Nancy served on governance. The initial secretariat did most of the organizing work. DPLA now has 501-3(c) tax exempt status and recently hired a full-time Executive Director.
DPLA content consists of metadata with links back to the digital asset: books, images, videos, sound recordings, whatever. Users would become curators and create their own collections but training would be required. There are two types of contributors: Service Hubs and Content Hubs. Service Hubs are organizations like state-based digital libraries (47 states have these) with the mandate to help others in the state create digital content. They offer a wide variety of services, from training to actually scanning, to storage. IMLS funded a pilot project and chose 7 state digital libraries to develop DPLA content within their states and create exhibits in areas where the state was strong. Georgia, for example, worked on civil rights, Kentucky on Prohibition, etc. .. Content Hubs are institutions with large bodies of content, like Harvard, BHL, Haiti Trust, Smithsonian and LoC. Since the Smithsonian had already aggregated metadata from many of its museums and research centers in a Collection Search Center, it was logical for the institution to contribute about 700,000 records via FTP to the DPLA. DPLA provide the SMithsonian with about $55,000 to help set up the process. DPLA launched the portal (
www.dp.la.org) two days after the Boston bombing, with what was available. Noha Adly has been a consultant.
WDL
Dr. Nagy explained that the Biblioteca Alexandrina is hosting the World Digital Library and has adopted some of the specifications from Europeana. BA is responsible for the internationalization and the Arabic / Internationalization of WDL. There’s only a single copy in the cloud (they are using Amazon S-3 cloud). The Smithsonian has contributed some items to the WDL. If the gBHL group wanted to become more visible, each node could contribute three books labelled as coming from the BHL, but also showing the original owner. There’s no limitation for each institution to contribute. For BHL we should be able to show the origin and
we could all sign, as long as it’s non-copyrighted content. However, the WDL seems to be only about primary resources and is heavily curated.
Action Item: Dr. Nagi will follow up to smooth the transition. ???
Review Regional BHL Membership
There are more organizations interested in contributing to the BHL in areas like Russia and India. There is no Asian node. How would we wish to expand the Global partnership. There are also organizations where a regional node exists, but there may be reasons to be part of a different one.l
Action Item: It was decided that the Executive Group could identify a subgroup to work on Global expansion.
Anne-Lise asked where should she start? Ely answered she should start with any herbarium or organization that has a scanner.
Funding Options to Support Global BHL Activities
Grants from the Moore Foundation, Europeana and JRS have been completed.
Are there ways to build into your funding opportunities global collaboration?
Action Item: Decision: Nancy and William will coordinate a Survey tool to find out where the money is coming from and what the restrictions are.
gBHL-CC EC Elections
Terms of the Chair and Secretary continue. Jiri Frank to elected Vice-Chair.
As a point of order, Ely Wallis will communicate our colleagues at Brazil about the election.
Action Item: Ely to communicate election results to Abel.
Next Meeting
It was discussed during lunch that next meeting would be planned in Australia in February. It was also requested that William should plan for a Technical Meeting at the same time.
Action Item: William should plan meeting for February next year with Technical Meeting too.
ACTION ITEMS:
ü Group photo.
ü Attach Terms of Reference to the Minutes.
ü Upload Minutes of EC Meetings
William: Review new Copyright for BHL-US/UK content available at BHL-China content
William: Send Dr. Nagy the discussion on the metadata format.
Ely: present a proposal to have one single logical portal to the group, inviting the Brazilian team to make it simple, solving the synchronization problem, and showing a Portuguese interface.
Executive Group: identify a subgroup to work on Global expansion
Each BHL node: Sign in to WDL and contribute 3 not-in-copyright books. Dr. Nagy will follow up to smooth the transition.
Nancy and William: Coordinate a Survey tool to find out where the money is coming from and what the restrictions are.
Ely: Communicate election results to Abel.
William: Plan gBHL meeting for next year in February with Technical Meeting too.