BHL
Archive
This is a read-only archive of the BHL Staff Wiki as it appeared on Sept 21, 2018. This archive is searchable using the search box on the left, but the search may be limited in the results it can provide.

BHL Staff 2009 Notes

Table of Contents

CiteBank (Drupal)
Ingest review/check-in
CONTRIBUTORS
DUPLICATES FOR SCANNING WORKFLOW:
SERIAL DEDUPING
METADATA/ IMAGE QUALITY
CONTROLLED VOCABULARY
INGEST METHODOLOGY
INHOUSE SCANNING
INGEST BEYOND
Portal Editing Discusssion
Sequencing
Item level corrections
Creators
Pagination
Merging Titles
Uploading MARC records
Gemini Discussion
Workflow
Responding to users:
Error Button
Web Feedback Form
Midday Recap
QA
Public Facing Wiki
Architecture:
Encyclopedia of Life (EOL)
Mission and Collection Development Policy Discussion
Tom Garnett Questions Nov 09

In attendance: via Skpe
Eileen Ref Librarian, Coordinator Digital Imaging Center, National Academy of Sciences, Philadelphia
Mike L., MOBOT

In attendance:
Christine G. Field
Diana D. Field
Diane R. MBLWHOI
Bianca L. BHL
Grace D. Smithsonian
Erin T. Smithsonian
Judy W. Harvard
Matthew B. AMNH
Matthew P. MBLWHOI
Joe D. Harvard
Suzanne P. Smithsonian
John M. NYBG
Don W. NYBG
Michele A. MOBOT
Chris F. MOBOT
Gil Taylor Smithsonian
Martin K. Smithsonian
Nancy G. Smithsonian


CiteBank (Drupal)

Facilitator: Chris Freeland | Note taker: Diana Duncan
Chris gave a recap of stats & usage: 2009_Q1-Q3_BHLDevelopments.ppt
highest most use date was in October--more than 3000 visitors in one day
Diane R.-- Was latest peak when EOL on Jeopardy? -- Yes

He then spoke about global development via BHL Europe --building a community of developers--funded & volunteer
See the presentation which he will be posting for other details

Suzanne--asked about only giving address to page and not by species name. Chris: Open URL is reference linking not linked by name

EOL
522,000 species pages linked to BHL and is #1 referral site
also Wikipedia, Blog posting are getting in there
Harvard also high up--links are in catalog. Suzanne made the point that it is worth doing the work to put links in catalog. Joe didn't think if was from the catalog. Chris -- links to BHL help BHL.

He next shared what other consumers have done.

Development goals re: citations
Showed examples of PDFs from BHL pushed into Drupal/Biblio

He walked us through Citebank. It is built in Drupal and has Biblio module. Other projects are using it.
If search in Drupal, receives an amalgamation of information from different sources. PDF and keywords from user, catalog data
He displayed a BHL Data Flow diagram--it is a Google Doc--link?
Drupal & Biblio store just metadata for citation (like Endnote). It gets indexed by Lucene. Name services hit off Lucene.
Have ingested bibliographies into Drupal.

Chris is demoing this over next 3 weeks to us, Taxdag?, BHL Europe. He's looking for feedback as far as managing BHL.
Suzanne--how are bibliographies working in finding BHL titles? Chris--We can take those and articlize content. Suzanne is considered about workload for resolving problems i.e abbreviations for journal. Chris -- we are working on getting abbreviations for TL-2 and other bibliographies that may be sufficient for articlizing. Drupal also has 15000 titles.
Don W. asked about preferred format for bibliographies. John M. NYBG is working on a grant where they may be adding a large number of citations.

Chris showed that us you can import bibliographies and can choose a file standard. MARC does not work very well. You have to be a registered user to upload content.

Questions: Who gets to have a login--everyone or do they have to give a reason. Are there worries of messed up metadata? There is a sandbox for testing.

EOL wanted us to take bibliographies from synthesis center. There were also criticisms about BHL interface. By pushing our citations into Drupal we get an extended search interface. Users are now coming in to a specific page or searching for specific author, title, etc.

Can do a keyword search now without matching exactly with Lucene. There are also facets.
Suzanne-- one thing in library world we're still figuring out how to break out facets. What can we give them to narrow down their search?
one thing might be monograph vs. serial. Chris made the point that reference mgmt systems cite article vs. book. Suzanne--do you want reference people who work with public to give you ideas? Yes.
Chris--we are faceting upon pieces of information in citation db. We can add in other publication types, but these don't exist in Endnote

Suzanne asked about relevance and filtering. Chris--can apply different weights to the criteria.
Judy--can you put in preprint or reprint? Yes. There are 31 Endnote types but not that level of granularity.
Chris--How much editing are we going to be doing in other people's bibliographies? We can make suggestions but cannot edit.
He showed an example of an article brought in through Decapod list without journal name.
Bianca suggested we should add a comment that we are limited in our ability to control it.
We do have a citation source, maybe we need to highlight this better. Do we need to provide feedback link to original source?
Suzanne made point that should only give login to people who promise to curate their data? Judy did not agree and thought most people would be receptive to that. John M.-- if there is a protocol for it then should bring it in.
Chris -- there is a distinction between Citebank and BHL. Do we need basic & advanced search? Don W. thinks a disclaimer will make it clear to enduser. Grace mentioned that EOL has a distinction between information that is scientifically vetted or not. Why could we not do this?
Diane--this is such a small subset. How is this going to be presented?
John M. suggested doing it like Google--with vetted material 1st and then non-vetted. Chris--also need to show with digital content or not because some bibliographies have works not in BHL.
Matt B.-- maybe treat it more like a consortial search
John M. -- that is searching by locations and we are talking about non-vetted and vetted. Matt B.--like Scopus only returns peer-vetted results
Need to be careful about wording. Kerri--fixable vs. nonfixable is what is important.
Matt B.--need clarity on different level of quality
Diane R.--could we do a little icon for BHL contributions vs user-contributed
Chris--We are not the core users of BHL. Each library suggest a couple of users to run through quick demo. EOL educational group did a usability study but they have not done it for us yet.

Chris thinks the best feedback comes from the users who actually use the site. Suzanne is worried about people who fail to find something and write us off. We want to find the users who have been giving us feedback on portal.

Name search still has to be incorporated back into Drupal.

Suzanne--Need Gemini here on Citebank

Don W. brought up born digital publications and new names. If BHL becomes a repository, then we could not allow this to be editable, among other issues. ICZ is deciding this soon.

Diane asked if taxonomic name searching was the only thing the portal can do and Citebank cannot. No.

Next steps--Chris to talk with IC about the issues brought up here and at TDWG.

Other issues--deduping bibliographies
Kindle now has BHL literature -OCR text

Action items
Check to linking citebank records back to title in BHL
Clearly identify contributor for each citation and more clearly brand BHL books that we have ability to edit
We need to provide different kinds of comments
Need to distinguish BHL content from user-contributed
We need feedback button going back to Gemini


Ingest review/check-in

Facilitator: Bianca Lipscomb fea. Suzanne Pilsk | Note taker: Michelle Abeln
Skyping in:
Eileen Ref Librarian, Coordinator Digital Imaging Center, National Academy of Sciences, Philadelphia
Mike L., MOBOT

Mike L. described the ingest process:

Bianca L.: we need a short term decision on ingest analysis, how to incorporate it into our workflow as soon as it's there
Concensus reached: yes, we acknowledge duplicates exist, as long as all contributors are correctly identified

CONTRIBUTORS

-when titles are merges, tehre are currently no fixed rules on which contributor displays on search results
Mike L.: we can take 'contributed by' note off, or show all contributors
-do we need this information to display on search results since it's attached to item level data? Individual items can still be traced back to contributing institutions
-contributing libraries should be marked 'archive.org:institution'; should BHL institutions be marked 'BHL:institution'?

DUPLICATES FOR SCANNING WORKFLOW:

Chris F.: 24,000 monographs downloaded from IA, pulled into spreadsheet, but can only be put into deduper with less than 1000 titles/ set
-Grace, Erin and Diane R. have been sending these lists against the monograph deduper
-Internet Archive added to institution list
Diane R.: when all items in a list have an OCLC number, 8.9% duplication for about 7000 titles
-however, about 45% of the ingested records did not have an OCLC number attached, so no dupes were identified
-in a random selected of titles manually checked against BHL, several were found, but none had exact title matches
-dupe percentage might be as high as 16-20%
Chris F.: Kai is working with BHL-Europe, experimenting with text mining, to get a 'fuzzy match' on title searching
Bianca: there are longer term solutions in develoment, but for now we need to prioritize deduping amonst ourselves. IA dupes are seen as acceptable, we've done what we can with the tools we have
-we recognize there are a set of records from IA we just can't dedupe against
-agreed that OCLC# and volume # together are best indicators of dupes
-reminder to make sure picklists have OCLC#s present on all titles
-suggested we check in in our next monthly call to see if this seems to be working for everyone
-Bianca will be sitting down with Erin and Grace to walk through the deduping process
-there will also be weekly lists of ingested monographs to be put against the deduper, Michelle A. volunteered to dedupe these lists

SERIAL DEDUPING

-serials ingest must be manually put into serial bidlist
Mike L.: about 4000 serial items were ingested from CDL
-some ingested materials have complete volume level metadata, some don't; only titles with complete data will be entered into bid list
-at this point no one besides BHL is scanning serials

METADATA/ IMAGE QUALITY

-do we want capability of inserting 'page unavailable'
-tabled until Tuesday's Gemini discussion

RESOURCE ALLOCATION:
-contributing libraries who are not scanning can participate in Gemini feedback, but funding is needed
-long term solution that the IC needs to decide

CONTROLLED VOCABULARY

-have been consulting with third parties to help with name reconciliation, but need to make sure our cross references maintained
-we need to decide what we want vendors to provide/ offer us
-at present we can edit creators at item level, but we can't merge creators
-tabled; will follow up during Tuesday's portal editing discussion

INGEST METHODOLOGY

-Don, Suzanne, Erin, Christine [Becky?] volunteering to be part of a subset of the collection committee to do group analysis on what subjects, LC#s, Dewey#s to use for ingest
-should this be tweaked before each ingest?

INHOUSE SCANNING

-orange bag priorities lower due to ingest
-keep scanning, but tabled until next year
-IMLS Special Collections Grant: Harvard and Smithsonian retooling special collection, looking for a mobile scanning unit, considering Boston Imaging

INGEST BEYOND

-as users find IA content they want us to ingest, they can send us feedback asking us to include in BHL
-if IA record has no metadata, we can catalogue it and create necessary metadata (electronic records)
-a workflow can be established if needed

ACTION ITEMS:
-need a deaccession policy for non-BHL dupes with bad metadata
-divide and manually bid on ingested serial spreadsheet
-need an interface for creator merging in admin dashboard


Portal Editing Discusssion


When editing, always hit SAVE
CF: for open URL purposes, no need to worry about the bib standard, the info passed is all numbers, could be "banana 7"

Sequencing

updates have been made to the numbering schema that has made sequencing much easier
What do we do re: policy decision?
default to minimum standard present for enumeration of serial
bibliographic standards should be followed when possible if standard is in question, if too overwhelming then stick a Gemini ticket in

Item level corrections

click on the identifier for the item
in volume field box you make your edits, follow minimum standard existing for a given serial
What do we do re: policy decision?
default to minimum standard present for enumeration of serial
bibliographic standards should be followed when possible if standard is in question, if too overwhelming then stick a Gemini ticket in

Creators

Users should be able to search on a variety of "correct" names, not the preferred name for different forms of the name
click on "Add creator" to select from the drop-down list of existing names and save
If name is not in the drop down, how do we get the name into the creator list?
Go to the Dashboard, click on Creators, enter field information and SAVE
if you want to edit the creator's name select the name from the creator drop down, then edit where appropriate and SAVE
There is a need to do creator merging...
Can you delete a creator entry? Not via the Admin interface
allow for creator search
creation of committee for creator stuff
CiteBank will prove just how necessary this is -- need to address this issue in the short term
ML: need for extra engineering to fulfill creator issues, currently only 1 table
discussion repeated issues brought up yesterday regarding drawing from existing name "authorities"
go back to the title by clicking on the title identifier
IF ANY QUESTIONS, ENTER AN ISSUE INTO GEMINI

Pagination

click on item identifier
all pages are listed, click on "paginate this item"
hit "lock for editing" so no one else can touch the record when you're paginating
indicated pages vs. page types
whenever indicated page is blank the default goes to the page type
clicking on the thumbnail icon brings up the page image on screen
click on ... of indicated pages
change prefix to whatever you want
OH, make sure to CHECK the box(es) next to the page you want to edit
assign = append vs. replace
books can have figures and plates and you don't want to confuse the 2
some automation for sequencing pages
use prefix as indicated in the book
implied checkbox inserts brackets around the number
are their standards for pagination?
to discuss via staff call
page type does have a drop down, example has an illustration and text?...there is no wrong, the important part is marking the illustration for the benefits of the users
mark all blank pages as page type blank
CF: we ran query to see how many page types = blank and found 30K
SP: wouldn't it be helpful to query how many page types are illustrations, plates, etc.
BE SURE TO UNCHECK pages before making further edits
year? volume? piece? for use in indicating pages within bound-withs - doesn't currently appear in user interface but openURL resolver DOES use this information, otherwise it acts on fuzzy match with volume info assoc. with the item. Important for assigning years for proceedings of a year that are published and bound within a volume of a later year for example.
Open up pagination to crowd sourcing efforts, volunteers, interns, etc.
difference b/w unlock or complete? carry over from windows client, probably one could go away
not sure about how easy it will be to make changes to the asynchrony b/w the page viewer and the paginator...Mike to investigate b/c portal has similar functionality...

portal editing will be an ongoing task as part of BHL

Merging Titles

gap-fill example: Journal de zoologie. 12011 & 13355
merging 13355 INTO 12011 makes 12011 the parent record
who becomes the parent record?
all things being equal, it's up to you
whichever has better MARC record becomes the parent
MARC elements may be moved to improve upon the bibliographic record
write down "child" titleID
go to record you want to be parent record and hit pencil
add item, enter title ID or search full title and hit "search", select option you want and save
what happens to the record of the child title that was merged into the parent?
maintain duplicates when merged
"notes" field does not currently show in UI, but we need a place to enter notes for users, how can this note be incorporated into biblio data UI screen?
DON'T screw with vol. field b/c used for openURL resolver!
add additional notes field for public facing notes, maintain in-house "notes" field currently existing
change "expand all" to something else that invites users to look at volume information (added as action item)

Uploading MARC records

Ex. Bulletin -- United States National Museum no. 277 SIL supplied
cataloged as serial and monograph depending on library
HAR has cataloged as monograph
joe has method to use URLs to extract records via XML but must happen 1 at a time, need to use MARC Edit for batch uploads
MARC 21 used when?
upload MARC XML by browsing and selecting file from your computer
error message appears when no error, Joe considers it a good sign, to email Mike the XML record for Mike to investigate
go back to dashboard
search for MARC record just uploaded and Add item, search for Bulletin -- United... and select no. 277
check "publish on the BHL portal"
error message received, but this is ok
and SAVE
while we're trying to deal with errors we need to communicate error occurrences to Mike via =debug URL
how does this affect the openURL resolver?
what do we do about a case where 1 item is numbered differently as part of different series/serials - is this common enough for us to need to figure out how to solve it?
you can have 2 or more MARC records that point to the same item but the item information needs to have multiple volume fields that can be associated with a given title ID
Ask a librarian for citation resolution as well
OH! and you have to activate the series statement -- Mike, can we make this the default? Yes, part of the import code.
simple fixes - everyone can do
major fixes - do in conjunction with institution who contributed it
ILL when can for gap fills if institution with book is no longer scanning. However, when marc record is not associated or available at institution who will scan, institution who has book will send to their scanning center, but institution still scanning will pay for it. Contributing library will be library with book, sponsoring library will be library paying for it.

SP: Do we create pool fund for gap fill billing to do this? So IC presentation item - create pool fund for gap fill scanning with institutions no longer scanning that do not have money to scan.

Gemini Discussion


BL: Four major things to cover: workflow, automated response for user feedback, error button, and web form available in portal - changes?

Workflow

typically, at Smithsonian, BL and GD enjoy Gemini. GD checks Gemini daily to address current issues. BL addresses backlog. Preliminary stats: 344 issues in Gemini; 396 components assigned to those issues (categories to classify those issues). In Gemini, there is Excel export function that allows you to export everything you want to see in Excel. Left navigation panel filters by component, status, who it's assigned to, date.

Backlog - 50 issues not addressed (unassigned). No one has looked at them and no one is taking care of them. BL doing them on an ad hoc basis. Group needs to commit to checking Gemini weekly - maybe 20 minutes or something to commit to Gemini to follow up each week. Problem with Gemini comes with back log issues not resolved for some time. Need routine.

SP: monthly scheduled calls - wants to create monthly call as deadline to everyone to make sure they're checked Gemini by then.

BL: Wants more regular checking than monthly. Grace checks daily - others need to address issues faster.
Only reason people get notes for what they're assigned for is because they're set as issue watchers.

ET: RSVP - we need to use this more. RSVP in the context of Gemini means we need to get back to user and let them know we got the feedback.

BL: If an issue gets assigned to someone, if you're trying to do triage, assign the issue to someone, select a component, (under component is a message that describes that component. ) Really never assign risk levels. type IDs are linked to web form. priorities usually stay with trivial. set status as assigned, and hit update. This is all that happens in Gemini. The only way people get emails is when someone is added as an issue watcher. If everyone has a regular time to check Gemini, we can eliminate issue watching in future. Question: what's realistic. Is it realistic to agree to a schedule? Is it better for an email?

DR: Like email better. Also us to see quickly what the issue is and get back quickly.

SP: Cannot promise to check Gemini weekly.

DW: Could be both. Send email and people will decide how often they check.

ML: Does not want to be an issue watcher. Mike sets up outlook reminder to look at Gemini weekly. Issue watcher emails are extra stuff he doesn't need.

BL: Different workflows for different people.

Erin and Grace will continue to set people as issue watchers.

BL: Will go back and add people as issue watchers for backlog.

DR: Can you see everything assigned to you? Yes.

JW: What is the feedback mechanism for editing once you fix it?

BL: Go back into status and update to closed and resolution to something. Only thing to keep track of stats are components and status. Resolution will vary. Status - make sure you close the item when you're done.

DR: Do we respond directly to person? Will talk about.

BL: Drop down list in status has only four options. We really only use in progress, closed and assigned. When you first have item unassigned (default), the only three options are unassigned, assigned and in progress so you can't close item not addressed. Only after assigned can you close it. Grace and Bianca will assign - others will follow up and close the issue.

Comments - really important for keeping us up to date on what's going on with issue. If it's the final comment for issue, select closing comment and set issue status as closed. Comment will then be visible as closing comment. But nothing's gone back to person with this. This is a public site, but some people are concerned about this.

BL: Mike, is there a need to make Gemini private? Is it okay remaining public?

ML: Because of the way it's used, it needs to remain public. Not sure if we can make only BHL stuff private.

Since we know it's public, it's okay.

BL: To set issue watcher, go to "watch this issue."

Responding to users:

right now, what happens is users submit feedback and that's it - get no response. People often submit feedback more than once because they';re not sure if it actually goes through.

Two ways to think of this: handy to have automated response. On other hand, there may need to be issue specific response.

SP: We need an automatic response to say it's submitted.

JM: Contact user if appropriate. We don't need many auto responses. We just need something that says it's been submitted.

BL: Can also search by keyword in Gemini. Users say they want to know their feedback went somewhere.

SP: Thank you for your submission as automated response needed.

BL: Can we have automatic response.

ML: We can do that through the portal. They will get an automated message when they submit feedback.

ACTION ITEM: Someone needs to write automated response and send to Mike.

DR: Would like some generic message to send for certain issues in Gemini.

CF: Gemini is public. Do we want to let users get link to Gemini issue to follow issue. Group says no.

BL: Chris - what is status of Gemini upgrade? How do we get it?

CF: Gemini needs to be moved the BHL; only environment and then it can be upgraded. Gemini is still used as main issue tracking system for tropicos, so they don't want it upgraded. We have to move to another server in order to upgrade for us.

BL: Upgrade needs to be priority. This is high priority.

SP: What is added functionality we get with new release?

BL: Get roadmap visually of all collections issues. We will get change log. Interface will get cleaner, etc.

CF: Answer to Mike - we can work around need to buy a new license to upgrade Gemini.

Error Button

BL: Error button: right now we have button that says report an error. Clicking on this takes you to feedback form - report an error not right language - button not clear - looks like page load error. Button also reflected in item view but no text. Was solution because real estate is issue on item title bar so we couldn't always also have text. Does have a roll-over tag on item level.

We need to concentrate on making the icon very clear.

BM: Many users just assume it isn't for them - think it's a portal error - not a user oriented feature. When they;'re actually in the book they don't know what the icon is for.

BL: Do we need a button committee?

Volunteers for button committee_ JM, ET, BM, MP, BL, CR, GD Also to address web feedback form.

Web Feedback Form

All on button committee need to pay attention to website feedback. Ignore wording at top - just focus on form. There have been suggestions that we change website feedback to something else to invite users to use it for things about book. Committee needs to figure it out. There are also suggestions to change language in subject radio button. Also need to make it clear to folks that when they push the button the url needs to be sent to us. Language needs to make that clear. Committee needs to tell them what needs to be done.

BL: Connie - what about BHL email? How should we proceed with feedback she gets via the email? How do we work it out?

CF: Do we need email now that we have an active feedback loop?

SP: Do we need to keep email address if we have this feedback mechanism now?

CR: Doesn't seem like we need the email.

BL: People want to send us info on permissions, contact to get involved, contact because of issue with book - different reasons. Do we keep all that in Gemini and then field comments to appropriate people?

CF: One tool for all outreach communications.

KT: They will have to send feedback via form, but users don't like that. They want email available.

CF: Increasingly we have websites that have no email - only feedback mechanism.

BL: Connie gets variety of things in feedback email.

CR: Is there way to copy email to Gemini?

DW: Is there way to have button say contact us. Hit this, opens form with multiple categories that gets automated and assigned?

Connie and Bianca will also be part of group to address this.

BL: Please call me if you have issues with using UI.

It's up do people assigned to issues to get back to users. Should these emails come out of BHL email? We can't send our personal emails to people. Need dummy email that we can all send from.

ACTION ITEM: For button committee - deal with dummy email to send emails to users from.



Midday Recap

return from lunch, 1:20pm

SP: - speaking of general and specific issues from yesterday.

Suzanne mentions Matt P brought up need for California Digital Library to be added as identity in serials mashup, so bidding can be done on behalf of CDL. - action item for Kai

SP:- going blue with portal - December

General portal discussion, BHL reminiscing...

QA

QA - led by Keri, Matt = note taker

1:31pm

KT asks for QA numbers.

Group: What were we required to bring?

MB: AMNH $38,800 total costs for Scanning.

DW: NYBG

a $2200 total 2008 $800. total 2009
b $200 one way

DR: MBLWHOI :

a. single non rb vol - free

b. Via cart = $785 per truckload each way

c. Hand carried $250 each way

d. 12,000 per year shipping avg

JD: MCZ
a. $150. per load

BL: Tom said- money may be available for transport

KT: at summit : cost of no QA = prohibitive.

DW: moot point with ingest...

KT: a. we can't fix ingest materials


MK: This above info, and this discussion
is to gather info for IC

JM: In a perfect world at NYBG QA 2009 would be done.

Martin: Emphasized this is all for information for IC.

JW JM - Not all institutions can commit to QA as a matter of reality

DR pointed out that what we spoke of in NYBG QA Summit was to develop
best practices whether or not this would actually take place globally

SP: IC needs to hear status QA, and what nneds to be done so IC will respond to this issue.

BL: Pointed to doc. in Wiki: Major QA Concerns

MK - he hopes that this is the document which will be presented to the wiki

GOAL for this is for IC presentation leading to deliberation

Where will transpotation funds come from MK+ from contingency fund

BL: If money avail for shipping will all do QA

DR: a. if you are solo Qa person = 23 minutes per book

b. for lib assistant = 2 books per day with interruptions

Grace - time takes 15 minutes per book

MK - possibility of payment for transport cost only

MB: AMNH - will not work for us

SP: Group needs to recommend
We will do QA if you give us some form of staffing to perform QA


Bianca - Look at this globally QA, Gemini all part of same process.

MK from this point forward do we want to ask for reimbursed staff time for this

DR If we add all of these to our plate of tasks

MK Does group feel that QA not possible for group to commit to?

SP Question of employing a roving BHL assistant/technician? For shipment periods?

DR Retrospective QA discussed in NYBG Summit?

ALL: Was decided in NYBG - NO.

Item being discussed is roving individual.

MK - 100-125,000 a year, is this what the group is speaking of?

Grace- Full QA done here does work in terms of IA improving their work.

SP - QA is needed, we do not have staff to do this.

DW is scanning $ over?

MK - this is how it is for next 2-3 years

DW- Down the road in 3 years, we would still need assistance.

BL -Smithsonian and MCZ and a couple of NYBG shipments is total of what is being scanned.

Will we describe all QA at "just in time" = Gemini will be our QA mechanism.

MB - AMNH committed to finding errors, noting them, and eventually rescanning.

SP - If IC finds more money, the future should include Scanning and QA money.

MK & JW : did and analysis 20 cents added to 10 cents per page = 30 cents per page, 20 cents goes to staff QA costs

2:15pm

The next time we go for money it cannot just be for scanning.
Looking into the future - who will take care of maintenance in 10 and 20 years.
Look at maintenance.

Martin: Philosophical point : We are doing BHL because we are persistence institutions, is because BHL is a part of our future as libraries,
as we shift our activities in the future.

JW points out that IC always was concerned that staff money was not there for us from the beginning.

2:19pm



Public Facing Wiki


BL : Chris started it up, Grace Erin and Bianca brainstormed and came up with some ideas for our public facing wiki that they will present to all for opinions, thoughts, etc.
4 topics: architecture, email/contact info (this issue will be addressed by the button committee), pages in question and who will edit

Architecture:

(see page outline handout) idea is that it will link from the portal, instead of having the top links go to portal pages (feedback, about tools, etc) will link to the wiki pages. This discussion won't include layout of content on the portal or specific discussion of links on the portal and what we're doing with them.
-Home page of the wiki currently has "About" content, and links to presentations (slideshare). BL feels presentations give aded info and should be linked to. HP also has links to collection development policy. MK thinks the presentations should be scraped from the private wiki and maintained there. DR feels some of the presentations need a little more context and may be confusing to users. BL explains that we can 'share' pages between our private and public wikis since they are both on wikispaces - this makes it easy to maintain, but we have to remember what info is public. HP also contains a link to affiliated digital projects (links to the page on the portals with online details) that lists all the various BHL members individual digitizing projects. DR suggests HP page is too long and should be broken up into individual pages

ACTION ITEM: make each section on the wiki HP into a separate page.
-Portal FAQ page (scraped from the internal wiki)
-HELP aned tutorials: instructional videos
-Documentation: workflow, presentations, collection development policy and other documentation

ACTION ITEM: need to have a page for developer tools and APIs use the current Tools page content.
This page needs to be a top level link on the navigation menu. Thought: have a separate page for user related tools. Thought from MK: make sure blog content about developer stuff is on the wiki page

-Licensing and copyright (the current portal 'copyright' page) CC info, and info about contacting for permissions. DR: also need a place for due dilligence information (the link that fails from IA items BHL.org/permissions page will redirect to the wiki licensing page). There is another permissions page. Maybe there should be 2 sep. pgs one for 'collaborate with us' permissions type stuff and the other for 'use our stuff' copyright type stuff. Where these go on the top level links can be left for later.

-'Dissemination' page: call it Community or Networking or Labs... MK would like to see a page with a list of all the other mashups and other people's use of our stuff.

-Contact and Feedback: a page to explain use of and link to the Gemini web form and consolidate who to contact for what (the old contact page content). Becky suggests we look at the Zappos contact page. BL suggests that all contact info should come through Gemini and eliminate all email addresses from the page. Problem currently with the email address not forwarding: Chris says we should just remember to occasionally check the email, but take the address off our website. MK would like to see a list of BHL staff and IC staff names and contact info:
ACTION ITEMS: make a new page with that contact list and link to it from the main contact page.

Pages In Question: 1) (online details) members digitizing projects list with links See other decisions above
Email: see above

Who edits? Answer: everyone who wants to who is a member of BHL. GD: need to have a wiki master gardener who is tasked with making sure that necessary developments and news is included, and look and feel is preserved. Chris and Mike will be in charge of the tools and api pages, but for now BL, GD and ET will be the editrixes, pulling in other people as necessary. MK suggests that we put up a survey tool on the new wiki asking people what they want to see on it. BL: if we want to wiki to be integrated with the portal, having a uniform header with links and such. We need to decide what elements need to be in a header common to all our sites.
Admin site issue:
ACTION ITEM: Chris and Mike figure out the admin link placement.**


Encyclopedia of Life (EOL)

with special guests Cyndy Parr and Katja Schulz

BeetlesBHL.pdf
Overview of EOL (slideshow)
Aggregation:
Names information primarily from Catalogue of Life. Also includes IUCN red list information. Maps from GBIF. Images from a variety of sources.
Number of text objects, sorted out by chapters. Each chapter or subchapter may come from a variety of sources. BHL content added.

Content Partner Login for those that want to provide content. Requirements for how data is put into EOL, licensing, etc. Partners can get information such as comments on objects and usage statistics so they can see what is happening to their content on the EOL site. EOL does not change any of the content that the partners' provide - it is up to the content provider to fix the content. LifeDesk (in Drupal) will format the content for the provider. EOL is formatting some content for providers. Using many different options for data exchange depending on the partner.

Images with yellow background - not curated. Red backgrounds have been tagged as problematic. Aggregated information goes onto a page (yellow background). Curators then review and the general public can comment and tag, and then the final curated page is available.

Fellowship program - some money available to provide graduate students with 1/2 funding to develop pages, curate, and other tasks. Will have approx. 15 spots available this year.

Goal: page for every species. Have 177,000 pages with some vetted content (something on page that has been reviewed by a scientist). Vetted content numbers doesn't count BHL content (BHL doesn't count towards total). But for a significant number of taxa, BHL is the only content.

Users want improvement to how BHL links to EOL pages. EOL hasn't run focus groups specifically on the topic.

Beetle presentation (slideshow)
Beetles are more than 20% of described species so they are very important to EOL success. <20% documented online. Probably about half have only type specimen and initial species description.

Strategy for beetle development since very little to harvest:
Engage coleoptera community to locate relevant resources, compile bibliographies, identify type specimens, find volumes with descriptions for scanning.
(TG: having names database more easily to navigate would be helpful so wouldn't have to go through species by species)
Coleoptera community could also proof and correct pdfs, process plates, translate.
Good potential for crowdsourcing
(TG: some suggestions/best practices for compiling bibliographies would make the reuse of this information easier for BHL)
Possible funding opportunities?

Other crowdsourcing ideas:
divide plates into individual species and uploading images to EOL
users can manually pull out relevant information and add to species pages
wishlist page for special projects
give users "karma points" for participating so people can show the work that they have done

BHL appearance in EOL - Ideas:
Lists are currently long, just alphabetical. How to improve?

Name finding tools give NameBank IDs. Patrick Leary just provided list of EOL species IDs so could link BHL and EOL better.

EOL pages include links to original page sources. Not sure how many of the original sources are linking to EOL.

Are there crowdsourcing things that EOL can help with? Contact Cyndy or use feedback on EOL page ("user voice").

EOL staff at SI do not have programmers - programmers are at MBL. They have to prioritize what they are working on. EOL Informatics getting new manager in Jan; EOL Product Development person going to start as well.

EOL Informatics Group monthly development updates will be forwarded to BHL list so we can see new things about EOL.

Mission and Collection Development Policy Discussion


Bianca's PowerPoint Collection Development Template for BHL.pptx

Working mission -
BHL is an international library collaboration of natural history museum, botanical garden, and biological research libraries working together to digitize the published literature of biodiversity held in their respective collections and to make that literature available for open access and responsible use as a part of a global “biodiversity commons.
BL: wants to establish a mission and collection development policy that we can check out work against. Ultimately, mission and CDP needs to be approved by the IC and posted on public wiki. Serves as a guide for future digitization and ingestion of future materials. Conceptually bring everything BHL does together.
BL: Who are we? Why are we doing this? Who are we doing this for? Mission needs to reflect our broad audiences.
Judy: this is wording from MOU at outset of project.
BL: does everyone feel good as a start? No response – Yes. Do we need to add elements to this? Sufficient?
Doug: digitize is restrictive? Bring in content instead? Make literature available is better wording?

Remove the following: types of libraries (but keep the word libraries), the word "published" (manuscripts could be included), and "biodiversity commons." Make it short and clean.
How do we capture who we are?
Sustainable use is fuzzy and a better term – don’t use preservation. Open access and “responsible” use – Access is sustainable.
Bianca and Don will work to finalize statement - want it short, sweet and "tweetable."

Combination of strategic goals and mission goals – these goals are from the plan.

Est. a major corpus of digitized biodiversity publications on the web. – we’re scanning, working with content providers, permissions, Martin: est. parameters of what collections policy / corpus is biodiversity universe (corpus ) is.

Improve access to accurate, document information about the world’s biodiversity. – how do we know it’s accurate? A lot of revision in literature.

Improve the efficiency of biological research for users. - provide open services for re-use of data, anywhere and by anyone.

Preserve the textual record of biodiversity for the future. – should be a goal! – BHL will ensure that the corpus of biodiversity literature is available for future generations – end sentence there. Add community contribution idea? Do we want to encourage that? Repository? Effective technical practices? Are we going to be there to make sure literature is available in the future?

_strategic_Sustain the project into the future.

Ensure that the BHL is widely known, understood, and used by scientists and the general public.

_strategy_Internationalize the BHL. – are we have to provide a global repository? Covered in first one? We’re already reaching out. Shouldn’t be its own separate goal? This isn’t just for use. Ensure that the legacy of biodiversity literature belongs to all of humanity. Cooperate with other efforts to ensure efficiency. Sharing workflows. Sharing knowledge. Cooperate with other BHL things, other international collectives of biodiversity literature. Use globalize rather than internationalize??

Martin: do we need to deal with this since IC is working on this? BL: wanted to set goals to work with mission. Don: we need to work on Policy and needs to be based on mission and goals. We need a base to work with. Suzanne: making sure we’re in line.

Suzanne: we’re doing action items – are we succeeding? – are we in the right direction?

Bianca and Don will work further on mission and goals.

BL: propose a template for CD Policy. Proposed elements for inclusion:

Nature of collection – analog content selected and digitized by BHL member and contributing libraries – preservation model – martin – preservation is :loaded: - sustainable is a better word. Material acquired through various avenues falls under different preservation orientations: Archived, Served, Mirrored (citebank), Linked (materials is hosted elsewhere and library points to that location)

Already digitized content ingested into the collection – portal model

Community contributed content – institutional repository model

Citation database – can/should citations considered collection materials.? – tool to access content. A narrow bibliography does though become an item (very valuable) – potential to be linked with that is in there. Metadata repository model? Repository the best word –reminds SP of a vault. Dynamic repository instead?

Becky: use of word Archives – we won’t have the level of control that speaks to archival practices. Is there a desire (regardless of feasibility or money) – to this? Ideally, we’d like to, but just not feasible right now. Use curated collection instead? What about going back to goals of preserving textual record? Do we want to have a preservation goal if we really aren’t digitally archiving. Preservation goal really means the print or original record. We don’t want to give the idea we’re scanning and tossing.

Are we really curating, or just digitizing everything?

Resources
NSDL Collection Development Blueprint, CDL
Need more sample policies – member libraries, local policies in digitizing analog materials, policies on inclusion of materials into portals, IR Policies?

Element 1
Why is collection being made? (from BHL about) – why we’re doing what we’re doing

Element 2 – Who?
Who is producing collection? BHL members, contributors, other institutions (ingest), vetted scientific community members

Element 3 – who for?
Intended for researchers, scholars, and scientists…
What about citizen scientists / amateurs?
Public?
Librarians? – connecting users to these materials – sometimes tools are clunky and difficult.

Element 4 – goal
To enrich and advance research in the field of biodiversity by digitizing, aggregating and making freely accessible the world’s legacy biological literature. Don: what is the purpose of the goal? Goal of collection? Should we remove “legacy” literature? What about current literature? Let’s remove “research level”? If we’re for the public, let’s make the impression it’s there for anyone who wants it. Primarily scanning scholarly works, but anyone can use it. CD Policy is a guideline. We need a targeted sense of who our users are. Conflict. Let’s talk about the kind of literature rather than the use. Primary collection goal is for scholarly literature. Scholars are the primary clientele. Let’s remember that open access is important. Let’s keep scholarly in there. Who we collect for and who we serve are 2 different things. We’ll est. parameters for what is included. The ingest called into question a lot of thoughts about the collection itself. We felt that things were more “controlled” prior to ingest. Having some things out of scope in our collection is okay. A CD policy needs to be the “ideal” guide. Is there “core” material being scanned? Strike balance between bringing things into collection and then taking a step back and detailing what our collection is and why we’re doing it.

Nature of materials = content +services = research materials, reference, datasets????, collections, serial publications, tools and products??????, community services????

Collection building strategy – selection of materials – who does and how? Rules? Materials in public domain selected by staff, requests vetted through librarians, permissions, ingest methodology, established bibliographies.

Selection criteria
Relevant to study of biodiversity? Some researchers are very specific, very loose term, some discipline-oriented, but it does related to biodiversity – use “biodiversity disciplines”

takes a scientific approach,
high resolution and quality,
persistent, enduring, copyright compliant, open access philosophy

John – open access vs. creative commons license. Are they really compatible? If it’s open access, don’t you just take it and use it? Can you use it for commercial purposes? We are restricting how you might re-use it? Nothing we can do to stop people from re-publishing and re-using it? If they’re concerned about following the law, they’ll contact us and do it the right way. If they’re law-abiding, we can get money.

Support work of EOL? - we’re already doing this. Taxonomic literature.

Breadth

Core taxonomic literature – materials related to the wide range of field that impact study of biodiversity research – zoology, botany
Supporting literature – anthropology, geology, paleontology, evolutionary biology, conservation biology – horticulture move here?
Ancillary literature - ecology, agronomy, horticulture, natural resources management – anthropology here instead?

Need a group to work on the divisions above. Maybe only 2 layers? Ancillary - some things from ingest may be related to this. There are fine lines, what is core today or even yesterday, what happens 100 years from now? Let’s introduce concept of time things this discussion? What about a timeline? Things important in past may not have same impact now.

Collections committee

Defining more precisely what core taxonomic literature is. Are there subjects/ call numbers that directly correspond? Community vetted taxonomic bibliographies. Known taxonomic tomes. ID and review of biodiversity taxonomies such as in HALL to differentiate supporting from ancillary literature. Review of subject headings and call numbers used for ingest.

Let’s look at our favorite resources. Make Wiki page of subject, resources, etc…

What are the broad overarching categories we need to spell out?

Organization of the collection???

Deaccessioning statement – un-publish content out of scope? Removal of content in violation of copyright, materials missing large chunks of content?, duplicate copies – is there a cut-off for number of copies? Redundancy instead? Certain level of quality that is totally unacceptable?

Ask scientists – what are the core taxonomic resources?

Tom Garnett Questions Nov 09

- tgarnett tgarnett Nov 9, 2009

1. On the question of would your library undertake quality assurance if transportation funds for the returned books were available – I gather the answer is no. Correct?
2. Several questions were raised about capabilities for CiteBank etc. From the unstructured discussions would it be possible to come up with a specific list for review? Even better if you can achieve consensus, a ranked list? The action items are close in this regards but needs tidying up. This is important because there are several other parties with development priorities also.
3. On the Controlled Vocabulary section, “have been consulting with third parties to help with name reconciliation, but need to make sure our cross references maintained and we need to decide what we want vendors to provide/ offer us. Glad to see research in this area. This may result in a funding priority and I am interested in the costs.
4. The notes state, “-orange bag priorities lower due to ingest.” That may be the consensus of this group but it is not mine. We have several offers of extremely long-standing wanting to give us already digitized articles with permissions to load into BHL. The longer we wait, the more likely they will go elsewhere. BioOne has an offer on the table to give us even more articles. Putting this at a low priority jeopardizes an important acquisition stream.
5. The excellent point about the need for a deaccession policy was made. Is anyone working on a draft?
6. This statement is in bold in the notes, “We will do QA if you give us some form of staffing to perform QA” How would this work? Suppose you have for a given year 300 hours of QA work and BHL central increased your subaward by $XX dollars. What would you spend it on? How would the receipt of the funds, which would be far below the amount needed to hire a full or even half time employee, assist your library in doing QA? If you answer that the funds would reimburse your library for existing staff to do this work instead of other work, how will this be documented? All the subawards have auditing requirements. Can we think of creative solutions here? We are still looking for scanning funding and I am quite confident to put a figure of >.30 per page as a relevant cost. But we need to understand how the money could be used.
7. Concerning the excellent discussion of the Public Wiki, the highest priority IMHO is improved help info linked to the appropriate sections of the Portal.
8. I was pleasantly surprised to see that your group reviewed the current strategic plan in some detail. The BHL IC did also. I welcome incorporating your ideas and suggestions in the next draft.
9. Sorry if I missed it but what is the “Button Committee?”
10. I don’t understand the “walled garden text” from the Action Items.