BHL
Archive
This is a read-only archive of the BHL Staff Wiki as it appeared on Sept 21, 2018. This archive is searchable using the search box on the left, but the search may be limited in the results it can provide.

BHLMeeting2008NovemberNotes

NOTES FOR BHL Meeting November 6-7, 2008
MBL/WHOI Grass Reading Room
printer friendly
  • 8:30 - 8:45 am: Welcome and round robin introductions (LEAD: Martin / RECORDER: Suzanne)

Welcome and housekeeping tips from Diane.
Martin: EOL bling: Pens white made from Zea Mays; Brown is recycled sticks; and totes are available contact EOL at Breen Burn.
Round robin introductions - old and new friends introduced: Attendance included:
Martin K; Chris, Keri, John M., Kevin; Gretchen; Don; Doug; Phil; Suzanne; Michael; Bernard; Maggie; Joe; Matthew B; Diane; Matt P., Jen, John F., Ryan

  • 8:45 - 9:15 am: BHL EOL Update and other news: (LEAD: Chris and Martin / RECORDER: Suzanne)
BHL: 9 million pages done to date. Taxonomic working group (TDWG) meeting was attended by Chris and Phil – presentation on BHL was well received. The overall group of scientists loved BHL goals and what we have so far. They want more content, API, and services; data mining etc. It is important to keep in mind what do they want and they do not want us to have. They don't want us focusing on solving all the “dirty” metadata and have that getting in the way of the advancement of services we (and they) might want to build. Taxonomist what do they want? Duplicate scans intentional and unintentionally: the group seemed to be okay with duplication. They are not looking for goal medal – approved version. They understand that the metadata has flaws, and duplication happens (Embrace duplication). Critical that identifiers need to persistent but are fine with duplication. They understand what we have and what we are doing and “on our side”!

BHL Europe: NH London and Berlin (Humboldt?) leaders of the BHL Europe extension. Questions are there! Issues regarding bidding and what other scanning efforts etc will need to be kept in mind. Final negotiation project 3.4 million Euro. Smaller universities and libraries that hold some things we probably are missing in our BHL partnership. Truly international effort.

BHL Exports of title and page for EOL species pages. The December release will have lots more BHL content and names and links in to BHL.
How does EOL get BHL data: Writing monthly scheduled job 1st of the month title volumes pages and names on pages and alternative identifiers. Name services to the EOL specs. TDWG group wants services so that others can get the data as well. People want simple down load – tab deliminated – lower the barrier – for simple one grab and go.

Date stamped for Taxon finder and name bank and send it again with a time frame. Re-indexing for new words and names could be done using the data stamping.

Phil is on the Moore Grant funds for the Fedora build. 1st year was a test on Tropicos as a preservation architecture. Metadata for images – specimens and life in field. Export to Fedora and then services on top of that base.

BHL and Fedora : Scaling issues could be a problem with some unknowns about the next version: version 3. There seemed to be some conflicting priorities that made it a bit shaky if Fedora will work for the overall BHL architecture. BHL funding is tied to IA for Fedora collaborative pieces.

Article repository will be in Fedora. Data model and interface to article content is being built in Fedora. Smaller pieces of information that seem to already be in Fedora based systems might be the model for the article repository. BHL main portal dream of distributing content (currently everything is at IA and we all know that is a risk – 45 terabytes) So Fedora might be container to transfer. Fedora object from all the files with a “title” from IA. 22 thousand objects of title level. It has been determined that Fedora at page and name will not work. Up level from page to article from article to volumes.
Fedora content can be on on standards like METs; foxml; adam pub etc. Content model then Fedora can know how to handle it. Haven’t played with METS yet with the Fedora.

IA doesn’t really produce things in a standard format except for the MARCXml and Dublin Core. Kirtas does deliver in METs. We either need to come up with a book data model that we all could use. Few work in this way. Everyone is doing own content and look at the OAI model for delivery. BHL is doing stuff different with a common repository with same stuff. We might have to innovate. The attitude might be that we need to do what we need to do and be the defacto standard setter for getting this type of material handled, stored and delivered, etc.

  • 9:15 - 10:15 am: Scanning Workflow: Serials Bidding: Part I (LEAD: Matt P. / RECORDER: Joe)
Preparing content
10/17 conf call discussed bidding. All libraries identify what to scan and bid on that.
Univ. Helsinki contacted MBL to offer volumes to fill in gaps of Finnish journals.
This is the time to figure out issues, how to fill gaps, etc. as project is in its infancy.
Bernard – serials list was meant to indicate intention of what to scan.
Suzanne – should we work to clean up serial mashup? Or in portal? Should they talk to each other?
Chris – they should talk to each other. Cleanup of data –
Looked at bid list online
Matt brought up issue of successive vs. latest entry cataloging. Discussed on 10/17 phone call. How do we deal with this? Different libraries handle serial titles differently. What is done the most? Should there be a BHL standard for this? Nuances must be taken into consideration so that 200 yrs worth of cataloging is presented accurately.
Chris – are we adding data that’s not in our catalogs?
Yes.
Maggie – successive entry is the standard now for cataloging.
Suzanne – not everyone has followed CONSER rules. Do we take time to correct problems?

People need to be looking for exact title in portal as it is now.
Bernard searched for example in bid list. Annual report of the state entomologist. Indiana – 4 times on bid list. Sorted by title. Looked at dedup view. Diacritics often pose problem in searching for titles.
Merge records, then bid.
Key question: what constitutes a full vs. partial bid? Is everything out of copyright a full bid?
Takes a lot of time to find the right title. Best to search by title.
Make judgement as to which record to use. Matthew B. said he usually picks record with most institutions attached. Makes separate bid for different chunks, if he’s bidding on disjunct chunks of volumes.
Can use note field to indicate gaps.
Suzanne mentioned possibility of having BHL portal inform bid list of what’s been scanned.
Who in practice has applied a partial bid and other institution fills gaps?
Ex – Occasional papers of the san diego natural history society. MBL & Harvard complementary bidding. Appear in BHL portal as 2 separate titles. Differentiated on contributing institutions. Tools on portal may be used to merge these 2 titles. Rich Pyle and other scientists want to know where volume came from with records behind that. Provenance thus is important and must be honored. We also need one point of entry for users. We can accommodate both. What’s the best way to get there?
Bidding should be done before scanning.
Chris - Use mashup serial title id as title id put into wonderfetch – matching in portal done via this id Libraries contribute this new title id via wonderfetch.
Primary matching would be done with this id. Makes wonderfetch important.
Biweekly mtg with all bhl scanning center coordinators will begin – on call with R. Miller.
ID from mashup to portal via IA – preferred method. Could also be imported via bib records or manually after scanning.
Diane – this may help with monographic series. Monographs could be grouped under serial title, using title id. Can monograph titles be searchable (indexed) if this was done? Would have to establish a rule as to when monograph title would be used in record in portal, but it is possible.
Mike L. – idea of full text search under investigation. Currently search only on 245 in BHL portal.
Definition of partial bid & what other fields are needed in mashup.
How to indicate gaps?
Perhaps more “subranges” to indicate “starts” and “ends”
Perhaps a button with option to add an additional range, rather than going through the whole bidding process 2 or more times.
Possible to get feedback from bhl portal to indicate gaps? It is possible for portal to send a list to mashup of what’s been scanned. Have to establish schedule and timing for this.
Action items:
Portal – minimal cleanup & merging in bid list
Chris, Keri, Bernard to review following proposed serial title ID process:
Assign id to newly merged titles in bid list
Id put in title_id field in WF lists
Portal uses this as field to aggregate scans for serial titles from different libraries
Currently using ILS bib id # in title_id field in WF
Matt will organize serials call to define partial bid, related to copyright. Some email discussion will precede this.
END

    • Bidding - fulls and partials: current definitions and future definitions
    • Partial Bids
    • Merging
    • Portal notification of mashup
  • 10:15 - 10:30: Coffee/Tea Break
  • 10:30 - 11:30 am: Scanning Workflow: Serials Bidding and Mongraphic Deduping: Part II (LEAD: Diane / RECORDER:Keri)
    • Workflow of bidding, partial bids and after scanning
    • Monographic series
    • Monographic deduping
    • Future Clean up work

Introduction of Ryan – programmer on the deduping tool.
Diane Question – are people going back and modifying bid after scan? I think we’ve decided to use the portal to rectify scanned vs. bid. Action item Michael, Bernard, Chris will talk about using Portal to rectify serial holdings vs. actual scans. Currently only AMNH is going back and editing bid list after finding gaps, though they aren’t having many post-scan rejects.
Matthew B. – Q: What happens if we bid, then find a gap, but no one in BHL can fill the gap? Should we ILL the missing volumes? Suz – should we just keep a list of these (parking lot/wish list), and maybe go back later when we have more partners and see if we can fill in? We should maybe taking the long view – leave the titles in the ‘parking lot’ until we can fill (Helsinki example). Action Item: Need to create dedicated page on the wiki where we list these items, though they should also be findable in the mashup or in libraries’ individual inventory systems. Is there any reason we need to know why the book was rejected? Probably it is more individual library issue. Bernard – how much of a problem is this percentage wise? Shouldn’t we just individually keep track of our post-scanning rejects, then at some point we can pull together that information, put it into the mashup and come out with a list of things we need to fill in. For now, we would also like to put a list of these titles in the ‘parking lot’ on the wiki.
Diane – mashup & monographic deduping. Are we checking in both places for monographic series type stuff? MBLWHOI is, no one else really does. Seems this is not a huge problem yet
Suzanne – now that we’re going to do the serial title id linker in the portal, for those items that I know are cat both as a serial and a monograph, should I be using the monographic record, but also passing the serial title id so we can find it both ways? Mike – currently we can’t find that title both ways (through the monographic title and the serial title). An additional problem is how we’ve noted in the monographic record what the associated serial title is. We don’t want the portal to be limited by our old MARC centric ways of cataloging. If we make things too broad, though, it will lower the precision
Note: for bound-withs, we aren’t deduping on that second title, if it doesn’t have it’s own MARC record. (groan) We do have OCR, though perhaps eventually we will be able to break down the components, similar to structural markup of articles. The other part of the equation is training IA to do bound-withs, start and stop scanning based on notes we supply. Suz wants to know what others are doing – are they cataloging at point of finding bound-with? Is their catalog just clean? SI’s is a mess.
Monographic deduping tool – what other functionality would we like to see? Ryan has recently updated, so we strip out punctuation, but we’re still using exact title. Would like to move to indexing the titles so we can do fuzzy matching. Now have .csv export. Coming soon: mark as rejected, export of updated list with rejects/deletes.
John M – would like to have bulk delete of titles ( “select all” + delete). This can be done.
Question – are we updating the deduper when titles are rejected? Diane, MBL is (either by deleting and uploading new list, or individually deleting titles). NYBG has not been updating after deletes. SIL has not consistently done this.
Is there any reason the portal and the deduper would need to talk? What are we doing about gaps in series? Have been handling these on ad hoc basis – either because we find they are scanned as a serial, and we can get the volume that way, or have been asking other members to fill in. Not often. For those that are cataloged both ways, should we bid on the title as a series, but scan as a monograph?
Suzanne thinks the librarians want to get to it both as a series title, but also at the monographic title. Should we just edit the record in the portal? Should we try to clean up our catalogs? Seems like we should try to clean it up at the portal level.
The serials title id solution solves the problem of provenance and collocation, can it also solve our monograph/series problem? Yes theoretically it could BUT where do we get the series or monographic MARC record (whichever we lack) this would be difficult. OR could we put the authority record that includes the series title trace into the portal?
The flip side is for a serial where each vol. is also essentially a monograph (memoirs of MCZ) .
Lets just keep doing them as we have them in our ILS currently (if it’s cat as a mono for you, do it as a mono, etc) with the addition of the following: if it does appear on the mashup, we should include the serial mashup id in the wonderfetch of the monographic record from your ILS.
Final note about the mono deduper – are people keeping track of their rejects so that we know what we haven’t done from each member’s collection? “Reject” button vs. just deleting the title. Do we try to track what’s being rejected (reject button), vs. just have a huge list in a few years with everyone’s rejects (use delete button, but each library keeps track of what got rejected).
Conflated with the issue of member A scanned a title that both A and B own, B wants to point to the portal with A’s book – how do we keep track of that? (not answered).
Most duplication occurs among MBL vs. everyone else and among the Botany libraries. Perhaps we should just keep plugging along, using the deduper, and leave the monographic gaps for future generations to ponder.
Suggestion is to send the electronic invoice from IA which includes rejects load back up to the deduper so we can rectify.

Summary/Action items for Serial Bidding Part I:
For things that are also cat as a serial, make sure to include the mashup serial title id in the wonderfetch when sending.
Monograph gaps – for future generations.
Action item Michael, Bernard, Chris will talk about using Portal to rectify serial holdings vs. actual scans.
Action Item: Need to create dedicated page on the wiki where we list serial gaps.
Action Item: All BHLers will keep track of what is getting rejected on their local systems
  • 11:30 - 12 noon: Summary and review, assignment of action items (LEAD: John Mignault / RECORDER: Suzanne)
  • 12 noon - 1 pm: Lunch at Swope
  • 1:30 - 2:45 pm: Working with the BHL Portal (LEAD: Chris / RECORDER: Kevin Nolan )
    • Handling duplicates in portal (see below for details)
    • Implications of editing & merging in BHL Portal (see below for details)
    • Collocating series and serial runs (see below for details)
Chris on how content gets into the portal—admin examples.
To get to admin pages: click on About in the BHL portal, scroll to the bottom of the page, and click “for users with administrative rights”. Users are then taken back to the About page and click on Admin. First person to log in on any given day must wait for system to get up and running; usually Bernard in the UK. Admin dashboard—similar to Botanicus. Once in, admins have full editing ability.
Chris on BHL portal admin:
Dashboard includes: admin functions, library functions, reports, stats. And data harvest: all updated information from Internet Archive. Includes items pending approval from IA (45 days). Stats pages contain e.g. lists of titles in production and what is ready for scanning; can be viewed by individual library.
Question from Chris: when to merge and when not to merge?
Answer: “Embrace duplication”: duplications can remain.
If a book is sent to be rekeyed or OCRed again, it is only done once. When to take a volume dark? Volumes taken dark are redirected to on server level.
Question: whose record do you merge into?
Answer: use the best record, record with the richest metadata, and this comes down to cataloger’s judgment.
Use title string to group to find duplicates. Suzanne says the best way to check duplication is to view OCR. Chris says it’s a place to look but not fail-proof.
Example of book that has been duplicated: the Transaction of the Academy of Science of St. Louis: scanned by both NYBG and Harvard.
Mike runs through another duped example: Bulletin de la Socit botanique de France. How to merge duped titles: Bulletin scanned both by NYBG and MOBOT. Diacretic problems might account for some cataloging problems. This issue is mostly an IA problem. Looks like IA will not solve the diacritic problem. So this needs to be solved internally by BHL. Need item IDs in order to merge. Take ID from URL. Click on Pencil icon in upper right of record screen. This appears with edit rights.
Metadata can be edited in the next view. Click to add an item. Add item ID. Click search. Click Add (items have a single title which is their primary title; need to make the merged-to title the primary title) Can also be edited from the item side.
Action: need the ability to add multiple titles when merging.
Important: Must save after editing, otherwise will need to re-do. Non-primary title is no longer searchable at this point. Items keep their ownership. Both are linked to scanned items with chosen one being the primary title; secondary title is there and can still be accessed through ID #; links previously made to the secondary title will exist. Once merged, attribution will not be shown at the record level. Right now, in volume list, item shows who contributed.
Chris asked about pop-up window that might show attribution. Do users care? John M and Bernard put forth that at the title level the user does not care. Action: More info at the item level to include contributor, sponsor, etc. in display.
To get to a secondary (hidden) title, go to advanced search and there is a secondary title available button in the search.
Example: Bird Gods—Chris asks group: how do you want View to look? More links are to run down the right side of the page. Possibly collapses like Windows plus/minus. Volume, download, more info. Icons or text? Action: Martin suggests sub-group: Mike, John M, Chris, Maggie agree to work as a subgroup to come to decision on how the View section will look, and what it will include. To be held next Friday, 11/14—phone call.
Biologia Centrall-Americana (BCA) added into the View differently. All monographs under one title ex.; IA adds as individual titles. The question is: How to deal with this? Will need to pull OCLC from WorldCat? Will need some way to get ID to BHL—responsibility falling to person doing the merging. The primary/secondary issue applies only to dupes (intentional and unintentional); with citations want to be able to see All. Action: need to do some more modeling on this: Chris and Mike.
Question: do we care about keeping IA in synch with BHL and latest BHL changes? Is it necessary? Action: make sure IA archive links to BHL portal for all scanned books, and put BHL first. IA will not allow BHL to edit items they have dropped off. Cannot change what IA has added. What happens if IA pulls out or BHL decides not to use IA as sole scanning vendor? Nice to synch but it is probably not worth the effort as this point. IA sponsor info: that’s how they do their billing. Link in IA site directly to BHL in the view?
Editing in Portal admin: not editing MARC record, just what is pulled from MARC record to database. They come from MARC and can return to MARC. Action: MARC bib ID needs to be made un-editable. This bib ID is different institution to institution. Chris runs through fields in Title. Action: Is Uniform Title being populated? Call Number useless in an electronic environment. Institution needs to be suppressed. Action: ability to apply multiple languages in admin Portal function; change editing to the 260 delim and build pub details. TL2 Author hold-over from Botanicus and will be eliminated. Add Creator is from Authority Table of authors.
All available in live and in beta. BHLers can use log in to go and use and test. Action: Add preceding/succeeding titles in admin; ability to edit the records of other institutions is okay. Need date and user as an auditing tool; plus the last few changes. Action: Matthew and Suzanne to work on this and put recommendations to the BHL group.

  • 2:45 - 3:00 pm: Coffee/Tea
  • 3:00 - 4:00 pm: BHL Portal: What is YOUR Vision of the BHL portal? (LEAD: Chris / RECORDER: Keri)
    • Review "Desiderata" page
    • What information should be delivered?
    • How should you navigate the portal?
    • Should there be an "OPAC" skin on the portal?
    • Serving up of same item to more than one user – implications

What do you want to get out of the portal back into your ILS? Chris – what we have now is a tab delimited export that can give you standard numbers, title, author, taxon names etc. Suzanne – what we would like is a way to separate monos from serials, because they will go through separate processes. Related also to how our users want to filter their search results and display. How are we distinguishing (leader)? Bernard - Problem is accuracy of data, and missing out on some titles if you try to limit to mono or serial only, particularly with series. Chris – we have also had request from Bio One for ‘what journals have you scanned’. The problem is that we don’t have the code correct in many records. Suzanne suggests incorrectly coded records should be corrected in the portal. But we do provide the CAS/CAM info when downloading, so you could export everything, and then filter based on CAS. Leave discussion of weather we want to see m/s download sep in user interface for later.
Bowker – BHL sends data to Bowker, Bowker assigns ISBN for digital manifestation for all. OCLC will create a new OCLC record for English lang monos only with its new ISBN. So, we will get new ISBNs for everything, and some new OCLC records for a subset.
ACTION ITEM Suz needs to look up and see where the old OCLC print number can live in the new OCLC record for the electronic version.
SHOULD the new OCLC record overlay the current book record we got from the ILS? All we get back is the OCLC number (not a file with the new OCLC MARC records) What should we display? All the OCLC number, all the ISBNs, etc? ACTION ITEM: let’s play with the return data from OCLC/Bowker and see how we’d like to display all that info. JM suggests if we want to put an ILS interface on top, we should use MARC. Suz – these will only be a subset (like 12000 titles or so) of the BHL. If we want new MARC records, with new OCLC numbers for the e-versions, we’ll have to create new MARC record for all. Suz. Points out that if we do create new records for ALL as eversions, this would allow us to import all the marc records for all bhl titles into our ILS (since we don’t own the actual book which the old OCLC record describes.)
ACTION ITEM for CATHY (and Suz) should ask OCLC to create new OCLC records for everything we’re getting a new ISBN from Bowker.
Mike wonders – if they assign ids to a thing and we change it, does it invalidate the id? Suzanne can check to see if there is substantial change to the metadata. Maggie – ISBN is for a publishing thing, so it might depend on how minor/major the changes are to determine if you have a new ‘edition’.
ACTION ITEM: review MARC to Bowker mapping. Maggie and Suzanne. Suz will send email to Maggie with Bowker stuff.
Back to interface/display of portal & desiderata. Review and add/change things on the desiderata list.
Open Source ILS on top of BHL? Do we want it, do we need it? Or, is what we have now the new catalog, and we just need to keep developing it. Chris suggests – could we get a grad student or something to work over the summer to put the OS OPAC on top of our data, and our current Portal runs in parallel. We like the idea of a test project (many nods). We don’t know exactly how much work this would be – is it worth it, or should we just work on the current portal. ACTION ITEM: Martin will look into finding a victim (U Ill?) to work with MoBot over the summer. Also will work with John F. to evaluate what product (VuFind, Koha etc.) we might want to try.
Larger question how do ‘users’ want to use the catalog – Martin thinks we have 3 main groups, the ‘suck it in’ data peoples, the libraryland OPAC folks, and uh one other that I missed. User analysis could be a HUGE problem, and a slippery slope. Do we need to go forward to ask for a proper user survey? PARKING LOT: user survey.
How do we want to search, what fields do we want to search on. Currently we have title, author, names, subjects. [question are we indexed by Google? Yes at the title/item level not fulltext]
ACTION ITEMS – additional search functionality:
  • Date range search
  • Anywhere in the record (index is the entire MARC record) and choose Boolean or phrase
  • Ability to do combined search author=X AND title=Y
  • (for examples look at worldcat.org or the MoBot Voyager Catalog, or Zoo Record, iTunes)
  • Filter and sort
END

  • 4:00 - 4:30 pm: Non-BHL Scanned Materials: What Does it mean for BHL? (LEAD: Martin / RECORDER: Doug H.)
    • Simpler items: things scanned by the Internet Archive (e.g. CDL)
    • Harder things: AnimalBase
    • Policy issues vs. Technical Issues
    • BHL-Europe
    • What does this mean to BHL member workflows in terms of de-duping, serials bidding, etc.?
Internet Archive and CDL: A de-duping tool was set up for BLC to de-dupe against Google Books and IA. Used Google API to run a search against Google holdings.
Trying to incorporate IA into de-duping and bid list tools would bog down current version of tools.
If materials are ingested from IA, what do with do with the new materials from Google in IA, both dups and unique. Do we need to be careful of Google books being ingested into BHL?
ACTION ITEMS:
  • Investigate how to incorporate IA titles into BHL Deduping Tool and Bid List (John F., Bernard S., Mike L., Chris F)
  • Explore further ramifications of ingesting. Should we go directly to CDL to get data? (Martin K, Chris F. and Tom G.)
  • Communicate to IC the importance of communicating back decisions made about this because it will be difficult to implement and will require current tools, (Bid List, De-dup) to be made more robust. (Doug H.)
  • Outline framework for community and institutional “in kind” involvement in BHL. (Matt P.)

4:30 - 5 pm: Summary and review, assignment of action items (LEAD: John Mignault / RECORDER: Suzanne)

November 7 (Friday) - Grass Reading Room, 2nd floor Lillie Building
  • 6:00 - 6:30 am: Meet for brisk walk around Eel Pond, counter-clockwise, ending at Pie in the Sky
  • 8:00 - 8:30 am: Breakfast at Swope
  • 8:30 - 9:00 am: Scanning output quality: Opportunities and Challenges (LEAD: Jen Walton / RECORDER: Joe)
    • Experiences with IA scanning
    • Review of materials
    • Types of errors
    • Rejection rates, etc.

Martin: create a list of ways that IA can become a better partner. Interactions, documentation need to improve.
Google more transparent than OCA – comment from U. Illinois.
Action: Martin - Need regular phone meetings (every other week) with 3 scanning centers with reps from each of the contributing libraries.
Strive for consistency across the board, for all scanning centers.
Action: Everyone - contribute to wiki grievance page
What problems are people finding?
What is dark and different shades of dark?
So called dark pages can still be found, but there are pages that are truly dark.
Suzanne: do we care if non-significant pages are missing? Such as covers.
Don: can matter – in botanical literature, covers can be important.
Matt: looks at cover content. If insignificant content – doesn’t matter.
Suzanne: foldouts – whole book gets rejected for a foldout. Can it be scanned and add foldout later? Not at IA
Botanicus – can add pages later. Skipped pages may be added later, but whole volume shows up as recently scanned. At IA, file is closed once item is scanned. Can’t add pages later.
Should we develop this ability?
BHL will eventually have all content and we can do what we want with it.
Parking Lot: Want to be able to add in missing pages in workflow outside of IA
Parking Lot: Frankenbooks – take best parts of each scan
Do we need to follow up with IA on different levels of dark?
Some things with errors do not go fully dark. Must tell IA what we want to be fully dark.
Action: need for completely dark – Martin to discuss in meeting with IA. Completely dark means ID pulls nothing up. Partially dark – curation changes to frozen.
Quality of foldouts – are they sufficient? Some libraries say yes, some no.
Action: foldout scanners from Boston, DC, NY to get together – Martin will organize foldout summit, to include Don, Diane, Bernard.
Suzanne: focus on access, need to get stuff out at LC scanning center.
Jen: quality falls off when there is a lot of shading.
Suzanne: charts illegible
Martin: need to decide what is suitable quality for foldouts.
Diane: found that JP2 often was the only file with good quality.
Volunteers for Martin’s foldout summit – don, diane, Bernard
General experiences with IA scanning:
Jen: Feb issues finally resolved. All pdfs now up. Things are coming up currently. Some things were rescanned. Others re-derived. IA said they would do 100% check of things done at end of last year/early 2008. MBL did about 10% check
Action: for Robert Miller, official definition of QA by IA. What does it mean across all of their scanning operations?
Billing issues – how are people dealing with billing. How verify invoices?
Diane: keeping track of everything coming back. Using metamanager to compare. Over $5k in credit required. Error in page counting on part of IA.
Martin: proposed alternate billing method – only billed for things that appear in portal
Diane: Some rejected items are being billed
Shall we explore new method of billing?
Action item: discuss with exec committee need for new billing method. Possibility of being billed by deliverables – what appears in the portal.
Should better IA cash flow.
Diane putting a lot of effort into checking bills, comparing metamanager, books returned, etc.
John M. uses IA page for stats
Matt: praise for Paul N. at Boston center
Don W. reports problems with mixed up metadata. Keri thinks it’s due to poor workflow at IA. WF helps because they just have to click on the link to retrieve library supplied metadata.
Keri has had to show them how to use WF.
Diane: has to be consistency among scanning stations. All have to use WF, all must be able to do divides, provide e-packing lists back to libraries, all have to be able to do foldouts.
Suzanne: their center doesn’t do divides.
Matt: critical to be able to do divides. So many volumes are bound together. Should be a basic level of service.
John M. – center won’t do volumes w/o contiguous pages – can’t handle
Diane: excellent service, email communication, photos from scanning center. Direct communication with book loaders as well as center supervisor.
Chris and John M.: should we discuss finding a new vendor? May be time to move on. Lack of standards and training across board. No librarians involved in making standards.
Action: Chris and Martin will express to exec council, unhappiness with current level of service from IA. Must start insisting on improvement by a certain date. June is recycling of funding period. Have leverage because Boston is doing a good job, others can follow. Not asking for impossible so we know it can be done quickly. Incentive – will lose funding if don’t improve.
Suzanne: do we need to investigate alternatives?
Chris: research would be good. Higher priority for researching where files will live, etc., if IA is no longer vendor.
Action: let exec committee know we’re willing to appoint a group of researchers to investigate other vendors. NY has already been looking at alternatives. Also NHM.
NHM tried Kirtas – some reservations
John M. reports good things about Kirtas
Tentative fact presentation group – Bernard, john m., doug h. (ex officio), Kevin, keri, martin
Martin: some have worked with Brewster for many years, a partnership element exists for some BHL members. Some think of IA as vendor, others as a partner. Brewster wants to be our friend. We don’t need a friend, we need a vendor.
Continue to add to issues page. General and specific issues (a page for each)
John M.: trust issue – we can’t trust them. Have to check what they’ve done.
END

  • 9:00 - 10:30 am: Managing bibliographies & prioritizing literature collections (LEAD: Diane / RECORDER: TBD)
    • What are we scanning?
    • What should we be scanning?
    • How can we manage it without overlap?
    • Suggested formats for submitting vetted bibliographies from taxa groups
    • Reporting back to taxa groups what's been scanned
    • Can we use tools such as Connotea, RefWorks, or other bibliography mgt system to allow groups to upload their bibliographies?
    • Does the OCLC Collection Analysis tool play a roll? Should we renew our option for this service?
    • Permissions received and the workflow on bidding/deduping and prioritizing http://www.sil.si.edu/BHL/BHL_permissions.cfm

Decapod list hosted on wiki. More will come from various taxonomic groups.
How do we make this work in our workflow?
Permissions list. Bib lists. Chris pulls out titles but still need lots of cleanup. Serial titles are especially in need of cleanup and prioritization.
Multiple lists will be problematic.
Groups want to know when their stuff is scanned. Reporting important.
Most lists come in through Endnote. Chris sorts and prioritizes.
How do we get the content and bring it in so that it’s centrally available and visible?
Permissions list – SQL dbase
Journals may be listed on many lists. Open for ideas.
Bernard: virtual taxonomic library. Federated search for resources. Link resolver. Edit scratch pads for taxonomists. Creates massive list of bibliographies.
Action: Chris – find out how scratch pads could work for this.
Refworks? John F. looked into it for deduping. Would it work for bibliographies and aggregating citations?
Merging title variations a lot of work.
Already structured information. How can all different bibs be pulled together? Should we just use Endnote?
Publicly available, web visibility – requirements
Diane – can we have different list, but then aggregate to find out what top journals in lists are?
John F.: BibApp can do this. Groups can be based on whatever you want. MBL using it based on scientists and groups in Woods Hole.
Zotero Commons
Subgroup to look at options for pulling bibs together to get one master list.
Master bib list, permissions list, internal library lists – 3 current systems. Don’t currently talk to each other.
Erin updates dbase.
How would we use list?
Suzanne: should have one place to go for bidding intentions.
Bring Permissions list and Bibliography together? Can we get everything to talk together?
Can have different functionalities within same tool.
Should the bid list be re-invented now that we have more knowledge and experience. Need to integrate systems. Monographs additive.
Unified tool:
Accept vetted bibliographies in some format (Refworks, etc.). needs a way to report back out.
Ability to easily id and reconcile rejects.
Bid list can be expanded to track rejects. Can create extension with bulk loading capability.
Priority to create unified system – union of 3 tools we currently use.
Develop in parallel a unified tool. Use BHL Europe development funds? Or existing BHL funds?
General workflows and policies are good. Need to modify tools. Make it scanning vendor neutral. Standardized reporting method.
Action: Bernard, Chris, Martin – explore this (using BHL funds or BHL Europe funds)
Permissions data is similar to that in bibliographies. Permissions list is basically a suggestion list.
Creating bibliographies requires scholarship. Collections managers should be involved. For now – manage bibs in Endnote/Zotero. Pull lists together in one master list – publish. Report top journals.
What to prioritize? 20 citations and more? 10 and more?
Action: need a ref management system for managing bibs at least on a temporary basis. Bernard, Chris, John F. to talk to Julius in London.
OCLC Collection Analysis Tool:
Why doesn’t it work? Basic doubt of data. Numbers don’t make sense. Don’t layer on deduplication. Uniqueness numbers don’t make sense. Overlap – all libraries share 25 books. Something wrong with data. Takes a lot of time to get something useful out of it. Takes skill and time to usefully manipulate data. Doesn’t provide much useful information.
BHL data includes only what has been done to date – not what we intend to scan, or may never scan.
Action: Doug H. Talk to IC about OCLC Collection Analysis Tool and recommend non-renewal.
END
  • 10:30 - 10:45: Coffee/Tea Break
  • 10:45 - 11:30: The BHL Article Repository: A New BHL Paradigm (LEAD: Chris and Martin / RECORDER: TBD)
Martin thinks this won’t take much time to discuss. He is an optimist.
Repository idea has come up from the “busy as beavers” community of scientists. Been a topic discussed at IC, but was not well received. Concerns over copyright.
Goal: get BHL articles & other content into this repository
Tom & others have investigated possibilities with EFF. “Safe harbor” concept is possible, yet fuzzy. IC had concerns about big legal issues & being a target. Harvard had concern.
Napster vs. YouTube model. Part of the issues & concerns could be mollified by clear explanation and definitions.
Suzanne has concerns about joining preprints with prints with reprints, etc.
Maggie suggests that the intellectual content of a printed paper can be republished, just not the layout.
Suzanne – the scientists want this. There is a clear need.
Martin – how can we integrate existing repositories for a ‘quick win’? Probably can’t. Will frustrate Suzanne.
Action: Continue to build your own Institutional Repositories. Chris & Co. will investigate aggregation/harvesting of these to pull into BHL Article Repository as it is developed. Matt will tell us who runs AMNH’s DSpace.

  • 11:30 - 12 noon: Summary and review, assignment of action items LEAD: John Mignault / RECORDER: Suzanne)
  • 12 noon: Adjourn; box lunch provided for your trip home!

Catering Plans
10/29 - Dinner on Wed. will be at Cathy Norton's house - we'll pick people up in front of Swope at 6:00 pm. Meals on Thurs. will be in the Swope dining hall. Breakfast at Swope and a box lunch will be provided on Fri.
Diane
----


Topics:
  • Workflow for deduping, bidding, scanning
    • What do we need to improve?
    • Are there too many dupes, poor scans?
    • Rescans and do-overs - making sure the correct copy is in the portal
    • Interactions between bidding systems & portal
      • When portal brings in content from IA (18,000 vols from California Digital Library) or other data providers, how do those titles make it down to the bid/dedupe tools so that partner libraries don't scan?
      • Does the OCLC Collections Analysis Tool help in any way? Do we even want to keep it going?
  • Managing bibliographies & prioritizing literature collections
    • What are we scanning?
    • What should we be scanning?
    • How can we manage it without overlap?
    • Suggested formats for submitting bibliographies
    • Reporting back to specialist group what's been scanned
    • Can we use Connotea, RefWorks, or other bibliography mgt system to allow groups to upload their bibliographies?
  • Collocating series and serial runs (and monographic series) that are partially scanned by 2 different libraries.
    • How do we merge items to new titles?
      • BCA, Fieldiana examples
      • How do we represent cat sep and series and serials? (SI example of volume in serial missing, cat sep found - Z39.50 fetch or trick?)
    • How do we merge titles?
      • MBL & MCZ together scan complete "Journal of Conchology"
      • Still want provenance of 'this book came from that library' for attribution, research, finding physical object, potential credits from print on demand, local "branding"
    • Same issues for monographic series
      • 'Bound With'
  • Handling duplicates in portal
    • Different than a merge?
    • Do we ever take a title offline because it's a duplicate?
    • How do we handle requests for records "orphaned" by duplication correction or record merge? (redirect, 404, etc)
    • Possibilities to offer duplicates to the user (without at the same time confusing them: annotated RB version of a title done by Library Y and "regular" copy done by Library X)
    • Example:
  • Implications of editing & merging in BHL Portal
    • BHL portal is built from massaged data pulled from IA XML
    • Workflow of editing on BHL portal: edit only "your stuff" or set up "areas" for different libraries to edit
    • We make changes in portal that don't get committed back to IA, or in any physical file currently.
    • Do we want to keep IA & BHL in synch? Will they let us? _bhl.xml file(s)?
  • And what about the rest of the world?
    Whom do we consider checking against for duplication? Whose content do we want to pull in?
    • Google Books
    • Madrid, Gallica, AnimalBase
    • Other IA participants