BHL
Archive
This is a read-only archive of the BHL Staff Wiki as it appeared on Sept 21, 2018. This archive is searchable using the search box on the left, but the search may be limited in the results it can provide.

collectionscommittee

Back to Collections Committee Main page

Collections Committee Calls are regularly scheduled for 2pm EST every OTHER Monday | Dial 1-877-860-3058 pcode: 961479

Table of Contents

July 23, 2018
June 11, 2018
May 14, 2018
Apr 30, 2018
Apr 16, 2018
Feb 5, 2018
Jan 22, 2018
Dec 11, 2017
Nov. 27, 2017
Oct 30, 2017
Oct 16, 2017
Sep 18, 2017
August 22 (rescheduled from Aug. 21)
July 24 call canceled
July 10
June 26
June 12
May 15
May 1, 2017
Apr 17, 2017
Apr 3, 2017
Mar 20, 2017
Mar 6, 2017
Feb 6, 2017
Jan 23, 2017
Jan 9, 2017
Notes for previous years

To Discuss
  1. Support BHL collection analysis activities, if resources obtained to perform work (Alicia Esquivel)
  2. Page redaction policy
  3. Segment types in BHL
  4. Future of uploading electronic-born content to BHL? (action item from BHL Annual Mtg 2018)
  5. Linking related information to BHL content such as with external contents lists, archival finding aids, etc.
  6. Linnaeus Link and BHL collaboration (NYBG, Harv Botany) (Non Coll CMTE BHL partners: MBG, Kew, NHM London, CBG?) Linnaeus Link Partner list here
  7. Requirements for adding segment metadata to BHL in bulk (Susan Lynch)
  8. Draft of BHL ILL guidelines (Connie Rinaldo)
  9. Collections/Contributor browse page changes (Pam McClanahan) -- Contributor browse map interface for BHL participants not content by geographic subject area correct?
  10. Should we incorporate MESH subject headings into our ingest criteria? see http://biodiversitylibrary.countersoft.net/workspace/0/item/61160

July 23, 2018

  1. MARC LDR 06 issue http://biodiversitylibrary.countersoft.net/workspace/0/item/79356
  2. Copy specific info question -- [link to doc] Incomplete parts/pieces/nos. ok?
  3. Is it possible to DRAFT general guidelines for adding E-content to BHL...? ==> When a publisher of electronic born content approaches BHL to contribute materials...
    1. ACCEPT if...
      1. Relevant to the BHL collection
      2. A catalog record exists in a BHL partner collection / in OCLC
      3. Publisher can provide access to electronic items at the book, volume, or issue level
      4. ?
    2. REJECT if...
      1. Irrelevant to the BHL collection
      2. No catalog record exists for the title in OCLC (Or would BHL be willing to create a record, and if so, who would do this work?)
      3. Publisher can only provide access to electronic items at the article or piece level = we do not have the resources to stitch electronic pieces together to create a full issue/volume
      4. Scope of work to upload the full title is beyond BHL resources and already available online elsewhere e.g.: https://pubs.er.usgs.gov/browse/Report/USGS%20Numbered%20Series/Bulletin/
      5. ?

June 11, 2018

Page redaction policy is complete
  1. should this policy be included as part of our digital imaging specs?
  2. OR be a stand alone doc that is referenced by digital imaging specs (and collection dev policy)?

E-Content examples for BHL
  1. http://biodiversitylibrary.countersoft.net/workspace/1427/item/61159 - e content to add to BHL?
  2. http://biodiversitylibrary.countersoft.net/workspace/1427/item/59605
  3. http://biodiversitylibrary.countersoft.net/workspace/1427/item/60832 - link to full series https://pubs.er.usgs.gov/browse/Report/USGS%20Numbered%20Series/Bulletin/

Quality issue
Poor copies of "Essai sur l'histoire naturelle de la France Equinoxiale" see https://www.biodiversitylibrary.org/item/226440 (1741) and https://www.biodiversitylibrary.org/item/226429 (1749) -- would we consider their quality "unusable" as per our deaccession policy? What do we think about approaching Manioc.org to try and replace with http://www.manioc.org/patrimon/GAD11005?
--otherwise I have submitted a scan request for replacement as multiple BHL partners hold the 1741 and 1749 copies - lipscombb lipscombb Jun 7, 2018

NOTES
Attendees: Bianca, Kelli, Joe, Gil, Diane, Michael, Deborah, Susan, Matt

Page redaction policy nearly complete; Bianca to finalize, address transcriptions and add to collection development policy

E-content and permissions issue like chicken and egg -- which comes first?
we need someone to upload but we need a commitment of uploading before we seek permission
ADD IF = if full volume PDFs are available then it has potential to be added
REJECT IF = not worth it to download individual articles PDFs then don't include; worth asking but if publisher cannot provide full volume PDFs then we can't work

should we include e-content moving wall content in this queue? Worth trying to put everything together for now and see how it goes
moving walls are an interesting case b/c issues never close
Bianca to periodically remind people
e-content easier for people who don't have scanning funds or expensive equipment in house
would a monthly digest via email encourage you to take a look? Might be worth trying to send something out once every 2 months

Bianca to create new e-content queue for folks to sign up for titles before seeking permission; take out moving wall stuff later if need be

NYBG working on some titles right now and Susan to edit Gemini issues
ADD IF = if an OCLC record available somewhere then it's OK to include
BUT IF = NO OCLC record then can still be added but needs original cataloging which might be prohibitive

Decisions on Titles
DLXS old U Michigan system with single page by page navigation, bitonal images so images would not look very good, may not be up to good quality - not sure about DPI - generally things of DLXS are to modern standards
[x] get BHL partners to fill in gaps if possible
USGS Bulletin is a really big ask b/c of the volume of content, we wouldn't want to spend the time
how do we want to slice this going forward?
Kelli to check with Susan about digitizing the USGS bulletins?
this is one that we should leave to requests and digitize small parts as necessary b/c it is so large and a lot is out of scope
BHL partners to slowly pick away at it since boundwiths
do what's on your shelf vs. e-content? it depends...older copies might not be good enough content but newer copies may be OK

having various IF THEN statements would be helpful - good to codify - in fielding requests for upload/digitization
basically it depends

Poor copy but better some access than none at all
Look at partners page for contact us to see if they might let us upload their copy but if not then have BHL partners digitize
[X] Bianca to ask Manioc if they might be willing to contribute this volume to BHL.
Otherwise, see Gemini issue to request replacements
http://biodiversitylibrary.countersoft.net/workspace/292/item/79458

May 14, 2018


  1. E-content uploading into BHL collection
    • EABL Moving Wall digitization status?
    • New field needed?
Content... Or (simply)...
  1. FYI Page redaction policy -- Bianca, Joe and Diana talking tomorrow
  2. FYI Segment types -- Susan, Diane, and Bianca to discuss further (not yet scheduled) see http://biodiversitylibrary.countersoft.net/workspace/0/item/61182
  3. Page types = https://docs.google.com/document/d/1Tz1T5cByPq2V-VQL-WPQfEcnMw6MdftzOjEc6dSfmTg/edit?usp=sharing
  4. Status of ILL guidelines?

From BHL Annual Mtg 2017 notes re ILL:
This motion carried at the Annual Meeting: “BHL will no longer have an explicit ILL agreement as part of the BHL MOU.” (This is the key decision. There is no explicit agreement.)
Logistics (suggested) for informal ILL needs include:
No further action required unless someone opts to develop the wiki pages or list-serv.

NOTES
Deborah Cooper with Mann Library at Cornell, new Digital Collections Librarian, has been there for 2 years, familiar with BHL but now coming on and doing more work with BHL

Moving wall digitization for EABL is still outstanding
MCZ has volunteered to help with MW digitization going forward but so far busy with field notes digitization
close out date for EABL? end of March - work should be completed by now; majority of the work is completed according to Don W.
Susan Lynch short term for 2018 has volunteer extremely interested in project and investing significant time
BHL members (Harvard Botany, Havard MCZ, Field Museum) who have offered to help - we should be OK for this year but not sure about next year
Susan still working on sending EC recommendations for EABL moving wall

Important to have separate roles
over time we can confuse ourselves by overloading fields
should be a period of time soon where misc requests can get attention
next Monday we could put forward to the Tech Team
what is the recommendation of the group?
depends on how data is used, when too many variations used then you can muddy the field
give credit where credit is due, especially to institutions that want to make an in-kind contribution
If a PDF gets uploaded then it would be nice to know who uploaded so that we can request curation of this institution
If we define another role then it's like adding to a controlled vocabulary rather than a new field
Making the distinction about uploader vs. scanner is good b/c sometimes no physical
"Uploading Institution" -- BC to create Gemini for Susan to take to Tech Team next week add as follower - DONE

It would like to see an E-content upload queue
where titles like Agronomia Costarricense could become a part
then Bianca would seek permission?
nice to have not a need to have
if the publisher initiates request then there is clearly expressed interest in including in BHL
e.g. Agronomia Costarricense
a new e-content upload queue could be helpful, there might be some here that could move more quickly
if already available on other platforms then why go through effort to upload to BHL?
processing via Macaw and PDF quality?
maybe part of criteria to add to e-upload content queue is to have physical copies held to get started
needs to be held by a BHL partner even if born digital b/c we need the metadata

Bianca - Joe wants to join in with segment types discussion

Apr 30, 2018


NOTES
Bianca, Matt, Allie, Diane, Connie, Susan, Michael, Joe,

looking at Buceros - how can we best label the uploading institution to give them credit?
should we change the Scanning Institution label or add a new field?
If Holding Institution and Scanning Institution are the same then only Holding Institution is listed
done b/c of request
Susan in favor of adding another field so that fields could be labeled appropriately
maybe only change is to Macaw? b/c if uploaded from digital version don't need to update PMA for IA
need to define a role in BHL dataset, plus also harvest code to pull from IA and Macaw
Yes it's important to have the name of the uploader recognized
Statistics and metrics for scanning institution counted?
A generic term might work well without making a whole new field

Susan and Bianca to remember how Contributor works and report back
Bianca could come up with some possible terms to rename the current field
Tech Team to start talking sometime in July
Harvard Botany and one other BHL member have generously offered to get involved, no other updates
using real life example of born digital content for uploading during BHL training workshop very helpful
if we digitize from print there are never any hyperlinks, only born-digital and yes BHL loses hyperlinks
could Holding Institution upload? if they wanted to digitize or upload it's up to them

how would we handle if BHL partners didn't hold?
could we open up a scan request like queue in Gemini so that institutions could take something on as they desired - using open Gemini queues is a good idea - should we have one or two?
queues in terms of funding - folks that don't have funding to do scanning could take on born digital content if they so choose - would need to get Bianca involved
Affiliates who don't have access to scanning funding could do the upload
at the time permission is requested, we make clear that it's a best effort - set expectations realistically at the time we get permission
we're not staffed to pursue permission for anything that doesn't have a MARC record - would need to be a case by case review
yes appropriate to have BHL Collections Committee approve > request for commitment > permissions
Bianca to come up with a better example to discuss at next call

obituary is a reference question we often get
how do you weigh the time you have to put in to to describe the obit vs.
you could have so many different segment types that it might get overwhelming: reviews, editorials, etc., e.g. review
if people are looking for obits then it's something worth putting in
Susan also had "Index" as another candidate
BC to follow up with Susan and Diane about segment types

Apr 16, 2018

  1. FYI Collections presentation for BHL Annual Mtg 2018
  2. Action Items from BHL Annual Mtg:
    1. EABL Moving Wall digitization
    2. What is the future of born digital uploads for BHL?
      1. BHL Digital Content Internship
  3. Page redaction policy -- did anyone have a chance to review the document?
  4. segment types update http://biodiversitylibrary.countersoft.net/workspace/0/item/61182 -- tech team to revisit in summer
  5. page type definitions (on behalf of BHL Australia) https://docs.google.com/document/d/1Tz1T5cByPq2V-VQL-WPQfEcnMw6MdftzOjEc6dSfmTg/edit?usp=sharing

NOTES
Bianca, Kelli, Joe, Michael, Diane, Connie, Matt, Polly, Allie, Susan

Success of EABL project made the issue of uploading Moving Wall e-content titles more pressing
EABL got smaller publishers to give permission for adding a lot of in-copyright titles, many of these have moving walls
some are print only, some are digital only, some are both print and digital
BHL based
in most cases better results when we digitize the print…
EABL team recommends that moving wall titles are identified and uploaded early in the year each year
what are our options
1) how should BHL consortium share the workload?
2) get interns for spring semester to do the bulk of the uploading?
3) adding to BHL paid staff?
Susan would like to see the BHL UI enhanced to display “uploading institution” as a new field
on any given year there is usually a new member/affiliate receiving training
at the end of training perhaps these moving wall titles would be good practice
Susan working on writing up notes from Annual Mtg workshop
EC to make a decision?
what is the importance of adding a new field?
how likely/unlikely is it that someone from your institution would participate in this?
if you indicate yourself as the uploader then you’re also claiming responsibility for curating the content going forward
what if there’s an issue with the content?
for the duration of the EABL project, the holding institution was the publisher
vast majority of cases, born digital content is on the publisher’s website
uploading institution should make an attempt to resolve the problem, contacting the publisher and asking for a better copy ==> if uploading institution can’t do this work then need to reassign to another institution
How many moving wall titles need this work? about 40…
most publishers are moving from print to digital so it’s likely that this number would increase
no scanning costs but resource costs
problem with an intern is what if you don’t get an intern for one year or the intern doesn’t finish
if each member could take 3-4 titles then there shouldn’t be a huge time commitment
the issue of uploading via Macaw raises the issue of file format…yes Macaw will accept PDF files ==> some BHL partners may not have experience with this in Macaw
would we want to standardize with publishers what they submit? so far we’ve only had to deal with PDFs
have to upload full volumes at a time, can’t upload article by article, on occasion articles have been stitched together using Acrobat to create a volume — haven’t been too many of these
recommendations to be passed onto Members from EC
EABL titles do need to be worked on this year - Susan looking for a volunteer
Diane willing to help - suggestion of Affiliates taking on this work
Susan would like to see folks recognized in UI if they will be helping with uploading

[my summary about future of born digital content and impact on BHL]

very common for electronic content to have hyperlinks
BHL loses all hyperlinks in born digital content
small problem now that will become a bigger and bigger problem as years go on
what happens to the links? Macaw internally breaks the PDF into component JPEG files so hyperlinks just interpreted as images
Joel decided to store the original PDF in IA in case we can use it for the future
in order to treat born-digital PDFs
only been capturing PDFs for a few weeks now

what are BHL’s commitments for born-digital content today?

need to make it really clear to the user that something was redacted in case they want to
opposite coloration
black box with white lettering “redacted content”
can we come up with a standard redaction object that we all use going forward?
we have the opportunity to use this with Macaw
with materials bound in - how do we mark this for IA to avoid scanning
URL to policy statement in redacted indicator
standard disclaimer in document - common indicator can help
if anyone redacts any content we should all do it the same way
people could think it’s a part of the original if the redacted indicator isn’t consistent - artifact
could speed things up if we use a common indicator
so far all redacted content has been done in-house? check with Field…?
Bianca to check in with Joe and Diana

Feb 5, 2018

Attendees: Michael, Kelli, Joe, Matt, Diane, Gil, Bianca, Connie, Don,
  1. review our new cropping recommendations in our Digital Imaging Specs https://biodivlib.wikispaces.com/Digital+Imaging+Specifications
  2. follow up on new segment types
  3. thinking about Collections presentation for Members meeting 2018

Harvard Library Imaging Services
looks good
send to staff and change if needed

DR I’m not only a driver but the sole passenger too, doesn’t sound like others need it
can we get data about existing segment types - what’s in there now?
then DR to propose to add 4/5 types
to review with tech team
existing types might need to stay as they are b/c of BioStor
to pursue: “Bibliography,” “Index,” and “Review”; also “data” (charts, tables, etc.) and “list”
segment type called “book” - should this be published or unpublished?
DR to work with Bianca and Tech Team to get additional segment types for archives
shared ML’s comments about segment types: confusing possibly for same segment and page types but likes review
DR about 98% done with uploading field notes and getting ready to segment
JdV has a potential use case for “Treatment” with MCZ
can use batch article uploader
DR articleizing all correspondence but then getting ready to do field notes and journals
SL articleizing Arnoldia - been putting a lot in in the last month or so
BC needs to work on documentation

have stats with context - have raw numbers available but tell more of a story of what the stats mean
trends with comparison
successes/challenges about working with partners to resolve issues, curate content…need to better formulate
permissions mostly about EABL numbers
[X] send Susan a message about presenting on EABL numbers (she will be presenting on day 2)
highlight key things would be better for collections CMTE
identification of problem and how we came to solve it…
this is what we contribute to the project and how we’ve made it better
page redaction may be useful example
conversation about what Alicia did in nearish future
start with Connie and then expand - convo for after the Members Mtg
[X] BC to upload notes for Dec 11!
ILL guidelines need to be dug up but don’t need to discuss at Mtg
Kelli to send out from last May
segment work and tech team back and forth - doesn’t eliminate BioStor but eliminates dependence on BioStor
[X] BC to check in with Tech Team about presenting on this
yes present Gemini stats in context
people have Gemini backlogs - how do we possibly pass issues to others since we all feel overwhelmed…
DR: quite a lot of issues are scans in progress - she finds it helpful to filter
finding the ones you’ve forgotten about…
==> Find a better way to SHOW collection management work rather than tell...

[X]BC to continue work on page redaction policy
BC to get draft of presentation to CMTE for comment [was sick and out of the office so this didn't happen unfortunately :(]

Jan 22, 2018

Attendees: Bianca, Joe, Don, Matt, Gil, Kelli
  1. Page redaction policy update (Bianca)
  2. Descriptive tags for Archival items:
    1. Kelli did some searching - not any great ones out there already, see https://www.loc.gov/aba/publications/FreeLCGFT/freelcgft.html for LC genre/form terms
    2. Diane asked via BHL Listserv and other partners not planning to segment archives at this time
    3. Working doc https://docs.google.com/spreadsheets/d/1fo4nHKEBOimiTfWZZQakttdzoIZLzB4wm3u1rC2D25Y/edit?ts=5a2ee37a#gid=0
  3. Updates to BHL browse by collection and contributor update (Bianca)
    1. Marissa suggests looking at how others list their contributor names via this doc https://docs.google.com/document/d/11I5YxHLtltVA8CpA-7FNfMjArnnQIuAiW3tcr4Yt1xE/edit
  4. Page cropping best practices with SI
    1. "General SIL policy is to crop just inside the page, unless text, image, or annotation runs off the edge of the page, in which case cropping should be just outside the page.
      If scan is a request, and the requestor wants it cropped outside the page do that.

      Each page image should be de-skewed to make the text as close to horizontal (180°) when viewed on a monitor as possible. This is to maximize OCR quality.

      If a book is bound so poorly that to deskew to 180° for each page would distort the original appearance of the book, or produce visual dissonance, and the text is unlikely to OCR well (manuscript, blackletter, etc.) pages can be deskewed to the degree the imaging specialist thinks will maximize readability.

      Unbound manuscript material should always be cropped outside the original, on a black or dark gray background. Each manuscript page in a folder should be photographed using the same focal length, in order to provide the correct sense of scale for all items in that folder, unless a very small item would be unreadable if scanned at a longer focal length."

Notes
Page redaction:
Specific locations of endangered species could be something we might redact
verify with publishers/authors for contemporary literature of the need to redact
BHL Partners need to be responsible for knowing the content of archival materials in case there is sensitive information
maybe need to be explicit about the fact that we won't redact insensitive terms, a disclaimer
MP suggests: it's important to be general, you don't want to get caught in an argument about what is stated by the policy
want to be able to retain the ability to make a decision just in time / plausible deniability
language important to protect BHL but better to not be so specific, our corpus is filled with PII
crosses a line of our determination,
policy in place to help guide inclusion of materials that have been rejected in past because of BHL's all or nothing open access model

Segment types: Diane has a good list started, no other good references out there that fit BHL's needs
change "letter" to "correspondence"

Collection Browse recommendations:
CMTE members like the display of contributors via a map, see Hathi https://www.hathitrust.org/community
Just lists of partners OK like BHL already has or DPLA: https://dp.la/partners

Cropping recommendations for Digital Imaging Specs
We have gathered recommendations from Widener, SIL, and UIUC
[X] Bianca to summarize and add recommendations to our BHL imaging specs - DONE

Dec 11, 2017

Attendees: Trish, Diane, Connie, Susan, Bianca Joe, Gil, Matt, Kelli, Michael

Alicia's Collection analysis work
see her presentation linked in 10/30/17 notes
Alicia assessed gaps in BHL via four areas: 1) temporal 2) taxonomic 3) topical and 4) geographic
Data exports used to perform analysis for 1) and 2)
Full text used to perform analysis for 3) and 4)
based on recommendations from: previous collections cmte work and GBIF data gap analysis best practices
How does AE's temporal analysis differ from Bianca's?
Can we replicate AE's analysis post GNA re-index? (not sure when that is happening for BHL)
relationship with JSTOR labs? Bianca needs to follow up
geographic analysis was problematic and most complicated to do
which strategy helps us identify gaps the most?
capture, recapture per concept
Connie: analysis consistent in smaller ways maybe at family level - genus would be too problematic with synonymies and scale
specific areas - we have recipe
name algorithm complications
Alicia's collection analysis work very useful for internal purposes
presenting to public requires additional context and explanation
plan is to share NDSR reports and webinars but specifics haven't been discussed yet
share with folks re: Python but how to handle questions?
status of JSTOR file?! needs to be private

Archival material types
Diane Rielinger: As more archival materials are added to the BHL, I’ve been looking at the genre/segment types, especially for the materials Harvard Botany has been adding as part of the Field Notes Project. Some of our materials don’t fit nicely into the existing categories, for example unbound collections of plant lists or notes. Therefore, I’m soliciting feedback on possible new types.

Here’s what exists now for segment/genre types:

Please note that this is different from material type, which comes from MARC field 06, and already includes a value of “archival materials”.

Ideas???

Spectrum of materials from MARC Title types <==========|segment types|===> page level types
Susan @ NYBG planning to inter link letters
letter already a type, list is made up on the fly
is there a pre-existing list in library community of controlled terms somewhere"? Bianca and Kelli to review
some from BioStor, some from CiteBank
would be good to have Mike run a query
can we make use of segment types? they are just a label in BHL for now
used for faceting or harvest by DPLA? no DPLA not harvesting articles
beta.biodiversitylibrary.org shows BHL 1.5 with faceting capability
anyone else segmenting archival materials? Diane to ask BHL Staff
Admin Dash should provide definitions for segment types - need to come up with definitions
"list" vs. "plant list" pagination documentation model keep talking as CMTE 'book review" add to types? "Index"
goal to flesh out types for 2018
RJB Madrid cataloged @ item level


Nov. 27, 2017

NDSR webinar (Marissa Kings)
Recorded webinar sessions to be sent out TBD

Nov. 13, 2017
Call cancelled

Oct 30, 2017

Alicia Esquivel's Collections Analysis Presentation
Powerpoint presentation link with notes

Oct 16, 2017

Attendees: Bianca, Alicia, Gil, Joe, Katie, Don, Diane, Kelli, Patrick, Matthew, Susan, Michael (Regrets: Connie, Trish)

Collection Analysis
Alicia created a new visualization
BHL Kingdom Coverage.png
kingdom level, with updated estimate for species of life
can add it when notes are available
getting close to end of residency so wrapping things up with documentation
leaving scripts in Github
all analyses are done except for statistical analyses
officially ends the 1st week of January
webinars to be available to everyone in BHL in November, then publicly
general overviews of each of the projects
Bianca Alicia to get virtual presentation for November

1. LCSH analysis tool code to Mike — not a good fit
2. JSTOR Labs topic analysis - status with MK?
3. biodiversity literature universe scope - capture re-capture
4. collection over time analysis
5. geographic names case study - processing time is high

Page redaction policy
Status? It’s on Bianca to follow up
Not discussed while she was away

Uncropped pages
Joe spoke with folks at Harvard University Widener
Yes, Widener does run a cropping routine
Basic practice but no quantifiable standard b/c of variability of materials being scanned
“make an acceptable 3 to 10mm band”
where no size variation - 40 pixel border
archives keep border proportional to size of page
Joe sent messages out on July 12
philosophical questions from image producers
DR we all looked at images from Harvard and they looked good
how do the borders affect PDFs?
haven’t heard any complaints about it - might be a lot of black boarder for hard copy
affects file size?
aesthetic point of view, affect on OCR is none
cost of the time to produce images vs. uncropped

D Hobert might know for SIL - BC to check
Kelli Trei from UIUC
Bianca to check with IA
especially useful for archival materials

DR to flesh out proposal about linking to related information
Bianca to help
need to have discussion about what we may/may not link to
various use cases are now listed but still need to get feedback
BC and DR to get back to committee in 1 month
BC to check with Secretariat/Tech

Susan Fraser attended Linnaeus Link meeting in EU
Asked for an updated spreadsheets
NYBG received new spreadsheet but it’s confusing
SF doesn’t want to share until its cleaned up and comprehensible
NYBG got help from library school intern and SL pretty happy with it, entering into Google
SL to review spreadsheet with SF about sharing the spreadsheet with others
LL bibliography based off of Basil Soulsby’s work ended in 1933
goal to list where copies can be accessed as well as digital copies
4-5K entries total
spreadsheet only lists titles with no known digital copy
lots of titles on the spreadsheet are already in BHL, many that aren’t in BHL yet should be
SL has started entering Gemini tickets
in a couple weeks hoping to share with others and meet with folks who would like to participate and talk about workflow
how many are not scanned yet 1000-1500
bibliography lists multiple editions
finding lots of LL entries as articles in journals in BHL, SL adding article definition
DR and JC have already jumped
SF is BHL’s interface to LL so contact her going forward

A lot of progress has been made for the addition of bulk segment metadata
one of ML’s highest priorities
requirements document has been finalized and ML working on it
will be available for testing very soon
SL working to get citations together that need to be applied to BHL for testing
testing with Arnoldia + another Arnold Arboretum journal articles
once this tool is available we’ll no longer be dependent on BioStor
put article metadata into spreadsheet and upload then articles defined in BHL
Who has citations in spreadsheet format that are polished and clean, there’s room for folks to participate in testing
DR has lots of entries for testing with archival materials
Nicole Kearney probably has lots of materials to test with
One of the requirements is that the tool do as much deduplication prevention about BioStor - which is fairly good
ML looked at BioStor code and repurposing algorithms to ensure similar duplicates
Do we have articles + scanned volumes for JEANH? Bianca and Susan to look

long list has been ok thus far
add dividing line between notes
MP didn’t notice any problems
maybe it’s a problem with length of page?!

Sep 18, 2017


AGENDA


NOTES

Facilitator: Trish
Present: Diane, Trish, Don, Kelli, Susan L, Marisa, Michael, Katie, Joe, Matt

Content analysis update (Alicia): Alicia could not attend meeting but she sent the following links to work she is doing with geographic names and her presentation at the BHL Tech meeting in St Louis

Trish and Susan L gave outcomes from BHL Tech meeting relevant to collections group

BHL v2 will need data model to accommodate non-book objects (visual resources, archival materials, and future content types yet unnamed)
Visualization of content will be useful both internally for identifying gaps but also externally on the BHL site as way for users to explore our content by geography, time periods, taxonomic Kingdoms, etc
In order to make visualization functional there will need to be some data cleanup and normalization of bibliographic metadata (e.g. publication date info is not stored in both date and volume fields )

Current version of BHL will continue for about 6 months and Mike will finish current commitments (full text search, bulk uploading of articles, making existing transcriptions available/viewable in BHL e.g. William Brewster papers)
Alicia is doing a proof of concept to put BHL citation data into Wikidata (Linked Open Data). This is beneficial both as a way to share BHL data more widely but also as a way to normalize some of our data such as author names
Rewriting the old deduper tool – was put on back burner indefinitely, lower priority for BHL

Linking related information to BHL content How might we introduce this visually? (Diane)
Diane has nothing new to add to the discussion. She is still Looking for comments on google document and would like other case studies to add. She’ll give a few more weeks for comments then Next steps would be writing requirements for submission to BHL 2.0.


August 22 (rescheduled from Aug. 21)

TUESDAY August 22, 2pm AGENDA BELOW
(DUE TO THE ECLIPSE MONDAY, THE CALL HAS BEEN MOVED ONE DAY TO TUESDAY)

NOTES
Matt Person, facilitator, Attending: Gil, Matt, Alicia, Kelli, Michael, Diane R., Trish
Thanks go to Trish, and Carolyn Sheffield who assisted us in moving the meeting from Mon to Tues.

Content analysis update from Alicia:

EAST Eastern Scholars Academic Trust serials shared print repository project update from Matt

Agriculture titles discussion - Michael

Linking related information to BHL content - Diane


Next meeting to be facilitated by Trish, Sept. 18
Thanks everyone !


July 24 call canceled



July 10


Attendees Bianca, Gil, Katie, Marissa, Kelli, Joe, Don, Michael, Patrick, Connie, Matt, Trish

Notes
Alicia has emailed Martin about how to proceed with JSTOR collaboration.
JSTOR topic graph of BHL topics
Based on full text of 10,000 items
DW: where’s the links to the items that involve these topics? Can we go back in and find them in BHL?
Topic graph nice to have but needs to connect back to original source material
According to JSTOR Topicgraph tool info, this does seem to be the ultimate goal
BHL now has 205k items
Would be good to have up-to-date visualization of BHL topics for show/tell but ultimately most useful if topics link back to source material
yes folks like the idea of pursuing with JSTOR further

Page redaction policy in progress
Bianca to follow up with Connie, Monica (BHL MX) and Diana Duncan
Diana Duncan to provide real examples of redacted pages in BHL soon
discussion started as a result of Field Museum archival content digitized as part of CLIR grant that had sensitive information

Joe and Bianca to follow up on uncropped pages examples / issues
downloading PDFs of uncropped pages?
Image you upload in Macaw is exactly the image you get in BHL?
IA ready JP2s there is no processing of images
but if you leave it unchecked there may be modification - maybe resizing…no cropping
Bianca to check with Joel about this…
Joe hasn’t seen anything that the uncropped pages are affecting, besides aesthetics
Widener produced images from Harvard, sent to MBG were never cropped
What is too big for a margin anyway?

EC still talking about ILL Draft

Bianca making list of Collections CMTE discussion topics at the top of the wiki, please continue these discussions as time allows in her absence.

This is Bianca's last Collections CMTE call for a while. BIG thanks to Trish and Matt for leading future calls and to Patrick, Trish and Michael for reporting on staff calls.
Thanks everyone!


June 26

Agenda

Notes
Attendees: Bianca, Marissa Kings, Katie & Joe, Don Wheeler, Diane, Trish, Matt, Susan, Patrick Randall,
Bianca plans to have DRAFT of page redaction policy for review before maternity leave
archives world has some good policies

Susan Fraser is at annual meeting for Linnaeus Link

Uncropped pages issue: how big of a margin is acceptable? need measurements
uncropped vs. cropped with a margin
Harvard has gold embossing, Harvard treats all items the same, JdV not sure about their workflow
capture images as book as an artifact
not sure if they do any cropping at all to keep margins narrow
harvard trying to help users see exactly what the edge of the page looks like?
Joe has links to examples in staff call
for single book there could be variable margins
are they cropping or are images truly uncropped? JdV to ask
pages cropped right down to the text block for BHL
Bianca to send around examples already in BHL for BCA and BHL SciELO - margins fairly large b/c they have color margins and measuring tape
Fauna Japonica several cropped volumes - cropped volumes look so much better
cropped images look better in flickr
as long as pages are connected together as digital book, does it matter how big the margins are?
select pages to download might be a bit off - might be smaller b/c of margins of images
printing would result in a lot of black toner used
Bianca to get examples into digi img specs and send out to group soon!

whether or not it’s ok to have duplicate articles defined in BHL?
BHL typically tolerates duplicate publications, as well as duplicate items
but what about having duplicate articles within a single volume?
multiple same articles defined for the same volume
BC: ideally it would be nice to have singular articles within a given volume but there’s always been
DW: quality issues, 2 different scanned versions of the same publication, how good is the metadata? it should be our responsibility to curate articles to make sure they have the best metadata
BioStor does put in good effort to deduplicate article metadata
article metadata comes in from different sources
we get metadata from BioOne and from JSTOR say so there would be overlap with this article metadata
multiple BHL partners could be working on the same article metadata
or perhaps an error occurs and article metadata could get submitted twice
limited citations vs. in-depth citations - how would we deduplicate these?
most obvious deduping process is looking at 2 articles that start on the same page, with the same author
some things to dedupe could be done cheaply
part/239993 is example of how duplicates are identified now
DR points out that it’s not actually a duplicate, there are 2 different parts, part I. vs. part II. which is great to see the relationship
Mike’s algorithm does its best to identify related content and connects them together
SL explains that this has come up as a result defining articles in bulk outside of BioStor

Bianca to cancel call for July 24, her last call will be July 10
any volunteers to host August or September calls? not at this time
Matt doing August BHL staff call, other folks on CMTE also helping with Gemini
Trish will provide collections cmte report for Aug 17 staff call, thank you!
no one volunteered at this time for just in case Thurs July 20 or Sept 21 call


June 12


Notes
Attendees: Gil, Bianca, Katie, Joe, Patrick, Alicia, Michael, Kelli, Trei, Susan,
Regrets: Diane, Connie, Matt
No volunteers for staff call :(

Offer for partners to join Collections CMTE sounds good
Calls work well as currently scheduled so stick with same time frequency

thanks Connie for update about ILL in agenda above

Alicia has been talking to folks at JSTOR labs
tool pulls out different topics from full text
based on a controlled vocabulary
AE sent sample to them to see what they could pull from BHL
don’t share widely! JSTOR requests
can share within BHL
now is the time to get serious with JSTOR to establish relationship if we want to pursue further
spreadsheet has list of subject matter
corpus profiler can drill down have links back to source material
JSTOR to also look at subject headings as well as a test
Susan: what kind of vocabulary do they have, does it have a name?
AE took a look at vocabulary and found it to be biodiversity relevant
JSTOR open to working with us since their tool is still in beta
AE: visualizations could be available for users as part of websites? or simply use tools for collection development?
do whole analysis for us or pass a workflow onto us?
could be other types of visualizations that could be made with data, AE not sure what JSTOR has
Visualizations useful for end users
Some folks like the idea of integrating into browse functionality
JSTOR has beta version up online, AE saw this and asked about bulk uploads
currently no API available
would need to talk to them further about how to administer tool depending on what we would want to do
AE to add links to their topical analyzer to the wiki
Alicia sent a list of 10,000 items for JSTOR to work with
list of URLs to items, JSTOR just pulled OCR
bhl_ta_experiment spreadsheet is list AE sent
JSTOR wants to know more about where this is going before they do all 200,000+ items
could do focused analysis or analysis of collection overall
Bianca would like to see all BHL items run through this tool and would like to work with Secretariat
Michael for it, Gil too
could do zoology vs. botany for example as more focused approach
tree of buckets that mirrors taxonomic tree in some way - this would rely on controlled vocabulary
AE is working on a separate project to assess BHL topics in comparison of taxonomic tree
BC: is there a way to know how many items or what percentage had issues with OCR that had poor results
JSTOR has challenges with other languages
AE to send BC white paper and other info about JSTOR tool - DONE

Humbolt expedition another good mono series to cite as part of https://docs.google.com/document/d/1M_3a8Sz6ryurRz-byptU2cgHYT3yPKrXvBDAtalu2XM/edit?usp=sharing
MARC 555 sub a description sub u has hyperlink for Engelmann Correspondence collection
MARC record complete, IA complete but MARC 555 does not display in BHL
Diane and Bianca to continue to flesh out doc for use cases and possible strategies to incorporate series guides into BHL


May 15

Agenda

Notes
Bianca Crowley, Marissa Kings, Patrick Randall, Michael Cook, Matt Person, Susan Lynch, Katie Mika, Alicia Esquivel, Don Wheeler, Gil Taylor, Diane Rielinger, Connie Rinaldo

Ideas for BHL ILL Guidelines
Key Takeaways to present to BHL EC:
  1. BHL could consider adopting a flexible ILL "understanding" rather than a strict policy (as CBHL does)
    1. Needs to be on a case by case basis; some can do international ILL while other cannot; some can provide ILL for free while others are required to charge; some restricted by copyright policies
    2. bottom line, if we can, we will supply for other members the content that we can for nothing
  2. ILL process needs to be considered
    1. Should ILL be limited to BHL Members or include entire consortium?
    2. How should ILL requests be circulated? Via a listserv (to avoid charges but lose tracking stats) or through standard ILL software (helps with tracking and fulfilling requests but may require issuing refunds)
    3. In what formats can ILL be provided, physical books vs. PDFs
    4. E.g.: IAMSLIC uses http://www.iamslic.org/wp-content/uploads/2010/07/ResourceSharing_ILL_Brochure_2014_SW.pdf but AMNH found this tool to be unreliable

Council of Botanical and Horticultural Libraries (CBHL) Members represented in BHL:
NYBG
Harvard Bot
Patrick Randall (MCZ)
Cornell
CBG

Don Wheeler: CBHL doesn’t have a set policy; It’s an understanding CBHL share with each other for no cost
requests come outside typical ILL request software -- doesn’t come through OCLC for example, just through phone call or email
no real structure
a number of ILL’s come in through CBHL listserv but downside of this process is that if you keep track of ILL statistics an email doesn’t auto factor in - you’d have to keep track on your own if you need to keep track
email is helpful for keeping track

Susan Lynch: was ILL librarian at AMNH, they were using WorldShare ILL OCLC product, most members use this - SL likes to keep things within this system, simplifies record keeping and helps with tracking
AMNH asked for ILL from ANSP and was charged, then refunded after Susan emailed to let them know they were BHL Members

ILL has been a benefit for BHL Members but not all Members can participate
Needs to be on a case by case basis
MCZ no longer does their own ILL, it’s part of Harvard’s centralized system, but CR specified that BHL Members get free ILLs
DR: is it a limitation of software that’s not letting Members participate?
CR: more of a process issue, Harvard hasn’t taken over Botany’s ILL yet - Harvard agreed to honor reciprocal agreements so long as coming from MCZ; MIT has a completely different way of doing ILL for example
CR has allowed ILL for all BHL partners, not just Members

maybe some Australian, Singapore, European partners cannot participate for free
there’s a BHL ILL recommendation written (as part of MOU?? -- where is this?) but it’s practically more of a good-faith agreement
discussions at Members meeting mainly around being flexible
some folks can’t do ILL lending/borrowing in general, or for copyright reasons, or for free
some institutions are required to get paid as part of this process
AMNH had a very strict policy about not sending books through customs, i.e. limited international ILL — some institutions will lend internationally, some will not
CR needs to double check about whether or not there is a task group for ILL - can bring it up with EC

needs to be up to each library, no overall structure for CBHL
bottom line, if we can, we will supply for other members the content that we can for nothing
BHL will no longer have an explicit agreement as part of MOU
SL: this works for PDFs but not for difficult books

IAMSLIC - do they have ILL? yes they have an ILL channel via a listserv, folks who can fulfill do - MP to check with MBLWHOI’s ILL librarian to get more details
there is a z39.50 interface internal to IAMSLIC http://www.iamslic.org/wp-content/uploads/2010/07/ResourceSharing_ILL_Brochure_2014_SW.pdf
AMNH would only supply PDFs for free, found that Z39.50 interface was pretty unreliable
LC uses ILL requests as justification for digitizing, AU can’t do international ILL
SIL Gil going to reach out to Robin
CR to bring to EC to see what they think so far
via email Diane Rielinger points out: "IAMSLIC is an interesting organization given its large international participation. BHL is probably too large to do an internal system like IAMSLIC but the list of lending contacts, what you will lend and not, etc. from section 4 of the document that Matt sent is very interesting. Maybe BHL can have a central database with that info and some more so you know who to ask and who not to. Recordkeeping would be up to the individual institution as to however you can manage it."
Matt: below is an example of an email dialogue between IAMSLIC libraries - when the above Z39.50 interface does not work, librarians follow up with free text email communication, e.g. :


Dear Alexandra,

I checked the reference and I think the exact journal title is "Zeitschrift für Geomorphologie".

Title: Zeitschrift für Geomorphologie : a journal recognized by the International Association of Geomorphologists (IAG) = Annals of geomorphology = Annales de géomorphologie

Published: Berlin ; Stuttgart : Borntraeger

Numbering:1.1925/26 - 11.1939/43; N.F. 1.1957 -

Maybe you can search again for a library holding this journal/volume. In case you can't find a library in Z39.50 let us know. Our local Rostock University Library has this volume on stack. It would be possible to order the article and send it to you.

Best regards,

Olivia



ORIGINAL EMAIL:

Dear Iamslic,

I am looking for the following article that is not in the Z39.50. Could anyone provide?

Inbar, M., Risso, C. & Parica, C. 1995. The morphological development of a young lava flow in the South Western Andes-Neuquén, Argentina. Zeitschrift Geomorphologischer Natur Forschungen, 39(4): 479-487.

Many thanks in advance and regards to all,

Alexandra

Flagging uncropped pages
Michael Cook suggested to just flag the volume if possible not every page
technically not necessary at this point so doesn’t seem worth the work to do the tagging
uncropping is OK as exception to rule but not new standard
Digital Imaging Specifications - BC to look at this and send around

Collection analysis updates
Alicia talking to folks at JSTOR labs they have analysis tools in beta
topic finding tool based on a thesaurus they’ve created
Alicia has had several meetings with folks to see if these tools could be useful for BHL
AE sent sample of materials for JSTOR labs to run through their tool, look through parts of their thesaurus - are biodiversity terms available, what’s missing?
JSTOR seemed very excited about working with BHL data for free
Thesaurus does look pretty inclusive of biodiversity terms - worked with controlled vocabularies, worked with global plant thesaurus and ecologists
Did the sample sent include BHL OCR?
Alicia sent a list of URLs and JSTOR labs was able to work with this since publicly available
beta tools can work from PDFs
labs are both in Michigan, New York and New Jersey; AE has met with 3 people in different locations

AE also got project info about LCSH which she sent to Mike for review - would it be worth using for BHL or requires rewrite and if so, how much work would this be? It’s a very technical program
code only made available to ML last week
might be reviewed on a tech call in the future - may be good to run by Joel to see if he might have any resources to lent to this?

statistical analysis on universe of biodiversity literature, AE to look at one organism over time

Including bibliographies to major monographic series in BHL somehow
if there’s some resource that BHL can refer to to help people navigate the title that would be useful
since we don’t require all mono records sometimes it’s hard to know what to look for
Do we want to put something in BHL? putting something in as a volume of the series
Or do we want to link out?
U.S. Exploring Expedition
analogous to archival finding aids - they tend to show up in 500s of MARC records where you could put a URL
would our current BHL architecture support this as part of a notes field?
would it be hyperlinked? SL thinks that the hyperlinks aren’t supported in the BHL notes fields at present
seems like a good idea but how much can we really do to provide resources?
link out puts responsibility on third-party vs. ourselves but if we load ourselves then we have to maintain to keep current
can these works be collections? have a collection of mono seps and series titles with finding aids
easier we make it for people to find things the better…but how…?
NYBG recs for archival finding aid MARC 555 not getting ingested into BHL - SL thinks it should be in BHL - contains URL not digitized content - NEH grant for John Torrey correspondence
What are other CLIR folks doing, using 555 or something else?
Handful of use cases
- external contents list
- Expeditions
- archival finding aids
DR willing to help
Maybe we should have 555s indexed in BHL?…
BC to create google doc to circulate to group and collect ideas

May 1, 2017


Notes
Bianca, Diane, Michael, Connie, Don, Marissa, Kelli, Joe & Katie, Trish, Alicia, Matt, Susan

Guidelines for handling monographic separates and series records is clear, ready to share with BHL Staff
BC to make sure it's shared with Staff - added to Staff call notes
BC to invite Cataloging group to add to new BHL requirements tab - DONE

Joe receives uncropped JP2s via in-house scanning operation; how they are presented in Harvard repository but not typically how we display in BHL
has quite a backlog
Joe has been cropping each page and batch cropping is not always an option, fold-outs etc.
Adding a lot of time to Joe’s Macaw workflow with a long way to go in their queue.
Joe said it’s 5 times faster to upload to Macaw without cropping
Is this acceptable?
Most folks think uncropped page images are acceptable, also agreed that processing of images takes the MOST time
Example Joe provides in agenda is typical of what a typical Widener scan looks like
SL says margins look good, not too big
Joe says margins are generally like this
DR says even foldouts look good, they are crisp and nice, sometimes IA foldouts have larger margins
TRS: any affect on OCR with margins? SL says that OCR still looks pretty good
MC actually prefers the uncropped pages, rather than the fake stack of pages presented in book viewer
OH yeah and image algorithm for Art of Life, TRS to investigate if she has time
file size difference for cropped vs. uncropped version? maybe just trivial difference
Joe is resizing images from 40 MB to 6.5 MB for Macaw but could also be an IA issue…but this was several years ago
SL uploads files at about 30 MB without difficulty but could not upload 48 bit color depth b/c Abby OCR reader failed
had to reduce color depth from 48 bit depth to 24 in order for IA to process correctly
DR has done 12-15 MB per image, one of them was 1700 pages! derived ok at IA
Joe to email Mike and Joel and cc Bianca
also PDF quality issue https://ia601504.us.archive.org/20/items/geologynorthame00marc/geologynorthame00marc.pdf
Michael Cook wonders if there’s anyway to flag uncropped images?
SL suggests that this could be captured in embedded technical metadata but still this info isn’t very accessible
BC to check around to see if/how this is possible…(flagging uncropped images in embedded metadata?) - DONE

EABL has been working with the kind of PDF that Vicki Funk wants to have added to BHL
permission is needed
Would SIL catalog this?
Mariah Lewis could also catalog
BC to prioritize getting SIL to catalog
BHL filled with content like this typically embedded in a journal
EABL could take on uploading,
BC to work on permissions and see if SIL will catalog - YES SIL is in the process of cataloging!

Alicia has been at a couple conferences lately DPLA fest and NDSR symposium
Met with biologist last week at CBG re: capture/re-capture analysis
Alicia to do a stratified sample to look at a specific species for a specific time period
likely to work with a botanical species since working at garden
more important part is assessment of process and workflow
what about something agricultural such as corn or rice since both biodiversity and agricultural…
for any species with name changes, Alicia would chronologically break up literature based on date of change and review separately
to be working on this over the next couple of weeks

contributor/collection browse update - Pam and Bianca met to review details for what Gemini issues need to be created covering all requested changes for collection and contributor browse pages
Pam to run these issues by committee when ready, she has been attending multiple conferences lately
Pam to help come up with some ideas about incorporating contributor landing pages on browse page, not sure how to include these landing pages in UI, current page needs redesign to accommodate

BC would like to have 2-3 volunteers to take on collections CMTE calls while out on maternity leave

Apr 17, 2017

Agenda
Notes
Michael Cook, Susan Lynch, Connie, Bianca, Gil, Don, Patrick, Diane, Kelli, Alicia, Katie, Marissa

Bianca provided update about how all OTS externally linked articles have been removed from BHL

is there a character limit for copy specific information?
Mike confirms that there is no limit but please BE MINDFUL of space considerations and review our specs http://bit.do/BHLcopyspecific
what they see in the BHL may not be everything that’s published
KBART implications for adding “incomplete”
BC to add to vol standards doc about copy specific info

Carnegie Institute over
over half are relevant
if journal then we’d digitize all
what do we want to do about digitizing all monos?
Carnegie wants us to digitize all…
how long are they? PR doesn’t have a good sense of average length
DW noticed one astronomy book at 1000+ pages
some are like 30 pages
some astronomy, legal, letters of members of continental congress, not many humanities but there are a few
Carnegie didn’t insist upon us digitizing all but would desire us to…
Let’s prioritize relevant ones…
have other folks digitized? maybe we can ingest already scanned content?
Let’s start with the most relevant and revisit w/ Carnegie when done with that
Collection development policy says we are focused on biodiversity literature - so can’t we say no?
PR to post list of biodiversity relevant materials to Gemini issue and get in touch later
scan as monos or series?
let’s do both - but for DPLA purposes let’s make mono record primary
mono series are different b/c they are intended to stand alone where as journal articles are not intended to stand alone

if we start redacting whole pages…
understandable that we need to redact somethings, it’s time intensive
it would be better to not have whole pages redacted
intentionally redacted explanation to be present
good to have standard statement that we could all use for redacting
this is the first time we’ve redacted
we have not put things in BHL that we might have b/c of presence of PII
would this have to be done via Macaw? or could you do via IA?
not against redaction
BC to come up with DRAFT specifications, details to add to Digi Imaging Specs
Connie to help then BC to ask on staff call - Monica with BHL Mexico has volunteered
BC to check in Diana Duncan about her timeline for this - would she want to help?

Idea to use capture-recapture method for scope of biodiversity literature
has been used for other calculations
used to calculate population sizes, quantity of medical literature, size of Google Scholar
Alicia to meet with population ecologists to understand methodology
needs to choose best statistical model based on the data type
what kind of sample to choose? random or more calculated?
what are some of the sources for getting this data?
start with subset of biodiversity literature - like botanical literature, searching on web Google Scholar and other databases + hard copy bibliographies and compare them to each other to find statistical probability of total population
fuzzy matching on titles in fuzzier manners
if you know any population ecologists please let Alicia know
Also been looking at how to visualize taxonomic names that we have in BHL
package to match names to taxonomic trees
but needs a good way to export taxonomic names - good question for Mike
we do have data export file but not sure if it retains relationship with pages
Alicia explored LCSH analysis w/ Tech Team but need to assess how much time it would take to apply to BHL
ML would have to do the work really b/c very technical
Alicia has requested that the researcher send the code to Mike to have him look over to assess if worth moving forward
since we have all the keywords in the metadata available to us but still working on a way to come up with a meaningful representation
also possibility to text mine BHL and come up with visual representation that way, based on sample - not whole corpus all at once at first

enter mono sep vs. serial record policy into BHL metadata requirements, not really collection dev policy stuff
BC to add rule about mono separates to BHL metadata requirements
check in with PR about Carnegie witch to see how that plays out
mono rec should be primary
nothing precludes someone from throwing in a mono record if they want

Apr 3, 2017

Attendees: Michael, Kelli, Matt, Gil, Pam, Bianca, Diane Rielinger, Don, Trish, Susan, Joe, Katie, Patrick, Alicia, Marissa,
Agenda:

Notes
BHL Annual Mtg recap
Los Angeles will be next year’s Annual Mtg
MC: went really really well, organizers did spectacular job, very well organized, lots of different locations made meeting very interesting, great tours
blog post outlines good summary of events
Things of note: Kew has to defend their membership to BHL, MC hopes other folks don’t have to do that
scope of BHL collections - Agriculture sits just outside BHL’s scope but BHL has a significant agricultural collection, would this be considered core at any point?
folks said no agriculture would obscure true taxonomic core:
There is a symposium at TDWG 2017 regarding the importance of agricultural information in biodiversity. We often ask the TDWG attendees about their thoughts on BHL. This might be a good question for them.
Dr. Ng’s presentation: needs to be clearer option for high resolution downloads - you have to know where to click and it’s a convoluted process to get to the high rez images
Taxonomist from Malaysia delivered long rant about mistakes in literature by Linnaeus - BHL reproducing literature without pointing out the mistakes in literature is a problem
parallel discussion of the content would be nice - we used to have Disqus in BHL but had to be removed for privacy reasons - would really like to see this feature come back
Partners meeting had a lot of lightening talks about their status reports see
MC excited to see Tech priorities: full text search - when is this anticipated?!
Macaw to expect article metadata at the time of digitization
Adding landing pages for contributors
Downloading options for articles and embedding DOIs
MC did not bring up the bad IA PDF issue
not clear that the build your own PDF option
focus on in copyright materials for the future
BHL Annual Mtg notes?
valuation discussion from Kew prompted good discussion from the rest about how to sell value of being involved in BHL
how global that BHL has become - how to incorporate new Members/Affiliates into Collections CMTE and other groups
Strategic planning discussion was good: full text search, scientifically relevant illustrations, ability for people to annotate, downloading high rez images
levels of membership and funding needs to continue to be hammered out, some members can’t utilize IA scanning fund pool
Ely Wallis of BHL AU - look at institutional size, FTE, etc. for sliding scale of membership
CS has pics from start planning mtg and is in process of organizing notes to share more widely, maybe they need to be approved by new Secretary
MC and KT didn’t know too much about BHL Egypt stuff
BHL China - mostly focused on ingesting BHL, but MC’s not sure how much they’ve contributed to BHL

OTS articles
In Mike Lichtenberg removing EndNote export, he noticed some issues with Organization for Tropical Studies articles as outlined in the agenda above
William Ulate has been in touch with OTS folks (in Costa Rica) to try and understand the links to copyrighted material - should we be able to show or were these mistakes?
Susan Lynch points out that article metadata is “kinda crazy” - page numbers stuck in volume info field, also other metadata anomalies, copyright problems for link outs,
OTS put files onto web server in costa rica, links in BHL go to in-copyright content on a server in Costa Rica
over 100 articles from Zootaxa - which has specifically told BHL they don’t want their content in BHL
BHL def doesn’t want to link to content that is not accessible
there are existing broken links as problems too
would be good to err on the side of caution
could we do a global unpublish with possibility of restoring later? lots of folks agree!
but it’s been up for 5 years! BHL would prefer to stay out of the potential malay of the situation and take down
there’s some concern about losing access to legit open access content
Action Item: unpublish OTS content immediately and work backwards to open back up
static data, if they’ve made changes then we wouldn’t have gotten updates
can we get permission? BC has a long queue but can certainly add them to see
BC, SL, and TRS to work with Mike about unpublishing
BC to work on working backwards plan to get this content back when/if possible, work with CONABIO for Mexican titles
We are still waiting to hear from OTS who is supposed to be getting back to us in next week.

Redacting pages in BHL?
Not uncommon to have redaction happen in archival world
BHL not opposed but need some policy decisions
Field Museum needs to redact due to personally identifiable information (PII)
they are looking into adding notes into the MARC field
can add Copy specific information to item but this is a manual process
remember can download from IA!
as we do more field books this will continue
redaction is OK but we need policies about how to handle exactly

Collection analysis update:
We rant out of time unfortunately but Alicia was kind enough to provide the update below, please review!
https://docs.google.com/document/d/17jieoqKbnJIBeD6DcLGpRnMRp9xhMwVWPKOE39f3Ljk/edit?usp=sharing

Mar 20, 2017

Agenda

Notes
Attendees: Don, Trish, Matt Person, Joe deVeer, Katie, Gil, Pam, Diane, ME, Alicia, Marissa, Susan,

BC to send out Prezi link re: Annual Mtg presentation

Alicia has sent Susan the info about the LCSH pilot she mentioned last call
tech team hasn’t had time yet to review
Alicia working on possibility of pulling info from BHL full text
reading over Mining Biodiversity project, completed last year
reading publications that have come out recently from U of Manchester re: tools used for MB project
Could we pull location data to cross that with species data to see location coverage for example?
Trish recommends contacting William Ulate about Mining Biodiversity
Trish confirms that most of the work for Mining Biodiversity is done
William Ulate gave a presentation for CBHL about the project, Trish to ask him to share presentation with BHL, slides at least

Collection/Contributor Browse Page Change Requests
BC to talk to Pam about adding Gemini request(s) for making BHL collections browse page in similar format to Columbia U old coll browse page (http://library.columbia.edu/find/digital-collections.html)
Columbia U has new digital library page but old page a better model for BHL
Goal is to have institutional collections eliminated in favor of sticking with Contributor browse as institutional collection is duplication of contributor browse option
BUT need to add splash page functionality to contributor browse…how?
BC to revisit institutional collections and follow up with those that still have them: NYBG,
SL mentions that we’ll be having faceting by contributor
look at DPLA for faceted searches by contributor >> fulfills adv search request to be able to index by contributor
DW: would be fascinating to have map visual for contributors!
SL suggests a focus on usability and visual appeal into consideration first before checking in with Tech Team
Pam to come up with ideas about how to incorporate contributor landing pages into contributor browse page
then to discuss with coll cmte before tech team
SL suggests landing pages for contributors should have guidelines about how to design them, should be consistent so that users know where to go to find links to content in BHL in the same place
BC: maybe we could have a template that contributors fill out when creating their landing pages?
BC to check on status of collections browse Gemini requests for Pam. BHLFEED-58498. (This from Mike.)
she will be looking into BHL v2 wish list

review of Novara Expedition publications in BHL http://www.biodiversitylibrary.org/search?searchTerm=Fregatte+Novara#/titles

having records for both monos and series are nice to have…
some of the monographs have very specific information that is hard to capture in series record
Challenger expedition has nice volume descriptions http://www.biodiversitylibrary.org/bibliography/6513#/summary - you can understand volume content more easily
some users look at the publications different ways, series v. monograph

BC: So if we have monographic records for some volumes of the series, does this mean we need to add monographic records for all volumes?!
DR: Do it for people that have monographs in their collection
but don’t proactively add monographic records
In some cases this would mean a LOT of mono record editions and a LOT of work to hyperlink all
BC to follow up on this for coll deco policy for next call
DR: expeditions tend to have websites bibliographies with keys out there…do we want to scan these in? have available in BHL in some way?
Great question for follow up for next call.

Mar 6, 2017

Agenda

Notes
Attendees: Bianca, Michael, Gil, Pam, Joe, Patrick, Alicia, Diane, Kelli, Susan, Trish, Connie,

Do BHL libraries loan their journals? esp. BHL scanned items?
Most people just don’t loan journals at all, BHL scanned or not

What about withdrawals? Do folks withdrawal BHL scanned items?
Sometimes UIUC items get withdrawn w/out notifying original collectors
Cornell hasn’t come across issue yes - generally if important enough to scan than important enough to retain
UIUC does local loans only for grads and faculty
MCZ does loan to museum complex - up to discretion of circulation person, limited to local population
MCZ won’t withdrawal but will send BHL items to repository w/ caveat if multiple copies, critical to core, etc.

Cornell still binds
MCZ hasn’t bound since 2012 - no longer does in house and currently lacks resources
UIUC still does
Work with folks in bindery to set aside barcodes and refer to that barcode for digitization
SIL to reassign issues where they run into trouble to other BHL partners

as BHL has expanded to take more and more unpublished materials
field notebooks and journals have book-like appearance and fit in easily
digitized correspondence 5-6 letters w/ MARC record
HUH has large collection of correspondence by Asa Gray
to/from, year ranges, other specifics
misc correspondence can pose problems however, not specific enough for a nice MARC record
how to submit these as a collection into BHL but then also differentiate them?
add date before “Page” to keep page metadata clean but still see in page nav
but if more detail required, submitting correspondence details like articles to Rod Page to be processed via BioStor
Diane is doing this via Macaw, submitting page metadata as .csv and segment metadata as RIS file to Rod Page
see http://www.biodiversitylibrary.org/item/185537#page/1/mode/1up for example
segment type? listed as articles than letters
process a LOT easier than hand segmenting
2 kinds of documentation 1) page metadata via Macaw and 2) segment metadata to BioStor
Diane to write up soon

Kelli says the Hemipterist isn’t available in Oryx…?
Collections CMTE skeptical: seems like just some guy’s website
seems dubious…
MC can pass on to Ent department to check in with subject experts
Editor is also sole author, no clear peer review or submission qualification process
he’s the one who reached out not someone who wants to use it
check in with NHM about receiving
use caution before accepting, time to ask for more info
BC to continue to check with NHM
looks like vanity publication, group is dubious…

Alicia updates:
researcher took a science collection and used LCSH to create a tree for browsing purposes
Alicia to try something similar for BHL collection
created tool to parse through authority files and compare with MARC
big project that he has written in C# which Alicia does not know
his program breaks down LCSH strings to match against authority files
would this be beneficial…?
Is the program in github? there are thousands of lines of code
SL happy to take a look and bring it up on tech call, has experience in C++ and Mike L. has experience in C#
build a tool for browsing a collection
subject keywords for a collection and turn it into a hierarchical tree
matched as many headings as he could, those that didn’t match he would break down to try and find matches
AE thinking that BHL keywords might match more easily since already broken down, but not sure
only screenshots available so far, not sure if tool is available for users to check out online
BC suggests that there may be a way to get access to original LCSH for MARCXML passed at time of scanning for BHL items

BC to be out on maternity leave starting end of July! Will need volunteers for Collections CMTE calls over rest of summer

BC will send out outline for Annual meeting presentation to group ASAP - DONE

Feb 6, 2017

Agenda
Hathi Trust call follow up
Soulsby numbers can now be added via Admin Dash in support of Linnaeus Link

Notes
Attendees: Connie, Matt, Diane, Kelli, Bianca & Pam, Don, Joe, Katie, Patrick, Michael, Alicia, Trish & Ari, Susan,

Bianca shared notes from Hathi Trust call
There is a Fed doc depository
SUDOC - Superintendent of Documents Classification System… just a way to organize like call numbers
Kelli says "it’s kinda complicated" and here's some info on it via UIUC http://www.library.illinois.edu/doc/researchtools/guides/usfederal/sudoclist.html
Departments may keep lists but there isn’t necessarily a list of all
Fed docs just get shoved in storage, hard to find

Shared print is a thing!
University groups do these kinds of programs - e.g. big 10 academic alliance mostly serials
california too
project in the NE called East - shared serials & monographs
Aserial in SE - shared serials & monographs

Springer, Wiley & Elsevier - put print holdings in storage in Indiana so that folks can drop print and keep digital; use space for new holdings so long as they have digital too

MP: East Project - 60 institutions, taking stock of holdings, rather than hold/store and pare down, repository copies stored distributed throughout institutions; grant from Mellon foundation to assess minimum number of copies to save, given caveats and issues with missing volumes and catalog inconsistencies - but grant only covering mono portion; MP on serials portion of committee;
East project used GreenGlass to compare monographs - presentation at Charleston conference that Michael thought to be impressive, product used to be called “OCLC Collection Analysis” tool but it’s a completely redesigned redeveloped tool; DR has heard good things about it
Tutorials: https://www.oclc.org/support/training/portfolios/library-management/sustainable-collections/tutorials.en.html
How much does GreenGlass cost?

Cornell - shared print often overlaps with print retention agreements so that folks can retain minimum threshold within consortium

CRL paper as print archives registry, not sure that they’ve settled on minimum number of titles

Boston Library Consortium overseeing management of East project which is larger than BLC (18 or so libraries); East project is about 60 libraries - take a look at presentation from Charleston conference; serials part of East just getting underway now

Harvard Library has its own lib consortium group and not a part of East

MP to keep us posted on his participation with the East project for serials; did analysis of each library on serials committee - identified small sampling of 50 titles held in the fewest OCLC libraries

KT helped clarify that HT may have used GreenGlass tool to dedupe member holdings

Some other shared print examples (thanks to Matt Person):

Linnaeus Link network = works by Linnaeus and his students and works written about him
Soulsby numbers can be specified on title and segment records
NYBG actively digitizing Linnaean materials, maybe Harvard Botany too
Soulsby numbers typically in a 510 field - need to add as an identifier
possible using APIs to search title/part records by Soulsby identifiers
don’t have ability to pull everything from BHL database w/out Mike’s assistance
no way to search BHL UI by identifier
would be great to be able to search by this via our UI esp. for Soulsby, TL2 numbers, etc.
at least in Botany folks using bibliographic tools
would help our collections group too!
BC to add ticket in Gemini to request search by identifier

SL reports that there’s a DRAFT design for new BHL search
design for searches and faceting
dedicated search hardware

SL & NYBG getting off ground to add Soulsby numbers
nothing retrospective but has been processing 50 or so new titles, some monos, some articles
post digitization via Admin Dash
could be possible to add the Soulsby ID for mono records at the time of digi but SL hasn’t explored w/ ML
list of Soulsby titles is finite, not growing

Alicia:
exploring an idea of a registry just as HT did for fed docs, for BHL based on bibliographies
taking OCR of subject bibliographies to populate spreadsheet
compare to BHL exports…
based on fern bibliography project
SL reports that fern bibliography ended up prompting digitization of new editions vs. unique content
digital objects of interest to antiquarian book sellers not necessarily scientists
CR agrees may not be the best use of our time to digitize multiple editions, some are unique but this is more of an exception
Has to be analysis of the bibliography to eliminate some of these edition and other similar issues
might be problematic to generate OCR if post-1922 in-copyright content
also looking at Rod Page’s list of high priority serials
CR to see if she can find some of JJ’s spreadsheets
BC to seek out Rod Page’s priority serials list background
SL could we try to identify titles that HT holds that we do not hold within our collection
compare HT and BHL with regard to a smaller subject area, do LCSH and call number comparison
HT might be willing to work with us
SL and AE to email about HT comparison idea

Jan 23, 2017

Agenda:
Welcome Alicia Esquivel and the NDSR cohort!
NDSR project areas Coll CMTE can support:
  1. collection analysis
  2. collection browse pages for BHL 2.0
  3. else?
bringing in this microform? http://biodiversitylibrary.countersoft.net/workspace/292/item/56499
It's not necessary to link editions of the same work in BHL is it? Are they linked in other library catalogs? (http://biodiversitylibrary.countersoft.net/workspace/1137/item/5509 and http://biodiversitylibrary.countersoft.net/workspace/0/item/48567)
Anything further for HathiTrust call?

Notes:
Attendees: Connie, Don, Matt Person, Trish & Ari, Diane, Kelli, Alicia, Joe & Katie, Gil, Bianca
Katie Mika NDSR
crowdsourcing transcriptions
different types of resources that need to be transcribed
any kinds of unique features to those materials
selecting a transcription
Trish has had some experience with Purposeful Gaming
working with Joe to look at tools from PG project
what might be optimal solution for permanent transcription process for BHL
talked to Tech Team about what to do regarding BHL integration
to talk to Susan Lynch about John Torrey correspondence
what types of primary source material, in the future, are going to be a part of BHL
[x] add Katie to listserv

Ari - MBG; improving access to illustrations working with Trish & Doug
wants to learn more about Visual Resources document we put together
focused on images already IN BHL
DO people want to access the images? How important is the context to these images?
looking at Alicia’s project for subject analysis - could this be applied to the images?
visualization tools to discover images in the portal
potential applications for images
bring Ari in where there is something relevant
Ari to check in about VR doc and get back to us

Alicia - content analysis of what is/isn’t in BHL
has been looking at past documentation
read 2007/2008 collection analysis doc based on OCLC tool
wants to see if OCLC tool has improved since
strategies for coll. analysis page
bib analysis from 2015 - fern analysis
[x] pub Alicia in touch with Robin and Jackie
looking at other libraries for content analysis work
talking to Marissa about best practices in libraries
analyzing options for ROI time/labor

collection browse changes could fall under Pam’s work
identifying requirements
share with Marissa (best practices) and Pam (user priorities)
CR: more participation by residents in calls is better; hear about what we’re working on; more reasons to ask questions

any time we have something relevant to NDSR
MBG: Ari - Trish & Doug = Illustrations
MCZ: Katie - Joe & Connie = Transcriptions
SIL: Pam - Carolyn = User Priorities
CBG: Alicia - Leora = Collection Analysis
NHM: Marissa - Richard = Best Practices
send CR anything that needs to be reviewed for NDSR

no to microform Pat request
good to check in with Joe’s list about BHL Europe materials - google doc
editions: not really something you put in a bib record so how would we go about linking?
could be slight title changes
little different than serials - even if next edition comes out, previous edition could still be highly used
not worth pursuing linking editions
would be cool to do it but would have to be very manual
we don’t have man power and precedent
when someone searches on the title, not bibliographically linked but visually
could there be a see also hyperlink?
not really something that goes in the bib record so would have to be something new in BHL dataset
is linking editions a user priority?
all NDSR residents should be using all BHL tools including Gemini post boot camp

Susan Lynch talking about the fact that Soulsby number in BHL portal not searchable
could she add Soulsby number to Admin Dash
Soulsby to be added as an identifier category into Admin Dash
good to know for future portal edits
through APIs folks can get content via identifiers

Jan 9, 2017

Agenda
  1. HathiTrust Collections CMTE call prep
  2. Relevance questions
    1. http://biodiversitylibrary.countersoft.net/workspace/0/item/58857
    2. http://biodiversitylibrary.countersoft.net/workspace/1137/item/48722
    3. http://biodiversitylibrary.countersoft.net/workspace/1137/item/48737
  3. Year data project update
  4. Linnaeus link update (Susan Lynch)
Status
Count
Soulsby #s



Completed IA harvests
5
78, 148, 336, 765,2664
Article definition
2
76, 3648
Scanned by NYBG, Gemini ticket
2
162, 172
Copyright Concerns
3
3722, 3890,3984
Opened Gemini tickets
12

Attendees: Connie, Diane, Matt Person, Joe, Patrick, Gil, Bianca, Don, Susan, Kelli,

University of Toronto has been approved for Membership in BHL
let's ask Rod which edition of Century Dictionary he would prefer 1889-1891 is first edition
no one opposed
W. H. Hudson bibliography sounds good, bring it in
quite a few NIH Bulletin in Hathi (but might be Google scans and not great)
20 linear feet of shelving
a lot of material on parasites
would first think of looking in NIH
does Medical Heritage Library have this?
MHL based out of ? but collab among Wellcome, Yale, etc. - collection at IA
a number of bulletins on IA digitized by NIH - ingest NIH digitized copies - Not Google
worth ingesting already digitized copies
but not worth us digitizing ourselves...

FYI Year update metadata

MP: HathiTrust has always been on our radar, has BHL been on their radar? Weeding project, in some cases linking to Hathi; Has Hathi done foldouts?
BC: feel like BHL has a lot to offer HathiTrust
DR: report felt a lot like dejavu
BC: collaboration on collection analysis tools would be a great opportunity; NDSR resident also L.A. resident working on tools that we already have that we can share; seems like a friendly discussion, no agenda
would be nice to invite a HathiTrust member to our Collections CMTE call if relevant

NYBG has a volunteer that put together a spreadsheet
matched up rows from Linnaeus Link spreadsheet with bib numbers from NYBG catalog
NYBG records indexed by Soulsby number (in 510s and searchable) so able to match Linnaeus Link list to their
SL has been looking for low hanging fruit for ingest or digitization
lots of the Linnaean publications are classics - Wellcome library has produced great scans with good MARCXML - can be ingested from IA with little work on our part
SL has been able to satisfy Linnaeus Link request via articles in BHL
SL also opening Gemini tickets to prevent duplication of effort, many for NYBG and some for Harvard Botany - providing both Soulsby and OCLC numbers - mapping the numbers quickly can happen via NYBG catalog
publications can be confusing due to Latin titles and multiple editions
Soulsby numbers should display from 510s in BHL but won't be indexed b/c Notes aren't searchable
would have to add Soulsby numbers as
MO might have them in their catalog - have to have Soulsby numbers in their catalog if their contributing - this is how Linnaeus Link pulls records, they look for the Soulsby numbers in the 510 (is Linnaeus Link pulling from Botanicus or MO catalog? does it matter?)
SL could make NYBG spreadsheet shareable at some point
Only BHL members that LL can pull from are NYBG and MO, so works for print copy but if NYBG and MO don't hold then LL can't Zfetch
Hopefully once LL fulfills it's vacancy we can restart discussions

Notes for previous years