Cornell
BHL welcomes Cornell! The purpose of this page is to help document Cornell's integration into the BHL project.
January 9, 2012, 2:30 EST, Discussion
1-866-748-1119 passcode 1912897#
Agenda
- Getting Cornell U content from IA to BHL:
- 4093 items to be ingested from Cornell over the course of the next few days. Would be great to announce additions once complete, perhaps end of next week? PS This ingest will push BHL over 50k titles and 100k volumes!
- Inventory of Cornell scanned content:
- Marty, Mary, is it possible to get a list(s) of content you have already scanned? It would be good to know things such as what file formats you have for the scanned content, how many books vs. serials, any collections concentrations, do you have article content, etc.
- See overview of 4 sources of scanned content in notes below
- Future Cornell scanning for BHL
- Blog post: And while I'm thinking about it, it would be good to have a blog post about Cornell U joining the BHL. Marty, did you have someone that I could work with to help put this post together?
- Curious to know if Arxiv (now through Cornell U) http://arxiv.org/ - do they use a safe harbor model? Or how do they manage peer-to-peer sharing?
Notes
1/9
Attendees: Keri, Joel, Bianca, Martin, Suzanne, Marty, Mary, Joy Paulson, Francis Webb, Nick P.,
Yes! Over 4,000 Cornell records already in IA to be ingested into BHL over the next few days. Should see records in BHL early to mid next week.
Joy got a list of content that had been sent to IA and Marty estimated how many were relevant to BHL, est. about 5000
Eveline & Jim IV'd Marty have come up with a draft for a blog post
[ ]Bianca to follow up with them re: blog post
Have started working on a draft of content that has been scanned, (Joy & Francis)
[ ]Cornell should add to wiki and feel free to wiki away
195 volumes digitized for Entomology collection thus far - either bee keeping or rare, scanned locally, on drive or DVDs, all public domain, all scanned with Kirtas machine (JP2s) and scanning vendor in Montreal (TIFFs), they can deliver in any format
It is best to go through IA than directly into BHL. There is no clear direct path into BHL and via IA is the path of least resistance
Other staff at Cornell have submitted to IA but Francis has not yet
Question: How does the structural metadata transfer/translate? So long as its compatible w/ IA - Insure there is the specifically needed, appropriate structural metadata file and need a scandata file
Historical literature of Agriculture (CHLA) - some is still under copyright
Cornell has article level metadata in serials!
Cornell (c) concerns? pre-1923 content OK is the BHL policy, basically we don't want anything that Cornell isn't comfortable providing
there's some content w/ permission in CHLA, even some own Cornell publications would require permission before putting into BHL
what about collections boundaries? BHL is very broad in its definition of biodiversity, we welcome Agriculture, a well rounded biodiversity library benefits from supporting literature
soil science OK
historic home economics materials as well but not relevant
Asia collections, SE Asian travel narratives, but not sure how much natural history content
planning to continue to digitize, how should Cornell dedupe? Might be a good trial run to submit a list of content already scanned and considering inclusion into IA to test
Cornell to do a project to borrow books from others to replace pages, to date have put in missing page notices, BHL against "Frankenbooks"
relatively small % of monographs have missing pages, serials have a much higher % of issues
Can we display notices of missing pages somewhere for BHL users? Would be great to record if a page is missing but would this show up in BHL? Hmmmm
Gemini is BHL's feedback system used to collection info from users when they encounter missing page issues for example, see
http://biodiversitylibrary.countersoft.net/ -- follow up with Bianca re: Gemini questions
Cornell inserts image into book to indicate page missing
So what is step 1 for getting content into IA NOT scanned via IA, send 4 key things:
- page images in compressed Jpeg format
- page structural metadata
- MARC data
- Metadata that IA wants in its own format? Title + item metadata but mostly title metadata
Issues have been encountered with OCR for example, you can send OCR file if you have it, IA OCR file is a .txt but the structural information is in a different file, scandata file
Google doc digitization specs - should be updated
2 different scanning projects: 1 million book project & Google books
not sure about whether or not 1 million book project can be used, but some going into CHLA so would be brought into BHL via CHLA to IA upload
Natural History of New York materials have been scanned but not available online yet
Would be useful to dedupe re: rare Entomology collections
[ ]Bianca to get Cornell deduper log-ins to Joy and Francis
Trying to get as much Ento content online and available to Cornell users
How much content from BHL is in Summon? Summon had some trouble with the ingest
[ ]Martin & Suzanne status check re: BHL content in Summon
Cornell 1st phase putting in links to Ento content online, 2nd phase is FT search of Ento content
[ ]Keri to follow up w/ Cornell re: Summon since SIL considering discovery services
[ ]Suz to add Francis & Joy to BHL Staff list
Who should Francis talk to re: IA uploads - Keri Thompson; the scandata file can be a real doozie
Cornell needs permission to upload to <biodiversity> collection;
[ ]Keri to coordinate with IA to help secure this for Cornell...
useful to do a small test to make sure things are uploading to IA smoothly...
[ ]Keri could do a small dedupe test - against what's not been scanned yet in the SIL collection, Bianca can help
starting with monographs to see how it goes, fear of losing article data, lots of opportunity to learn from Cornell's process
October 7, 2011, 10AM EDT, Discussion
Attendees: Mary Ochs, Marty Schlabach, Bianca Crowley, Suzanne Pilsk, Grace Costantino
1-866-748-1119 passcode 1912897#
Agenda
Notes
Agriculture & Life Sciences, digitizing for a no. of years
core historical literature of Agriculture
project to digitize more Entomology (Comstock Library) b/c closing down, need to create a virtual library
Marty head of collection development (Mary used to be)
Rare collection is targeted for digitization
Range of digital projects existing <--> pursuing
in IA ~90,000 items but not all Life science materials, participated in Microsoft digitization project ==> content uploaded to IA, some materials from Mann Lib but not sure
some Home Ec content, large scope, Human Ecology, Food science, Social sciences -- Need to identify most relevant materials:
Action Item: look into why Cornell content not ingested via ingest criteria?
Action Item: identify specific call no. ranges or subject headings to pull in content that does not get ingested via the ingest criteria
Goal to get funding for scanning - membership in BHL could help ==> recommend discussions w/ Martin K. re: potential funding sources
a lot of material in Hathi, Microsoft ended, Google continues - 250,000 vols. have been scanned, not all public domain
targeted Mann Lib as Ag content gap filler
4 sources of content
- Microsoft scanning (earliest scanning project) ==> in IA mostly -- how's the metadata for these records? why weren't these ingested into BHL already?
- Google, 30,000 vols. ==> in Hathi -- should we incorporate these scans into BHL?
- Core historical literature of Agriculture, 11,000 monographic vols. & serials, running for several years
- Entomology content, already digitized titles/vols. 2 years ago sitting on discs tried to dedupe against BHL, 70 vols. have no home, rare, planning to digitize more - Canadian vendor for scanning, high level
have rare Entomology collection
open to considering gap-filling but sig. more expensive to do one off scanning
DLXS platform for core historical lit of Ag digital content
Wordsworth collection all in IA, Cornell created local interface
Gemini - Digital collections librarian (in Africa) & tech person - they'll get back to us
Cornell has contact w/ Chinese Academy of Sciences, also interested in Africa meeting post Life&Lit, lots of contacts, digital library of Ag journals
Q: BHL records into Cornell catalog through Serial Solutions
Q: Entomology virtual library want to access content from multi
BHL-staff list: Mary, Marty, maybe Joy
BHL-collections group: Marty
Marty coming to Chicago
3pm wiki agenda send to Marty & Mary
Gemini logins TBD
blog post re: Cornell U joining BHL
Gemini issue