CiteBank Oct 2010
Back to
CiteBank homepage
Conf call re: Citebank and Orange Bag Inventory
Attendees: 10/6/10 Chris, Bianca, Trish
Discussion
What date can we give contributors as to when Citebank will go live?
Dec 1st (Phase 1, “Soft launch”) but only technically ready for content to be loaded, doesn’t mean all content will be in by then.
To date, David has secured OAI bridge in staging site. Production site (Citebank.org) does not reflect updated code on staging site.
We don’t have a service level agreement yet that defines what we’ll do for them and in what time frame so we can’t promise any definitive timelines. For persistent inquirers we can say “We’re sorry we don’t have a firm date we can give you but thank you for your interest in CB”
Currently Citebank content is populated from OAI feeds (merged titles, publishers) includes SCIELO (ONLY THAT RELATED TO BHL), SMITHSONIAN DSPACE, BHL PORTAL, ZOOKEYS (ONLINE PUBLISHER), BHL user-generated PDFs
By 10/9/10 Citebank development interface will be put into a staging area for internal review. Need to review Solr faceting. We will have until Dec 1st to incorporate internal feedback and fix. After Dec 1st date we will not change URLs (links should remain stable to the public).
We will only announce CB (Dec 1) to selective groups in the hopes of getting more feedback before we do a wider announcement in Spring 2011.
Where do publishers fit into Phase 1? We should test import of publishers data with stuff we have in hand (i.e. physically in our orange bag or stuff that has been sent to us electronically such as American Mosquito Control)
As we test data import in the next 2 months need to document process of bringing it and create guidelines for content providers.
Need to identify “Classes” of content providers (types would include: publishers, individuals, groups/associations)
Another distinction - those that want us to point to their content on their site and those that want us to load their content into CB (note: it hasn’t been decided from exec committee whether we want CiteBank to serve the role of pointing out to content)
Also identify Level of effort (easy, moderate, hard?) – Criteria include: For example if we can get their data via OAI feed or API they fall into easier category. What format for content files (TIFF or PDF, PDF is easy, TIFF is difficult) What format for md (endnote, commonref format, XML, MODS)?
CB accepts single document upload so articles, book chapters, etc. in a single file are permissible; multi-file objects are better accessed via BHL portal
!!! Need guidance from Tom and execs about how much effort we should extend to get providers content into CiteBank [This will be based on what Trish & Bianca learn during test cases w/ input from dev team]
Document process for adding one‘s content to CB themselves. We will build tools to help folks be self service. We won’t scrape websites to get content. Bib formats we support Endnote, Bibtext…. Can either upload citations one by one or upload bibliographies
Trish task – need to identify md fields that need to be in an excel spreadsheet for upload (this should be same as md fields on online form for creating md)
Dev team to develop a means to allow content providers to contribute md via spreadsheets
Trish and Bianca go into Citebank.org to sign up for account, Evaluate import and add/edit functionality.
Go to BHL internal wiki (search on “bibliographies” – can use this data to test loading features). Give feedback dev dudes, cc Chris on any requested changes. Questions/discussion points best relayed via email. Redmine is tracking system for Citebank and where we put requests for changes
http://projects.biodiversitylibrary.org/. *Trish needs to get access to this system.
No current way to upload pdfs in bulk but would want to request this as a feature in Citebank to dev guys. Willing to meet providers halfway – if they provide metadata we will handle loading of content and syncing with md
If there is no existing md but contributors are willing to create it what is the minimum standard/format for them to enter their data? Trish and Bianca should decide (include in documentation to providers). The field should parallel what we require in our online md form
CB as repository for md and CONTENT, also links to BHL book content (BHL PDFs will live in CB); could provide links to other content but then not replicable or avail for taxonomic intelligence services or BHL APIs = not ideal [BC checked w/ TG: if providing links in CB to another open access repository then no permissions agreement is required]
Need to synchronize CB content types (book chapter, article, species description, original descriptions/treatments [to accommodate PLAZI] etc.) with user-generated-PDF creation
Orange Bag Inventory
Other info to add to the orange bag spreadsheet to help manage this info
- Class*
- Level of effort*
- contact email or phone#,
- initial contact date
- last contact date and who talked to them from BHL,
- content file types (e.g. pfds, TIFFS)
- content file condition ( e.g. missing PDFs for Vol 2)
- has contributor sent files physically or virtually?
- md file types (e.g Excel spreadsheet)
- md file condition (have titles only; md is ready to be ingested but author names are not properly parsed; no md exists and contributor cannot create for us)
*Need to keep working definitions of these immediately in order to make sense of what’s entered into spreadsheet; Trish & Bianca to determine initial level of effort, but dev team will need to review/confirm
Timeline
Conversation about where are we from technical dev side on Citebank (include dev guys, Chris, Bianca, Trish) this Friday at (9:30 CST).
Oct 19th larger discussion about citeBank about goals, vision, priorities (include Tom, Bianca, Chris, Trish, dev dudes) Trish set up
Trish test uploads this week, Bianca join next week
Documentation to create by Dec 1st
- provider classes and their definitions
- Feedback on test cases
- Test md uploads with contributed bibliographies Bibliographies
- Test md and content loading needs for specific providers
- Designated as 1 priority by Tom (Journal of East African Natural History, Journal of Ethnobiology, UniBio Press titles).
- Bianca’s priorities – permission titles requiring part digitization, part PDF loading (Asiatic Herpetological Research, Mitteilungen der Munchner Entomologischen Gesellschaft, Opisthobranch/Shells & Sea Life, Journal of KY Academy
- guidelines for content providers (major/minor publishers) to get content into CB (not for individuals to upload their content)
- list of questions to ask content providers to assess the nature of their md + content (to be approved by TG & CF):
- Do you have an OAI feed or APIs?
- What format or file types do you have for your articles/documents?
- Do you have metadata for your articles/documents? [If no then contributors will be invited to create/enter their appropriate md into a suitable format, otherwise no dice].
- What format is your md in?
After Dec 1st how do we prioritize what contributors to load?
- BC proposal: send out list of questions to remaining content providers on “Orange Bag” spreadsheet and evaluate from there