SL07_TechnicalDisucssionNotes
Technical Discussion Topics II (Thursday, September 13, 2007: 9:00 – 10:30)
BHL PORTAL DEVELOPMENT UPDATE
Use Case
Perhaps use case development would be a good way to involve member's reader services staff - it was suggested that we devote a wiki page to use cases.
Example of researcher looking for one volume of a monographic series - this is not always easy. What would help: citation resolving. Another useful tool, being able to search on taxonomic name. Chris demo'd the names search in Botanicus (excellent!) - this is a very relevant search, since it is *only* searching meaningful names, it is not just a full text search. Later enhancements would include common names in the search, though that will require a resolution tool with more common names, etc. This is somewhat complicated. Adding and resolving common names would be a great way that users could contribute.
EOL and GBIF are working to develop a 'global names architecture', of which common names are a part. There would definitely be an opportunity for the community to contribute there.
Name Administrator Demo - Chris then demo'd a module that gives an administrator the ability to manually find and add names that are not in uBio's Namebank. They are working on even more cool tools.
Paginator Demo - The paginator is the nifty tool MoBot uses to provide page-level metadata. They use it as part of their QA process, but if we could get a 'first draft' of likely page level metadata from the Penn State process, paginator might be the way we can confirm/correct that data. It would not add too much time on to processing the book. Plans are to eventually make the paginator a web-based application (it is currently Windows only) this might be a way we could offer distributed editing by the community.
Improvements in the pipeline:
-ability to add piecesparts-level metadata (chapter, article) -- could we maybe make this level of informational addressable? (DOI, handle)
-automate the addition of part-level metadata (this would involve NSF, math, etc.)
-making sure the xml from Penn State is in the NLM dtd (also map it to MODS?)
-interface needs a little tweaking
-entire data architecture needs modification to accommodate the 'piecesparts' (article/chapter) level metadata
-resolution of security issues to enable release of paginator outside the MoBot firewall
Should IA re-host or synch up the new improved data? Should IA have the paginator? Should the community (the eagle-eyed librarian/taxonomist/biologist part of the community that is) have the ability to edit metadata post-scanning? This might be a discussion topic at the OCA technical meeting. This is tied in with the larger question of the potential new IA architecture, the question of where the images reside in relation to the metadata, and the whole Fedora thing.
It is important that we build in some user tools (2.0 stuff) to our end product, but the larger issue is staying flexible so that we can fit in to whatever architecture EOL has, allowing them to serve up what ever type/level of literature they need. We might want to have RSS feeds available for the species pages that show the latest texts being digitized for that species, or the top
n cited titles, etc.
Timeline/priorities/emerging tasks document (link to that here once it's up)
#1 priority is to finalize the process of getting scans into the current BHL portal, and functionality for the current portal before we start any fedora work (we already have a 400,000 page backlog!) One of the main problems we are facing is the slow pace of downloading - we need IA to eliminate the (ever narrowing) bottleneck for institutions who are downloading their content.
#2 get a handle server (UIUC and Harvard are both running a non DSpace handle server - we might need their help)
Other priorities, with ?
-OAI - must implement in a real way
-need a 'title bank' (a la TDWG's "pubbank") an editable resolver for titles - who should own/operate this? needs to be discussed among TDWG/BHL/EOL
-tagging: we can't forget about this, though we also shouldn't sweat over this feature for first iteration
-
GET BETTER OCR - we will discuss this more in the afternoon session
Processes and priorities that might change as we move towards Fedora:
data modeling, interface, and schemas that are particular to Fedora, yearly meeting moving to March impacting Fedora proposal acceptance.