BHL
Archive
This is a read-only archive of the BHL Staff Wiki as it appeared on Sept 21, 2018. This archive is searchable using the search box on the left, but the search may be limited in the results it can provide.

Functional Requirements


Collections Analysis <> Functional Req. 2 > Putting it Together > Future Needs > Final Thoughts

Functional requirements for the BHL Metadata Repository (MR)

11:00-12:30
Chris Freeland, MoBot - Proposed functionality of BHL MR [user interface]
Mark McKinney , Luratech/CCS
David Remsen, uBio/MBL/WHOI - demo of imbedded taxonomic intellegence

Presentations:

Chris' ppt and more chris CCS_2006.pps(Daniel's ppt) Klaus' ppt David's ppt

Notes:

Please correct any mistakes on this page, or add to the notes if you find your brilliant insight/question/response is not recorded!

User Interface
Chris Freeland (MoBot) has worked with scientists and other users on how they use infomtion on the web. Thinking about this he began to think of a prototype interface for literature. He demonstrated a possible interface for the BHL, with the following functionality:
other desiderata: relevance ranking of some sort for name search results; some sort of visual cue or color coding for format - e.g., name found is in an illustration vs. citation vs. description.
Check out Botanicus to see the first few functions in action

Some of the things Chris was using was Ajax for the Google Map like naviation - zoom, pan, etc.

The concept of two separate files of OCR and Image - so the OCR could have some editing - semantic mark up, format, corrections, etc. He has some experience with the distributed proof reading model.

CCS/Luratech

Mark McKinney (Luratech) explained how Luratech works with tools for JPG2000 and CCS works on content converstion and structure maps. They have begun to collaborate using both of their companies strengths to provide some interesting results.

Daniel Lanz (CCS) gave an overview of docWorks software which can

Klaus Jung (Luratech) gave an overview of Luratech's pdf compression ability and use of JPEG2000 (part 6)

Taxonomic Intelligence - "Names are what puts the 'B' in BHL"
David Remsen (Woods Hole) gave a demonstration of FindIt using an SIL title that had been scanned previously by Internet Archive (n.b. pretty dirty OCR), and made a case for name-level searching/tracing being an integral part of what the BHL should do.
The name is the metadata in biology but some problems are: lexical variation, taxon changes (1% a year), spelling errors, rectification of common and latin names.
uBio created taxonomic name recognition algorithm that stores recognized 'names' in NameBank - includes misspellings, synonyms, vernacular names. NameBank is linked to ClassificationBank which includes taxonomic heirarchies and synonyms. Tool is trainable. Current names db has more than 8 million names.
If we do taxonomic discovery of names in parallel with scanning for BHL, it will iteratively help the [BHL] OCR. It will also grow the NameBank which can help drive taxonomic initiatives elsewhere (e.g., GBIF & Species2000).
LinkIt demo using SIL title (click on uBio LinkIt under OCR uncorrected text)- on-the-fly recognition of names, puts synonyms, misspellings in index; cross-index with other taxon lists (ITIS, Species2000); If you go to uBio and upload text, you can browse your scanned text based on class or alpha index of all names.


Discussion:
Q: Where are the names indeces in relationship to the texts? They are outside the text -- you always want to go to the index each time you view your document, since the namebank works iteratively, the taxonomic intelligence is always improving as the bank grows.
Tom G. agreed that we must have this [taxon. intell.] component early on in the BHL development so that the BHL has immediate value to the taxon. community.

Q: Does the NameBank use GUIDs? Yes, it generates it's own GUIDs, since it's gathering irregular (misspelled, vernacular) names. Eventually all the "good" names can get an additional proper taxonomic id, and then all the different GUIDs (ITIS, GBIF, etc.) could be mapped -- that's how LinkIt works.
Names would also need our BHL GUID for page level linking out to other resources (GBIF, etc.)


Decisions/Action Items: