Notes and Action Items.

Friday 3rd December - TMB & Technical Meeting

BHL-E Schema Discussion – [Wolfgang Reporting from Presentation slides]

Standards Used:-
1. MODS – for Bibliographic data
2. ODRL – for IPR
3. DWC – Darwin Core for Taxonomic Data
4. MIX – based on reqmt of more details for images. (Metadata for Images in XML) for scans.

DWC is insufficient for storage but okay for Taxonomic information, for taxon names at page level.
Agreed: We need to understand ABCD and DWC2 further offline. – Further Investigation needed.

Schema Structure diagram:-

The Detail is defined by bhl Level and bhlSequence gives the schema order, bhlElement is nested by bhlSubElement,
bhlGUID defines where it belongs – defined no.as in where is my own no? followed by bhlParentGUID.

Item Structure:-

bhlTaxon, define taxon names using the DWC terms.Item level and page level.
Integrates ODRL – bhlRights
bhlFile – links to files and also for embedding files (optional)
bhlDateCreated and DateLastMod – taken from reqmts list.
bhlOCRSource – for extra info for OCR –was it enhanced afterwards etc.
Putting stuff together in one file as opposed to being Flat, stuff has to be more readable as opposed to being complex. BS.
Data Structure is quite martial. However GUID element enables you to split things up (elements). You can submit one metadata file with the scans and then split things up.
Q: Who is generating the object around the schema?– Walter.
A:It could be the schema mapping tool prior to the Pre-Ingest.
If using MODS it is already packaged i.e bibliographic information and then to
itemInformation within the Schema Structure.

Questions from group:-

Use Cases for a Journal structure….can this be done?
Automatic data and handchecked data – shld we make a differentiation between the 2 types of data?
This tool needs to be integrated; It will be useful for the Catalogue of life.
We need to know how the data is obtained? i.e. from the lowest taxonomic rank?
How much info is tagged and marked i.e level of granularity provided by Content Providers.
We need to determine level of information.
QJana: What about a Tool for annotations (pdf)?
Recommendation is that Content Providers should do this themselves.
Suggestion is to build the metadata in the pdf itself. It will be easy to provide a
plug-in to write into the pdf itself.
Agreed: We need to define the process that goes around the structure.
Potential issue: can the item be separated from the metadata? They (author) may
not want us to use without citing.
Answer: We can process the OERL…ie offer downloads within the pdf.
Adrian: - The Legacy is for the Portal to be used by all freely – Sustainability. We need to think forward as in what happens next? Hence, for the future we need to think of an IPR function.
Functions need to be included within the Schema structure so that we can plug and play into it at a later stage.
There is the CP side and the Technical side.(data is not generated for the Pre-Ingest - AIT).

Various discussions and considerations from the group:-

The Schema sets up our entire data model which is a key component within our deliverables.
We should sample data with a few MODs and move them around the Structure which can be mapped into Schema.
Validation of Use Cases is needed against the proposed Schema.
The CP has to map it against this Schema? Ideally they should give it to us with this Schema..
Confirmed: this step is before the Pe-Ingest.
We can’t force CPs to adopt our Schema from the start.
The data is going to Europeana which will be based on this Schema structure.
Data will now be delivered on the current structure from henceforth – Confirmed by AIT.
The Standard is already in place, if in MODs then it can easily be mapped into this Schema structure.
QMelita: CPs can send data in any form was the original understanding and now they need to map it using this current item Structure?
AWolfgang: We already have tools in place to do the mapping, bigger institutions shouldn’t really have an issue using the tools and smaller grps will have assistance.
We start to ingest in January.
Of course there will be concerns to begin with but after a few weeks we should be able to ingest information pretty quickly.
Q: If we have to go to partners and set up tools for mapping…when will we do this exactly?
QBoris: The CP and Pre-Ingest connection should be part of the Pre-Ingest but now it seems to be divided and we have to provide them with an interface beforehand.
Melita/Jana: - We will have to Ingest material into our repository quickly…training and tools need to be ready very soon.
Even if the tools are ready, we will need documentation for end-users, Training plan etc for CPs.
Wolfgang: 6CP can already be mapped to the new Schema easily. Integrating MODS into our Schema is a straightforward process.
Some will be delivering data already in MARC21, MODS, DWC
Mapping to be delivered on NHM servers.
For those that have it in their own format, then it will have to be done separately or we can provide them with a tool.
We have a small process flow.

ACTION 1:

Working Group to be put together: Group members should consist of Patricia, Wolfgang, Heimo and several others who should be involved in the decision making.
Proposal by Xmas:Agreed. Discussion to include Pros, Cons etc.
Pre-ingest depends on Schema and hence decision to be made asap.

ACTION 2:

Mapping before Xmas if possible will identify those institutions whereby we can’t ingest easily.
Wolfgang will do mapping beforehand. Will provide mapping for MODs and MARC21.