Notes and Action Items.
Friday 3rd December - TMB & Technical Meeting
BHL-E Schema Discussion – [Wolfgang Reporting from Presentation slides]
Standards Used:-
1. MODS – for Bibliographic data
2. ODRL – for IPR
3. DWC – Darwin Core for Taxonomic Data
4. MIX – based on reqmt of more details for images. (Metadata for Images in XML) for scans.
- DWC is insufficient for storage but okay for Taxonomic information, for taxon names at page level.
- Agreed: We need to understand ABCD and DWC2 further offline. – Further Investigation needed.
Schema Structure diagram:-
- The Detail is defined by bhl Level and bhlSequence gives the schema order, bhlElement is nested by bhlSubElement,
- bhlGUID defines where it belongs – defined no.as in where is my own no? followed by bhlParentGUID.
Item Structure:-
- bhlTaxon, define taxon names using the DWC terms.Item level and page level.
- Integrates ODRL – bhlRights
- bhlFile – links to files and also for embedding files (optional)
- bhlDateCreated and DateLastMod – taken from reqmts list.
- bhlOCRSource – for extra info for OCR –was it enhanced afterwards etc.
- Putting stuff together in one file as opposed to being Flat, stuff has to be more readable as opposed to being complex. BS.
- Data Structure is quite martial. However GUID element enables you to split things up (elements). You can submit one metadata file with the scans and then split things up.
- Q: Who is generating the object around the schema?– Walter.
- A:It could be the schema mapping tool prior to the Pre-Ingest.
- If using MODS it is already packaged i.e bibliographic information and then to
- itemInformation within the Schema Structure.
Questions from group:-
- Use Cases for a Journal structure….can this be done?
- Automatic data and handchecked data – shld we make a differentiation between the 2 types of data?
- This tool needs to be integrated; It will be useful for the Catalogue of life.
- We need to know how the data is obtained? i.e. from the lowest taxonomic rank?
- How much info is tagged and marked i.e level of granularity provided by Content Providers.
- We need to determine level of information.
- QJana: What about a Tool for annotations (pdf)?
- Recommendation is that Content Providers should do this themselves.
- Suggestion is to build the metadata in the pdf itself. It will be easy to provide a
- plug-in to write into the pdf itself.
- Agreed: We need to define the process that goes around the structure.
- Potential issue: can the item be separated from the metadata? They (author) may
- not want us to use without citing.
- Answer: We can process the OERL…ie offer downloads within the pdf.
- Adrian: - The Legacy is for the Portal to be used by all freely – Sustainability. We need to think forward as in what happens next? Hence, for the future we need to think of an IPR function.
- Functions need to be included within the Schema structure so that we can plug and play into it at a later stage.
- There is the CP side and the Technical side.(data is not generated for the Pre-Ingest - AIT).
Various discussions and considerations from the group:-
- The Schema sets up our entire data model which is a key component within our deliverables.
- We should sample data with a few MODs and move them around the Structure which can be mapped into Schema.
- Validation of Use Cases is needed against the proposed Schema.
- The CP has to map it against this Schema? Ideally they should give it to us with this Schema..
- Confirmed: this step is before the Pe-Ingest.
- We can’t force CPs to adopt our Schema from the start.
- The data is going to Europeana which will be based on this Schema structure.
- Data will now be delivered on the current structure from henceforth – Confirmed by AIT.
- The Standard is already in place, if in MODs then it can easily be mapped into this Schema structure.
- QMelita: CPs can send data in any form was the original understanding and now they need to map it using this current item Structure?
- AWolfgang: We already have tools in place to do the mapping, bigger institutions shouldn’t really have an issue using the tools and smaller grps will have assistance.
- We start to ingest in January.
- Of course there will be concerns to begin with but after a few weeks we should be able to ingest information pretty quickly.
- Q: If we have to go to partners and set up tools for mapping…when will we do this exactly?
- QBoris: The CP and Pre-Ingest connection should be part of the Pre-Ingest but now it seems to be divided and we have to provide them with an interface beforehand.
- Melita/Jana: - We will have to Ingest material into our repository quickly…training and tools need to be ready very soon.
- Even if the tools are ready, we will need documentation for end-users, Training plan etc for CPs.
- Wolfgang: 6CP can already be mapped to the new Schema easily. Integrating MODS into our Schema is a straightforward process.
- Some will be delivering data already in MARC21, MODS, DWC
- Mapping to be delivered on NHM servers.
- For those that have it in their own format, then it will have to be done separately or we can provide them with a tool.
- We have a small process flow.
ACTION 1:
- Working Group to be put together: Group members should consist of Patricia, Wolfgang, Heimo and several others who should be involved in the decision making.
- Proposal by Xmas:Agreed. Discussion to include Pros, Cons etc.
- Pre-ingest depends on Schema and hence decision to be made asap.
ACTION 2:
- Mapping before Xmas if possible will identify those institutions whereby we can’t ingest easily.
- Wolfgang will do mapping beforehand. Will provide mapping for MODs and MARC21.