BHL-E_ADD-Interface-SIP
BHL Europe SIP
The Submission Information Package (SIP) is the payload used at the interface between
It is the base information package for
The SIP is a
METS profile. It contains references to scanned digital data, OCR data, Structural information for displaying a table of content and related items. The structural information is also used to created composed items. The Descriptive Metadata is represented by MARCXML as MARC is a wide spread standard in the librarian sector.
-
Jan 27, 2010 for easier handlig of articles and images I propose a change for the SIP. Unlimited amount of descriptive metadata which needs to be referenced in the logical structural section. therefore the logical section may contain references to individual articles, all described separately. Same works for images of animals that can be identified separately now.This results in a new level of information beneath page.
-
Jan 28, 2010 I'm wondering how MODS (or MARCXML) will be able to store the required metadata for articles / images & pages? METS is definately able to map the structure of a given item (item in sense of physical item). However as we need at least 4 Levels of detail (Monograph / Series => Volume => Article / Chapter => Page / Image) which all will have individual metadata requirements, I don't think that MODS (or MARCXML) will be able to describe all required information. That's why we need to define our own BHL-Schema, which we might embedded into METS, and maybe even use MODS for Monograph / Series level, but at least for chapter / page level we need our own information (and e.g. images occur as use case very often). We cannot declare our METS schema (or even decide to use it) before it is clear what requirements we have. And with a pure MODS / METS combination we wont be able to map all required data.
-
Jan 28, 2010 Yes, and we are currently trying to get some more information form the content providers. I like your thinking about the levels and we need to harmonize these with FRBR. I think we can use the mets:structMap for describing multiple levels like issue/article/chapter/page/column/image and use MARC's 787 to generate at least some of these structMap divs.
-
Jan 29, 2010 I like to idea of outlining the structure in the mets:structMap. I've just sent a message with a new Google Doc to the metadata group, we should continue the discussion there.
Standards
The Library of Congress' Network Development and MARC Standards Office developed a framework for working with MARC data in a XML environment.
MARC XML could potentially be used as follows:
- for representing a complete MARC record in XML
- as an extension schema to METS (Metadata Encoding and Transmission Standard)
- to represent metadata for OAI harvesting
- for original resource description in XML syntax
- for metadata in XML that may be packaged with an electronic resource
METS (Metadata Encoding and Transmission Standard) is a standard for encoding descriptive, administrative, and structural metadata regarding objects within a digital library, expressed using the XML schema language of the World Wide Web Consortium. The standard is maintained in the Network Development and MARC Standards Office of the Library of Congress, and is being developed as an initiative of the Digital Library Federation.
METS, a Digital Library Federation initiative, attempts to provide an XML document format for encoding metadata necessary for both management of digital library objects within a repository and exchange of such objects between repositories (or between repositories and their users). Depending on its use, a METS document could be used in the role of Submission Information Package (SIP), Archival Information Package (AIP), or Dissemination Information Package (DIP) within the Open Archival Information System (OAIS) Reference Model.
A METS document consists of seven major sections:
- METS Header (metadata describing the METS document itself)
- Descriptive Metadata (may contain external, like MARC, or internally embedded descriptive metadata, or both)
- Administrative Metadata (provides information regarding how the files were created and stored, intellectual property rights, etc.; may be either external to the METS document, or encoded internally.)
- File Section (lists all files containing content which comprise the electronic versions of the digital object.)
- Structural Map (outlines a hierarchical structure for the digital library object, and links the elements of that structure to content files and metadata that pertain to each element.)
- Structural Links (allows METS creators to record the existence of hyperlinks between nodes in the hierarchy outlined in the Structural Map.)
- Behavior (this section can be used to associate executable behaviors with content in the METS object.)
Metadata Object Description Schema (MODS) is a schema for a bibliographic element set that may be used for a variety of purposes, and particularly for library applications. The standard is maintained by the Network Development and MARC Standards Office of the Library of Congress with input from users. MODS is expressed using the XML (Extensible Markup Language) schema language. MODS was designed as a compromise between the complexity of the MARC format used by libraries and the extreme simplicity of Dublin Core metadata.
Best Practice and Implementation
The METS profile was chosen because it supplies all structural and descriptive information needed for BHL. It was also chosen by other projects as described here
Open source software providing a service-oriented architecture for managing and delivering their digital content. Fedora is a integrated, repository system that enables storage, access and management for virtually any kind of digital content. Supports METS ingest.
A digital repository system that captures, stores, indexes, preserves, and distributes digital research material. Supports METS for ingest and export, as well as METS AIPS.
METS PROFILE DESCRIPTION
URI
[not registered yet]
http://www.bhleurope.eu/ingest/sip/20091005.xml
Title
BHL Europe METS Document Profile for Submission Information Package (SIP)
Abstract
BHL-Europe is building an
Open Archival Informaion System (OAIS) compliant web application. This implementation has the need for a
Submission Information Package (SIP) for the
ingest module.
METS profiles can contain all information needed for the Use Cases described by the consortium.
Due to various
pre-ingest tool providers and the need for a standard, this profile is intended to provide all information needed by the content providers to get the most features out of BHL-Europe's web application for their items.
This profiles describes the SIP and may also be used as
Dissemination Information Packages (DIPs) or
Archival Information Packages (AIPs).
Creation Date
2009-10-05 --:-- CET
Contact Information
- Institution: AIT - Applied Information Technique Ltd
- Address: Klosterwiesegasse 32/I, 8010 Graz, Austria
- Phone: +43 3168 35359 0
- Mail: admin@ait.co.at
This document was prepared with the assistance of:
- Alexander Herzog, AIT
- Jürgen Pammer, AIT
- Walter Koch, AIT
- Lee Namba, ATOS
- Kai Stalmann, MfN
- Bernhard Scaife, NHM
Related Profile
No related profiles
Extension Schema
The following schemas will be used for BHLE SIP
MARCXML: The MARC 21 XML Schema
DjVuXML-s
Rules of Description
MARCXML
The following fields are required for BHL-Items as these describe the
BHL Deduplication Subset (BDS):
- Marc Leader Pos7 - Category: Monography or serial component
- 245$a$c - Title: The title of the BHL-Item
- 100, 700 - Creator: the creator of this BHL-Item
- 260$b - Publisher: The Person or organisation responsible for publishing the original object
- 250$a$b - Edition of monographs: The edition of the book beeing scanned
- Date Created: Date of the creation of the digital object (part of METS:ROOT element)
- Date Last Modified: Date digital object was öast updated (part of METS:ROOT element)
DjVuXML
The OCR output of DjVuXML will be used for highlighting words in the bookreader and as source for fulltext search. Therefore, the DjVuXML has to provide word coordinates.
Controlled Vocabularies
- no controlled vocabulary has to be used.
- nevertheless, there will be several controlled vocabularies used within the MARC records.
Structural Requirements
mets Root Element
- Each SIP describes one and only one Record that will show up in BHL-Europe (=BHL-Item). In case you need to describe multiple items for series, create one SIP per sub item and link them within the logical structure map to a compound item.
- Each SIP must have a mets:objid with an URI like oais:<INSTITUTION>:<INTERNALIDENTIFIER>. These identifiers will be used for deduplication and will be replaced by a BHL internal URI during ingest. The original mets:objid will be moved to the mets:metsHdr.
mets Header
-
mets Descriptive Meta Data
- Each SIP must have one mets:dmdSec/(mets:dmWrap|mets:dmRef)[@type="MARC" and @label="marcxml"] node. These information will be used for indexing the metadata. If multiple nodes of this type are present, then the first or the one with the status current will be used.
- For fulltext search and GnuBook highlighting a mets:dmdSec/(mets:dmWrap|mets:dmRef)[@type="OTHER" and @use="djvuxml"] must be available. If multiple nodes of this type are present, then the first or the one with the status current will be used.
mets Administrative Meta Data
- TODO: describe rights and image preservation metadata
mets File Section
- For GnuBook support a mets:fileSec/mets:fileGrp with binary or reference data of images (tiff, jpg, jp2) must be available. Depending on the use attribute, master, reference and thumbnail mets:fileGrp will be generated.
- For preservation a mets:fileSec/mets:fileGrp[@use="master"] with binary or reference data of images (tiff) must be available.
mets Structural Map
- For page numbers and page names support a mets:structMap[@type="physical"] must be supplied with a root div containing all leafs. The containing leafs must be named by the orderlabel attribute and put in sequence with the order attribute starting at 0.
- For logical item structure navigation support a mets:structMap[@type="logical"] must be supplied.
- The mets:structMap types should use one of the following values: frontcover, halftitlepage, titlepage, imprint, dedication, inspiration, foreword, preface, toc, lot, lof, introduction, chapter, part, afterword, bibliography, references, appendix, glossary, index, colophon, promotion, backcover
mets Structural Link
-
mets Behaviour
- [DRAFT] The behavioural elements are used to do automated image transformation
- [DRAFT] The behavioural elements are used to control incremental updates only
Technical Requirements of Content, Behavior and Metadata Files
Images
For the mets:fileGrp the following file types are accepted:
- Bitonal images must be 300dpi-600dpi TIFF.
- Grayscale images must have 300dpi - 600dpi as uncompressed TIFF or lossless compressed images like LZW or JP2000) in 8bit color depth.
- Color images shall be between 300dpi and 400dpi. 24 bit color as uncompressed TIFF or lossless compressed JP2000 or LZW.
- NOTE: For BHL JP2000 files can be compressed by 15%.
Tools and Applications
- BHL Europe Portal (in development)
Examples
- sip.zip (includes the DjVU XML and a malformed SIP)