BHL
Archive
This is a read-only archive of the BHL Staff Wiki as it appeared on Sept 21, 2018. This archive is searchable using the search box on the left, but the search may be limited in the results it can provide.

BHL-E_ADD-Interface-SIP

BHL Europe SIP

The Submission Information Package (SIP) is the payload used at the interface between
It is the base information package for
The SIP is a METS profile. It contains references to scanned digital data, OCR data, Structural information for displaying a table of content and related items. The structural information is also used to created composed items. The Descriptive Metadata is represented by MARCXML as MARC is a wide spread standard in the librarian sector.


- herzoga_ait herzoga_ait Jan 27, 2010 for easier handlig of articles and images I propose a change for the SIP. Unlimited amount of descriptive metadata which needs to be referenced in the logical structural section. therefore the logical section may contain references to individual articles, all described separately. Same works for images of animals that can be identified separately now.This results in a new level of information beneath page.

- wkoller wkoller Jan 28, 2010 I'm wondering how MODS (or MARCXML) will be able to store the required metadata for articles / images & pages? METS is definately able to map the structure of a given item (item in sense of physical item). However as we need at least 4 Levels of detail (Monograph / Series => Volume => Article / Chapter => Page / Image) which all will have individual metadata requirements, I don't think that MODS (or MARCXML) will be able to describe all required information. That's why we need to define our own BHL-Schema, which we might embedded into METS, and maybe even use MODS for Monograph / Series level, but at least for chapter / page level we need our own information (and e.g. images occur as use case very often). We cannot declare our METS schema (or even decide to use it) before it is clear what requirements we have. And with a pure MODS / METS combination we wont be able to map all required data.

- herzoga_ait herzoga_ait Jan 28, 2010 Yes, and we are currently trying to get some more information form the content providers. I like your thinking about the levels and we need to harmonize these with FRBR. I think we can use the mets:structMap for describing multiple levels like issue/article/chapter/page/column/image and use MARC's 787 to generate at least some of these structMap divs.

- wkoller wkoller Jan 29, 2010 I like to idea of outlining the structure in the mets:structMap. I've just sent a message with a new Google Doc to the metadata group, we should continue the discussion there.

Standards

MARCXML - http://www.loc.gov/standards/marcxml/

The Library of Congress' Network Development and MARC Standards Office developed a framework for working with MARC data in a XML environment.
MARC XML could potentially be used as follows:

METS Profile - http://www.loc.gov/standards/mets/

METS (Metadata Encoding and Transmission Standard) is a standard for encoding descriptive, administrative, and structural metadata regarding objects within a digital library, expressed using the XML schema language of the World Wide Web Consortium. The standard is maintained in the Network Development and MARC Standards Office of the Library of Congress, and is being developed as an initiative of the Digital Library Federation.
METS, a Digital Library Federation initiative, attempts to provide an XML document format for encoding metadata necessary for both management of digital library objects within a repository and exchange of such objects between repositories (or between repositories and their users). Depending on its use, a METS document could be used in the role of Submission Information Package (SIP), Archival Information Package (AIP), or Dissemination Information Package (DIP) within the Open Archival Information System (OAIS) Reference Model.
A METS document consists of seven major sections:
  1. METS Header (metadata describing the METS document itself)
  2. Descriptive Metadata (may contain external, like MARC, or internally embedded descriptive metadata, or both)
  3. Administrative Metadata (provides information regarding how the files were created and stored, intellectual property rights, etc.; may be either external to the METS document, or encoded internally.)
  4. File Section (lists all files containing content which comprise the electronic versions of the digital object.)
  5. Structural Map (outlines a hierarchical structure for the digital library object, and links the elements of that structure to content files and metadata that pertain to each element.)
  6. Structural Links (allows METS creators to record the existence of hyperlinks between nodes in the hierarchy outlined in the Structural Map.)
  7. Behavior (this section can be used to associate executable behaviors with content in the METS object.)

MODS - http://www.loc.gov/standards/mods/

Metadata Object Description Schema (MODS) is a schema for a bibliographic element set that may be used for a variety of purposes, and particularly for library applications. The standard is maintained by the Network Development and MARC Standards Office of the Library of Congress with input from users. MODS is expressed using the XML (Extensible Markup Language) schema language. MODS was designed as a compromise between the complexity of the MARC format used by libraries and the extreme simplicity of Dublin Core metadata.

Best Practice and Implementation

The METS profile was chosen because it supplies all structural and descriptive information needed for BHL. It was also chosen by other projects as described here

Fedora - http://www.fedora-commons.org

Open source software providing a service-oriented architecture for managing and delivering their digital content. Fedora is a integrated, repository system that enables storage, access and management for virtually any kind of digital content. Supports METS ingest.

DSPACE - http://dspace.org

A digital repository system that captures, stores, indexes, preserves, and distributes digital research material. Supports METS for ingest and export, as well as METS AIPS.

Table of Contents

BHL Europe SIP
Standards
MARCXML - http://www.loc.gov/standards/marcxml/
METS Profile - http://www.loc.gov/standards/mets/
MODS - http://www.loc.gov/standards/mods/
Best Practice and Implementation
Fedora - http://www.fedora-commons.org
DSPACE - @http://dspace.org
METS PROFILE DESCRIPTION
URI
Title
Abstract
Creation Date
Contact Information
Related Profile
Extension Schema
MARCXML: The MARC 21 XML Schema
DjVuXML-s
Rules of Description
MARCXML
DjVuXML
Controlled Vocabularies
Structural Requirements
mets Root Element
mets Header
mets Descriptive Meta Data
mets Administrative Meta Data
mets File Section
mets Structural Map
mets Structural Link
mets Behaviour
Technical Requirements of Content, Behavior and Metadata Files
Images
Tools and Applications
Examples

METS PROFILE DESCRIPTION

URI

[not registered yet] http://www.bhleurope.eu/ingest/sip/20091005.xml

Title

BHL Europe METS Document Profile for Submission Information Package (SIP)

Abstract

BHL-Europe is building an Open Archival Informaion System (OAIS) compliant web application. This implementation has the need for a Submission Information Package (SIP) for the ingest module. METS profiles can contain all information needed for the Use Cases described by the consortium.
Due to various pre-ingest tool providers and the need for a standard, this profile is intended to provide all information needed by the content providers to get the most features out of BHL-Europe's web application for their items.
This profiles describes the SIP and may also be used as Dissemination Information Packages (DIPs) or Archival Information Packages (AIPs).

Creation Date

2009-10-05 --:-- CET

Contact Information


This document was prepared with the assistance of:

Related Profile

No related profiles

Extension Schema

The following schemas will be used for BHLE SIP

MARCXML: The MARC 21 XML Schema


DjVuXML-s


Rules of Description

MARCXML

The following fields are required for BHL-Items as these describe the BHL Deduplication Subset (BDS):

DjVuXML

The OCR output of DjVuXML will be used for highlighting words in the bookreader and as source for fulltext search. Therefore, the DjVuXML has to provide word coordinates.

Controlled Vocabularies


Structural Requirements

mets Root Element

  1. Each SIP describes one and only one Record that will show up in BHL-Europe (=BHL-Item). In case you need to describe multiple items for series, create one SIP per sub item and link them within the logical structure map to a compound item.
  2. Each SIP must have a mets:objid with an URI like oais:<INSTITUTION>:<INTERNALIDENTIFIER>. These identifiers will be used for deduplication and will be replaced by a BHL internal URI during ingest. The original mets:objid will be moved to the mets:metsHdr.

mets Header

-

mets Descriptive Meta Data

  1. Each SIP must have one mets:dmdSec/(mets:dmWrap|mets:dmRef)[@type="MARC" and @label="marcxml"] node. These information will be used for indexing the metadata. If multiple nodes of this type are present, then the first or the one with the status current will be used.
  2. For fulltext search and GnuBook highlighting a mets:dmdSec/(mets:dmWrap|mets:dmRef)[@type="OTHER" and @use="djvuxml"] must be available. If multiple nodes of this type are present, then the first or the one with the status current will be used.

mets Administrative Meta Data

  1. TODO: describe rights and image preservation metadata

mets File Section

  1. For GnuBook support a mets:fileSec/mets:fileGrp with binary or reference data of images (tiff, jpg, jp2) must be available. Depending on the use attribute, master, reference and thumbnail mets:fileGrp will be generated.
  2. For preservation a mets:fileSec/mets:fileGrp[@use="master"] with binary or reference data of images (tiff) must be available.

mets Structural Map

  1. For page numbers and page names support a mets:structMap[@type="physical"] must be supplied with a root div containing all leafs. The containing leafs must be named by the orderlabel attribute and put in sequence with the order attribute starting at 0.
  2. For logical item structure navigation support a mets:structMap[@type="logical"] must be supplied.
  3. The mets:structMap types should use one of the following values: frontcover, halftitlepage, titlepage, imprint, dedication, inspiration, foreword, preface, toc, lot, lof, introduction, chapter, part, afterword, bibliography, references, appendix, glossary, index, colophon, promotion, backcover

mets Structural Link

-

mets Behaviour

  1. [DRAFT] The behavioural elements are used to do automated image transformation
  2. [DRAFT] The behavioural elements are used to control incremental updates only

Technical Requirements of Content, Behavior and Metadata Files

Images

For the mets:fileGrp the following file types are accepted:

Tools and Applications


Examples