Archival_Storage
Archival Storage Ingest Script
The ingestSIP.php script browses the SIP folder and ingest new digital object to Fedora.
You must be sure that when you add new object, the name doesn't start with a . (dot) or INGESTED_BOOK_.
When it finishes adding a object, it renames the object folder to INGESTED_BOOK_old_book_folder.
Requirements
- PHP : This script is a php script and it need php installed to be launched.
- Tesseract OCR (optional) : We use this OCR engine to generate pages OCR.
- kakadu : We need kakadu to convert some image (tiff to jp2000 for example), create thumbnail, ...
Configuration
Before running this script you must set up 3 global parameter in this file (ingestSIP.php) :
- BHL_NAMESPACE : Used to create the book pid.
- BHL_SIP_DIRECTORY : The folder where the script will find books.
- LOOP_INTERVAL : Interval time that the script sleep before checking for new book.
Run It
To launch the script, just run this command : "php ingestSIP.php".
SIP Hierarchy
On each SIP we must use some structure, for example :
- a book named OriginOfSpecies must be on a folder named also OriginOfSpecies.
- in this folder we must have a folder named pages witch will contain the pages images (jp2 or tiff).
TODO :
Finish the script to consider the taxonomy datastreams and other needed datastreams.
Low Level Storage based on Akubra
The strategy of low level storage depends on DataStream IDs, which is declared in ${FEDORA_HOME}/server/conf/akubra-llstore.xml.
After Fedora Commons 3.4, Akubra is used as the default low level storage framework, which is configured in ${FEDORA_HOME}/server/conf/fedora.fcfg).
How to compile
- Download the maven project from BHL-E Github;
- Run "mvn package".
How to install
- Purge all objects in Fedora Repository, then shut down the server;
- Place akubra-mux-0.3.jar and bhle-llstore-0.0.1.jar in ${FEDORA_HOME}/tomcat/webapps/fedora/WEB-INF/lib;
- Replace akubra-llstore.xml with the one in the install package, and modify the store paths and DataStream IDs according to your needs;
- Restart the server.
How it works
A subclass of org.akubraproject.mux.AbstractMuxConnection overrides the getStore method to provide BlobStore according to the keywords of DataStream IDs in akubra-llstore.xml. And the filesystem storage is reused from akubra-fs (simple filesystem implementatio) and akubra-map (wraps an existing BlobStore to provide a blob id mapping layer) without any modification. Therefore, all the path mappings for objects and datastreams are still based on MD5 mapping.