Content _Management Harmonisation
Data Harmonisation
This page will hold notes on harmonisation processes and related information
Spreadsheet of upload review:
upload_status_20120213.xls
Harmonisation Status by Provider
BNF
|
substantial harmonisation needed
|
all deliveries received, disk ready to return
|
CSIC
|
serials and monographs good to go (one needing metadata); PI completed
|
uploading constantly
|
DILIBRI
|
xml fixed and page data extracted from mets; ready to ingest (addendum - some metadata still has other issues to resolve by the look of things)
|
|
GFBS
|
metadata harmonisation/transformation needed
|
no more content upload during project lifetime, next 01/2013
|
MNHM
|
needs to be harmonised
|
|
LANDOOE
|
ingest working; PI completed
|
upload completed
|
MFN
|
no content to review
|
content upload during April
|
MIZPAS
|
review in progress [Tobias]
|
more content for upload after feedback
|
HNHM
|
pdf articles with article level marxcml; layout looks good; still need volume/serial level metadata (metadata only with articles)
|
|
MSN
|
will aggregated by UGOE
|
|
NAT
|
structure good, (one or two items need xml) otherwise ready to ingest
|
marcxml edits to be added NAT side for upload of further content
|
NBGB
|
good to go (uploading)
|
|
NHM
|
no content to review (submission via IA/BHL global)
|
|
NHMW
|
no content to review
|
no CP
|
NMP
|
no page level metadata, otherwise good to ingest
|
upload completed according to D2.7
|
PAS
|
review in progress [Tobias]
|
more content for upload after feedback
|
RBGE
|
Notes/FoB serials ready to ingest
|
upload completed
|
RBINS
|
review in progress
|
more content for upload after feedback
|
RENNES1
|
test ingest in progress; PI completed
|
more content for upload after feedback
|
RMCA
|
new content needs to be harmonised; old monographs good; old serials need PI support - reorganising old serials for test
|
more content for upload after feedback
|
UBBI
|
no content to review
|
will upload same content as already in Europeana
|
UBER
|
ready for test ingest
|
upload completed
|
UBFrankfurt
|
test ingest in progress
|
uploading constantly, feedback needed
|
UCPH
|
all content serial; some missing metadata - Svampe throwing xml issues; BotaniskTidsskrift structure good (inserted section) -testing this,Friesia also good to test
|
|
UGOE
|
tif data with mets metadata; (transform xml to mods & extract structural data for filenames to ingest)
|
|
UHViikki
|
removed initial underscore in filename identifiers (eg 300014_16_0010_page_2.tif -> 300014-16_0010_page_2.tif) and converted some marcxml to .mrc to match metadata inside serial volumes. Content should be good to ingest, need test clean on integration to confirm
|
uploading constantly, feedback needed
|
Report template for git issues
for examples please see:
```
/mnt/nfs-demeter/upload/providers/xxx
```
.... folders .... in
```
/mnt/nfs-demeter/upload/providers/xxx/yyy/
```
## Actions to take:
1. NONE
1. ...
## Summary:
* Bibliographic level:
* folder names & folder structure:
* series level:
* volume level:
* item level:
* file names:
* InternalIdentifier:
* FileSequenceNumber:
* PageType:
* PrintedPageNumber:
* medatada available:
* metadata in accepted format: which?
Note on the article level folder structure:
(email of wolfgang 28.2.2012)
1. there is a naming convention for folder names when using article level metadata (in order to maintain the correct order of articles).
2. Creating and referencing that metadata within OLEF is no problem.
However storing this structure on the filesystem is somehow difficult.
I've discussed this with Heimo and we came to the conclusion that using a continuous sequence for the whole volume would be the best solution.
We could then identify same pages if they have the same sequence number.
e.g.:
SerialFolder/
-- VolumeFolder/
---- 12345_0001_index.tif
---- 12345_0002/
------ 12345_0002_page_1.tif
------ 12345_0003_page_2.tif
------ 12345_0004_page_3.tif
------ 12345_0005_page_4.tif
------ 12345_0006_page_5.tif
------ 12345_0007_page_6.tif
---- 12345_0009/
------ 12345_0007_page_6.tif
------ 12345_0009_page_7.tif
------ 12345_0010_page_8.tif
------ 12345_0011_page_9.tif
---- 12345_0010_imprint.tif
...
This way we could even create references to content which belongs to an article but e.g. is in the appendix.
So to summarize: the sequence number for files has to be continuous and may be reused for same pages with same content (which means files are copied & duplicated). Folder sequence just have to fit the ordering on top-level, which means the sequence of folder names may be re-used for files residing in a sub-folder.
MARC metadata quick reference
Bibliographic level in MARC:
http://www.loc.gov/marc/bibliographic/lite/lbdleader.html
OLEF field description | Marc 21 field tag
-------------------------------------|-----------------------
Titles 200-249
Title of series
Title of monograph
Title of volume
Title of article
Authors of volume 100
Authors of monograph 100
Author(s) 100
Place of publication 260 subfield $a
Publisher 260 subfield $b
Date of publication 260 subfield $c
ISBN (if available) 020 subfield $a
ISSN (if available) 022 subfield $a
Language of pub. 008/35 - 37 (position 35 - 37) iso code
546 subfield $a (ISO 639-2b 8 )
Keywords 600 - 699 (Subject)
Abstract text 520
Additional new page types from University of Rennes 1
PageType_eng_fre.xls
Metadata database dump from PAS
tl_issues.zip
In red are those providers where we stated in the content analysis report: interconnection successful. These should be ready and ingested for the launch.
Account
|
PI Status
|
Ingest Status
|
mehrrath
|
empty
|
no action
|
nhmw
|
empty
|
no action
|
test
|
overview of all items in PI
|
no action
|
admin
|
empty
|
no action
|
nbgb
|
issues with some items
|
ingest complete, some issues
|
fr-rennes1
|
metadata updated
|
ingest partly complete, some issues
|
uk-rbge
|
some issues
|
|
fi-uhviikki
|
account not working
|
|
de-ubfrankfurt
|
empty, tests in progress
|
|
de-dilibri
|
issues with some items
|
ingest complete
|
at-landooe
|
ready for ingest
|
ingest complete, few issues
|
be-rmca
|
issues with some items
|
monographs complete, some issues
|
bhl-us
|
ready for ingest
|
|
es-csic
|
|
ingest complete, some issues
|
cz-nmp
|
|
ingest complete
|
de-mfn
|
issues with some items
|
ingest almost complete, some issues
|
nl-nat
|
Tom to do more work, metadata issues
|
tests in progress
|
pl-pas
|
empty
|
|
pl-mizpas
|
|
ingest complete
|
bhl-us-a
|
ready for ingest
|
|
bhl-us-b
|
ready for ingest
|
|
bhl-us-c
|
ready for ingest
|
|
bhl-us-d
|
ready for ingest
|
|
bhl-us-e
|
ready for ingest
|
|
bhl-us-f
|
many items still running PI steps
|
|
bhl-us-g
|
many items still running PI steps
|
|
bhl-us-h
|
ready for ingest
|
ingest of many PI items complete
|
bhl-us-i
|
many items still running PI steps
|
ingest of PI items complete
|
bhl-us-j
|
ready for ingest
|
ingest of PI items complete
|
dk-ucph
|
tests on int in progress
|
|
?hnhm
|
|
|
?rbins
|
|
|
?bnf
|
|
|
?mnhn
|
|
|
?uber
|
|
|
?ub-bielefeld
|
|
|
?gfbs
|
|
|
de-ugoe
|
|
~430 books ingested
|
|
|
|