Page Level Metadata for Ingesting Content
Back to
BHL Metadata
See also
Uploading to IA
Page Level Metadata for Ingesting Content
General Info:
Library Specific Metadata Information:
Mappings to the page types used BHL portal (Mike L.):
- Contents = Table of Contents
- Copyright = currently no exact mapping in BHL, we're just using Text
- Cover = Cover
- Foldout = Foldout
- Normal = Text
- Title = Title Page
- Title Page = Title Page
- Rarely used:
- Illustrations = Text (list of illustrations, not a picture)
- Index = Index (3 instances in BHL, another 400 or so in the CDL stuff)
XSD Tags used by Walter Koch in BHL-E test (Walter K.)
- AIT XSD for ingest: scandata.xsd
- What I have used are the tags: <pageType>, <pageNumber> und <handSide>
- If a page shouldn't be shown in the reader the element has to be set to: <pageType>Delete</pageType><addToAccessFormats>false</addToAccessFormats>
- <handSide> simple changes between "LEFT" und "RIGHT"
- <pageType> kann have different values:"Cover", "Normal", "Title", "Contents", "Chapter", "Index", "Illustrations", "Blank Tissue", "Color Card", "White Card", "Copyright", "Publish Note", "Delete"
- Online Reader starts with a page of type: "Title".
- <pageNumber> has to be a unique string within the whole XML object, otherwise the value will be ignored.
[contra example from Mike Lichtenberg: I have to contradict the comment about the page numbers needing to be unique within the whole of the XML. This example shows that is not the case: http://www.archive.org/download/mmoiressurlesq05cour/mmoiressurlesq05cour_scandata.xml. Notice that this is my earlier example of a book that contains multiple volumes (each volume is paginated separately). It is also my example of a scandata.xml that contains the extra "pageNumData" section (so I think that this gives us a hint of what that section is for).
New Archive Uploader Beta notes from Keri or, using the manual uploader, which you shouldn't be doing.
my test here: http://www.archive.org/details/gemeinnzzigena01borow_303
To start: click on "Share" button on archive.org.
for Title, do NOT put in the actual title, put in the identifier (either made up IA identifier, or barcode)
fill in rest of fields - keyword = subject (MARC 650)
hit upload & it will create the directory using the id you put in "Title" and will let you upload/edit
i uploaded marc.xml, scandata.xml and meta.xml (failed)
you can NOT overwrite meta.xml file at this stage (you will need to update this file, since it has the identifier as the title!)
after uploading everything, it is non-obvious but you have to go to the ITEM MANAGER to derive your files. link is at the top of the metadata editing form
in Item Manager, derive the files by putting * in the remove_derive form, then hit the derive button. alternately, it may work if you just hit the derive button.
to check to see if your job is running, go down to [show queued processes]
Issues: upload of 2.7GB zip took almost 3 hours (second try) middle of the day, though.
you still can't overlay the meta_xml after publishing the files, you have to manually edit the metadata using their form, and you must edit to change the title, sponsor, etc.
not sure where to put item info in the metadata editing page. possibly add custom field "volume" - first tried putting it in "coverage" which is wrong. their form names don"t match the xml tags 1 to 1.