BHL
Archive
This is a read-only archive of the BHL Staff Wiki as it appeared on Sept 21, 2018. This archive is searchable using the search box on the left, but the search may be limited in the results it can provide.

BHL Harvesting, IA Updating Dilemma

Problem Statement:

We have reason to suspect there may be a disconnect between BHL's harvesting methodology and IA's page insertion or file updating practice. Please see our specific concerns and examples outlined below.


Below is a list of specific areas of concerns regarding IA scan updating and BHL harvesting that we have encountered:



Below is a list of concerns we have regarding any changes that might be made to the harvesting process in order to rectify the above issues:



Examples:




This book was first scanned on April 21 and sent back on May 14, 2009 due to a missing page. The pages were inserted (or rescanned?) on June 13.

According to our notes, the original scan was missing plate VII (recto) and the explanation for plate VII (verso of plate t.p.). It was sent back for page insertion.
From the meta.xml file:
<updatedate>2009-04-21 17:46:10</updatedate>
<curation>[curator]dorothy@archive.org[/curator][date]20090502015434[/date][state]approved[/state]</curation>
HOWEVER, the scandata.xml, jp2.tar abbyy, djvu, and pdf files are all dated from June. It seems like they did not correctly update the meta.xml file.

Here is the inserted page seen in IA, (turn the page - note that there is no t.p. for plate VIII):
http://www.archive.org/stream/diecrustaceendes00hell#page/n379/mode/2up

Here is where that page should be in the book in BHL, but isn’t (then keep going - note there is a t.p. for plate VIII):
http://biodiversitylibrary.org/page/13059124
This book was first scanned on March 30th, and was sent back because of a missing page. The page was inserted May 21.
In meta.xml we see:
<scandate>20090521024433</scandate>
<imagecount>158</imagecount>
<curation>[curator]dorothy@archive.org[/curator][date]20090527201134[/date][state]approved[/state][comment]199[/comment]</curation>
<updatedate>2009-05-20 22:57:20</updatedate>
<updater>Quinnisha</updater>
<missingpages/>
<repub_state>4</repub_state>

You can see that Page 34 was properly inserted in IA:
http://www.archive.org/details/essaideclassific31901labo
However, in BHL, while the inserted pages do display, the pagination is now incorrectly associated with the scans:
http://biodiversitylibrary.org/page/12983647

All of these examples had page insertions after the initial scan