Pagination Issues
A place to document various pagination problem use cases.
MOBOT Pagination Issue Documentation
We’ve recently been discovering some semi-widespread pagination problems for MOBOT titles in BHL. Unfortunately, these problems are side-effects of the recent change in workflow that delivers MOBOT scans to the Internet Archive (IA). There doesn’t seem to be an easy fix for these problems, and there’s no telling when they’ll be corrected. In good news, the scans and pagination for these titles can be corrected in Botanicus by Mike Blomberg, so we have at least one complete and correct copy online.
These problems can include pagination mistakes as well as missing pages.
Here’s an example of a pagination mistake (pages labeled wrong):
Botanicus:
http://www.botanicus.org/page/1161989 this is page 99 of Bulletin of Miscellaneous Information (RBG, Kew). If you look at the pagination bar on the left, the pagination is correct. Page 99 and surrounding pages are correctly marked. An extra page at page 102 has been inserted by mistake (this is a problem that Mike Blomberg will be able to fix), but the pagination for the rest of the volume is correct.
BHL:
http://biodiversitylibrary.org/page/11831638 this is page 99 of the same title in BHL. If you look at the pagination bar on the left, you can see that the first page 99 points to an illustration, and not the actual page 99. Page 100 is also incorrectly marked. Then, the pagination repeats (pg 99 and 100) for the correct pages. But page 102 repeats, and the pagination is off by one page throughout the rest of the volume.
Here’s an example of missing pages:
Botanicus:
http://www.botanicus.org/page/1212359 pg 373 of vs.9-10 of Notiser ur Sallskapets….
BHL:
http://www.biodiversitylibrary.org/page/13328758 the same page in BHL-if you look at the pagination bar on the left, it looks like the pagination is correct. However, looking at the actual page, you’ll see that the page number is off by two pages due to a problem earlier in the text. If you scroll down from page 373 to look at the next page, you’ll see a series of dots-these dots indicate missing pages. The next page that appears is page 62 of the next volume. Anytime you see those dots when scrolling through a volume on BHL, they indicate missing pages.
MIke Lichtenberg's explanation of what's happening:
"Newly scanned Botanicus items do not get uploaded to IA until after they are marked “Pagination Complete”. (FYI, this was not always the case…
for a number of months things were uploaded as soon as they were scanned, rather than after the pagination process was completed… I’m pretty sure that’s why we have so many things out of sync on BHL.)
The problem is that there's no "before" and "after" picture of the data that can be obtained from IA. The data is as it is TODAY, and what it was yesterday is forgotten. In addition, there is no unique identifier or file name or anything assigned to a page at IA. All of that combined means that even by cross-referencing our data with what's at IA, there's no automated way of determining what changed (and therefore no way to automatically fix our pagination).
Normally, in such a situation, I would advocate for simply deleting the page data from BHL and replacing it with the new data from IA. However, there are two problems with that: 1) any detailed pagination that had been added to the item in BHL would be lost, and 2) it has been mandated to me (many times) that we need to preserve any Page IDs that we assign (so I can't just drop existing BHL records).
All of this combined means that fixing these becomes a very manual, very slow, very complicated process. I have to work directly in the database to reassign the database records correctly.
Since we now wait for pagination to be completed before uploading anything, any changes made to an item before that initial upload will show up everywhere.
Once pagination is complete and an item has been uploaded to IA, changes in Botanicus will be reflected at IA, but not BHL. The reason for this is a long story, related to my earlier explanation about why page inserts cause things to get out of sync, but the important thing to know is that after pagination is complete any changes need to be made in both Botanicus and BHL."
A workflow has been set in place involving creating Gemini issues for pagination problems. Initially Michelle will be attached to the issue to determine if it's something that can be handled internally or if Mike Lichtenberg needs to be added.