Archive
This is a read-only archive of the BHL Staff Wiki as it appeared on Sept 21, 2018. This archive is searchable using the search box on the left, but the search may be limited in the results it can provide.
Communications
Meetings & Calls: Minutes & Agendas
2018 Annual Meeting
Listservs
Admin & Governance
Governance
Administration
ByLaws
Financials
Calendar
BHL Consortium
Participating Institutions
Global BHL
Other BHL Partners
Committees
BHL & You! Report on Your:
Presentations
Grant Proposals
General Documentation
BHL Reports & Statistics
Logins & Tools
Collections & Digitization
Help & Documentation
Collection Development Policy
Collections Committee
Cataloging
Deduplication Tools
Member Digitization Docs
Macaw
Gemini Issue Tracker
Metadata
Copyright Issues
Publicity & Outreach
BHL Outreach
BHL Related Meetings
Testimonials
Accomplishments!
3rd Party BHL Mashups
Technical Topics
Code & Tech Docs
Impact of Tech Projects
Tech Advisory Group (TAG)
Purposeful Gaming
BHL v.2.0 (coming soon!)
Projects
Smithsonian Field Book Project
BHL Field Notes Project
Expanding Access
NDSR
BHL Portal
To ingest materials from IA, need to know:
1. What has been scanned?
Is
RSS feed for biodiversity collection
sufficient & scalable?
Example:
Bulletin of the Natural History Museum (Volume 1)
http://www.archive.org/details/bulletinofnatura01entolond
ftp://ia340917.us.archive.org/1/items/bulletinofnatura01entolond
For any scanned volume, need to know:
1. What is its identifier?
bulletinofnatura01entolond
-source: RSS
2. When was it scanned?
2007-03-29 02:27:49
-source: in
ftp://ia340917.us.archive.org/1/items/bulletinofnatura01entolond/bulletinofnatura01entolond_meta.xml
3. What server is it on?
ia340917
-source: not in RSS? not sure how to get.
4. Where are its pages?
Low res JPG:
http://ia340917.us.archive.org/zipview.php?zip=/1/items/bulletinofnatura01entolond/bulletinofnatura01entolond_flippy.zip
JP2:
http://ia340917.us.archive.org/zipview.php?zip=/1/items/bulletinofnatura01entolond/bulletinofnatura01entolond_jp2.zip
4.5. What are their page numbers?
http://ia340917.us.archive.org/zipview.php?zip=/1/items/bulletinofnatura01entolond/scandata.zip&file=scandata.xml
4.6. Which page is the title page?
<bookplateleaf>
ftp://ia340916.us.archive.org/1/items/bulletinofnatura01entolond/bulletinofnatura01entolond_meta.xml
5. What title does it belong to?
<call_number> in _meta.xml
If no <call_number>, then in zquery.
zquery:
ftp://ia340916.us.archive.org/1/items/bulletinofnatura01entolond/bulletinofnatura01entolond_metasource.xml
245a from:
ftp://ia340916.us.archive.org/1/items/bulletinofnatura01entolond/bulletinofnatura01entolond_marc.xml
<title> from:
ftp://ia340916.us.archive.org/1/items/bulletinofnatura01entolond/bulletinofnatura01entolond_meta.xml
6. Where is its PDF?
ftp://ia340917.us.archive.org/1/items/bulletinofnatura01entolond/bulletinofnatura01entolond.pdf
7. Where is its MARCXML?
ftp://ia340917.us.archive.org/1/items/bulletinofnatura01entolond/bulletinofnatura01entolond_marc.xml
8. Where is the OCR?
Page-level?
ftp://ia340916.us.archive.org/1/items/bulletinofnatura01entolond/bulletinofnatura01entolond_djvu.xml
Entire volume?
ftp://ia340917.us.archive.org/1/items/bulletinofnatura01entolond/bulletinofnatura01entolond_djvu.txt
9. What institution does it belong to?
<contributor>
ftp://ia340916.us.archive.org/1/items/bulletinofnatura01entolond/bulletinofnatura01entolond_meta.xml
10. Who sponsored scanning?
<sponsor>
ftp://ia340916.us.archive.org/1/items/bulletinofnatura01entolond/bulletinofnatura01entolond_meta.xml
10.5. Where was it scanned?
<scanningcenter>
ftp://ia340916.us.archive.org/1/items/bulletinofnatura01entolond/bulletinofnatura01entolond_meta.xml
11. How was it described as a volume by IA scanners?
<volume>
ftp://ia340916.us.archive.org/1/items/bulletinofnatura01entolond/bulletinofnatura01entolond_meta.xml
12. What articles does it contain?
TBD
13. When was it added to BHL portal?
BHL responsibility
Concerns:
Will zipview.php change? Should we be using something else/another way to get these behind the scenes bits of data for ingestion?
How do we know server (ia340917) & cluster (/1/) ? Wrong terminology?
Should we copy OCR, PDF, JPG, and/or JP2 local to BHL portal?
Persistence
Stability
Scalability
Concerns from IA in hitting files behind the scenes?