BHL
Archive
This is a read-only archive of the BHL Staff Wiki as it appeared on Sept 21, 2018. This archive is searchable using the search box on the left, but the search may be limited in the results it can provide.

Articlizing_08_29_08

August 29, 2008
Connie, Joe, Tom, Martin and Suzanne

Jim H. is ready to give funds to MCZ to articlize their materials.
Connie suggested generalized approach that could be beneficial than just subset of MCZ publications.

IA will be hiring Xionan at the end of September to continue work on an automated metadata extractions (articlizing). Linda F. is the contact at IA. The software is loaded at IA and ready to be test. Someone needs to review, evaluate and provide feed back to IA. That is the missing piece. Money from MCZ could go to a contractor or staff person to be that person.

Smithsonian also has some money to deliver some specific publications from a grant and money is available to apply to articlize. SIL’s deliverable is 100% articlizer. A vendor will be used to get that level of accuracy. The results can be given to BHL for ingesting if Chris wants and to Xionana and IA as a training tool.

Former work on the Xionana tool required a lot of time for staff. The results from the algorithm were difficult to read. IA had a PDF problem. The information requested for returning after proofing was overwhelming. No one is sure about the current state of the system at IA now.

NLM DTD should be the delivered format. Formerly, it was not used. We hope Linda can work out the NLM DTD requirement.

Smithsonian project vision: - The index to the tile will be rekeyed in citation format. (Do we know if NLM DTD has all the fields for citation?) Vendor will take scanned images and using article database from the index, indicate start stop and file name for linked article. That database will be delivered for SIL use. BHL can use it as well if Chris wants. The vendor will take large PDF deliverable from IA and separate into the articles with proper file names and link to article citation database.

A goal of BHL is to “chunk” PDFs for downloadable article level.

Critical is creating a citation and resolving a citation are issues. NLM DTD has other information that would be great, but our focus currently is to at least get this simple function. The start and stop elements are critical for the chunk deliverable.

Jonathan L. and Jim H. and others want to download and reuse the article.

Xionana deliverable won’t be exactly what Jim wants. Connie wants to create the item that Jonathan and Jim want in a way that would work for BHL as a whole.

Harvard does have a spread sheet that is a start of the "index" that can be used. About 1400 items.

Next steps is to bring Chris into the conversation and set up a time to talk with Linda.