BHL
Archive
This is a read-only archive of the BHL Staff Wiki as it appeared on Sept 21, 2018. This archive is searchable using the search box on the left, but the search may be limited in the results it can provide.

Transcriptions Task Group

printer friendly


How to include Transcriptions into BHL?


There are several projects generating text for BHL that improve the OCR text and we need a suggestion on the best way to integrate these new texts into the BHL Corpus.

The main topics to wrap our discussion around are:
  1. What to do with current (Qtr.4,2015) existing transcriptions of BHL pages, documents and notes that several different initiatives have produced? and
  2. How to proceed in the future with new transcriptions?

A group of members from BHL Tech, Admin and Staff was charged with the objective of discussing the topic of incorporating transcriptions in BHL through Conference Calls and email exchange and come up with recomendations and decisions on our next steps.

After an initial meeting on September 8th, 2015, Mike L., BHL's developer was asked to review the process suggested and come up with questions on clarifications needed for an appropriate implementation. A bigger group of colleagues was convened for a follow up meeting, including Technical Advisory Group (TAG) members and Harvard, BHL-Australia and Smithsonian's representatives, as they all had actual experience from current and active projects transcribing BHL contents. Finally, a longer exchange of emails clarified the remaining inquiries and followed up on the recommendations and standing action items. The topic was later discussed in (extended) Technical Group meetings, but the actual implementation of the modifications to BHL code was delayed in lieu of higher priority activities like the move of the BHL Portal from Missouri Botanical Garden to Smithsonian, which took most of the time available from our developer and other TAG members past beyond the end of the year (2015).

Below, you may find the documentation (in reverse order from the most recent to the older) gathered from the discussions and decisions in the Meetings and a copy of the email exchange among different members of the group on the topic.


William Ulate
BHL Technical Director

Table of Contents

How to include Transcriptions into BHL?
Task Group Members:
3rd Transcription Group Meeting - BHL Australia + SMI Transcription Center proposals
2nd Transcription Group Meeting - Thoughts and Questions
Skype call for Discussing Inclusion of Transcriptions into BHL
March 2016


Task Group Members:


Trish Rose-Sandler (MBG); Mike Lichtenberg (MBG); Sheffield, Carolyn (SIL); Crowley, Bianca (SIL); Joseph deVeer (Harvard); Kalfatovic, Martin (SIL); Richard, Joel M (SIL); Kearney, Nicole (BHL-AU); Parilla, Lesley (SIL); Julia Blase (SIL)



From: Kalfatovic, Martin <KalfatovicM@si.edu>
Sent: Thursday, November 5, 2015 11:17 AM
To: 'Kearney, Nicole'; William Ulate
Subject: RE: Field book transcriptions on BHL?
Nicole, yes, what Julia said was correct, but it was a special case (the transcript was done years ago and already existed as a print item, so in some ways it was a supplemental volume to the field book).
It can be considered a separate entity; the problem with adding the transcript is that it becomes another item in BHL (e.g. BHL can’t consider it a manifestation of the original for technical reasons).
Lesley Parilla is going to load a few so we can experiment with what the ramifications are for this (e.g. does it get a DOI?) but I don’t want to go into production on this.

Cheers,

Martin------------------------------------------------------------------
Martin R. Kalfatovic
Associate Director, Digital Services Division || Program Director, Biodiversity Heritage Library
Smithsonian Libraries
10th Street & Constitution Ave., NW | Room 29E
MRC 154 PO Box 37012
Washington, DC 20013-7012
email: kalfatovicm@si.edu
tel: 202.633.1705 | Skype: martin.kalfatovic | VIAF ID: 32094717 (Personal) | ORCID ID: 0000-0002-4563-4627 (Personal)



From: Kearney, Nicole [mailto:nkearney@museum.vic.gov.au]

Sent: Wednesday, November 04, 2015 10:41 PM

To: William Ulate <william.ulate@mobot.org>; Kalfatovic, Martin <KalfatovicM@si.edu>

Subject: FW: Field book transcriptions on BHL?
Hi William and Martin,
Below is my correspondence with Julia (from Jan/Feb) about uploading field books and their transcriptions onto BHL.
Cheers, Nicole
Nicole Kearney
Coordinator | Biodiversity Heritage Library
Digital & Emerging Technologies, Museum Victoria
PO Box 666, Melbourne VIC 3001
61 3 8341 7779



From: Kearney, Nicole
Sent: Monday, 2 February 2015 12:07 PM
Subject: RE: Field book transcriptions on BHL?
Hi Julia,

Thank you for all this information and advice. I have read through the documents you sent and looked at the example on BHL. I have discussed these with the rest of the team here at Museum Victoria and with our colleagues at the Australian Museum.

We can certainly upload the original and the transcript as separate documents and then associate them as you’ve described (we will follow this route for now). However, it would be great to be able to view the original and the transcript side by side (as you can on the DigiVol portal and your transcription centre).

An ideal place for the transcription would be in the space occupied by the OCR field in BHL (in the right hand column), with the option of showing or hiding this field (as you can with the OCR). The transcribed text takes up much less space than the handwritten text so it would fit nicely in this side column. We have been very careful to maintain the original formatting in our transcription of the field diaries (line breaks and page breaks). This means that we could ensure that each page of transcript matches each page of handwritten text.

We would definitely like to continue this discussion and would support any changes to BHL that would facilitate the concurrent viewing of handwritten material and its transcribed content.

Thanks again, Nicole

P.S. We have scanned our field diaries as individual pages, not as double page spreads as specified in your guidelines. This is our standard procedure for all our scanning for BHL. We were therefore planning on uploading the field guides into MACAW as separate pages. Will this be a problem or is there a specific reason why you scan yours as double pages?

Nicole Kearney (Mon & Thurs)
Project Coordinator | Biodiversity Heritage Library
Digital & Emerging Technologies, Museum Victoria
PO Box 666, Melbourne VIC 3001
61 3 8341 7779



From: Blase, Julia [mailto:BlaseJ@si.edu]
Sent: Wednesday, 21 January 2015 2:30 AM
To: Kearney, Nicole
Subject: RE: Field book transcriptions on BHL?
Hi Nicole,

That is great news. Our staff also enjoys going through field book transcripts and getting to know the scientists all over again!
To answer the first part of the question, regarding uploading digitized field books into BHL – yes, they do need a MARC record to go through Macaw. However, as we do not create MARC records for our items either, we instead use a MARC metadata map for the item/title level metadata (title, author, date, abstract, series, etc.). We place the metadata into a CSV under the appropriate MARC headings and upload that CSV to Macaw.

After we do so, we and upload the digitized high-res page images to each item-level Macaw record. Then, we create the page level metadata (page numbering, page type such as cover, text, illustration) in Macaw, and send the item off to BHL once it is reviewed complete. Though the process involves those few extra steps, it has worked for us so far: http://biodiversitylibrary.org/browse/collection/smithsonianfieldbookcollection.

I have attached to this email a draft of our guidelines for importing field books to BHL via Macaw, with the metadata mapping and screenshots, if that clarifies the process at all. I have also attached a sample title-level (also known as the item level) CSV. You can see the element information below the relevant MARC column title. Please let me know if you have questions, or if it would be helpful to schedule a webinar where you can watch us walk through this process once or twice.

In terms of uploading the items with their transcripts…that might be a wider BHL partner question. The short answer is, yes, we have done so, and we can certainly show you how we did it, and help you learn to do the same. The transcripts in our test cases were actually completed by the original author’s assistant when he retired. See an example here: http://www.biodiversitylibrary.org/bibliography/97053#/summary.

We loaded the transcripts into Macaw as unique items and, once they appeared in BHL, associated the transcript record in BHL with the original record as a second “volume” in the series. However, as it is now, the transcripts are not associated explicitly with each page of text – for instance, one page of typewritten text might include three pages of information from the handwritten text. Martin had mentioned that the ideal situation would be for all transcripts to be directly associated with the handwritten pages from which they came. Therefore, we have not moved forward with incorporating more transcriptions into our records in BHL. I wonder if a policy decision needs to be made about this situation on a higher level, as more and more partners complete similar projects? What are your thoughts?

In any case, we are thrilled that you are ready to upload your field books into BHL, and happy to help as much as possible.

Best,

Julia Blase




From: Kearney, Nicole [mailto:nkearney@museum.vic.gov.au]
Sent: Monday, January 19, 2015 1:43 AM
To: Blase, Julia
Subject: Field book transcriptions on BHL?
Hi Julia,
Our volunteers have now transcribed a number of field books from our collection. I’ve been reading through the transcripts today and they’re a fascinating read. The curators are very excited about them and would, of course, like to make them available online.
We are therefore pondering the inevitable question – how do we incorporate all this wonderful work into BHL?
We haven’t yet uploaded the digitized field books onto BHL because they do not have a MARC record (this is because they’re part of our Archive Collection rather than our Library Collection). Our understanding is that you cannot upload an item into MACAW unless it has a MARC record. Our library is happy to create a MARC record for each diary, but I just wanted to check with you that they do indeed need one before I ask them to do this.
Ideally we’d like to upload the digitized field books onto BHL with their transcripts attached in some way. I’ve viewed the completed projects on the Smithsonian Transcription Centre, but I’m not aware of the transcripts being available on BHL (excuse my ignorance if they’re there), or of there being a way to link to the transcribed versions from within BHL (again my apologies if I’ve missed this). I
If it isn’t possible to put the transcriptions onto BHL, is this something that you intend to do in the future? If so, we’d love to be involved in this conversation (as it’s certainly a conversation we’re engaging in here).
Any advice/information you could provide me with at this point would be greatly appreciated.
Thanking you in advance, Nicole
Nicole Kearney (Mon & Thurs)

Project Coordinator | Biodiversity Heritage Library
Digital & Emerging Technologies, Museum Victoria
PO Box 666, Melbourne VIC 3001
61 3 8341 7779



From: Kalfatovic, Martin <KalfatovicM@si.edu>
Sent: Thursday, November 5, 2015 11:13 AM
To: 'Kearney, Nicole'; William Ulate
Subject: RE: Other kinds of Frankenbooks?

Nicole, no, these would not be Frankenbooks.
We only apply the concept to digital manifestations that are created from multiple physical copies.
So, if scientist A compiled a bunch of her favorite articles and a library added that to their collection, that would be fair game for BHL (at the discretion of the contributing library).
The goal is to have the digital manifestation correlate to a physical copy (and I would bet were not 100% pure on that matter).
Does that clarify?
Martin

From: Kearney, Nicole <nkearney@museum.vic.gov.au>
Sent: Wednesday, November 4, 2015 10:45 PM
To: William Ulate; Kalfatovic, Martin
Subject: Other kinds of Frankenbooks?

Your mention of Frankenbooks reminds me of another curly issue that has recently cropped up here at BHL-Au.
Our library collection includes bound volumes that are compilations of articles from different journals on particular topics, e.g. Ecology of Australian Snakes. Some of these volumes belonged to eminent scientists and include their hand-written notes bound between the articles. For example, our librarian recently brought me a particularly beautiful 1859 volume that was owned by Frederick Du Cane Godman. Between the articles are his annotations and stunning illustrations. Then, at our meeting at the Royal Botanic Garden yesterday we were shown a large collection of these “Collected Papers”.
The librarians have asked whether we can digitise these volumes, transcribe the handwritten sections, and put them up on BHL. I suggested that some of the articles would already exist on BHL in their own journals, but they maintained that the annotated compilations are historically significant in their own right (as the annotated volumes from Darwin’s library are).
Reading your email below, I wonder if these are the books your Collections Committee originally called Franken-books (the reference books made out of parts of books). Have you indeed come across such volumes before? Does BHL accept them? (If ever there was a case for including article-level metadata…)
Kind regards, Nicole
Nicole Kearney
Coordinator | Biodiversity Heritage Library
Digital & Emerging Technologies, Museum Victoria
PO Box 666, Melbourne VIC 3001
61 3 8341 7779




From: Kearney, Nicole <nkearney@museum.vic.gov.au> Sent: Wednesday, November 4, 2015 9:36 PM
To: William Ulate
Cc: Kalfatovic, Martin
Subject: RE: Notes from yesterday's transcription meeting
Hi William and Martin,

Thank you for sending your meeting notes through. I'm feeling a little regretful about not pulling an all-nighter to attend your meeting.

I'm flattered that you've honoured our proposed stop-gap solution with such a great name. Our "Frankenbooks" suggestion was originally just that - a stop gap solution for displaying the transcriptions alongside the original texts within the constraints of existing BHL functionality. However, I do see the benefit for the user of having the original and the transcription appearing side-by-side with matching formatting/layout (as per attached pic).
I discussed the idea of using the OCR field for transcripts with Julia Blase in February – I will forward you this correspondence. This option would certainly be acceptable (providing the field can be labelled “Transcript” when used for transcriptions). It’s an intuitive place to put the transcriptions and would allow users the option of showing or hiding them as required. We’re happy to hear that Mike says this can be done easily!
It was Julia’s suggestion to upload the transcriptions as subsequent volumes of the original title (again see email correspondence from Julia). This was what BHL had done with a previous transcription and we have been following this model. It is interesting to note that this previous transcription was a historic document in its own right, produced on a typewriter by the author’s assistant after he retired: http://www.biodiversitylibrary.org/bibliography/97053#/summary. I’m not aware of more recent transcriptions on BHL, other than ours. Are there others?
I agree with Leslie and Martin that being able to search the transcriptions is highly desirable (and that searchability trumps tagging/formatting). I’m interested in Joe’s comments about retaining/losing mark-up for charts, tables, etc. We replaced the mark-up with the actual formatting in the display version of our transcriptions (again see attached), but this would be lost in the OCR field it is worth considering how to display/mark-up tables, etc.

I will stop my evil preparation of Frankenbooks, although we may still consider these monsters for our own website. ;)

Thanks again for involving me in these discussions, Nicole

P.S. As for discussions in person, I generally work on this project on Mondays and Thursdays 9am to 5pm (AEDT – Australian Eastern Daylight Time, UTC/GMT +11 hours), but I could be available any day between 7am and 1am AEDT. If this doesn’t work for you, we can certainly continue to discuss this over email.

Nicole Kearney
Coordinator | Biodiversity Heritage Library
Digital & Emerging Technologies, Museum Victoria
PO Box 666, Melbourne VIC 3001
61 3 8341 7779
Original and transcription.jpg







From: William Ulate [mailto:william.ulate@mobot.org]
Sent: Thursday, 5 November 2015 2:21 AM
To: Kearney, Nicole <nkearney@museum.vic.gov.au>
Cc: Kalfatovic, Martin <KalfatovicM@si.edu>
Subject: Notes from yesterday's transcription meeting

Hi Nicole,

Please find attached our notes from yesterday's meeting on transcriptions.

As you may see, there's a request from the group to ask if you could possibly pause the process of uploading items constructed out of transcriptions into BHL while we are working on the process of figuring out how to handle the PDF of transcriptions (ie. transcription with layout).

Martin and I could meet with you if you would like to discuss this further, just let us know your availability please.

Kind regards,

William






3rd Transcription Group Meeting - BHL Australia + SMI Transcription Center proposals


From: William Ulate
Sent: Monday, November 2, 2015 2:49 PM
To: Trish Rose-Sandler; Mike Lichtenberg; Sheffield, Carolyn; Crowley, Bianca [CrowleyB@si.edu]; Joseph deVeer; Kalfatovic, Martin; Richard, Joel M; Kearney, Nicole; Parilla, Lesley
Subject: 3rd Transcription Group Meeting - BHL Australia + SMI Transcription Center proposals
When: Tuesday, November 3, 2015 9:00 AM-10:00 AM.
Where: Call 1.866.305.1460 Access code 3594388#

Folks,

Let's meet tomorrow Tuesday at 9a.m. US Central time to discuss Nicole's proposals (see below) and take a careful look at the approach that our Australian Colleagues are following.Attached here you can find our notes from the last meeting (thank you Trish!)Talk to you all tomorrow!

Regards,

William.

===================================================================

Meeting Notes (Nov. 3rd. 2015): 2015_11_03 transcription discussion.docx



2nd Transcription Group Meeting - Thoughts and Questions


From: William Ulate
Sent: Wednesday, October 14, 2015 6:02 PM
To: Trish Rose-Sandler; Mike Lichtenberg; Sheffield, Carolyn; Crowley, Bianca [CrowleyB@si.edu]; deVeer, Joseph; Kalfatovic, Martin; Richard, Joel M; 'Kearney, Nicole'
Subject: 2nd Transcription Group Meeting - Thoughts and Questions
When: Thursday, October 15, 2015 1:00 PM-2:00 PM.
Where: Call 1.866.305.1460 Access code 3594388#


Meeting Notes (Oct. 15 2015):

Transcripton discussion 2015_10_15.docx





















Skype call for Discussing Inclusion of Transcriptions into BHL






From: William UlateSent: Friday, September 4, 2015 9:35 AM
To: Mike Lichtenberg; Kalfatovic, Martin; Sheffield, Carolyn; Blase, Julia; Trish Rose-Sandler; deVeer, Joseph
Subject: Skype call for Discussing Inclusion of Transcriptions into BHL
When: Tuesday, September 8, 2015 2:00 PM-3:00 PM.
Where: Skype call from mobot.cbi


Hi all,

We would like to invite you to a call on September 8th at 2pm Central time to discuss "How to include Transcriptions into BHL". There are several projects generating text for BHL that improves the OCR and we would like to talk about the best way to integrate these texts into the BHL Corpus.

The draft Agenda of the meeting includes both of the following topics:
  1. There are currently several initiatives producing transcriptions what to do with current existing transcriptions and
  2. How to proceed in the future with new transcriptions?

I will try to establish the Skype call from the account of MOBOT's Center for Biodiversity Informatics (mobot.cbi).

Thank you!

William.
========================================================================================
Meeting Notes (Sept.8th 2015)

Transcriptions Meeting - Sept 8th 2015.docx
===================================================================