BHL
Archive
This is a read-only archive of the BHL Staff Wiki as it appeared on Sept 21, 2018. This archive is searchable using the search box on the left, but the search may be limited in the results it can provide.

face to face meeting May 12 2015

Meeting minutes from Purposeful Gaming Face to Face meeting
May 12th 2015 St Louis MO

Attended: MOBOT (Trish Rose-Sandler, Mike Lichtenberg, Mike Blomberg, Doug Holland), Cornell (Marty Schlabach, HollyMistlebauer,), Harvard (Joe DeVeer, Patrick Randall), NYBG (Susan Lynch), Smithsonian Libraries (Grace Costantino joined via skype after 2:30pm )

8:30-9:30 meet/greet/eat, Digitization updates (all)
Introduce yourself, institution and role on the project
Cornell
Total scanned=27k pages 685 catalogs (Subcontract agreement – 210 catalogs 70k pages)
Waiting for 4th shipment. 153 items in IA but not sure how much in BHL. Shipment 3 will require some local catalog records.
Sending out to external vendor. Took awhile to ramp up Macaw but speed is really good. Collections assistant and Holly have been using it. Holly does final review before going to IA. Successfully going into subcollection in IA. Gets ingested into seed and nursery collection in BHL.
Macaw issues – figuring out permissions for different functions. Documentation is very good. Macaw still doesn’t have ability to do structuring at article level. Can do article start and end but not author and title. Joel and Mike looked into adding that as part of Macaw and sent around some preliminary questions about what is needed last summer.
Looking at Office of Sponsored programs to reallocate grant funds from all digitization to some staffing costs. Complications and slow due to duplication required this.
NAL counts – 14,748. They are in single collection in IA https://archive.org/details/usda-nurseryandseedcatalog. Marty initiated contact to coordinate and reduce duplication with our partners and them. They have since become an affiliate of BHL. Other ideas for collaboration – sharing catalog records and pursuing permissions.
Outreach – Heirloom Gardening seminar at Genesee Country Museum. Created handouts – bookmarks. June 5th will talk at Mann Library reunion weekend on agro-biodiversity. Creating notecards for event. Also doing Talk at CHBL annual meeting at seed savers in Iowa. Will have folks do transcription and play games.
Mann created postcards with imagery that were mothers days cards.
Using External scanning vendor for most of stuff (Trigonix) but now have 2 in-house scanning facilities available. Also acquired an overhead scanner.
Played with CartoDB – online free GIS system to plug in data. Seed and nursery companies in NY. http://cdb.io/1ItFgbE
Talking to copyright advisors at Cornell (Peter Hirtle) about digitizing trade catalogs non-US but haven’t come to an agreement yet. Hirtle believes we should be honoring the copyright law of country where it is published. If there’s no author its more to do with date of publication. Balancing risk but risk is small. Try to come up with table for monographs, serials, and non-personal author. Published in US between 1923-60s - an item is copyrighted only if there is a statement on it. Has to have been renewed.

NYBG
Total scanned=25k pages scanned but not sure how many items (Subcontract agreement was 15k)
Scanning completed by Jan 2015. Another NEH scanned seed and nursery catalogs we could upload to BHL. Waiting for NAL to complete its scans so as not to duplicate what they are doing. Holly’s spreadsheet contains NAL scanning up to 1922 and Susan could use this to verify. Could simplify by just looking at firm level instead of issue level.
Concerns – for web users easier if all issues (years) were combined under a single title record. Much easier to search and appears that way in IA. This has to do with different ways of cataloging (NAL does all issue as separate monographs). NYBG and Cornell does it same but right now issues from different scanners for same title are not merged.
Marty wondered if we could merge the individual monographs from NAL easily?
Several BHL members are in process of applying for IMLS NLG whose purpose is to reach out to new content providers and get them into BHL. Includes Enhancements to Macaw tool, ingest between IA and BHL

MOBOT
Total scanned= 22k pages, not sure how many items (Subcontract agreement – 8k pages). That is the majority of MOBOT collections. Copyright concerns – wanted to do a lot post 1923. Jason found loophole which focuses on date of publication, no individual authors for these. Corporate authors require specific years from date of publication.

9:30-11am Transcriptions – status of completion, walk through transcription process and QA process, consistency of transcriptions, transcribe-a-thon (Joe)
Harvard’s job was to identify and implement a transcription tool for creating content to the game. They had already digitized 30 field notebooks of William Brewster (one of the best known amateur ornithologists in US) diaries, observations, photos, etc. through previous Connecting Content project . Committed to doing 2k pages from Brewster . Categories – diaries (more personal), journals (bird observations, narrative and species list for locations and months, reported on other species too). Lived in Cambridge so most observations based in New England and Maine. Selected 5 journals and 5 diaries for transcription. Used 2 tools: ALA’s DigiVol, FromThePage. Looked at 7 tools originally and chose 2 that we thought were the best for our needs. We also needed 2 digital outputs anyway. Also uploaded seed and nursery catalogs to FromThePage
DigiVol Brewster matierals 2818 pages transcribed and vetted. Sent all files to Mike L. It took us about 1 year.
FromThePage is not as complete - Brewster journals 60% complete, diaries 95% done.
15 Seed catalogs went up end of march in FromThePage. Uploaded catalogs from NAL because had least amt of problems with images displaying and loading. 854 pages. 110 are complete or 13%. These are daunting to complete due to amount of text.
Comparing 2 tools

DigiVol
easiest to use and can interact with users.
Validation involves looking for glaring omissions, deviations from guidelines, some people wanted to add margin formatting. Great mechanism for feedback to transcribers.
Had about 20-30 volunteers total but active about 3-5. ALA has existing community of volunteers to draw from.
Administrator can display regional maps, choose an image for each expedition and paragraph describing author. Can add links to lots of things eg. Tutorial, bird lists of names. You can take existing ALA tutorial and modify.
Asked volunteers to Transcribe verbatim with line breaks. Our guidelines Didn’t ask for any many brackets as ALA did (eg. They wanted Abbreviations spelled out in brackets)
No fees or costs. Developers very responsive. But Upload process more involved (only accepts JPEGS so had to convert) Brewster’s journals are all in BHL now.
Exporting: single csv file or zip files
Items stay in DigiVol until Adm deletes them

FromThepage
Much more barebones than DigiVol
Images served directly from IA
Content created through Dashboard/Collections.
On main page can add images and description of project.
Biggest drawback for adms - Hard to track validations. Have to track outside tool.
Best feature – has url upload directly from IA much like BHL does. Saves a lot of time.
Takes about 4-5 mins to import a book from IA

Mechanical Turk (Patrick)
Mechanical Turk, part of Amazon web services, publish a project and international workers do the work. Paid for work model. Requestors make web service account, walks through process of developing project. Pick template, describe it, set price, qualifications. Certain people get master status. Can test them if you want before hiring. Paid by assignment. 1 page is 1 assignment regardless of number of words on a page. Have to Fund account. Don’t pay until work is accepted. Set time for project to expire and different tasks. Dashboard to edit HTML to get UI what you want. Spent quite a bit of time to look like FromThePage and added tutorial. Publish it - .csv files. Upload it. To add another batch its another .csv file. Excel spreadsheet with links to page. Hrly rate – $1 hr. No labor laws or tax laws. We put in $100. 76 pg catalog. Required 3 transcriptions per page. 228 pages. Took 76hrs for all pages to be transcribed (3 pgs per hr). Amazon gives you some stats – avg time to complete (19 mins per page). What are ethical considerations? How much should we pay? How to determine fair cost per hour
Of 228 pages Only 2 were rejected . Quality comparable to FromThePage. Uploaded Brewster journals and time to complete much slower than seed catalogs ( handwritten text is much harder for people for whom English is not first language)
Cost effective if alternative is paying staff to transcribe but not if alternative is having volunteers to do it
Hard to do QA.
If want to scale this up recommend having Text inputs could go to another site.
Amazon gets 10% fee of every paid task.
Feedback from workers – they can let us know when its not working. Do we know who the workers are? Indian, not sure who else

11-12:30 Lunch at Sassafras Café at the Garden (group photo)
12:30-1:30 Smorball/Beanstalk testing (all)
1:30-2:30 call with Tiltfactor (introductions, feedback on games, timeline)
Everyone on project team introduced themselves.
Trish asked Tiltfactor to explain how the game know which are the correct answers to type. Players can put anything in where words disagree in the OCR output. Its a Clever way to making players believe the game knows the correct word when it only knows part of the word.
Here’s a visual to help explain
graphic for algorithm.png
As soon as a certain number of people agree we retire the word.

Feedback on Smorball and Beanstalk
Team commented Beanstalk is a bit repetitive. Tiltfactor agreed. Looking to add Small animations –subtle little visual variations, animations. Wind effects.
Smorball – lots going on - hard to figure out what to do with helmet and use it.
Can we do a shortcut for Pass button? Yes.
3 lines coming in Smorball the bottom line is covered up by pass button
Beanstalk text is hard to read – they are working on enlarging
Smorball – non-gamers not sure where to start playing. They will add a pulsing stadium to guide you.
If we add zombies does it feel too much like Plants vs Zombies ? Hadn’t considered that - they know the designers and could talk to them.
Robot/zombie/pirates

Timeline –
Smorball getting close to being done. Beanstalk getting close but still need some work.
They want to do more testing.
Beanstalk Go ahead and push out to BHL community this week.
Do we need more data ? Good for now. 100 pages right now.
No official release date until they finalize art change but probably late May or early June
Tiltfactor will start drafting press release and pre-tweeting. Lets do a Joint press release and get IMLS to do a release. We’ll create a joint Google doc and write 1 press release for everyone. Include Tiltfactor, IMLS, MBG, Harvard (don’t forget NYBG & Cornell)
Harvard is concerned about end of May since its graduation and lots of communication going on. Is this an issue? Probably not because our audience will be global.
Mobile versions of games? Testing now but coming slower
Smorball will be difficult on mobile. But Beanstalk OK.

Rollouts of web vs mobile. These won’t be in the app store - Won’t be on phones but could play on iPad (browser based). Older ipads have been a problem.
Continuing feedback – survey monkey – link directly on the game

2:30-3:30 Game release plan/Communications (Patrick and all)

Patrick is working with Kate in Harvard Libraries communications and she works with public affairs. She wants to do pitches rather than press releases which mean she sends to specific individuals - More effective for different outlets to get them to pick up story. Patrick will do on campus places (e.g. metalab, Harvard press) External media – Slate vault, Buzzfeed, Boston Globe, WashPost, HuffPost, NPRs all tech considered, gaming community (PC Mag, Wired). Library and academic sources – ALA scope blog, ARriadnes, Chron of Higher Ed. Kate is writing the pitches. How to coordinate who we contact? Lets put the targeted media outlets in the Google Doc. Follow same Process we did for Zooniverse with spreadsheet where we listed contacts and person contacting them. Need to have it go out at same time. How to coordinate with communications in each partner institution? Have them get in touch with Patrick at Harvard.
General social media posts – store stock language for tweet and Facebook posts that we include in the Google doc, could do a library of images for posts.
Hashtags for games? Need to create.
Ask Tiltfactor is they have stats on players?
What is mechanism for support? Trish review contract agreement for what they are responsible for.
How do users ask questions or give feedback on live game? Patrick will check with Mary, Tiltfactor
Would be good to provide a monthly update and progress report via social media.
Game will run from June-October. Then we will need time to evaluate results before projects end late November. We aren’t sure if the game will run beyond the life of the project
Patrick will set up Google doc
How will users find the games? Social media, tiltfactor’s website, link off BHL? Would we want to put a link from BHL main page for a short period of time? Could do something with addThis banners. Easy to turn on and off. Customize who sees those.
Should both games be promoted to all audiences? And let users determine their preferences? Someone will ask Mary about that


3:30-4pm CCLA Crowdsourcing Workshop summary and wrapup of the day (Trish)
CCLA stands for Crowd Consoritum of Libraries and Archives. Group that was started in 2014 as a result of IMLS money give to Mary Flanagan from Tiltfactor to enable conversations around CS. They have held several regional meetings (UCLA, Boston) and webinars. The event at Univ of Maryland was the culmination of all those conversations.
60 attendees at 2 day meeting at Univ of Maryland. Attendees were mix of librarians, software developers, archivists, and funders (NEH, IMLS, NIH, Gates Foundation) Mellon, Knight and Sloan were supposed to come but canceled last minute. Mostly US based but some int representatives (OPENGLAM, Tanzania monitor water quality, Univ of Victoria, )
Structure was great mix of panels and small group breakout sessions. The panels focused on case studies, challenges, trends and gaps within CS. BHL was asked to participate on one of the panels called “Dispatches from the field” Great opportunity to talk about all the crowdsourcing activities we’ve been doing both within our projects and our day to day actitivies (Gemini, Macaw, etc)

Big takeaways –
Funders really welcome projects that include CS
Don’t need to argue what you create will be a model but could just be starting a community conversation around an issue.
NEH wants to see how cs can engage diverse publics in accessing same primary sources that academic researchers usually have access to.
NIH Has funded cs for sometime. Cs is a component of data science and multidisciplinary fields. If you can tie humanities into health they welcome that too [BHL books on plants as medicine]
IMLS – is interested in CS in both NLG and 21st century librarians (Laura Bush education & training)
Next steps: Funding for CCLA workshops has run out but there was strong interest in keeping the group going and sharing ideas, collaborating on grants, putting together a CS bootcamp.