April 24, 2018

Biodiversity Heritage Library Launches Crowdsourcing Games

BHL Beanstalk GameThe Purposeful Gaming and BHL project recently launched its first two browser-based video games, Smorball and Beanstalk.  Both are designed to offer players a fun online diversion while helping the Biodiversity Heritage Library (BHL) enable full-text searching of digitized materials. Funded by a grant from the Institute of Museum and Library Services (IMLS), which was awarded in December 2013, the project is exploring how games might be used to entice people to participate in crowdsourcing efforts at libraries and museums.

For example, scanning documents and rare books and posting the PDFs or image files online can help make those resources easier to access and available to a broader audience. But the text on those pages can’t be searched or data mined unless the content is converted to machine-encoded text. Optical character recognition (OCR) software can make it easy to extract text from scanned documents and books, but those programs aren’t perfect, and work best with consistent layouts, spacing, and modern fonts. Current OCR technology doesn’t work at all for some content in BHL’s digital collections, such as handwritten field journals from the 19th century, or catalogs with unusual typesetting and layouts.

“The OCR outputs for those kinds of things are garbage,” Patrick Randall, project assistant for Purposeful Gaming, Ernst Mayr Library, Museum of Comparative Zoology, Harvard University, told LJ. “Some of it, you can get partial OCR, but a lot of it has to be totally transcribed by individuals, by typing it all out.”

BHL, an international consortium of 20 natural history and botanical libraries, headquartered at the Smithsonian, is currently involved with multiple crowdsourcing projects that enable volunteers to help with tasks ranging from image tagging to transcription of digitized field journals.

BHL and its member institutions raise awareness and recruit volunteers for these projects through blog posts, social media, listservs, and conference presentations. And when it comes to intensive work such as transcription, “we’ve been pretty surprised at the reception and the willingness of people to jump on board and volunteer their time to help out,” Randall said.

These new games could greatly accelerate the vetting and proofing of these transcriptions. In the simple Beanstalk game, players read a short snippet of original digitized text from BHL and then type the text at their own pace to make a beanstalk grow. In Smorball, players are asked to transcribe as quickly and accurately as possible to compete in a cross-country tournament of the fictional sport Smorball.

In both games, these player inputs are then used to double-check an existing transcription of the digitized text. Initially, the project is using well-proofed digitized transcriptions to study player accuracy, performance, and other trends.

Smorball“For the inputs that have been used in the games, there have been a set group of texts that were really closely vetted, so that we know exactly what each word says, exactly how everything was supposed to be spelled,” Randall said. “So when we put those materials into the games for testing, when someone enters the wrong word, or even the wrong character within a word, we can know that it’s wrong.”

But eventually, these and other online games could be used to help BHL and other institutions clean up partial OCR; reconcile differences in human transcriptions of digitized documents; and/or verify the accuracy of machine-encoded words, phrases, or passages that have been flagged for potential problems. To hedge against individual mistakes and prevent deliberate vandalism of the projects, a game will have multiple different players complete the same word or phrase until a consensus emerges regarding the correct spelling or formatting.

Purposeful Gaming and BHL are being led by the Missouri Botanical Garden’s Center for Biodiversity Informatics, in partnership with Harvard University, Cornell University, and the New York Botanical Garden. Tiltfactor, an interdisciplinary studio at Dartmouth College “dedicated to designing and studying games for social impact,” was brought into the project to create both of the games.

Tiltfactor founder and director Mary Flanagan explained the potential that crowdsourcing has to help projects at BHL and other institutions, noting in a prepared statement that “cultural heritage institutions are increasingly benefiting from human computation approaches that have been used in revolutionary ways by scientific researchers. Engaging citizens to work together as decoders of our heritage is a natural progression, as preserving these records directly benefits the public. Integrating the task of transcription with the engagement of computer games gives an extra layer of incentive to motivate the public to contribute.”

In LJ’s June feature “Wisdom of the Crowd,” which took a broader look at the crowdsourcing trend, BHL outreach and communication manager Grace Costantino noted that the effectiveness of games as tools to get people involved with these crowdsourcing projects is still being studied. With BHL’s other crowdsourcing initiatives, the consortium works to keep volunteers engaged partly by keeping lines of communication open via social media and email, and by ensuring that they can choose from a variety of different projects so that regular contributors don’t become bored. The project will continue through November 30, giving BHL a clearer sense of the ways in which games can be used as a component of these other ongoing crowdsourcing efforts.

Matt Enis About Matt Enis

Matt Enis (menis@mediasourceinc.com, @matthewenis on Twitter, matthewenis.com) is Senior Editor, Technology for Library Journal.

Maker Workshop
In this two-week online course, you’ll create a maker program that aligns with your budget and community needs, with personal coaching from maker experts—from libraries and beyond—May 23 & June 6, 2018.
Doubling Your Circ on a Dime
How you manage your circulation matters—to keep patrons coming back for more and to demonstrate to stakeholders just how well-used the library is in your community. Don't miss this online course led by experts who have boosted their circulation numbers in creative and sometimes unexpected ways, without denting their budgets—April 25 & May 9.