November 29, 2015

Project Gutenberg Meets WorldCat

Last week, I interviewed OCLC’s Bruce Washburn about an OCLC Research project called oclcBot—a program which takes book records in the Internet Archive’s Open Library and matches their ISBN numbers with their corresponding OCLC numbers. I recently found out how another massive book site makes use of OCLC records—though in a more low-tech way.

Project Gutenberg is home to some 33,000 public-domain ebooks, and has become a go-to destination for new e-reader owners looking for free reading materials. Librarians have often directed Kindle owners to Project Gutenberg to soften the blow of having no OverDrive ebooks available for the popular device (although Kindle OverDrive ebooks will finally become available later this year). The Colorado Library Consortium (CLiC), in collaboration with other Colorado libraries, created a set of MARC records last December for popular Project Gutenberg content (with direct links to the downloadable ebooks and audiobooks) that libraries could easily put into their own catalogs.

But some librarians may not be aware of the human factor behind all that digitized text. Project Gutenberg texts are often scanned by volunteers and run through optical character recognition (OCR) software. Human proofreaders are an integral part of the process, making countless small corrections to a text before it is posted.

A Project Gutenberg affiliate that does this work is Distributed Proofreaders, made up of hundreds of volunteers, which has completed proofreading of more than 20,000 works in just over a decade.

On occasion, just one missing page or ink-smudged passage can become a stumbling block to making a public-domain work available at all. Such problems are crowdsourced on the Distributed Proofreaders wiki, and a look at the “Missing Pages” wiki page provides a fascinating look at the huge amount of work that goes into Project Gutenberg’s corpus, as proofreaders offer up their requests to the community.

So how do these volunteers use OCLC records? The same way everyone else does: to find specific copies of books. The wiki provides WorldCat links to many “Missing Pages” books to help locate new copies to scan. There may even be a few at your library.

David Rapp About David Rapp

David Rapp ( was formerly Associate Editor, LJ.

Craft Exceptional Digital Experiences for Your Users
Digital UX LJ and ER&L present an exceptional roster of library and user experience (UX) experts for our newest online course, Digital UX Workshop: Crafting Exceptional Digital Experiences for the User-Centered Library. During this 5-week online workshop, you will explore why UX matters, and how to sell user-centered design (UCD) to leadership within your organization. Whether you want to redesign your website, revamp your user interface, create a new discovery tool, implement e-resources, or develop a mobile app—you’ll have a tangible product by the end of the course.
SELF-eLearn More
SELF-e is an innovative collaboration between Library Journal and BiblioBoard® that enables authors and libraries to work together and expose notable self-published ebooks to voracious readers looking to discover something new. Finally, a simple and effective way to catalog and provide access to ebooks by local authors and build a community around indie writing!
View TDS Archive
On October 14, 2015 Library Journal, School Library Journal, and thousands of library professionals from around the world gathered for the 6th annual Digital Shift virtual conference to focus on the challenges and opportunities presented by the digital transition’s impact on libraries, their communities, and partners. Now available on-demand, this year’s program provides actionable answers to some of the biggest questions our profession faces for and from libraries of all types – school, academic, and public and features thought-provoking keynotes from John Palfrey, author of BiblioTech: Why Libraries Matter More Than Ever in the Age of Google, and Denise Jacobs, tech leader, author, and creativity evangelist.