October 31, 2014

Major Maine Libraries, Public and Academic, Collaborate on Print Archiving Project

Eight of Maine’s largest libraries, both public and academic, are about halfway through a major and distinctive project for the shared management and archiving of their print collections and the integration of digital editions into a statewide catalog.

The drivers of the Maine Shared Collections Strategy (MSCS)—lack of available space, budget cuts, low usage per cost and availability of electronic resources—are common throughout the library world. The recent Print Archive Network meeting held at ALA Midwinter Conference in Seattle highlighted the growing number of shared print projects being established throughout North America.

Nevertheless, MSCS has some distinguishing characteristics, according to those involved in the project:

  • The collaboration between public and academic libraries on a shared print project, which arises in large part from a unique history of trust and decades of collaboration among Maine libraries;
  • The collaboration of public universities and private colleges, as well as the state library;
  • The utilization of large-scale digital monograph collections, including both HathiTrust and Internet Archive, for the shared print data analysis, with a goal of integrating print-on-demand (POD) and ebook-on-demand into the group’s resources;
  • A primary focus on print monographs rather than journals;
  • An emphasis on the retention and preservation of titles, rather than the more typical emphasis on deselection.

MSCS comprises Colby College, Bates College, Bowdoin College, the University of Maine, the University of Southern Maine, the Maine State Library, Bangor Public Library, Portland Public Library, and Maine InfoNet, the state’s consortium. It started in June 2011 with a three-year Institute of Museum and Library Services (IMLS) grant of $821,065, and the participating libraries are contributing matching funds via in-kind services of salaries and fringe benefits for their staff on MSCS committees.

Matthew Revitt, the MSCS program manager, said he expects the broader library community’s interest in the project will be strong.

“MSCS will define a sustainable business model–including finances, collection analysis, governance structure, and a memorandum of understanding–that includes diverse partners with different needs and diverse funding streams that can be adapted by other multi-type library shared print projects,” Revitt said.

Documentation, models, policies, and procedures will ultimately be available for other libraries and consortia to download and adapt as they address the management of their own legacy collections.

The immediate goal is to analyze the collections and produce equitable criteria for retention and preservation decisions about legacy print titles as well as titles that could be de-accessioned either because they are preserved or available in a large-scale digital collection.

“These decisions will then be exposed in our library catalogs so that other libraries can use our retention commitments as a factor when analyzing their own collections,” Revitt said.

Challenges of academic-public collaboration

Even though Revitt said the project is scalable, he added that the history of collaboration among Maine libraries along with a robust statewide interlibrary loan system were important factors.

“It would be easier for us to rely on another library’s retention and preservation commitment than for those libraries without such a history,” he said.

A challenge MSCS will face is ensuring how to sustain the project after the grant.

“It is clear that the libraries that can weed their collections because someone else is keeping
titles are gaining a valuable service,” said Barbara McDade the director of the Bangor PL. “However, what is the pricing model we can use to sustain the collection builders to entice them to continue to hold the titles? Also, what is the optimum time for promising to keep an item and how is that decided?”

The variegated nature of the collaboration in Maine is a striking characteristic, according to Clem Guthro, director of the Colby College Libraries.

“I think one value of the MSCS project is showing that public and private academics can build true collaboration, and I think it shows that public libraries can also participate with academic partners,” he said.

Having public and academic partners also presents problems. For example, resource lending periods vary. Faculty members at Colby, Bates and Bowdoin can borrow materials from each other’s libraries for one year, but the lending periods at the public and university libraries are much shorter. Therefore, Colby, Bates and Bowdoin are still going to retain copies of certain items even if a public library has committed to its retention.

“I think there are some different expectations of outcomes amongst the partners especially difference between the publics and the academics which will need to be reconciled when we get to see the comparative data,” Guthro said.

However, comparing circulation data across multitype libraries also presents some challenges. For example, any overview of the libraries’ data set is going to have be in both LC and DDC in order to allow the libraries to analyze subject strengths in the classification scheme used for their materials and to make group comparisons across both LC and DC.

Leveraging HathiTrust and Internet Archive

For its collection analysis, MSCS, with the help of New Hampshire-based Sustainable Collection Services, will use the HathiTrust API to compare Hathi’s 5.5 million digitized books (including item-level data) to see where there is overlap. It will also be comparing MSCS data to the Internet Archive (IA). SCS has begun this investigation with the download of bib records for the estimated 1.5 million IA titles according to a blog post by Rick Lugg, an SCS partner.

“Some MSCS participating library representatives have expressed a willingness to rely on digital copies as surrogates, but until we get data back from SCS we will not know the extent of the overlap,” Revitt said. “Another factor will be the circulation rates for the items in HathiTrust, if the rates are low, then libraries may be more willing to rely on a digital copy. We envisage that for some subject areas for example, in the humanities, faculty at the academic libraries are still going to prefer to have a physical copy.”

According to the Ithaka S+R Library Survey 2010, 84 percent of U.S. library directors said they would be more likely to withdraw their print book collections if their library could provide guaranteed on-demand access to print versions through a sharing network such as HathiTrust, which was recognized as a Trusted Digital Repository by the Center for Research Libraries in 2011.

At least one participating MSCS library would need to become a HathiTrust member before the group could rely on digital surrogates. MSCS has sent holdings data for the participating libraries to HathiTrust and are waiting to see whether individual or consortial membership (for MSCS libraries) would be the most cost efficient. The stumbling block so far for membership has been the inability to meet the HathiTrust’s authentication requirement for implementing Shibboleth, the software which provides web single sign-on across or within organizational boundaries.

Sara Amato, the MSCS technical services librarian, also recently began investigating different options for implementing HathiTrust records into the union catalog MaineCat.

“In the initial data that we received from OCLC, it was surprising to see how low the overlap was between our holdings and Hathi,” Guthro said. “We had assumed that the overlap would be quite high. We will need to see the circulation data from our collections to compare with the Hathi overlap before we can say if this will make a difference in our decision progress, though I am expecting that it will for some materials.”

SCS has foreseen some possible issues with comparing data to the IA. For example, the IA’s API is not designed for large-scale batch queries, SCS must obtain the full set of Open Library data (of which IA is a subset) and then parse the Open Library records to identify the IA titles. These are large files, e.g., the Open Library Editions file contains 25 million lines. About 6.8 million of these appear to have OCLC numbers, according to Lugg.

“As of January 2, the IA “Texts” division contains 3.7 million items, not all of which are books. It will require some digging to verify the various relationships and the quality of the data,” Lugg said. “SCS must identify items which appear both in IA and in HathiTrust to minimize duplication of counts.”

Collection analysis issues

The collection analysis that underpins so much of the project has posed other hurdles as well.

MSCS participating libraries must be able to analyze particular collections data from five different local ILSes across eight partner libraries with 2.9 million bibliographic records (this figure excludes government documents, non-book materials, videos, bound journals, but does include special collections books).

However, before any real analysis could take place the group realized that only one library out of eight had recently completed an OCLC reclamation. (An OCLC reclamation project synchronizes the libraries holdings in the local system to those marked in WorldCat.)

“In hindsight that process should have been completed before the grant was awarded, or at least written into the grant activities,” said Deborah Rollins, the head of the collection services department at Fogler Library at the University of Maine. “On the other hand the grant allowed the group to hire a systems librarian who was instrumental in facilitating the reclamation for the remaining seven libraries.”

Post-reclamation data for most partner libraries took about a year to gather, much longer than first anticipated, and only became available in September 2012. At the same time, it had become clear that the collection analysis tool that the group had decided to subscribe to in November 2011, OCLC’s WorldCat Collection Analysis (WCA), would not meet the project’s needs. For example, WCA could not manipulate and report out large batch files of title and other data for the eight libraries individually and as a group, according to Revitt.

In October 2012, the group began investigating other collection analysis tools on the market and finalized the contract with SCS in February 2013.

“SCS are unique in their ability to provide tailored reports combining local circulation and item
data with OCLC WorldCat library catalog holdings and HathiTrust Digital Library holdings,” Revitt said. “SCS’s special and unique relationship with OCLC will enable us to pull library data en masse from the OCLC WorldCat.”

In addition, MSCS matches to WorldCat holdings will use FRBR work sets to help make more finely-grained retention decisions. SCS is scheduled to deliver collection summaries by March 31, 2013.

“I think that long term success will be judged by patron satisfaction–faculty, students, general public–that we have provided long-term stewardship of print collections and that they can also get digital or print copies from the large scale digital collections,” Guthro said.

This article was featured in Library Journal's Academic Newswire enewsletter. Subscribe today to have more articles like this delivered to your inbox for free.

Michael Kelley About Michael Kelley

Michael Kelley (mkelley@mediasourceinc.com) is the former Editor-in-Chief, Library Journal.

Share