January 15, 2017

What Governmental Big Data May Mean For Libraries

Venn diagram of open government, open data, and government data

open government data – simple Venn diagram by
Attribution-ShareAlike License

On May 9, President Obama signed an open data executive order and released an open data policy. Only a couple of weeks later, on May 22, Data.gov responded by launching a new  data catalog on an open source data management system called CKAN, which, the site says, will enable the central implementation of the Open Data Policy, as it will harvest the data inventories that federal agencies will be creating under the directive. “We also released new tools on Project Open Data that will help agencies easily meet the requirement of the policy, while laying the foundation for the new Data.gov infrastructure across government,” the statement continued.

LJ caught up with members of the library and data-driven research communities to see what this may mean for their missions.

Debbie Rabina, an associate professor who teaches government information in Pratt Institute’s library science program, told LJ that the memo “was received enthusiastically by the library community. Social media postings are full of words such as ‘overdue,’ ‘awesome,’ ‘welcome’.” According to Rabina, sections that were especially welcome included those saying that government data has to be open and machine-readable by default and the definition of Information Life Cycle & Open Data.

Data as Collection Development

Rabina said, “The challenges that we face as librarians are mostly with regard to our ability to manage this information. There are large potentials and large pitfalls. Concerns about managing privacy were addressed directly, my concerns are more in the areas of expertise for access and collection management. The kind of expertise needed to manage data are just beginning to emerge from within our ranks. I would like to see LIS education graduate students who have not only the technological skills, but, more important, the policy perspective that views data as a collection.”

James A. Jacobs, co-founder of Free Government Information and former Data Services Librarian at the University of California San Diego, agreed with that policy perspective, telling LJ, “the EO does not provide for preservation or create any user-centered services for the information. This is where libraries can step in. The EO provides a huge new advantage to libraries wishing to enhance the value they provide to their user communities. Libraries will be able to identify, select, and acquire large datasets of valuable information content without cost or copyright restrictions. They will then be able to add value to this content by preserving the content and providing user-focused services.”

James R. Jacobs, U.S. Government Information Librarian at Stanford, agreed that libraries should step forward as stakeholders and participate in the process of building out the agencies’ policies and infrastructure. “Libraries are a key target audience. We should take full advantage of this opportunity to steer this policy in a way that helps libraries and the communities that they serve,” he suggested. “We’ve been doing a lot of work on data curation, and we could be vital partners in terms of metadata standards, metadata creation, preservation, and managing the whole information life cycle, which libraries are really good at.”

Jeanne Holm, Evangelist, Data.gov, U.S. General Services Administration, emphatically agrees that librarians should participate in the process. In particular, “I would encourage anyone to make a comment on the Project Open Data site or Stack Exchange.” At Project Open Data, “we’re trying to open source the improvements on that policy,” she says, while partner site Stack Exchange hosts a conversation “around everything from what does this mean for a policy impact to how do I structure our weather data?” In particular, she seeks input from librarians on “how do we structure taxonomies?”

Cautious optimism

Jacobs of Stanford shared the other Jacobs’ concern about long-term preservation. He was also concerned, as is Joshua Tauberer, about the use of open licenses. Said Jacobs, “Open licenses presume access is closed by default. That’s obviously a problem because most government info is in the public domain. I think more work needs to be done. I’d love to see some sort of statement that says you can use this data but you have to put it out in the same terms that you found it. I worry that commercial entities will take public domain data and put a rope around it and make it not in the public domain anymore. That’s something you saw for example in Westlaw and Nexis; they took public domain data, and they have very expensive databases that people have to pay to access.”

Jacobs of Stanford also pointed out that “Data.gov is not OAIS [Open Archival Information System] compliant at this time; they don’t necessarily have data management policies in place. The other place I could see being a repository is the GPO’s system. It was supposed to go through the trusted digital repository process. They did the first pass and were ready to do the next step, but it was postponed because of the sequester.”

He was, however, particularly pleased with a number of aspects: the timeframe for implementation; the requirement for every agency to have agency.gov/data and /publications subdomains on their websites; and the github repository for agencies to share best practices, standards, and infrastructure.

Keith Curry Lance of RSL Research Group, sounded a note of caution. “The devil is in the details when such policies are put into effect. Notably, this executive order is careful not to conflict with any existing laws concerning access to specific data or, more broadly, the authority of executive agency heads. … A critical aspect of open data policy is time–how long does an agency take to respond to requests for information? The executive order makes it clear that federal departments and agencies are to respond by a reasonable deadline. When actual open data policies are written by federal agencies, we will find out how they define reasonable.” He also pointed out that since the provision is aimed at newly created datasets, not conversion of pre-existing ones, “those interested primarily in historical data may find the results of the order less than satisfactory.”

And Richard Pearce-Moses, Director, Master of Archival Studies at the College of Information and Mathematical Sciences, Clayton State University, GA, told LJ, “I very much look for a success to President Obama’s open data policy.  At the same, I don’t see money to help agencies actually implement these goals and, especially at the state level, we have a long way to go.”

Meredith Schwartz About Meredith Schwartz

Meredith Schwartz (mschwartz@mediasourceinc.com) is Executive Editor of Library Journal.