November 16, 2017

How the W3C Has Come To Love Library Linked Data

(Editor’s note: Since the initial publication of this article, the W3C has published permanent links to the final report and the supporting reports.)

The number of influential libraries publishing their metadata onto the web as linked open data, which is the heart of the Semantic Web, is growing at a dizzying rate. To further this trend, the World Wide Web Consortium (W3C), a major nonlibrary organization that supports the technologies that undergird the Semantic Web (or the Web of Data), will release a new report in September devoted to library linked data (LLD).

Even though this topic remains uncharted ether for the majority, many librarians at major institutions have recognized that a key to the bibliographic future lies in migrating their data out of library silos and into an open, global pool of shared data. Interrelating library metadata with nonlibrary resources in the linked data cloud is seen as the most promising way to ensure that library data remains accessible and reusable in a web-based, integrated, information universe.

“Traditional vehicles for surfacing content in managed collections—the OPAC, conventional use of controlled vocabularies, and authority control—are rapidly fading in importance,” said Stuart Weibel, a former senior research scientist at OCLC Research. “Some would say they’ve been useless for a long time. What advances that have been made have all been in the vein of ‘webulating’ catalogs and data,” he said, citing WorldCat, which celebrated its 40th anniversary on August 26, as an example.

“But these advances, important as they are, have not sufficiently integrated the contents of the library catalog into the search stream of the web,” Weibel said. “If libraries are to retain their role as curators of the intellectual products of society, their assets must be part of that search stream.”

Library standards, such as MARC or Z39.50, were designed only for the library community in the 1960s and 1970s, respectively. This legacy has complicated efforts to join that wider search stream, and it also led burgeoning web entities such as DBpedia, which offers a Semantic Web mirror of Wikipedia, to originally bypass library data.

So, in order to help library metadata play well with nonlibrary datasets (and vice versa), libraries have begun to reconceptualize metadata and publish it on the web using linked data technologies, such as the Resource Description Framework (RDF) language and its extensions OWL, SKOS, and SPARQL.

If library metadata is formatted and linked in RDF, then library content will surface more prominently in web search results and be accessible via massively enabling web protocols like HTTP, rather than library-centric protocols like Z39.50.

Web community interested in library data

In May 2010, several prestigious international institutions, all members of the W3C, chartered the W3C’s Library Linked Data Incubator Group (LLD XG), whose mission was to produce a report that would help increase the “global interoperability of library data on the Web.”

“To me, libraries are like the beautiful churches one finds in big cities — surrounded by skyscrapers, still beautiful as ever, but a bit in the shadow, and competing with the bustling city around them for the attention of parishioners,” said Tom Baker, a cochair of the group and the chief information officer of the Dublin Core Metadata Initiative (DCMI). “Students and even faculty today are more likely to start an information search on Google than in the library.”

The LLD XG’s charter expires today, August 31, and the group held its final meeting last week and will publish its final report probably in mid-September. The group, which has 51 members from 23 organizations, comprises a mix of veteran library data experts, software developers and Semantic Web experts.

“I think the main importance of the report from the W3C group is less its content than the fact that the W3C, a decidedly nonlibrary organization, commissioned it,” said Karen Coyle, a member of the group and a prominent consultant and blogger on library technology. “Basically it shows interest in library data from the web community. I actually wish that the committee had fewer librarians on it and more ‘others’ like scientists or web developers,” she said.

“The report, in my mind, has been incredibly important,” said Ross Singer, a member of the group and the interoperability and open standards champion at Talis, “if for no other reason than figuring out the problem we’re trying to solve and thinking about the real world complications of actually getting there,” he said.

“We have a very firmly entrenched system of creating and sharing data with a large and, for the vendors, consistently profitable ecosystem based on codified rules and standards,” Singer said. “To a degree, we have to throw a lot of this out the window and start over … the rules would all be different. For a profession that is very slow to evolve, this would be a massive disruption to the status quo,” he said. “As such, it is important to think seriously about how we approach such a sea change.”

Recommendations from W3C

The report is still being finalized but the draft recommends that libraries:

— create web addresses using Uniform Resource Identifiers (URIs) as globally unique, web-compatible identifiers for the resources (any kind of object or concept) they manage and the metadata elements they use to describe those resources. This ensures that diverse descriptions from other knowledge domains all refer to the same thing, and it would help eliminate metadata redundancies;

— develop library data standards that are compatible with linked data, since Semantic Web technologies “represent a fundamentally different way to conceptualize and interpret data from the data formats of the 20th century.”

— use their expertise in metadata management to become full partners with the groups developing Semantic Web standards;

— identify “low hanging fruits,” such as authority files and controlled lists that would allow libraries to quickly expand their presence in the linked data cloud;

— foster a discussion about open data and rights. “A mixture of rights within linked data space will complicate re-use of metadata, so there is an incentive to have rights agreements on a national or international scale”;

— explore using libraries’ ethos of quality control in the curation and long-term preservation of linked data datasets and vocabularies.

“In my opinion, this report can be considered a success if it makes clear that adopting linked data technologies does not mean throwing out the legacy of library science,” Baker said, “but to express — translate — that legacy into a new data language and, in so doing, make it more widely and richly accessible and reusable — relevant not just for use in libraries, but by data providers on the wider web,” he said.

Jeff Young, a member of the group and a software architect at OCLC, said the perspectives of libraries and W3C (along with the Internet Engineering Task Force, IETF) have begun to merge, noting the growing adoption of SKOS as the vocabulary of choice for representing controlled vocabularies on the web.

“As the shared understanding expands, the benefits of publishing linked data are becoming clearer,” Young said. “Writing this report has been a chance for both communities to get together and examine the leverage potential from both directions,” he said.

Similarly, Antoine Isaac, a cochair of the group and the scientific coordinator for Europeana, praised the group’s “ethos of working together around data, which should come side by side with the technical aspects of linked data.”

Library linked data projects growing exponentially

Isaac has been working on a pilot project at Europeana which already has made 3.5 million object records available as linked open data. This is just one of a veritable cascade of major projects that have coursed onto this landscape since the LLD XG’s inception, making its report even more pertinent.

LLD XG, in order to help keep track of these projects, has set up a Library Linked Data group, hosted by the Comprehensive Knowledge Archive Network (CKAN), to gather information on relevant library linked datasets. A number of significant developments have occurred in just the past few months:

— The Library of Congress (LC) this month offered its flagship Name Authority File as linked open data;

— LC also acknowledged the need for a more robust model for data interchange, like RDF, when it announced in May its Bibliographic Framework Transition Initiative, which envisions moving away from MARC;

— the British Library this month made available a major linked dataset representing the British National Bibliography (which included a schema and data model);

— The June 13 announcement by LC and the two other U.S. national libraries —the National Agricultural Library (NAL) and the National Library of Medicine (NLM)—that they plan to adopt the Resource and Description Access (RDA) cataloging code;

— the RDA controlled vocabularies, which can be modeled in RDF, were officially published this month on the Open Metadata Registry;

— Yahoo, Google, and Bing, in a rare show of unity, launched in June schema.org, which uses structured data in the context of search engine optimization;

— The Cambridge University Library released a dataset of 1.3 million records, and it provided a toolkit for converting MARC21 to linked data;

— Sudoc, the French academic union catalogue, comprising 10 million bibliographic records and maintained by ABES (l’Agence Bibliographique de l’Enseignement Supérieur), was released in July;

— The Virtual International Authority File (VIAF), which merges authority records from over a dozen national and regional agencies, revised its approach to linked data, and it also now links its records to DBpedia whenever possible;

— An LOD-LAM movement (Linked Open Data in Libraries, Archives and Museums) has begun to emerge with an international meeting held in San Francisco in June, and the first in a series of LOD-LAM meetups taking place around the world will happen September 16 in Washington, D.C. (sponsored by the Smithsonian Institution);

— The Association for Library Collection & Technical Services (ALCTS) and the Library & Information Technology Association (LITA) formed a Library Linked Data Interest Group at ALA’s annual convention in June.

“Where just 15 months ago, it felt like our task was to raise awareness and overcome resistance to a new way of doing things, it increasingly feels like we are pushing on an open door,” cochair Baker said.

The vendor perspective

A part of the LLD XG report’s purpose is to build on these existing initiatives and help provide a roadmap for the future. To this end, the report also offers a detailed inventory of available value vocabularies, metadata element sets, and relevant technologies based on use case studies.

“Taking the time to do this sort of inventory and functional requirements gathering, and using the results as the basis for recommended next steps as well as a concise discussion of ‘why’ to pursue library linked data, helps chart a path forward for our community in this space,” said Corey A. Harper, a metadata services librarian at New York University.

The draft report notes that library developers and vendors would benefit from not being tied to library-specific formats, because if they support linked data they will be able to market their products outside of the library world.

But vendors “need an expectation of development-costs recovery before beginning work on new products,” according to the report.

“We see and believe in the potential of LLD,” said Carl Grant, the chief librarian for Ex Libris. “At the gut level it feels like there is great potential benefit to this approach. I really like that this working group is part of the W3C, a clear advantage in getting us to think beyond our traditional library boundaries,” he said.

Ex Libris is designing its new system Alma with the necessary capabilities to support the LLD model, but Grant said there still remained huge barriers—such as finding ways to translate between vocabularies—to the overall adaption of the model.

“Given the global economic environment, this makes progress a real challenge as many will be reluctant to invest in something that seems rather esoteric at this point in time,” Grant said. “I personally believe that this data model requires substantial work to [garner] support on the part of vendors, and until we have more demonstrable benefits, adoption will remain with the innovators and thus, few in number. This is not to say that it isn’t important, because it clearly shows signs of being very important,” he said.

The schema.org project is seen by some as a massive boost, but others fear that it will remove the “open” from “linked open data” and could possibly neglect the scholarly, noncommercial requirements of libraries.

“Like many others, I view the emergence of schema.org with a jaundiced eye… another example of large, powerful companies wielding their influence without regard to an open standards process,” said Weibel. “…It is unfortunate that the agreement has turned away from the open standard of RDFa and towards an alternative, internally created approach authored within Google. But they don’t call it free enterprise for nothing,” he said.

Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch. http://lod-cloud.net/

Michael Kelley About Michael Kelley

Michael Kelley (mkelley@mediasourceinc.com) is the former Editor-in-Chief, Library Journal.

Share