November 24, 2014

Metadata and Copyright | Peer to Peer Review

Most of us are aware of the basics of U.S. copyright law, including the categories of copyrightable and non-copyrightable works. Some materials are explicitly exempted from copyright in this country, a key example being U.S. Federal documents. (Although if that sounds to you like a clearly distinguishable category, you should ask your local government documents librarian to fill you in on the complexities of defining “U.S. Federal document.”) Another exempted category is that of facts and compilations of facts that have no creative component. This was determined in the famous Supreme Court ruling of Feist v. Rural Telephone, in which the Court interpreted the constitutional wording of “to promote the Progress of Science and useful Arts” as implying some level of creativity.

“The constitutional requirement necessitates independent creation plus a modicum of creativity. Since facts do not owe their origin to an act of authorship, they are not original, and thus are not copyrightable.” (Feist, p. 1)

As you might imagine, “modicum of creativity” is itself very difficult to define, much like the determination of Fair Use based on “substantiality of the portion used” and the effect on the market for the work.

This question of facts versus creativity comes up in the discussion of ownership and copyrightability of library catalog data. This has been a topic of discussion in the US library world since the time in 1983 that OCLC announced its intention to claim rights in the OCLC MARC record data base. Can one, anyone, claim copyright in bibliographic records, either singly or as a collection?

There are two interlocking issues at play here: one is that of ownership, that is, if anyone owns the intellectual content of a bibliographic record, who is that owner? And the other is the question of whether or not bibliographic data, singly or in a database, meets the threshold of creativity  that is required by copyright law.

Ownership was one of the key issues in the OCLC WorldCat Record Use Policy. Given the fact that many if not most records in OCLC have been created by one institution and re-edited by others, the ownership of any one piece of bibliographic data is impossible to determine. OCLC’s policy recognized this and essentially declared the bibliographic data to be a shared resource with no specific intellectual property rights on individual records.

“While, on behalf of its members, OCLC claims copyright rights in WorldCat as a compilation, it does not claim copyright ownership of individual records.”

Much bibliographic data is copied from the item in hand or from publisher-provided data, and it is also shared both through services like OCLC but also from the entries in individual catalogs that allow downloads. Determining ownership in the intellectual property sense of any one instance of bibliographic data can probably be declared impossible. Ownership in the “possession” sense then becomes the only sensible focus for a rights assessment. Copyright in the database, while declared by OCLC, is not a foregone conclusion. Databases themselves must have some modicum of originality, as is described by the Copyright Office:

“What remains is a thin layer of copyright protection for qualifying databases. In order to qualify, they must exhibit some modicum of creativity in the selection, arrangement, or coordination of the data. The protection is thin in that only the creative elements (selection, arrangement, or coordination of data) are protected by copyright.”

Recently, the Digital Public Library of America (DPLA) addressed copyright in metadata with its draft Policy Statement on Metadata, which was discussed at the board meeting of Feb. 14. The first statement of the draft policy document is to my mind the most controversial:

01. The Vast Majority of Metadata is Not Subject to Copyright Restrictions.

The DPLA believes that the vast majority of metadata is not subject to copyright, because it either expresses only objective facts (which are not original) or constitutes expression so limited by the number of ways the underlying ideas can be expressed that such expression has merged with those ideas.

Given that the DPLA is located at the Berkman Center for Internet and Society at the Harvard Law School, this statement carries a fair amount of weight and has legal credibility. The policy, however, does allow (in Point 02) for cases in which a contributor claims copyright in their metadata, and states that any such contributions must be made under a Creative Commons 0 (CC0) license, which in effect dedicates the data to the public domain.

As members of the board mentioned at their meeting, CC0 is a fall-back should Point 01 fail. If Point 01 is true, the fallback is not needed. That this fall-back was included is in essence an admission that the copyrightability of descriptive metadata of the type created by libraries and by publishers (including publishers whose business is the creation and publication of metadata) has not be clearly established.

This sentiment also informed the creation of the Open Knowledge Foundation’s Principles on Open Bibliographic Data. That group, however, did not declare metadata to be outside of copyright, in part because of the massively international scope of its intentions. Because copyright law in some countries recognizes a stronger protection for databases than U.S. law, the Open Bibliographic Data Principles rely on the DPLA’s fall-back, which is that of asking the owners of bibliographic data to allow open use (CC0) of their data with no restrictions. This is also the license selected by Europeana for metadata and previews of Europeana resources. It is the use of the CC0 license that makes collaborative projects between DPLA and Europeana possible.

There is a problem, however, with declaring metadata, per se, to be outside of copyright: it presupposes that metadata cannot be a creative act or cannot contain a bit of creativity. This is not true, at least for some definitions of “metadata,” and perhaps the DPLA needs to define its use of the term to support the claim of non-copyrightability. Metadata today can be quite rich in content and can contain abstracts, reviews, cover art, and perhaps even a sample of the work for reading or listening. It also is not clear to me whether a classified arrangement or a choice of subject headings are simply facts. One could argue that subject description falls under “expression that has merged with ideas,” although I admit that I am not at all clear on the meaning of this aspect of copyright law.

I can imagine that declaring library catalog data to be void of creativity will grate on the self-esteem of many catalogers. While some library metadata consists of the rote recording of facts about an object, it would be hard to explain the necessity of a 1,500 page set of cataloging rules to produce that outcome. Catalogers that I meet are proud of their ability to interpret those rules for the more complex cases that come into their hands. That different catalogers make different decisions (much to the consternation of downstream users of the catalog data for their own cataloging) is evidence that at least some of the cataloging process is not purely factual.

This does not mean that libraries should not make their cataloging metadata available openly for re-use. The CC0 license is one way to do that, because it is the easiest license for users of the data to work with. However, any license terms need to be carried with data as it is used and re-used, at least in theory. In practice, metadata that is released “into the wild” will be re-used without regard to ownership, because that is the only practical way that the wide world of users can approach data re-use. Rather than declare a license or state that metadata is not subject to copyright, releasing the data in an easy-to-use form will have that same effect, a de facto gift to the world.

While I have some problems with Point 01 of the DPLA statement, the other three points are inarguable:

02. The DPLA’s Partners Share the DPLA’s Commitment.

03. The DPLA Asserts No Rights Over its Database of Metadata and Waives All Claims for Infringement Thereof.

04. Free and Unencumbered Access to Metadata.

The power of this policy is that it not only makes an important point, but it means that if you wish to be part of the larger DPLA community, you must make your metadata freely available. That may mean that some portions of an organization’s metadata cannot enter the DPLA because it cannot be shared. Yet users of the DPLA will have a uniform rights environment where metadata is concerned, and they can therefore incorporate that metadata into their own projects without legal barriers. I hope that this sets a precedent for the library world.

This article was featured in Library Journal's Academic Newswire enewsletter. Subscribe today to have more articles like this delivered to your inbox for free.

About Karen Coyle

Karen Coyle (kcoyle@kcoyle.net) is a librarian with over thirty years of experience with library technology. She now consults in a variety of areas relating to digital libraries. As a consultant she works primarily on metadata development and technology planning. She is currently investigating the possibilities offered by the semantic web and linked data technology.

Share

Comments

  1. Great article.

    A note about some other factors that apply in Europe:

    In Europe databases are subject to the ‘Sui Generis Database Right’ as well as copyright. This ‘Database right’ was introduced with the intention of protecting collections of data that were not well protected under copyright (with the underlying intention of making investment in such collections a better deal for the investor). There is a nice summary of the Database right from a UK perspective at http://www.out-law.com/page-5698 – it turns out that when taken to court, the database right is not as extensive in its protection as some hoped, but it is still possible that it would apply to a resource such as a library catalogue or a union of library catalogue data

    This database right is not something that Creative Commons licences deal with generally – although CC0 is an exception to this as it waives copyright *and related* rights to the extent allowed by law and this covers both copyright in a database (or content) and database rights.

    This has lead to the creation of some licences which are specifically geared to databases – the Open Data Commons licences (http://opendatacommons.org/licenses/). These are:

    Public Domain Dedication and License (PDDL) — “Public Domain for data/databases”
    Attribution License (ODC-By) — “Attribution for data/databases”
    Open Database License (ODC-ODbL) — “Attribution Share-Alike for data/databases”

    The ODC-PDDL is equivalent to CC0, and some libraries in the UK at least have preferred to use ODC-PDDL over CC0, but as far as I can tell it makes no difference – they both achieve the same thing.

    The University of Cambridge library has released data under both the PDDL and ODC-By – the latter used for records that originate from OCLC WorldCat – more details at http://data.lib.cam.ac.uk/datasets.php