October 31, 2014

Research at Risk

By Thomas Mann

When we undermine cataloging, we undermine scholarship

Studies abound showing that researchers don’t use library subject headings. They guess at keywords. They don’t grasp Boolean or word proximity search techniques. Many are apparently contented with whatever results they find quickly. They just don’t know what they’re missing. Fast information-finding trumps systematic scholarship.

Many library managers seem to think the library profession should simply capitulate and accept this situation. In their view, we should abandon Library of Congress Subject Headings (LCSH) in our OPACs and scan in the table of contents of each book – or wait for Google Print to digitize "everything." These managers are willing to go with the expedience of simply throwing more keywords into the hopper. They think this eliminates the need for categorization, linkages, and browse displays that show options beyond whatever keywords happen to be typed into a blank search box.

I wish those library managers had some of my experiences, both as a researcher and as a frequent bibliographic instruction teacher. I completed a Ph.D. program in English before I became a librarian. I thought I was a pretty good researcher. Only when the system was explained in a cataloging class, years later, did I realize how much I’d been missing for so long. There was a better way to do research than just looking for whatever words one could think of in the catalog, hoping to find at least one good book, and then browsing around in its general area in the stacks. Library users, even sophisticated scholars, often think that is all there is to library "research." Nobody ever teaches them how many of their real problems can be solved by good cataloging.

Now, every time I teach a research orientation class, students respond to LCSH just as I did. People are hungry to know how to do research more efficiently. Until we explain the differences between LCSH and uncontrolled terms, however, students cannot "see" anything beyond their "default" keyword horizon. They literally don’t know what they’re missing, or even how to ask for it. They’ll never see beyond that initial horizon if our instruction consists only of advice on how to "critically evaluate" web sites they find through Google.

Finding "Cockney"

One researcher, for example, was interested in linguistic studies of the Cockney dialect. He simply typed "Cockney" as a keyword into our catalog. He found some titles – mainly works of fiction, many of them at the juvenile level – but missed most of the linguistic studies. The latter have such titles as The Muvver Tongue, Bernard Shaw’s Phonetics, Vulgar Tongue, Die Londoner Vulgarsprache in Thackeray’s Yellowplush Papers, Sources of London English, Ueber den Ursprung der Neuenglischen Schriftsprache, Fraffly Well Spoken, Zur Sprache Londons vor Chaucer, and Ideolects in Dickens. The proper LC subject heading English language – Dialects – England – London rounds up in one categorical grouping all such works scattered by variant keywords – and variant languages. Equally important, the browse display of the same heading in an online catalog places this precoordinated string within an intelligible context of scores of related options, of which the following is but a greatly truncated sample:

English language Dialects Africa, West
English language Dialects Alabama
English language Dialects Australia
[ Narrower Term cross reference to Australianisms]
English language Dialects Bahamas
English language Dialects Canada
English language Dialects Cartography
[ See cross reference to English language Dialects Maps]
English language Dialects England Bibliography
English language Dialects England Cheshire
English language Dialects England Essex
English language Dialects England London
English language Dialects England London Dictionaries
English language Dialects England London Glossaries, vocabularies, etc.
English language Dialects England Northern Maps
English language Dialects England Phonology
English language Dialects England Wessex
English language Dialects Hawaii Bibliography
English language Dialects Illinois Chicago Dictionaries
English language Dialects Thailand
English language Dialects United States

Such OPAC browse displays of connections, contexts, and cross references to variant terms cannot be even approximated by Google’s "relevance ranking" software. The display of such browse lists in online catalogs is one of the most substantive advances in library science in the last generation. The switch from card to online catalogs has entailed not just additional keyword, wildcard-truncation, and Boolean search capabilities but the creation of these display menus of previously hidden overview information. Library cataloging now maps out, more efficiently than ever before, the relationships that are essential to in-depth scholarship.

Google’s relevance ranking of keywords is not the same as conceptual categorization of sources accomplished by LCSH. The difference is finding "something quickly" – isolated, unstructured, and disconnected information – versus gaining a systematic overview of the conceptual field.

A recent survey of historians attests that scholars really want "comprehensive searches" – the very opposite of merely "some" information provided "quickly." Having to specify all relevant keywords in advance is not the same as having the capability to recognize, in a systematic way, whole groups and ranges of relevant sources whose terminology cannot be anticipated. LCSH enables scholars to find the crucial sources they didn’t know how to request. The larger the collection of books, the more – not less – scholars require such a system, even if they don’t have the technical vocabulary to articulate the need for recognition mechanisms in addition to prior specification search techniques. Because they do not have that vocabulary, surveying them in user studies tends not to elicit this crucial point.

No random pickings

The first problem LCSH solves is that of synonyms and variant language terms. LCSH provides the mechanism that enables researchers to recognize what they cannot specify. A second problem, equally important, that cataloging and classification processes solve is that of efficiently segregating relevant uses of desired terms into groups of manageable size, separated from irrelevant uses of the same words in undesired contexts.

Google and other Internet search services fail miserably at keeping relevant uses of a term separate from irrelevant ones. Yet some library theorists are asserting that a full-text database of millions of books would make cataloging and classification less necessary, even unnecessary. With blank Google search boxes backed by mere relevance ranking of specified keywords, researchers can at best retrieve only random pickings jumbled within undefined horizons.

Information vs. scholarship

Suppose, again for example, a student wishes to research the foreign policy of Millard Fillmore. If she types "President Fillmore Foreign Policy" in Google, one of the top hits (among 72,500) is the online Encarta encyclopedia. That source has a single paragraph on the topic. When I did the search recently, this hit was followed by another providing a chronological listing of "Social issues" from 1850 to 1860, with a prominent link to a site for "Thousands of term papers…$8.99 for immediate access." Next came a page by a fifth-grader named "Caroline," highlighted by a crayon drawing of Fillmore. Then there was a copy of a brief speech by Fillmore on foreign policy from the Encyclopaedia Britannica web site. That last was followed by a "teaser" link to the first 75 words of that encyclopedia’s full 718-word article on Fillmore, with a come-on to "Get the full article with a FREE trial." These were among the top ten "most relevant" sites, as determined by Google’s software. (Note that in Google the order ranking of results will change from one minute to the next.)

What would a searcher find in a library catalog, in contrast? Simply entering Fillmore’s name as a subject would produce a structured browse display, including:

Fillmore, Millard, 1800 – 1874
Fillmore, Millard, 1800 – 1874 Autographs
Fillmore, Millard, 1800 – 1874 Bibliography
Fillmore, Millard, 1800 – 1874 Birthplace
Fillmore, Millard, 1800 – 1874 Cartoons, satire, etc.
Fillmore, Millard, 1800 – 1874 Correspondence
Fillmore, Millard, 1800 – 1874 Family

In my work as a reference librarian, I find that almost no researchers actively look for published bibliographies. Such things are beyond their "default" horizons. If we teach them to exploit browse displays, however, then the catalog’s menu brings this important option (Bibliography) to their attention even when it is not sought. In this case, the LC subject heading leads to John E. Crawford’s Millard Fillmore: A Bibliography (Greenwood, 2002), a 328-page compilation. Included within it is a 39-page section specifically on "Foreign Affairs." The breakdown of this overview of 335 scholarly sources maps out a full range of studies on the topic, with subdivisions on General works, Africa and the Middle East, Australia, Austria, Britain, Canada and British North America, Caribbean, Central America, China, Cuba and Spain, Europe (General works), France, Germany, Hawaii and the Pacific, India, Italy and the Papal States, Japan, Mexico, Netherlands, Russia, South America, and Switzerland. This bibliography is a scholarly, systematic, and substantive overview of relevant sources on foreign policy under President Fillmore – and its citations are annotated! It is exactly the kind of resource historians need for the comprehensive searches that are so crucial to scholarship.

It is noteworthy that references to Crawford’s bibliography are indeed included within the 72,500 Google hits, but where it shows up, or how far down the list, is anybody’s guess. Here is the crucial point: Google’s software cannot bring it to the immediate attention of anyone who is not specifically looking for it. LC cataloging can, and does. Moreover, it is much simpler to teach the use of browse displays than it is to teach any "information literacy" technique that could magically locate Crawford’s bibliography among the mountains of chaff.

Keyword search algorithms, no matter how sophisticated their "relevance ranking" capabilities, cannot turn exactly specified words into conceptual categories. They cannot provide the linkages and webs of relationships to other terms (in a variety of languages, too), nor map out in any systematic manner the range of unanticipated aspects of a subject. Keyword searches cannot segregate the desired terms in relevant contexts distinct from the same terms used in irrelevant contexts.

In contrast, LC cataloging and classification – done by professional librarians rather than computer programs – accomplish exactly these functions that are so critical to scholarship. The search mechanisms created by librarians enable systematic searching, not merely desultory information seeking.

Beware of bean counters

Academics must monitor what librarians are planning. All of us, librarians on the inside and academics on the outside, must beware of bean counters. They see only the cost of individual books while blindly ignoring the operation of the overall access systems in which the books are situated. Scholarship requires substantial onsite book collections accessible through high-quality cataloging and classification. The "same" books are much less discoverable by recognition mechanisms when they are stored offsite or replaced by digital copies searchable by Google interfaces.

High-quality scholarship requires the capacity to recognize relevant sources in ways that cannot be matched or replaced by search methods requiring prior specification of all relevant keywords. It also requires the capacity to segregate the right keywords into conceptual groupings apart from their appearances in the wrong contexts. Internet search software does not solve either problem; in fact, it greatly exacerbates both. It undermines the very possibility of accomplishing substantive scholarship.

Not for "digital age" managers

Proposed drastic changes to the basic "DNA" structure of academic libraries should not be left to committees of library managers alone. Much more input is needed from the scholars who use those libraries in ways that are not discussed at "digital age" library conferences. Those scholars must be given the vocabulary to express their gut feelings about the importance of recognition searching in ways that information professionals will not automatically dismiss as "outdated." Professional organizations in all subject disciplines urgently need to speak up formally on these matters. Ground-level librarians should copy and distribute this article to faculty library committees. Scholarship is too important to be entrusted exclusively to library administrators, especially those who seek to cut support for their own cataloging and classification systems in exchange for keyword search mechanisms inadequate to the needs of serious researchers.


Author Information
Thomas Mann is a member of AFSCME 2910 at the Library of Congress (LC) and author of Library Research Models and the Oxford Guide to Library Research (Oxford Univ., 1998; new edition due October 2005). The views expressed in this paper should not be construed as official views of LC

Share