November 21, 2017

Cross-Database Search: One-Stop Shopping

You know you want it. Or you know someone who does. One search box and a button to search a variety of sources, with results collated for easy review. Go ahead, give in—after all, isn’t it true that only librarians like to search? Everyone else likes to find.

Why should we make our users hunt down the best resource for a given information need and learn how to use its particular options for searching? Why not provide them with a simple way to get started? In the past, we might argue that such a wide-ranging search service was too difficult or impossible to build. It remains difficult, certainly, but such a service can no longer be called impossible, as these examples show.

Cross-database search services

Some early adopters of this type of technology use commercial applications, while others have built their systems from scratch. Unfortunately, because most search commercial databases, the curious are often locked out. However, in a message to the Web4Lib electronic discussion (“Cross-Database Search Tools Summary”), I listed some staff contacts for some of these services. Also, some are publicly searchable.

Searchlight. The California Digital Library (CDL) has offered its Searchlight tool since January 2000. Based on the Database Advisor service of the University of California (UC)–San Diego, Searchlight offers one-stop searching of abstracting and indexing databases, library catalogs, and web sites, as well as other types of resources. After selecting which “flavor” of Searchlight they wish to search (either “Sciences and Engineering” or “Social Sciences and Humanities”), users type in the search. An intermediary screen describes what is happening and also counts down the minute that it will take by default before results are returned (this can be adjusted by the user).

The user’s search words are sent to a wide variety of databases (well over 100), with the results organized by resource type (books, journal indexes, electronic journals, e-texts and documents, reference resources, and web directories). The number of hits is noted beside each resource, and clicking on that number will automatically take users to the results in that particular database when possible. Alternatively, they can click on a link to go to the resource and search it directly. Anyone can try it out, but those not part of the UC community won’t see results for licensed databases.

The CDL is planning to gather user feedback this fall on how it works for its own needs and then consider where to take the service. Possible future directions may include subject-focused cross-database search tools (for example, one-stop searching of all the best resources in a particular discipline), or a tool optimized for the needs of undergraduates to find “a few good things” on a topic.

NLM Gateway. This cross-database search operates in a variety of databases from the National Library of Medicine (NLM). A web page describing the service states, “One target audience for the Gateway is the Internet user who is new to NLM’s online resources and does not know what information is available there or how best to search for it.”

Flashpoint. The Research Library of the Los Alamos National Laboratory wrote an in-house Perl program to search a set of databases simultaneously. In the article “Flashpoint @ LANL.gov: A Simple Smart Search Interface,” the authors describe how their system underwent several design iterations in response to user feedback, testing, and analysis of failed searches. It presently searches nine bibliographic databases and one full-text database.

King County Library Search. The King County Library System in Washington State uses the commercial product WebFeat to offer one-stop searching of its library catalog, web site, and ProQuest databases. The system was released in November 2000 for user testing, so not all the databases planned to be included are yet covered. Users are limited to library cardholders.

Multi-SEARCH. The University of Arizona Library uses OCLC’s SiteSearch software to search multiple databases. SiteSearch uses the Z39.50 protocol to search databases that are compliant with that standard—in this case, three state catalogs and the OCLC FirstSearch databases.

Software for cross-database searching

Several sites use the WebFeat product to search multiple databases. Other products that offer similar capabilities include Fretwell-Downing’s Zportal, MetaLib from Ex Libris, Copernic Aggregator, Endeavor ENCompass, and OCLC’s SiteSearch. Several other site developers have written their own software but then must maintain it as resources (search targets) change.

More libraries than those noted above are developing their own cross-database search services, including OhioLink and the National University of Mexico (UNAM). It’s clear that there is a widely perceived need for one-stop searching of bibliographic databases, though it is also too early to have much data yet on what features are essential.

One key challenge for software of this type is how to package up the search and process the results. Unless the database supports the Z39.50 search protocol, it can be daunting to deal with the particular needs of a proprietary database. Even if sending the search is straightforward, the results may emerge via a somewhat primitive technique called “screen scraping.” Screen scraping is the process of collecting needed information by clues such as the location of the information on the screen. The problem is that the slightest change in screen displays can break your process. Some applications are limited to Z39.50 databases, while others (such as Searchlight) encompass other databases as well. In general, the more databases a search service covers the more challenges it will face.

Some early experience indicates that simply broadcasting the search and getting back results from separate databases is a start but not what most users really want or expect. Most users likely want such features as deduping (dropping duplicate records from different databases), merging and ranking (instead of keeping the results separated by the source), and methods for trimming down or sorting the results set.

Unfortunately, most of these features are likely to be somewhat difficult to achieve and probably extremely difficult to achieve with much accuracy. But increasingly, librarians serving user groups from the general public to academic researchers are realizing that it is a goal well worth pursuing.


Author Information
Roy Tennant (roy.tennant@ucop.edu) is Manager, eScholarship Web & Services Design, California Digital Library. He is founder and manager of the electronic discussion lists Web4Lib and Current Cites
Share