A new public library ranking, coming this fall, will enable constructive comparison and advocacy efforts for all
[Why These Measures Matter: This article explains the conceptualization and design of the LJ Index of Public Library Service 2009, including detail on the individual output measures. –Ed.]
As published critics of Hennen’s American Public Library Ratings (HAPLR), we propose a new ranking system that focuses more transparently on ranking libraries based on their performance. These annual rankings are intended to contribute to self-evaluation and peer comparison, prompt questions about the statistics and how to improve them, and provide an advocacy tool for the individual library.
Toward these ends, the proposed new system scrutinizes only such statistics that describe library service outputs, such as visits, circulation, public Internet computer usage, and program attendance. It excludes resource inputs, such as staffing levels, collection size, and revenues and expenditures. Inputs, we believe, do not measure library performance. That is why we emphasize outputs, which indicate some of the services people receive from libraries.
Caution is warranted with every ranking, however, even this one, so take the time to understand what is being ranked and how limits of the data affect the meaning of the ratings. See the sidebar on “Gaining Needed Data” (below) to get a sense of the kinds of numbers we would like to have to create a more comprehensive ranking and what it is likely to take to get them.The time is ripe for an improved ranking system, one that makes the most of the data available while advocating for better data over time. That is the aim of the LJ Index, sponsored by Bibliostat, which will debut later this year following the release of the 2006 federal Public Library Statistics Cooperative (PLSC) data.
It is true that both of the authors here have expressed philosophical and methodological reservations about the meaningfulness of such ratings, including the limitations mentioned above. However, we believe that the reasons to offer an alternative rating system outweigh the reasons not to.
About the data
The only national data set on which to base public library rankings is the one established in the late 1980s, and published annually since the early 1990s, by the National Center for Education Statistics (NCES). In October 2007, the project most widely known as the Federal-State Cooperative System (FSCS) for Public Library Data was moved to the Institute of Museum and Library Services (IMLS) and is now known as PLSC.
Annual data about the nation’s public libraries provide many indicators of service quantity: library visits, circulation transactions, use of public Internet computers, reference transactions, and program attendance. Per capita versions of these statistics provide rudimentary indicators of how much “repeat” business libraries do. How many times a year does the average community resident visit the library? Borrow a book, audiobook, or DVD? Ask a question? Attend a library event? In the absence of direct national data on service quality—customer satisfaction, collection adequacy for specific purposes, effectiveness in meeting community needs, and perceived value to customers—these are the best and most comprehensive data available for use in a ranking system. Nevertheless, it behooves the sensible user of library rankings to remember that no national data directly measure such important dimensions of library performance as collection quality, library accessibility and convenience, and customer satisfaction. For this very practical reason—not to mention many methodological ones—it is not possible to design a truly comprehensive ranking system. As a consequence, any ranking system must be regarded as only one tool in the evaluation “toolbox” of local library decision-makers (directors, managers, trustees) and public officials. Most libraries gather a wealth of other local data that should also be considered when evaluating a library’s performance. As the LJ Index expands, we hope to add qualitative elements to enrich it.
The problem with ranking inputs
There are two major reasons we propose to issue rankings based on outputs. First and foremost, input data present many comparability issues. Depending on a library’s size, it may or may not have a payroll that includes everyone who works in the library. In small towns and mid-sized cities, the library staff may be supported substantially by other local government employees. Similarly, a complete picture of a library’s fiscal status may or may not be provided by the revenues and expenditures it can report.
For instance, many public libraries owe at least some of the databases to which they provide access to consortial expenditures by state and/or regional library agencies. Expenses covered under one library’s budget (e.g., utilities, telecommunications) may be paid for by the city or county to which a supposed peer library belongs. And data on collection size alone, in the absence of data on collection age, could create a misleading impression about the resources available at any particular library.
The second, and perhaps more important, reason for focusing on service outputs instead of resource inputs is the potential political catch-22 presented by high rankings on the latter. Few potential rankings users would welcome the news that their libraries topped rankings on staffing, collection size, or—least of all—funding. While such rankings should be something to brag about in an ideal world, in these tight economic times, they could invite cuts on the rationale that the library would still have “nothing to complain about,” or that maintaining outputs despite input cuts (a doubtful eventuality) would represent an improvement in the library’s “efficiency.” For these reasons, we chose to leave input measures out of the LJ Index.
A composite measure
A single composite measure such as the LJ Index can help us judge how libraries fare on multiple service statistics at once. While the available data provide a somewhat limited view of the true breadth and complexity of public library services, employing those data in a carefully designed ranking system avoids the problem of “throwing the baby out with the bath water.” The limitation of national data on public libraries is a reason to interpret rankings based on such data cautiously, not a reason to waste that data by not using it to its fullest. Perhaps increased use of data in this way will provide an additional incentive for expanding national data collection efforts to incorporate some of the missing variables.
Comparison with peers
When individual libraries compare their own data to others, they have more flexibility in selecting the most appropriate peer libraries. The data might include attributes such as population of legal service area, service responses, community demographics, and other relevant factors. In developing national library rankings, it is only practical to use one or two attributes for identifying peer libraries. Otherwise, the rankings become too complicated. The most basic attribute to use is population of legal service area. The NCES (in the future, IMLS) data and past Hennen ratings have employed ten ranges of population of legal service area, beginning with 500,000 and over and ending with under one thousand. Unfortunately, this familiar approach has certain problems. For example, the top population range includes fewer than 100 libraries, while the smallest includes more than 4400 libraries. If nothing else, the wildly varying numbers of libraries in the familiar ranges make the “competition” in the ranking “contest” far tougher for some libraries than others. Alternative population ranges and additional variables may provide a stronger foundation for peer groups. These issues are still being discussed.
Fewer, more useful variables
A dramatic distinction between the LJ Index and HAPLR is the number of variables involved and how these are used. The Hennen system begins with eight variables—five input indicators and three output indicators. HAPLR then recombines these in different ways to produce 15 statistical ratios. The LJ Index will employ only four variables—all per capita ratios—and will not recombine these in any way. This relatively small set of index variables is justified on both statistical and philosophical grounds.
Statistically, four per capita ratios—visits, circulation, use of public Internet computers, and program attendance—correlate with each other strongly, positively, and significantly. (Reference transactions did not make the cut—which is a topic for an entirely separate article.) Correlating these statistical ratios with one another suggests that, as a group, the data coalesce to reflect a more general dimension of library performance. We call this dimension public library services.
In Table 1, each cell presents three figures: a correlation coefficient, which indicates the degree of association between the two variables; the statistical significance of that association; and the number of public libraries for which data were available on both correlated variables. Understand that 1.000 is a perfect correlation—that is, the correlation of a variable with itself. Also, the p (significance) statistic, .000, indicates that there is no measurable chance that random samples of libraries nationwide would yield meaningfully different results. This level of significance is not surprising for two reasons: 1) this analysis employs data for the full universe of public libraries, and 2) the sheer number of cases, 9200-plus, ensures such a high level of statistical significance.
The advisability of using this set of four output variables in a composite measure of library service output is further supported by the results of another statistical technique, factor analysis—a procedure that looks for commonalities among these variables from another statistical angle. In lay terms, our factor analysis found further evidence that the four variables hang together statistically, suggesting that they do tap the library services dimension of public library operations. Together, these variables explain over three-quarters of their shared variation. The factor loading of .921 for visits per capita indicates that this statistic is most strongly related to that more general dimension. Of the four statistics, it is not surprising that program attendance has the weakest factor, at .819. For most libraries, program attendees are a fairly small subset of variables. For that reason, it is notable that participation in library programs registers such a strong relationship to the rest of these variables. See Table 2.
Additionally, the Hennen system has two distinctive, and somewhat problematic, characteristics: 1) it weights some variables more highly than others, and 2) it employs the literal rank of a library within its peer group on each index variable in calculating its composite ranking. The HAPLR weighting scheme is an issue largely because it is entirely arbitrary, having been based on an informal poll via an email list. The weightings also add to redundancy in HAPLR scoring. Regarding the second issue, HAPLR translates each library’s 15 statistical variables into 15 rankings and then performs further calculations. Unfortunately, these calculations violate conventional statistical rules and result in distorted final scores. In creating the LJ Index, we plan to avoid these difficulties by sticking to solid statistical principles. The result will be a sounder statistical index with greater analytical utility.
How to use the ranking
The greatest difference between the HAPLR ratings and the LJ Index is that the LJ Index will rank public libraries specifically on the quantity of public library service they provide on a per capita basis. Knowing that will hopefully put LJ Index users in a better position to understand what their rankings mean—and what they don’t mean. Given standardized rankings on a statistically justified set of related output statistics, LJ Index users should be inspired to ask serious questions about why their libraries rank high, low, or somewhere in-between. Questions might include:
- What does this ranking say about our library that is actionable?
- How might we use this ranking effectively and responsibly without making exaggerated claims?
- How precise are the data we contribute to this ranking system, and how might we make them more complete and accurate?
- What additional data might we be willing to collect and report, if it could enrich this ranking system?
- What context is provided to our ranking by the peer group in which we are placed?
- What unique characteristics of our library and/or the way we collect and report statistics about its services might explain this ranking?
- What characteristics of our peer libraries and their data collection and reporting practices (to the extent that the latter are knowable) might explain this ranking?
- What else do we need to know about our peer libraries that might give us a fuller picture of how we compare to them in other relevant and meaningful ways?
- What questions do we have about how this ranking system works and how those procedures impact our library’s ranking?
- What other data do we collect locally that should also be considered when evaluating our library?
It is no mystery why people, library folks included, like to see the results of rankings. Every year, U.S. News & World Report publishes rankings of schools, colleges, hospitals, and other public institutions. Each year, the Places Rated Almanac ranks communities nationwide on a variety of characteristics believed to represent the desirability of living one place versus another. What usually happens with these rankings? Colleges and universities take their own rankings very seriously, using them to solicit grants and contributions—either by touting the institution’s strengths or appealing for help in addressing weaknesses, especially relative to other institutions. If nothing else, they provide claims that can be very useful for marketing and advocacy purposes. Communities and institutions that make the top ten in such rankings usually announce the fact with as much fanfare as possible. Frequently, an article appears in the local newspaper. Perhaps there is a celebration. Only occasionally do others ask serious questions about the rankings that might generate truly illuminating and actionable information.
We hope that our proposed system of public library rankings on service output will provide bragging rights, cause for celebration, and inspiration to ask thoughtful questions about what a high (or low) ranking might or might not mean. We hope it will offer a better, alternative strategy for ranking public libraries, as well as an appeal for deeper reflection on what such rankings mean and a commonsense recognition of their obvious limitations as well as their undoubted usefulness.
Public library rankings can be one of many valuable tools in evaluating and advocating for local libraries, as long as all of us—those of us who create them and those of us who use them—understand what we are doing and why and strive to improve the rankings as much as possible over time.
|Variable||Visits per capita||Circulation per capita||Program attendance per capita||Users of electronic services per capita|
|Visits per capita||1.000|
|Circulation per capita||.837**||1.000|
|Program attendance per capita||.664**||.637**||1.000|
|Users of electronic services per capita||.684**||.662**||.572**||1.000|
|** Correlation is significant at the .01 level (2-tailed).|
|Visits per capita||.921|
|Circulation per capita||.906|
|Program attendance per capita||.819|
|Users of electronic services per capita||.835|
|Initial eigenvalue= 3.036, percent of variance = 75.9%|
Keith Curry Lance is a consultant with the RSL Research Group in suburban Denver. He was the longtime director of the Library Research Service of the Colorado State Library and the University of Denver and a founding member of the Steering Committee of the Federal-State Cooperative System (FSCS) for Public Library Data. Ray Lyons is an independent consultant and statistical programmer in Cleveland. He recently received his MLIS from the Kent State University School of Library and Information Science, OH. He also has an MPA (public administration) with a specialty in quantitative methods
The LJ Index, coming this fall, is made possible through a partnership between LJ and Baker & Taylor’s Bibliostat Connect, a web-based tool that gives you access to comparative statistics about public libraries nationwide. The Bibliostat sponsorship will enable the LJ Index data, once available and analyzed, to be loaded and available for dynamic searching via Bibliostat Connect.
To create a more comprehensive ranking of public libraries, we need additional data. Output data elements that would make this ranking system more complete include more precise and diverse measures of digital service output, data on in-library use of materials, and a breakdown of circulation by format. The need for such improvements in the available data about public libraries is not news. Indeed, most of these issues have been under discussion at local, state, and national levels for many years. Proposing solutions to these data-collection challenges, however, is far easier than delivering on them. Our hope is that the willingness of LJ to participate in the development, testing, and analysis of new measures through the LJ Index project will help to generate progress on at least some of these fronts. Other organizations—such as state library agencies and associations, regional library cooperatives, and individual libraries—are invited to volunteer to join in testing proposed new data elements. Only after serious efforts have been made to implement new data collection strategies will new, more challenging data elements likely be added to the federal database of public library statistics.
NATIONAL RATINGS BASICS
By Ray Lyons
What should national ratings measure?
There is no science of devising national ratings of colleges, business schools, hospitals, libraries, and other institutions. As long as rankings are developed using proper statistical methods, which attributes get measured is up to the designers of the ratings. Of course, the selection should reflect recognized theory and practice of professions relevant to the ratings. Also, ratings designers should provide a rationale for choosing particular attributes over others. It is vital that ratings use measures that correspond closely with attributes that the rankings claim to rate.
Like beauty, quality and excellence are often in the eye of the beholder. So, devising ratings can be subjective and arbitrary. Consider, for instance, how product ratings in publications like Consumers Reports are done. Judges devise lists of attributes they believe to be important for a given product, say digital cameras. Keeping the typical consumer in mind, judges might choose features like ease of use, size, price, video capability, appearance, and so forth. But, professional photographers may find these ratings unsatisfactory since professionals evaluate cameras on different attributes like lens quality, speed, durability, brand reputation, or other advanced features.
Each set of ratings, then, takes a specific perspective and cannot satisfy all possible viewpoints on performance excellence or quality. For this reason, ratings will never be fully conclusive.
What do public library national ratings measure?
Since standard library statistics report only library resources (inputs) and service utilization (outputs), national ratings can reflect only certain aspects of library performance. Standard statistics are not informative enough to serve as measures of more sophisticated concepts such as greatness, goodness, excellence, or quality. It is quite a complicated matter to assess such multifaceted concepts as these. Measuring excellence, for example, requires compiling a complete inventory of attributes that excellent libraries must possess, developing and validating measures to assess each attribute, surveying libraries directly, and analyzing results. This type of intensive research goes far beyond an examination of national statistical data.
Even so, performing well on key statistical indicators can be viewed as a prerequisite for library greatness, excellence, and quality. In this respect library ratings are analogous to qualifying rounds in a competition or contestant screenings that television game shows use. Libraries scoring highly in national ratings have, in a sense, passed a preliminary round in a theoretical longer term competition. National ratings cannot proclaim that these libraries are unequivocally the greatest or most excellent. But highly rated libraries can be recognized as being among the strongest contenders if a competition based on verified measures of greatness, excellence, and quality were feasible.
How should library statistics be evaluated?
There are no tried-and-true standards for evaluating library statistics. Historically, the library profession has struggled with questions like: How many volumes should a library have in order to be excellent? How many hours must a library be open to provide maximally accessible services, and do so efficiently? What levels of circulation indicates excellent, good, or poor performances? Answers to these types of questions depend on circumstances unique to each local public library.
Librarians understand that the variety and uniqueness of local services and programming are not captured in standard statistical categories. Each library book has unique significance and value that cannot be represented in counts of library volumes. The same is true for library services, which can vary widely in complexity, sophistication, value, effort, contribution to community quality of life, and so on. The diversity of everyday library products and services cannot be expressed in standardized statistics. Nor will aggregate comparisons like national ratings reveal this diversity. Instead, standard statistical categories summarize (we could even say homogenize!)operational information about libraries. In the process, much of the richness and detail is lost.
Nevertheless, this standardization is essential for monitoring key aspects of library performance. On a daily basis, these statistics help libraries to understand their operations, services, resource utilization, and user community. The statistics provide information necessary for management, planning, marketing, and advocacy on local, regional, and national bases.
Using comparative library statistics to formulate national rankings requires that we accept certain reasonable compromises. For one thing, we need a way around the lack of objective criteria for evaluating library statistics. The only way that national ratings like the LJ Index can proceed with comparisons is by applying the simple rule that, in general, higher statistical indicators indicate better performance.
What do library statistics mean?
Statistical indicators like visit counts, staffing levels, materials expenditures, and so forth can be interpreted on a continuum from concrete to abstract. For instance, measures of library expenditures quite directly indicate money spent. More abstractly, these measures can be interpreted as the extent to which the library is supported by its community. Or they may be viewed as the relative power or strength of the library to perform its mission. Each of these interpretations has merits and problems. For instance, in the body of this article we describe why we avoid using input measures in the LJ Index.
Other library statistics also have alternative interpretations. Circulation can be seen simply as productivity or work units, or as demand for services and a reflection of the library’s ability to attract customers. Or, each circulation can be viewed as a vote of confidence indicating a match between library materials and patron wants. Again, these interpretations will each have pros and cons.
How valid and reliable should national ratings be?
If the ratings are portrayed as scientifically precise, then the answer is highly reliable and valid. If they are portrayed as estimations, even as a sort of contest, the requirements can be more lenient. In either case, clear explanations about accuracy, validity, and reliability of rating results need to be disclosed.
Despite careful review of Public Library Statistics Cooperative (PLSC) data by the Institute for Museum and Library Science, some of the data may be inaccurate due to reporting errors, sampling effects, definitional difficulties, or other reasons. So, ratings based on the data will not be absolutely precise. This potential imprecision must be kept in mind when interpreting ratings results.
How should library ratings be used?
In the body of this article appears a set of questions that we encourage librarians to review concerning use of the LJ Index. We will have more to say on this topic when the ratings are released later this year. Presently, it is important for us all to recognize that quantitative data can have a mystique that make them seem factual and credible. The same issues that librarians face in assessing information sources in generalpertain to evaluating statistical data: authenticity, accuracy, context, reliability, completeness, and so forth. With the LJ Index it is our objective to be explicit about what the ratings can and cannot indicate about library performance. We invite libraries to join in a dialog with us about using ratings productively.