For years I’ve been struggling to resolve a couple of mutually incompatible truths:
- Libraries are all about preserving access to valuable information.
- Libraries (even in the aggregate) have no hope of preserving access to anything more than a vanishingly tiny sliver of the valuable information that is created in the world.
Now, I recognize that the second statement above will seem like an exaggeration. “Sure,” I can hear some readers thinking, “no single library can keep and take care of more than a tiny sliver of the world’s useful information, but there are hundreds of thousands of libraries in the world. Surely if we all just coordinated our efforts….”
For those who are thinking this way, I recommend a glance at a 2010 report titled “How Much Information?,” which was published by the University of California, San Diego, as the result of a collaborative project undertaken by a group of corporate and academic partners. Here’s a pull quote from the paper’s executive summary:
In 2008, the world’s servers processed 9.57 zettabytes of information, almost 10 to the 22nd power, or ten million million gigabytes. This was 12 gigabytes of information daily for the average worker, or about three terabytes of information per worker per year.
A couple of things to bear in mind while looking at those numbers: first, 2008 was five years ago, and the pace of information production has not slowed down; second, these calculations include only the subset of the world’s information that passes through enterprise servers—i.e., computer systems run by companies, offices, universities, etc.
Let’s say, for the sake of argument, that only one byte per gigabyte of this information is actually valuable. That would mean (assuming I’ve sorted my decimal places correctly and if I haven’t I hope a commenter will correct me) that for libraries to preserve access to the world’s output of useful information, we would have to capture, organize, and preserve something more than 100 million gigabytes of information each year. Even if preservation on that scale were feasible as a distributed worldwide project, the prospect of managing and coordinating that project is overwhelming.
So here we are, back at the uncomfortable truth: we can’t handle more than a tiny, tiny sliver of the world’s information output. What does this imply?
One thing I believe it implies is that we might want to raise an eyebrow at the Library of Congress’s (LC) decision, a few years ago, to become the permanent archive of Twitter.
In a white paper released in January 2013, LC laid out its rationale for taking this step. Sort of. In the paper, LC explained “Why the Twitter Collection Is Important to the Nation’s Library” thus:
As society turns to social media as a primary method of communication and creative expression, social media is supplementing and in some cases supplanting letters, journals, serial publications and other sources routinely collected by research libraries.
Archiving and preserving outlets such as Twitter will enable future researchers’ access to a fuller picture of today’s cultural norms, dialogue, trends and events to inform scholarship, the legislative process, new works of authorship, education and other purposes.
Well, no argument there. The problem is that this statement answers an easy question (“Why is a Twitter archive worthwhile?”) but avoids the harder question, which is, “Why is a Twitter archive more worthwhile than the other projects that will not be undertaken because limited resources are being directed to the Twitter archive?”
When I ask myself why we should archive Twitter, I come up with several possible answers, none of which is very satisfying to me.
One possible answer is: because we can. (But think of what we can’t archive because we’re archiving Twitter.)
Here’s another: because we really don’t know what will and won’t be useful in the future. Why exclude such a rich trove of content from our national archive when much of what looks useless today could turn out to be tremendously useful tomorrow? (But we could also say that about all the non-Twitter outputs that we’re not archiving.)
And another: because, taken in the aggregate, the Twitter stream reveals interesting and useful information about how and when certain topics become important in public discourse—despite the apparent banality of its individual components. (But is Twitter really the best source of this information, and therefore the source most worthy of LC’s limited resources?)
And another: because what’s happening now with Twitter will have serious ramifications for what happens with the future of public discourse. (This may well be true, and if so it suggests that Twitter is an important topic of study. It does not, however, suggest that every tweet needs to be archived—or, more to the point, that it needs more urgently to be archived than the other things we could be archiving with those resources.)
Please understand: I’m one of the many people who thinks it’s remarkably cool that LC is creating a Twitter archive. But I would also think it was cool if the Library of Congress created a comprehensive clawhammer banjo collection or archived every commercially released reggae recording. The bottom line is that coolness isn’t the same thing as importance, and importance is a less relevant property than relative importance. In other words, the urgent question isn’t whether this project has value (of course it does) but rather “where on the priority list of valuable projects should this one fall?”
Of course, without knowing how much LC is spending in terms of labor, overhead, and capital equipment on this project and without knowing what other opportunities are waiting in the wings for the privilege currently being accorded the Twitter archive, it’s very hard to form a reasonable opinion about whether LC is using its limited resources wisely in this case. On August 7, I sent an email to LC’s director of communications, asking these two questions:
- Can you tell me how many full-time equivalent positions (or maybe how many person-hours) per week are dedicated to managing the Twitter archive?
- Can you give me an idea of the server capacity that is currently dedicated to managing this archive? What percentage (an estimate is fine) of LC’s total server capacity is absorbed by this content?
So far I’ve gotten no response. If and when I do, I’ll pass the answers along via the comments section. In the meantime, I’ll remain somewhat skeptical. Archiving Twitter is clearly a sexy and headline-grabbing move, as well as one that will probably offer real value to future scholars and researchers. I just wish I were more confident that it will offer more value than some of the other things LC could have done with the same resources.
|Data-Driven Academic Libraries is a free three-part webcast series, developed in partnership with Electronic Resources and Libraries (ER&L), that will touch on just some of the many areas where libraries are gathering, analyzing, and using data to change how they work—fueling your ability to better put this information to work in your own libraries.|