August 16, 2017

LC Launches Web Cultures, Webcomics Archives

“Digital Data” via xkcd, one of the comics captured in LC’s Webcomics Web Archive

The American Folklife Center at the Library of Congress (LC) announced June 15 the creation of two new born-digital collections: the Web Cultures Web Archive (WCWA), which will feature memes, GIFs, and image macros that surface in online pop culture, and the Webcomics Web Archive (WWA), which will collect comics created for an online audience.

Both are part of LC’s mission to “preserve and present American folklife”—most recently the ephemera of the Internet. WCWA’s goal, to document the creation and sharing of web culture, means that it features such online phenomena as Lolspeak and Leet, emoji, reaction GIFs, memes, and digital urban legends—along with sites like Urban Dictionary, the Internet Meme Database, Boing Boing, Slashdot, Giphy, Metafilter, Emojipedia, Cute Overload, the LOLCat Bible Translation Project, and Equestria Daily, which bills itself as the major fan site for bronies—adult fans of “My Little Pony.” WWA archives born-digital comics such as Hyperbole and a Half, xkcd, and Hark! A Vagrant, as well as less well-known offerings.

ARCHIVING CULTURE

LC’s American Folklife Center was created by Congress in 1976, broadening the scope of the Archive of Folk Culture—originally founded as the Archive of the American Folk Song in 1928. Along with the new name, it expanded its mandate to collect not only music but spoken word recordings, stories and mythology, crafts, artwork, dance, architecture, and records of community life. Now, as LC works to make these analog archives accessible in digital format, they are being joined by born-digital material.

LC has been archiving websites since around 2000, grouping them by theme, event, or subject area. “Like many national libraries we started with election archiving,” noted LC Digital Library project manager Abbie Grotke, “and we continued to do that, so we have a lot of government and campaign websites.”

Many collections are grouped around particular subjects, such as the events of September 11, 2001; the 2003 Iraq War; or the 2002 Winter Olympic Games. In 2010, LC began capturing sites that fall under broader categories, such as political commentary, media, religious organizations, advocacy groups, educational and research institutions, creative expressions, and blogs. LC’s web archiving team, part of its Technology Policy office, works in collaboration with LC staff and outside subject specialists to identify websites to be crawled for the Web Archive.

For WCWA, said Nicole Saylor, head of the American Folklife Center Archive, “We rely on nominations from folklorists, people studying digital culture, Internet enthusiasts…. It really took a collective curation effort.” Scholars like Trevor Blank, author of Folklore and the Internet: Vernacular Expression in a Digital World (Univ. Press of Colorado) and Robert Glenn Howard, director of digital studies and professor of communication, religious studies, and folklore in the Department of Communication Arts at the University of Wisconsin–Madison help LC’s team identify websites that play a part in today’s Internet culture.

While LC librarians have been interested in documenting online culture for the several decades, noted Saylor, much of the impetus behind the development of the new collections comes from having the infrastructure at LC to collect the material. “We have a big box of 419 scam email that Judith Gray, our reference coordinator, printed out diligently,” she told LJ. “They’re hilarious and wonderful, and they’re the analog version of what we’re doing now. We didn’t have the capacity then to do anything but print them out.”

Saylor was first inspired to develop WCWA after listening to presentations at the American Folklore Society’s annual meeting, watching new scholars give presentations on web phenomena like the Slender Man, “and making connections between the stuff that they’ve always studied versus this modern occurrence of it on the Internet.” A number of WCWA nominations emerged from the 2014 CURATEcamp digital curation unconference, which was cohosted in Washington, DC, by LC and the Catholic University of America’s Department of Library and Information Sciences. The unconference focused the theme of web cultures, and many public sector folklorists in attendance became nominating partners.

The Webcomics collection stemmed from an agreement in 2011 between LC and the Small Press Expo (SPX), a convention celebrating artists, writers, and publishers of comic art. SPX is home to the Ignatz Awards, which recognize outstanding achievements in comics and cartooning; LC committed to capturing the finalists for the Outstanding Online Comic category.

The process of archiving the webcomic sites turned out to be relatively simple, explained Megan Halsband, reference librarian in LC’s Serial and Government Publications Division. “When we discovered that it actually wasn’t terribly burdensome—that the image files weren’t actually all that hard to capture and we could actually do this work through the web archiving tool that we had—we expanded it to…the Webcomics collection,” Halsband told LJ. In addition to the Ignatz Award nominees, WWA includes Eisner and Harvey Award–winning material, as well as long-running comics.

Currently LC has collected about 50 sites in each of the two archives. Decisions on what to exclude are generally made along curatorial lines rather than any ideas of “appropriate” content, added Saylor—a suggestion to archive all the reviews on Amazon, for instance, was nixed—or because a site would be too technically difficult to collect. The material is not always safe for work, and may even be considered offensive, but the archivists don’t make judgment calls on the material on that basis. “We don’t know what’s going to be of future interest to researchers,” noted Halsband.

via GIPHY, one of the sites captured in LC’s Web Culture Web Archive

NUTS AND BOLTS

Recommendations for sites to archive are entered into the DigiBoard, LC’s in-house curatorial tool that manages the archival workflow process. Digiboard, developed by LC in 2009, automates and streamlines collection activities and generates metadata, as well as managing the permissions process. Seed nominations are reviewed and, if approved, LC secures permissions from the site owners.

Permissions are critical, Halsband told LJ. “We have to get permission to crawl the site as well as to display it and archive it in our collection, so we require permission for both of those activities. That does make it complicated because unless I can actually communicate a creator or a site owner, we can’t crawl the site.” While most site owners are delighted to have their work included in the Web Archives, others don’t bother answering LC’s request—“Maybe they think we’re junk mail,” offered Grotke—or simply don’t have current email contact information attached to the domain. “We sent paper letters in one case,” recalled Saylor. (LC doesn’t need permission to crawl or display and archive government websites, which are in the public domain.)

Once permissions, if needed, are secured, LC contracts with the Internet Archive (IA) to crawl and capture selected websites with its Heritrix web crawler—the same tool IA uses for its Wayback Machine. A site is archived in snapshot form; the archives represent how a website looked at a specific point in time, and are not meant to serve as a mirror site but rather to provide data about what people were clicking on, and when. To that end, a site—usually containing some combination of text, images, audio, videos, and PDFs—is generally captured more than once.

”Some of these have been challenging to archive in terms of the complexity, particularly in the web cultures collection, like the meme sites and all of these media-rich websites,” Grotke told LJ. “We have a couple of different strategies that we’re using to collect some of this content. We have a very deep quarterly crawl that we run to get some of those really large sites, and we have some RSS feed crawling that we do if the site publishes [an RSS feed].”

Sites for WCWA are often crawled at the highest level, capturing everything under that particular domain. Webcomics can be more difficult to separate out, as they are often hosted on larger sites with other content unrelated to comics. “We’ve gone in and tried to collect the comics part only,” said Halsband, “and it sometimes ends up becoming a little bit tricky because we end up getting a lot of content from the rest of the site…. That’s part of the hazard of web archiving, because it’s strictly reliant on you telling the program what URL you want it to capture.”

STATE OF THE ARCHIVE

Current access to LC’s web archives is relatively simple, with item records and attached descriptive data in LC’s MODS (metadata object descriptive standard) format. At some point, said Grotke, she would like to see more content included, such as the potential for full text search or derivative datasets—ways to help users dig deeper into the archive.

“Right now our access is pretty simple,” she added. “We’re working to make more of our data available for research use.”

The two new collections complement each other well, noted Saylor and Halsband, with contemporary web culture influencing comic art and vice versa. Capturing both categories will offer a window into early 21st century popular culture as it evolves. “It may not be for 25 years that they really become valuable, or 50 years,” said Grotke. “Some of these sites are still online, and only as they start to disappear or change drastically will the value be shown.”

WWA is also a useful look at how particular artists’ work changes over the years, either through fame—Halsband points to Gene Luen Yang, named Ambassador for Young People’s Literature by LC in 2016, whose American Born Chinese began as a webcomic and went on to be published in print—or simply through practice and experience. “I’d call it digital ephemera,” said Halsband, “where we’re collecting some of the sketchbooks and doodles and other things that these artists are creating that we might not otherwise have, because they’re not sketching in a sketchbook that they then…donate to a library along with all their original artwork. It’s kind of the 21st century equivalent of some of the artists’ sketchbooks [LC has] from the 19th century.”

The new archives have generated a lot of interest, said Grotke, and they’re just the beginning; in addition to the 30 or so collections listed on LC’s website, there are many more in production.

Halsband hopes to develop a literature and criticism web archive collection at some point, she said, as well as looking at other aspects of both webcomics and web culture—“looking at fandom and cosplay and other areas that are of interest to folklorists and popular culture historians and enthusiasts…because I think there’s a lot of potential overlap there.” She is also interested in seeing what other major archiving libraries do with similar material. The British Library recently announced a project where it will crowdsource a webcomics archive as part of its UK Web Archive, she noted. “That’s something that’s really interesting to me, how they’re going about doing that, what they’re going to do.”

“These two collections are getting people interested in looking at what the library has, that we’re not just old books,” added Saylor. “It’s a way in, for people to say, ‘Oh, they have this? Really?’ and then come back and maybe take a second look at…and they can find something that’s relevant to whatever it is that they’re interested in.”

Lisa Peet About Lisa Peet

Lisa Peet is Associate Editor, News for Library Journal.

Share
Create a Maker Program in Your Library
School Library Journal’s newest installment of Maker Workshop will feature up-to-the-minute content to help you develop a rich maker program for your library. During this 4-week online course, you’ll hear directly from expert keynote speakers doing inspiring work that you can emulate, regardless of your library’s size or budget. Course sessions will explore culturally relevant making and how to assess your community’s needs, mobile maker spaces, multi-media, and more!
Comment Policy:
  1. Be respectful, and do not attack the author, people mentioned in the article, or other commenters. Take on the idea, not the messenger.
  2. Don't use obscene, profane, or vulgar language.
  3. Stay on point. Comments that stray from the topic at hand may be deleted.
  4. Comments may be republished in print, online, or other forms of media, per our Terms of Use.

We are not able to monitor every comment that comes through (though some comments with links to multiple URLs are held for spam-check moderation by the system). If you see something objectionable, please let us know. Once a comment has been flagged, a staff member will investigate.

We accept clean XHTML in comments, but don't overdo it and please limit the number of links submitted in your comment. For more info, see the full Terms of Use.

Speak Your Mind

*