February 27, 2017

Documenting the Now Builds Social Media Archive

Shot at the November 25th Ferguson vigil held at McGill University in Montreal. Credit: Gerry Lauzon

Ferguson vigil at McGill University, Montreal, November 25, 2014
Credit: Gerry Lauzon

Partners from three universities across the country have joined forces on a new project, Documenting the Now: Supporting the Scholarly Use and Preservation of Social Media Content, that will collect, archive, and provide access to Twitter feeds chronicling historically significant current events, particularly around issues of social justice.

The project, DocNow for short, was awarded a two-year, $517,000 grant from the Andrew W. Mellon Foundation in January. Three principals—Bergis Jules, university and political papers archivist at the University of California at Riverside (UCR); Ed Summers, lead developer at the University of Maryland’s Maryland Institute of Technology in the Humanities (MITH); and Chris Freeland, associate university librarian at the Washington University in St. Louis (WUSTL) libraries—will work with a project team and an advisory board from a range of institutions across the country.

The seeds of DocNow were sown during the 2014 Society of American Archivists (SAA) annual meeting, which convened in Washington DC on August 10, one day after the shooting of Michael Brown in Ferguson, MO. During the weeklong gathering, many of the attending librarians and archivists watched the protests in Ferguson unfold on social media—particularly through powerful Instagram, YouTube, and Vine images being shared in real time on Twitter.

“We saw that the Twitter conversations were dictating how the major media would cover the event,” recalled Jules. “A lot of times the Twitter conversations would correct [and] change the narrative of whatever the mass media was trying to push. We saw it as this really powerful tool.” While at the meeting, Jules and Summers began discussing ways that archivists could preserve this material.

In the following months, Jules and Summers continued the conversation with Meredith Evans, at the time associate university librarian at WUSTL, where the Documenting Ferguson repository was set up in the weeks after Brown’s death. Over the course of the following year, the three decided to partner on the project that would become DocNow. When Evans left WUSTL to become the director of the Jimmy Carter Presidential Library and Museum in November 2015, Freeland stepped in as co-principal investigator, along with Jules and Summers. (Evans currently sits on the 20-member advisory board.) With the help of the project and design team—which includes Desiree Jones-Smith as project coordinator at WUSTL, UX designer Alexandra Dolan-Mescal, developer Francis Kayiwa, and information architect Dan Chudnov—DocNow is investigating its next steps.

A CONSCIENTIOUS COLLABORATION

During the SAA meeting, it quickly became clear to Jules and Summers that this was a pivotal moment in American history, and that much of the documentation was happening on social media. “That week there was just so much conversation about what was going on, policing in America, and treatment of poor communities,” said Jules. “And most of that conversation was being spurred by what people were seeing on Twitter.” Discussion among the archivists, naturally, turned toward preservation, and Jules and Summers began talking in earnest about what was involved in preserving social media material.

The two decided to see how much they could collect over the following few weeks, using the keyword “Ferguson” with Twitter’s API. “So we created this dataset of 13 million tweets,” Summers recalled. “That was the initial impetus for us working together.”

At the time, students at the University of Maryland were mobilizing and holding town hall meetings, said Summers, and asked him to help put together a visual backdrop for an event using digital media from the database. A series of teach-in events on campus were developed around the archive, which became the Digital Humanities Incubator, a practice-oriented series of workshops introducing scholars and students to digital tools by working with social media data. “That’s what showed us that there was an appetite for doing this kind of work on campus, and convinced our director that seeking funding to do a project would be a useful thing to do,” said Summers.

WUSTL was already encouraging people to submit digital images around the Ferguson unrest, having launched Documenting Ferguson in September 2014. Jules, Summers, and Evans felt their respective institutions would partner well, and began to assemble the advisory board to help with the project. “We handpicked people, including scholars, journalists, archivists, librarians, who we thought would help guide the project in a really significant way,” said Jules.

WUSTL will serve as administrative lead on the project. The university’s involvement and the connection to the St. Louis/Ferguson community is a crucial component of the partnership; it is also involved in other Ferguson-related efforts, including Regional Collecting Initiative on Ferguson, a cooperation among museums and archives in the St. Louis area that are collecting materials related to Ferguson.

DocNow will assemble its core team and advisory board in St. Louis in August. At the same time, it will be testing software on Amazon’s cloud servers and putting together user experience data. A white paper examining the various ethical and copyright issues is in the works as well, to be written by Jules and WUSTL copyright and digital access librarian Micah Zeller, who is a member of the advisory board.

PRESERVATION, CONVERSATION

DocNow’s scope is twofold: On the technical side, it will eventually produce an open source web-based application that will enable the collection, analysis, and preservation of Twitter messages, including links to the digital resources they point to. It will also serve as a platform to cultivate conversation among scholars, archivists, journalists, and human rights activists examining the ethical aspects of social media collection, including issues of privacy and copyright. The project development team and advisory board will take a holistic approach to the two goals, using the prototyping process to drive the ethics conversation, which in turn will inform the application’s development.

While there are a number of tools for gathering and preserving social media data, DocNow wants to keep the technical threshold low in order to appeal to as broad a user base as possible—journalists, reporters, historians, researchers, archivists, librarians, and students. “We’re interested in building something that anyone can use, can pull up on their desktop and start collecting hashtags, can start maybe creating graphs around,” Jules told LJ. But DocNow’s primary audience, he believes, will be “archivists, librarians, people who are interested in preserving digital content, and researchers and historians who want to analyze that content.” One of Mellon’s stipulations is that the tool be open source and available for anyone to install and use. To that end, data will be placed in a Hydra Fedora Commons repository.

Part of the tool’s functionality will be to allow archivists and researchers to easily start up a data collection around an event for a certain period of time, and to provide views into that data that will be useful to scholars. The project will incorporate some existing tools, such as George Washington University’s Social Feed Manager, which collects from social media platforms such as Flickr and Tumblr in addition to Twitter, and Rhizome’s WebRecorder project, which has the capabilities to capture and play back live video streams—both of which were funded by Mellon at the same time as DocNow. Brian Dietz, digital program librarian for special collections at North Carolina State University (NCSU), who worked on NCSU’s social media archives toolkit project, is on the advisory board.

The difference between DocNow and many of the other social media collection tools in development, explained Summers, is it will collect Twitter data exclusively, at least to start with, and will be heavily focused on the usability aspect; a UX designer is written into the project’s grant. “I think a lot of the time, in particular with preservation activities…you think about all the backend digital preservation workflows and how data’s going to be represented, all these kinds of things, before thinking about the user experience,” Summers told LJ. “We’re trying to invert that in our project by making a tool that’s useful to people and then figuring out the backend technical features—how data is represented, what the schema looks like, etc.”

ETHICAL ARCHIVING

The ethical aspect of DocNow revolves as much around security as the usual concerns of intellectual property rights that come into play when collecting social media feeds. Jules recalled seeing members of the police force tweeting during the Baltimore protests saying that they would be using Twitter photos and posts to prosecute people. “What does that mean if you’re building a collection of tweets, of digital content around Baltimore uprising? Do you have to take into account that maybe one day the Baltimore police might come in and say, ‘Hey, we know you have this collection of five million #baltimoreuprising tweets. We’re working on a case, we need to go in and see that collection because we think there might be an image of someone in there doing something wrong.’ These are things we have to think about.”

One solution could be to build in a feature that sends a message to a user when their tweet is collected, providing a link to information about the project and allowing them to opt out—or not, since a posted tweet is already public information. Another solution might be developing a way to anonymize a Twitter user’s name, photo, or geotag when a tweet is picked up by a researcher. “One of the deep issues,” Evans noted, is “what is private and what is not? Once you post something on social media, yes, you have your independent rights and intellectual freedom, but it’s also made public…. We’ve come up with a term, ‘go viral’—what does that mean to that content if we are collecting it and preserving or reusing it?”

In addition, while Twitter’s API (application programming interface) greatly simplifies access to its data, its terms of service are very specific and areas of conflict, particularly when it comes to the privacy of individuals whose feeds are being collected, could arise. Some of the issues that DocNow plans to address, Freeland told LJ, include: “Should we delete tweets that have been deleted by the person that posted them subsequently? What do we do about incriminating photos that are in collections? Personal identifying information that’s in the tweets? Those are the ethical issues that we’ll be wrangling with as part of this project.”

SOCIAL MEDIA, SOCIAL JUSTICE

From Ferguson to the Arab Spring, Twitter in particular has been the platform of choice for organizing meetings and protests. And while the DocNow application will be usable in any context, Freeland added, social justice “is the research question that our scholars are most interested in answering.”

“Social media, in a lot of ways, has democratized how people consume information, how people share information, how people put information out there,” explained Jules. “We’re far away from those days of the big three or four media companies and people sitting in front of the TV watching the news. Right now you can be part of the story.”

By way of example, Jules pointed out two divergent stories told by hashtags in the wake of Freddie Gray’s death in Baltimore, #BaltimoreUprising vs. #BaltimoreRiots. Each hashtag defined the narrative differently, “and on top of that you could see the media dipping in and deciding which story to tell.”

The #BlackLivesMatter hashtag, which originated after the 2013 acquittal of George Zimmerman in the shooting death of African American teenager Trayvon Martin, saw widespread use as Brown’s death was followed by those of more young black men across the country. Unified cultural identities within the virtual community, such as Black Twitter, have grown continuously more active around social justice issues. And the collection of artifacts around these subjects is gaining traction, with projects such as the modern civil rights archive at Chicago’s Newberry Library, the Baltimore Historical Society–led Preserve the Baltimore Uprising 2015 Archive Project, and the National Council on Public History’s “Interpreting the History of Race Riots and Racialized Mass Violence in the Age of ‘Black Lives Matter’” website, which is working to give a modern context to the history of racial uprisings in America.

The DocNow project represents a similar convergence of mass critical information and the means to collect it. “We were all on social media, and we all were in academic settings where we knew people were doing research on these things and documenting different types of social movements in a unique way,” Evans told LJ. “We wanted people to be able to be at their desktop, or on their tablet, and download this data and view it…or just organize it in a different way, an easier way to do the research.”

Lisa Peet About Lisa Peet

Lisa Peet is Associate Editor, News for Library Journal.

Share
CONNECTING INDIE AUTHORS, LIBRARIES AND READERS
SELF-eLearn More
SELF-e is an innovative collaboration between Library Journal and BiblioBoard® that enables authors and libraries to work together and expose notable self-published ebooks to voracious readers looking to discover something new. Finally, a simple and effective way to catalog and provide access to ebooks by local authors and build a community around indie writing!
Comment Policy:
  1. Be respectful, and do not attack the author, people mentioned in the article, or other commenters. Take on the idea, not the messenger.
  2. Don't use obscene, profane, or vulgar language.
  3. Stay on point. Comments that stray from the topic at hand may be deleted.
  4. Comments may be republished in print, online, or other forms of media, per our Terms of Use.

We are not able to monitor every comment that comes through (though some comments with links to multiple URLs are held for spam-check moderation by the system). If you see something objectionable, please let us know. Once a comment has been flagged, a staff member will investigate.

We accept clean XHTML in comments, but don't overdo it and please limit the number of links submitted in your comment. For more info, see the full Terms of Use.

Speak Your Mind

*