March 17, 2018


Going Geosocial: Big Data Research

Big Data ResearchProfessor Matthew Zook, Co-Editor of the journal Big Data & Society and Professor in the Department of Geography at the University of Kentucky, weighed in on the subject of supporting big data research for the third and final piece in our webcast series “Libraries & the Changing Scholarly Environment” presented by Electronic Resources & Libraries (ER&L), and sponsored by SAGE.

Zook discussed urban planning and the ways people use and understand cities. Readers will get an understanding of how to use “geosocial” media from his presentation, “Using Big Geosocial Media Data to Study Cities”.

As large datasets have come to be a part of the work and research in the social sciences and humanities, scholars are rethinking their assumptions, values and methods. Research librarians can play a key role in this to address ethical, privacy and access issues.

This webcast centers around how researchers and librarians can use large social media as an analytical tool. Zook wants to help researchers mine social platforms to gather and map information, be it “user generated,” “crowd sourced,” or what is sometimes called “volunteered geographic information.”

Geosocial media is any piece of social media, whether from Twitter, Facebook, Instagram, or others so long as it is tied to a physical location with a latitude and a longitude. This is a two-step process: accessing this data, typically through an API, and analyzing the data to map it. The results, then are illustrated by mapping the data with spatial distribution points.

To identify trends, Zook and his team aggregated tweets and mapped them into spatial units. As an example, they searched for people who used the terms “hipster” or “bro” in their tweets. They then pinned the results on a map of New York City to show where each group is concentrated. The results show a geographic circle, which they called a “bronut” and a “hipster” core.

Zook and his team conducted a second study around regional cuisine. This time, the data sets contain the words “grits” and “oats,” and the search is conducted on Twitter. Both terms receive cartographic representations. Initially, the individual tweets provide too many points on the map. The reader is only able to see a depiction of the population, with more tweets in places where there are more people. Next, transparent dots are added to allow “overplotting,” an improvement to mere population mapping.

Merely working with spatial units is considered “aggregating up.” However, when the tweets are grouped by county, the viewer will see clusters. There are drawbacks to this approach—particularly when it comes to mapping larger counties. Zook and his team start with identically sized cells, but they’re still using raw counts i.e. total number of tweets.

One of the most important considerations is the size of the spatial units, which will have a direct effect on the kinds of patterns displayed in the results.

“In the case with the data set containing the word “grits” for instance, Zook states, “We have the benefit of time stamps, so we’re able to learn that “grits” is used more on Saturdays and Sundays, probably because it’s time consuming to make grits, it’s often more of a weekend dish.”

These studies help researchers identify important variables. “This approach isn’t limited to any one topic; terms can be compared in similar ways to help us find information that we hadn’t even thought to look for,” he said.

Sage PublishingSponsored by SAGE Publishing

Facts Matter: Information Literacy for the Real World
Libraries and news organizations are joining forces in a variety of ways to promote news literacy, create innovative community programming, and help patrons/students identify misinformation. This online course will teach you how to partner with local news organizations to promote news literacy through a range of programs—including a citizen journalism hub at your library.