User-generated content (UGC)—which includes tweets, reviews, Facebook posts, and Wikipedia articles—now plays a key role in the average person’s Internet experience. UGC is also becoming an indispensable resource for helping researchers make sense of big data. In his Wednesday keynote address “The Mining and Application of Diverse Cultural Perspectives in User-Generated Content” at the Electronic Resources and Libraries (ER&L) conference in Austin this week, Brent Hecht, assistant professor of computer science and engineering at the University of Minnesota, will discuss how “UGC reflects the cultural diversity of its contributors to a previously unidentified extent and that this diversity has important implications for Web users and existing UGC-based technologies.”
Prior to the event, LJ spoke with Hecht about the intersection of geography and computer science, the influence of UGC, and why librarians are needed to help patrons navigate popular UGC resources such as Wikipedia.
LJ: You have an M.A. in geography, and a Ph.D. in computer science. How do those fields intersect?
Brent Hecht: I believe I was the first computer science/geography double major at my college in 2005, but I doubt that’s the case anymore. It used to be hard to explain, but these days, all you have to say is “Google Maps.”
I’m in a subfield of computer science called Human-Computer Interaction… it includes everything from [improving Google Maps] to understanding how information flows across space via social networks, to developing cool technologies that, for instance, let you take a picture of a publicly displayed local map and then use that for navigation instead of Google Maps—so it’s sort of in the augmented reality space. There are also people working really hard at the very challenging problem of figuring out when someone types “London” into the search bar, do they mean London, England, or London, Ontario, or any of the other Londons that are out there? There’s quite a wide-ranging set of usage questions at the intersection of those two fields.
How would you describe the role that user-generated content now plays in the average person’s Internet experience?
It’s all over the place. A ridiculous percentage of search queries have Wikipedia results in the top three—in Bing and in Google. Wikipedia is the sixth most popular website in the world. There’s also Amazon customer reviews… Twitter for news and social connections, Facebook obviously, YouTube—YouTube gets over an hour of video [uploaded] every second—I could go on forever.
A project in 2010 looked at how local user generated content is. There was some disagreement in the literature.
What we pointed out in that paper was that it used to be when people wanted to find out about a city, they would go to that city’s webpage. Now, typically, most folks go to the Wikipedia article about the city. The importance of it really can’t be overstated. A community of people with no credentials—classic user generated content in context—is defining the way that people understand the spaces around them.
By and large the [facts] are accurate. This is sort of the miracle of Wikipedia—if you get a large enough group of people together, they will be able to find mistakes, for the most part.
Where I think the model has broken down so far is in areas of coverage. The English Wikipedia, for instance, covers very extensively cultural and geographic topics that English speakers are interested in. Same deal with the German Wikipedia, the Spanish Wikipedia, the French Wikipedia, and so on…. There’s this perception that the English Wikipedia is so big, that it’s a superset of all the other language editions. That is actually not the case…. If you read an English language article about a concept that also has articles in other language editions, you are, on average, missing out on about 28 percent of the content you would get if you could read all of the language editions. That’s based on a dataset of [the largest] 25 language editions…. That’s a new reason why we need librarians—to help understand the cultural context of the information we’re reading, and help us gain information from other cultural contexts.
Are there certain topics or areas of coverage that the English language Wikipedia is considered better for than others? One criticism seems to be that it can be heavy on pop culture and light on other topics, for example.
There’s a reasonable hypothesis that the opposite is actually true. We’re actually starting a research project here [at the University of Minnesota] and one graduate student in our lab has begun to look at articles according to a quadrant defined by a popularity axis and a quality axis. So the articles in the upper right quadrant would be both highly popular and high quality. That’s ideal. And then in the lower left-hand quadrant, you get not very popular, low quality. But the other two quadrants are interesting. The high popularity, low quality quadrant, to me, is most interesting. What are people accessing a lot but getting low quality information from?