December 19, 2014

Place Your Bets, People: Semantic Speech Recognition and the Future of Libraries | Peer to Peer Review

Some years ago, when cellphones were still mostly the province of celebrities and hardcore business travelers, I was walking through an airport and saw a well-groomed and prosperous-looking man engaged in animated conversation with, as far as I could tell, himself. He certainly didn’t seem to be conversing with anyone nearby, anyway. As I (carefully) got closer and continued to watch him talking and gesticulating into the empty social space around him I thought to myself, “That’s interesting; he doesn’t look crazy…”

But then as I passed him, I noticed that there was something stuck in his ear, and I realized he was using one of those newfangled Bluetooth devices that I had started hearing about.

Today, of course, we no longer think twice when we see a business-suited person striding purposefully down the street and talking animatedly to no one. When you witness someone having an out-loud conversation with an invisible interlocutor, you automatically assume “Bluetooth device.”

There’s a big conceptual gap between conversing with someone through your phone, though, and conversing with your phone. Apple took the first mass-market leap across that gap a year or so ago with the introduction of an iPhone-native voice recognition application called Siri. Now there’s a growing buzz around a new smartphone app called Google Voice Search, which some people are saying is vastly better than Siri. If, in fact, it’s that good—and my own experience suggests that it is—then I have to wonder: will Google Voice Search usher us into a culture in which we no longer think twice about someone sitting in her office carrying on a vocal conversation with her computer? Semantic speech recognition has been a standard trope of sci-fi movies for decades, but it has yet to become a common feature of home and office life. That may well be about to change.

Let’s squint our eyes and try to look forward five years. I don’t normally consider myself a betting man, but the fact is that we’re all betting people. Every day we allocate time, energy, and other resources to certain activities and processes based on the belief that things are going to be a certain way in the foreseeable future. Each of those allocations of time and energy constitutes a bet.

So, who’s willing to bet against semantic speech recognition software coming into full maturity within the next two years? Not me. There’s been too much progress already, and it offers too rich an array of solutions to too many problems in too many contexts for me to be willing to assume that its progress won’t continue and quickly accelerate. If it does, what implications would such development have for the future roles of librarians?

I suggest that it’s a relatively small conceptual jump from asking your phone “How close am I to a Mexican restaurant with at least a three-star rating?” to asking it “Please find me five peer-reviewed articles on demographic trends in Europe from no fewer than three journals, each with an impact factor not lower than 11, and email them to me as .pdf files.” And I would suggest that the jump from there to “What are the best journals in microbiology?” or even “Are there any important and relevant articles missing from my works-cited list?” is also relatively small.

Now, maybe you disagree that the jump between “find me some articles” and “analyze my citations for completeness” is a relatively small one. And maybe you feel that even in a research environment characterized by advanced semantic speech recognition, what the human librarian offers—social perceptivity; a finely-honed sensitivity to the subtleties of marginal relevance; an awareness of sources not yet fully mapped by the open Web—is still going to constitute a unique and essential value proposition for the foreseeable future.

And maybe you’re right. But what if those value-adds that humans alone can offer represent surplus value to our patrons? The fact that a service is unique does not automatically make it valuable, and the fact that it’s valuable does not guarantee that it will be seen by those to whom we’re trying to sell it as more valuable than the alternatives. And make no mistake: we librarians are selling something. We are offering our services in exchange for our patrons’ increasingly-scarce time and attention. By offering them research support we are asking them, in effect, to make a bet—the bet is that their investment of time, energy, and inconvenience will pay off more richly than the much smaller investment required to ask a question of their phones or computers. It’s the same bet we’re asking them to make when we encourage them to “start their research with the library website” rather than with Google or (shudder) Wikipedia.

Nor should we forget a major problem with human-based library research support: it’s not scalable. One librarian can’t supply one-on-one service to thousands of patrons. A computer powered by semantic speech recognition, on the other hand, can, and its results don’t have to be just as good as the results you get from talking to a librarian—they only have to seem good enough to obviate talking to a librarian.

Does that scare you? It should. It does me. The question is: does it scare us enough to start thinking in radically different ways about how we support our patrons in their work?

This article was featured in Library Journal's Academic Newswire enewsletter. Subscribe today to have more articles like this delivered to your inbox for free.

Rick Anderson About Rick Anderson

Rick Anderson (rick.anderson@utah.edu) is Associate Dean for Scholarly Resources and Collections at the University of Utah’s Marriott Library. He serves on numerous editorial and advisory boards and is a regular contributor to the Scholarly Kitchen blog. His book, Buying and Contracting for Resources and Services: A How-to-Do-It Manual for Librarians, was published in 2004 by Neal-Schuman.

Share

Comments

  1. It’s the formulation of the research question that might require some coaching. When you speak to a device, there is no sense of the universe of possibility other than perhaps your own surroundings. Talking about the opposite of browsing! I watched a video by a Google employee the other day attempting to teach research techniques using Google Scholar. The presenter invited users to envision what the answer to their question might look like if they were to find it in print, and then type in their query to try and find that imagined result. OK, so now they speak the query. I, for one, am happy to try any and all new methods of satisfying research needs, though I think Rick’s example of limiting the results and setting the bar very high for those results is important to note.

    • Rick Anderson says:

      Those are really good points, Bob. Of course, part of the problem with our traditional approach to helping people phrase research questions is that it doesn’t scale. At my institution, we have a student-to-librarian ration of about 700 to 1, so the vast majority of students don’t have a librarian helping them formulate those queries anyway, and if all those students wanted a librarian to help them the result would be chaos. I think that’s one of the things that will tend to drive our users to things like Siri. Things like Siri won’t be as good as what a librarian can do, but, unlike librarians, they’ll be available at the necessary scale.

  2. Lets not forget that one of the challenges for semantic interpretation is to overcome the garbage in- garbage out issue. Recognition of what was said tends to work reasonably well for people with “normal” accents in widely used languages. For every person who states the system works well, you will comfortably find a number who say that it does not. Crunching more and more training data at the speech recognition level is not going to solve that problem. It needs a radical overhaul of the underlying speech technology that has remained largely unchanged for many years. If you were fortunate enough to see the recent video of the Microsoft Chinese language translator, you might notice that the speech recognition accuracy in English was not great, particularly towards the end of the demonstration.

    • Rick Anderson says:

      True enough — a system that doesn’t work well will tend not to get used, given other alternatives. But once again, we need to bear in mind not just the quality of the alternatives, but also their availability. Human librarians may be much better at parsing meaning and interpreting exotic accents than semantic speech recognition software is. But human librarians are also much less widely and easily available. Given the choice between a pretty-good system that lives in one’s pocket and a much better system that requires one to travel to a library, an awful lot of people are going to settle for pretty good.