October 25, 2014

Behavior Data vs. Patron Privacy: Productive Discomfort | Peer to Peer Review

Someone is gathering every crumb you drop / These mindless decisions and moments you long forgot / Keep them all!
—Vienna Teng, “The Hymn of Acxiom

DorotheaSalo New 3 14 Behavior Data vs. Patron Privacy: Productive Discomfort | Peer to Peer ReviewI’ve finally dumped Gmail forever.

Though the process took quite some time—moving mailing-list subscriptions, changing profiles on websites that knew me by my Gmail address, extracting the messages I needed to keep, and similar chores—the relief of a little more freedom from Google’s privacy-invasive data mining has been well worth the trouble for me. I want as little as possible to do with a company that allegedly thinks trawling and keeping behavior-profile data from college students’ school-mandated, school-purchased email accounts without notice or consent is in some way ethical.

I bring this up because of a strong tension I noticed at the recent Library Technology Conference between library notions of privacy and academic libraries’ salutary desire to use various forms of patron behavior data to improve websites and other services. How much are we willing to snoop to get better at what we do? How do we gauge potential (not actual, let us pray) harm to patrons? When we do decide that snooping is worth the risks, how do we protect our patrons from data breaches (making the news at too many higher education institutions of late) and reidentification attacks? How do we avoid participating in today’s sinister commercial and political nightmare of greedy, thoughtless, not-always-disclosed physical and digital surveillance? Does performing surveillance in our much-trusted libraries not legitimize the other surveillance regimes?

We cannot assume that the data we could and sometimes do gather about our patrons would be of no interest to the powerful or punitive. We know better, so we protect circulation records and computer-use histories as best we know how and interpose our proxy servers and sign-in pages between snoopy electronic publishers and our patrons’ identities. We saw last year in the Aaron Swartz case the worst that can happen when we decline to interpose ourselves, and we also have good reason to be wary of privacy violations by providers of electronic content. An odd twist in the Georgia State e-reserves opinion lends point to these concerns: some of the infringement claims were dismissed by the judge because access logs showed none of the students had actually downloaded the allegedly infringed-upon material. If this segment of the opinion holds up on appeal, it would seem to offer publishers holding copyrights in works used in higher education classrooms tremendous incentive to examine data on student reading and demand that institutions and their libraries gather and keep that data for them, in order to find grounds to sue us more.

At the same time, I certainly don’t want to paint data-gathering librarians with the same brush as Google, much less monumental consumer-behavior profiler Acxiom. Librarianship already has professional ethics commitments regarding privacy that apply to data, which is a good start. I’m fond of principles III and VI of the Code of Ethics of the American Library Association, myself. Our data collection motives are also rather purer, our data easier to tie to obvious patron benefits: sets of aggregate data, from COUNTER usage statistics to website access logs, are profoundly helpful for service refinement, website usability improvement, and collection development. Academic librarians don’t share these data (except distilled into harmless charts or tables), or aggregate them with other libraries’ data (except very carefully indeed), or mine them for individual identities, or keep them forever just in case, or willingly turn them over to businesses or government. If only other data gatherers regularly behaved like libraries!

What we don’t seem to have yet is a professionwide sense of how to apply our ethical commitment to privacy to digital information behavior data, such as we can gather from website access logs, proxy server logs, or web trackers placed in our websites or OPACs. We don’t to my knowledge have best-practice documents, charts and checklists, sample policies, or the rest of the mundane apparatus that helps us navigate other ethics questions without stopping in the middle of our busy days for ponderous pondering. (If I’m wrong about this, I would love to know more; please leave a comment correcting me.) I can only begin to imagine what this apparatus will come to look like, and I certainly can’t prescribe it from on high. It needs to be the fruit of a collective discussion. Fortunately, events like Library Technology Conference are starting that discussion.

At lunch on the second day of the conference, after my session on patron computer privacy, a student at the library school where I teach asked me whether I approved of the systematic catalog usage tracking one presenter discussed at a session we had both attended. A level don’t-you-dare-equivocate stare accompanied the question, an expression I dearly love to see on student faces because it demonstrates so clearly their willingness and ability to think critically about anything I or anyone else tells them. I sighed and said, “I wish they weren’t using Google Analytics.” That was easy to say; Google has repeatedly shown with Google Buzz, Google Plus, and various of its data-mining efforts that its notion of privacy does not measure up to library standards, so, convenient though Google’s tools undoubtedly are, privacy-conscious academic libraries should avoid them. (In my session, an attendee pointed out Piwik as a self-hosted, and therefore less invasive, Google Analytics alternative. Businesses and consortia that host library websites as a service would do well to offer Piwik to their clients.)

After that, though, I had to stop and think. I eventually said, “With the way they’re scrubbing data, it seems mostly okay to me, but I’d want to know more about their data-disposal schedule, and…I’d want them to feel uncomfortable about holding that data.”

It’s the last piece of that answer that I still stand behind. I want academic librarianship to feel uncomfortable about accumulating patron information behavior data, even anonymized, even in aggregate. I want that discomfort to cause us not to collect patron information behavior data at all without a clear need for it, to collect the scantiest data possible when it is needed, to guard that data well, and to throw it away like a hot potato as quickly as feasible to keep ourselves and others from the temptation to abuse it. I want us to endure the uncomfortable process of writing data retention and data privacy policies that treat patron privacy as a dominant concern. Data discomfort is productive, just as the tension at Library Technology Conference was. Productive data discomfort will help libraries remain an excellent example of consciously ethical privacy practices…an example much of the rest of society desperately needs just now.

This doesn’t mean we won’t ever collect data. This doesn’t mean we won’t ever keep data. This doesn’t mean we won’t ever use data. With luck, it means we will be careful enough about data collection, retention, and use to protect our patrons and keep their trust in us intact. No library patron should have to walk away from a library for the same reason I walked away from Gmail.

I also believe that as the privacy watchdog within our institutions, academic librarianship needs to cast a critical, privacy-minded eye over the student analytics movement. InBloom, a would-be K-12 student profiler/tracker whose products and services I find decidedly creepy and intrusive, has been defeated for now by teachers, parents, and librarians, but I am still seeing course management systems and student records systems in higher education discussing or even implementing tracking measures without much heed paid to student privacy. That such dubious features may not presently work well and can be ignored—I saw no use whatever in the so-called analytics that turned up in the most recent upgrade of my campus’s course management system—does not exempt us from questioning the collection and retention of student behavior data. Ideally, we should do so before tools based on that data develop enough to be both seductive and dangerous.

Oh, and because quite a few people ask whenever I tweet about leaving Gmail: I’ve moved my professional nonwork email (mailing lists and so on) to an email account on my own web domain; the actual mailserver is managed by the company I pay to host that domain. It’s working great so far.

This article was featured in Library Journal's Academic Newswire enewsletter. Subscribe today to have more articles like this delivered to your inbox for free.

Dorothea Salo About Dorothea Salo

Dorothea Salo is a Faculty Associate in the School of Library and Information Studies at the University of Wisconsin-Madison, where she teaches digital curation, database design, XML and linked data, and organization of information.

Share

Comments

  1. You might want to change your email contact info on your bio page, it still shows a Gmail address!

  2. This is great – and I hope we get to work defining how to ethically use information to improve what we do without betraying our principles. If we don’t figure this out, why should we expect others to do the right thing?

  3. There is a difference between privacy and anonymity. We preserve the latter to all. But in order to effectively assess, we have to both collect AND look at the data. In a university context, where I work, data can be looked at in a very granular way. Consider how our academic advisors identify and contact at-risk or low-performing students, or how we work collaboratively to refer reference questions. Neither of those work without sharing identifyable information.

    What legitimizes our data collection and use is our commitment to storing and maintaining it securely, accessing it on a need to know basis, and refusing to share its details beyond our institutions unless compelled by law. I understand the urge to dump Gmail (although I haven’t yet, shame), but it would be unfair to label our very legitimate data gathering and analysis as ethically compromised. Data gathering and analysis enables us to analyze and improve library services, understand collection use and make better purchasing decisions, and more. Our funders expect us to do precisely that.

  4. I have never had Gmail, or a Google account, or anything like that, though I have a Yahoo account I use rarely – for which, I admit, I gave some false information, just as I do for Facebook – I have no intention of giving anyone my real birth date; instead, I pick the birth date of a long-deceased childhood friend – easy to remember, but not mine. Mostly, for professional purposes where I don’t want to use my job email, I use an alumni account I was permitted to retain from my alma mater. I’ve always been a strong privacy advocate. I use anonymouse mail and web surfing regularly, have ‘phony’ accounts for truly private matters, etc. It’s not that I’m doing anything terribly exciting – but my business is my business. My political views, religious views, etc which I may wish to discuss with some others – well, I’m not about to put those out on Twitter, Facebook, or Google associated with my professional identity. I never, ever give out my mobile phone number to these organizations – or anyone outside of work and my husband. Do I sound paranoid? Well, there’s a good reason.

    You see, I’ve had a stalker, a violent ex-boyfriend – who found me again a decade after I thought he was out of my life. That’s when I stopped allowing ANY pictures of myself online – and there are none. This has done me some damage professionally – I speak at conferences, but refuse to allow myself to be photographed. My Linked In profile has no photo. My name is suitably generic, after marrying. I left my home country. Oh, did I mention that the ex is a skilled mathematician, and pretty brilliant with computers? This trend to put one’s picture and identifying information accurately and completely online if foolish and dangerous. But it makes it hard to build a profile for work!

  5. Tony Greiner says:

    Libraries that use the Alma ILS also have to recognize that there is no way to turn off Alma’s email feature, that sends an email to patrons when they check out a book, when they return it, and various other times.

    Since we now know that the NSA and CIA are reading these emails, we are ‘outing’ reader’s privacy by using this and similar systems.

  6. You voice a concern many of us discuss in information behavior research circles. Access versus ownership of/to metadata, tacit versus explicit consent, context and intent: these discussions are so often community and culturally situated. Our IRB processes, while sometimes awkward for LIS social science research, do force us to concretely weigh the pros and cons, expected outcomes for the greater good or the negative impacts for the individual participant. I agree with you: these ethical discussions are often thin when we get to such things as program, course, or tool (e.g., CMS, databases, library website) evaluation methods. Institutions often explicitly state an exemption for internal evaluation of services we provide, and oftener still do not require consent in any form beyond the tacit (by being a student, patron, participant, user etc. of XYZ, then you agree that any metadata generated that is not explicitly identifiable is fair game). The other aspect is user behavior: I’ve read studies in my own research area that show in online environments, we are less concerned about privacy when our metadata produces predictive analytics that are helpful in our everyday lives: Amazon, music sites, library catalogs suggesting materials, ads that appear in browsers based on current hobbies etc., discounts/coupons locally and based on grocery list: these are not necessarily seen as nefarious because they are perceived as helpful not predatory. The pivot, for me at least, is informed consent; and if there are low-barrier opt-out options. Information communication technology literacies and everyday life information seeking converge at that point. In economics it’s called the “high cost of effort”. And for users–to the advantage of those who analyze metadata at the micro and macro levels with varying levels of good or ill intentions–one could say it’s caveat emptor. I pose a question, too: was there ever a time when metadata was really “ours” to control? In the past, patrons would check out a library book and sign a circ card that lived in the pocket for all to see; our social security numbers were used as identification in institutions (schools, hospitals, governmental bodies, employers, etc.); our names, addresses, and phone were in telephone books by default. So, perhaps, ethical metadata use is a continuing conversation. And good for you for championing the dialogue!

  7. This is a very thoughtfully written article and I appreciate taking time to address this issue. I agree with most of the ideas expressed, yet in real life, I wonder if patrons/students even expect to have privacy or understand that a tenet of libraries is the value of privacy as a right? In a popular sense, it seems that most people I meet have given up on the notion of digital privacy in anyway, and they simply do not care. Basically, most people I talk with these days think of digital security and privacy as an after thought. When it comes to their identity people seem to practice security by obscurity, by being that one in a billion using Google or Facebook, etc. Plus, many seem to value the “I get it for free” trade off as opposed to paying for a service that would protect some once valued right, such as privacy.

    One idea I like very much for libraries is the idea of hosting their own content on low cost platforms. For instance, with the Raspberry Pi running arkOS (a project based on content ownership), a library can run WordPress, Ghost blogs, or OpenCloud web services for a very low cost without the question of hosting or content ownership. I am so surprised as I talk with librarians at how they are willing to sacrifice ease with products such a Google Analytics or other hosting arrangements, for the protection of content ownership for their library. Hopefully, this notion will change as more ARM based, low cost, single board processor servers hit the market in the years to come. That technological change will allow libraries to host their own data centers basically from a physical desktop. In that, the hardware might be the catalyst that will give libraries the motive to host and own software that protects the content and privacy provided to customers.

Comment Policy:
  1. Be respectful, and do not attack the author or other commenters. Take on the idea, not the messenger.
  2. Don't use obscene, profane, or vulgar language.
  3. Stay on point. Comments that stray from the topic at hand may be deleted.

We are not able to monitor every comment that comes through (though some comments with links to multiple URLs are held for spam-check moderation by the system). If you see something objectionable, please let us know. Once a comment has been flagged, a staff member will investigate.

We accept clean XHTML in comments, but don't overdo it and please limit the number of links submitted in your comment. For more info, see the full Terms of Use.

Speak Your Mind

*