April 19, 2018

Data Curation’s Dirty Little Secret | Peer to Peer Review

When the Research Data Services group I helped inaugurate worked out a response process for data-management-plan assistance requests, we were careful to respect the disciplinary expertise among our members. After all, even in late 2010 it was a truism that the barrier skill for helping researchers manage data was disciplinary expertise. “In practice,” wrote Alma Swan and Sheridan Brown in 2008, “data scientists need a wide range of skills: domain expertise and computing skills are prerequisites…”

Data curation’s dirty little secret is that this isn’t always true. It isn’t even often true.

Swan and Brown’s own evidence directly contradicted their words. They wrote, quite truthfully, that, aside from domain experts who teach themselves digital data management and analysis techniques, another typical data scientist or data manager “originat[ed] as a computer scientist who has acquired domain knowledge over time.” Domain knowledge, then, is not a prerequisite exactly; it can be learned on the job. Since this is true for computer scientists, why wouldn’t it be true for information professionals as well?

Researchers themselves are the authority for the claim that disciplinary knowledge is required for proper data management, in Swan and Brown as in many successor reports and articles. I must say, I don’t find researchers a reliable source on this point. If researchers knew what skills and techniques are necessary to manage and work with digital data, wouldn’t they be doing it better than they are? Would they even need help with data-management planning? Would they be leaving data management to wet-behind-the-ears graduate students at the very bottom of the lab hierarchy, as I have so often witnessed them doing? Would they be dumping the digital equivalent of moldy boxes from spiderwebby garages on librarians’ desks to the extent that they are?

That said, some researchers do believe fiercely in the indispensability of disciplinary knowledge. The last-but-one data-management brownbag that Research Data Services sponsored prominently featured work that two groups of my digital-curation students did to help the Living Environments Laboratory (LEL) store, describe, track, and search/browse the individual images and other digital materials from which virtual-reality scenarios are built. I had to bite my tongue hard when a researcher in attendance incredulously questioned the speaker about my students’ lack of disciplinary expertise. Surely they couldn’t have done that work? The work they had demonstrably done? Myths die hard… and while they live, they cause librarians needless headaches.

I have sent a round dozen groups of students out to solve digital-data problems in the three years I’ve been teaching digital curation. In addition to the LEL researchers, my students have helped a linguist, an art historian, student artists, a demographer, a radio station with media-archiving issues, and more. I’ve also sent interns and practicum students into a campus microscopy lab, our local Forest Service research outpost, and our local Geological Survey office. I match disciplinary expertise when I can, but I usually can’t. It’s never mattered. They do fine. They’ve all done fine.

For my own part, I’ve taught basic data management to engineers, physicists, biologists, historians, clinicians, and computer scientists, and I’ve critiqued data-management plans from even more disciplines than that. My own disciplinary background is in literary analysis and historical linguistics. I can count the questions and situations I haven’t been able to resolve singlehandedly without moving from my left hand to my right. The number I failed to resolve at all? One, that I remember—a confusing workflow in instrument biology, and it was my own fault for not calling in someone else to resolve my confusion before responding.

Are disciplinary differences irrelevant to research-data management? Well, no, but the salient disciplinary differences I’ve seen come in around idiosyncratic research processes and tools. I confess to considerable skepticism, for example, about the possibility of an electronic laboratory notebook software package that will work across the entire breadth of a campus’s research initiatives. Lab notebooks are tightly tied to idiosyncratic, ungeneralizable, often project-specific processes, and my experience with researchers suggests that they expect digital notebooks to conform to their processes equally tightly, and will brook no impedance. I hope I’m wrong—an 80/20 solution seems vaguely within the realm of possibility, perhaps—but we’ll just have to see.

For the advising and consulting around data management that libraries would like to do, of course disciplinary knowledge is useful! No question about it. If nothing else, a little disciplinary knowledge helps convince researchers that librarians are useful people to talk to. (I find that a tiny bit of research before a scheduled meeting allows me to fake it convincingly.) No matter how often researchers claim it is, however, “useful” is not the same thing as “needful.” As libraries work through how we will help researchers with data management, we can take comfort, I hope, in the mythbusting I’ve just done. We don’t have to have all the disciplinary knowledge scattered across campus within our library walls before we start to help.

I once chatted with the inimitable Diane Hillmann at ALA about scholarly communication and data curation. When the disciplinary-expertise canard came up, she said judiciously, “They all think they’re special snowflakes. They’re not.” I’ve never forgotten that. I believe my students and I have abundantly proven it, and I believe academic libraries can—and should—go right on proving it.

Dorothea Salo About Dorothea Salo

Dorothea Salo is a Faculty Associate in the School of Library and Information Studies at the University of Wisconsin-Madison, where she teaches digital curation, database design, XML and linked data, and organization of information.



  1. As someone in the data curation field with subject expertise (chemistry), I can say that my subject expertise hasn’t been nearly as useful for me as other types of knowledge. Namely, an understanding of the research process and the politics of academia.

    For example, knowing the research process tells me to focus on the “wet-behind-the-ears graduate students” and knowing the politics helps me sell data curation to their PIs.

    I admit that I have it a little easier navigating the politics because I have a PhD. It opens some doors for me but once those doors are open, I work hard to win people over on data curation.

    So do I need a PhD to do my job? Certainly not. But the things I learned while doing my PhD, like the process of doing research and working in academia, are vital to the way I approach data curation at my university. The great thing is that these are skills that can be learned in many different ways.

  2. In some senses, I think researchers actually benefit from working with someone from outside of their field on data curation. The outside perspective of a non-expert can be useful in helping researchers realize some of the assumptions they make based on their specific disciplinary focus. With the increasing interest in interdisciplinary collaborations, data description in particular can be improved by working with an information professional who is not an expert in the field, and thus can help the researcher see the bigger picture outside of their own disciplinary focus.