November 19, 2017

University of Michigan Launches “Deep Blue Data” Repository

Deep Blue DataThe University of Michigan Library (UM) has launched Deep Blue Data, an open repository for sharing and archiving large datasets generated by UM researchers. Soft launched in February and officially announced in September, Deep Blue Data complements the university’s long-running institutional repository, Deep Blue, and is part of a suite of research data services that the library has been developing for UM faculty and students.

“It’s been a four-year journey, building these research data services,” said Elaine L. Westbrooks, UM associate university librarian for research and service lead for Deep Blue Data. Westbrooks said that when she came to UM four years ago, many of the university’s administrators were looking to the library for help, particularly with new data retention mandates from funding bodies such as the National Science Foundation.

“We started to build a plan for services, and we wanted to include the tool—which would [become] Deep Blue Data—but we also felt that it was really critical to develop the suite of services,” including library assistance with data management planning, data discovery and access, data organization and management, standards for metadata and documentation, data sharing and publication, data preservation, and data visualization, Westbrooks explained. The library’s 50 subject specialists (or “informationists” in medical and health sciences fields) are the key liaisons who inform and assist departments and individual researchers with these services, she added.

During the past decade, DSpace-based Deep Blue had grown to become one of the largest institutional repositories in the world, but Westbrooks and staff decided that it was not optimal for managing computer-readable datasets, large datasets, audio files, images, and other content. So Deep Blue Data was built separately on a Hydra/Fedora framework. Eventually, the library plans to harmonize the two repositories—using Hydra/Fedora—enabling researchers to store all of their scholarly assets in one place.

The repository is already hosting deposits from statistics; linguistics; epidemiology; sociology; pathology; molecular, cellular, and developmental biology; school of information; and school of engineering researchers.

“I think this is a really good sign,” Westbrooks said. “I’ve been pleased to see the breadth of departments that are represented.”

Although the system is equipped to work with very large datasets, researchers such as astronomers or geneticists who generate petabyte-sized datasets tend to have their own systems. Westbrooks said she expects Deep Blue Data primarily to be handling datasets of 100 gigabytes or smaller.

For now, having separate repositories has led to internal discussions about what criteria—other than size—should be used for placing content into each, “and we’ve had numerous cases where someone submitted [content] to Deep Blue, and it should go into Deep Blue Data,” Westbrooks noted. These transfers are being handled individually, and the subject specialists are available for consultation with faculty if questions arise.

While both repositories are self-service, the subject specialists often facilitate the deposit process and monitor deposits to both. “We don’t want to be the ones that deposit everything, but we do dedicate time to working with faculty members, or giving them the information that they need to successfully make deposits.”

Faculty members or researchers might have questions regarding metadata, for example, while making a self-service deposit. Those questions are often funneled through the subject expert, which provides another opportunity for a librarian to decide which repository would be more appropriate for a specific set of content, and to discuss Deep Blue Data when needed, on an individual basis.

The library also leverages the relationships that its subject specialists have with departments and faculty to raise awareness about new services such as Deep Blue Data.

“Step one has been a very robust professional development program. Our subject specialists are constantly getting training on [data] visualization, medical data, social science data, text mining, data mining. We have a lot of expertise in the library and we are constantly holding workshops so that our [department] liaisons are very knowledgeable…. They’re able to have really good conversations with faculty.”

But, the “marker of success” for the project, Westbrooks said, is that she believes that word of mouth will extend beyond person-to-person discussions with subject specialists and faculty.

“The word about Deep Blue Data isn’t only going to come from the library,” she said. “We’ll have our Office of Research promoting it, we’ll have Advanced Research Computing promoting it, and we’ll have faculty and staff throughout [UM’s] 19 schools and colleges promoting it.”

Westbrooks is confident that the new repository will be a success—not only due to prevailing trends or the library’s past success with Deep Blue.

“When we started building this tool and developing this service, it [was] always…in collaboration with other units,” she said. “So, campus IT. Let’s be sure we talk to the medical school. Let’s talk to ICPSR [the Interuniversity Consortium for Political and Social Research]. What are they doing, and how can we collaborate? Let’s talk to Advanced Research Computing. Let’s talk to the University of Michigan Office of Research, [etc.] Keeping those conversations going has always been core” to the development of Research Data Services and Deep Blue Data. “We knew that we didn’t want to go alone on this, and we knew we wouldn’t be successful if we went about this in a unilateral way.”

Matt Enis About Matt Enis

Matt Enis (menis@mediasourceinc.com; @matthewenis on Twitter) is Senior Editor, Technology for Library Journal.

Share

Comments

  1. It is extremely difficult to develop and provide a high-quality product or service without conducting at least some basic market research. Some people have a strong aversion to the word “research” because they believe that the word implies a highly sophisticated set of techniques that only highly trained people can use. Some people also believe that, too often, research generates lots of useless data that is in lots of written reports that rarely are ever read, much less used in the real world. This is a major misunderstanding.