Vision
Associate Dean for Research, Professor of Library Science
Purdue University Libraries
504 W. State Street ADMIN
W. Lafayette, IN 47907-0156 USA
V: 765.494.2900; F: 765.494.0156
Curation facilitates deposition and preservation of data through community-based policies and application of metadata standards and ontologies to ensure integrity of long term access. It encompasses a set of essential protocols and systems which facilitate access, dissemination and archiving of e-research. These protocols include management policies and tools that provide descriptive analysis of digital collections and objects to augment discovery, administration, use, reuse, and preservation. Research takes the form of investigating policies to guide and facilitate the work of data generators in submitting and exposing data, applying enhanced metadata schemas to describe digital preservation and access, designing systems to facilitate long term discovery of collections of objects, and developing middleware to address problems of interaction between systems and applications.
As curation involves both essential activities and critical systems to facilitate access, dissemination and archiving of e-research, it is useful to look at the technical issues and the framework in which we look at them. Curation can be described as protocols and tools that provide descriptive analysis of digital collections and objects to augment discovery, management, use, reuse, and preservation. These protocols can take the form of schemas to describe digital objects, systems to facilitate discovery of collections of objects, and middleware to resolve problems of interaction between systems and applications. At the same time, curation also provides policies and consultation to facilitate the work of data providers in submitting and exposing data, as well as enhance capabilities to navigate to data. Storage and handling of data are important and necessary components of local, regional and national solutions to the “data deluge” problem—it is critical to ensure that various data can reside somewhere and that it can be reached. But where storage and handling usually seek to remove the human element by automating processes, curation seeks to account for the human element in discovery and access.
In fact, descriptive level curation applied to data which is willing to be shared by researchers at earlier levels (i.e., raw or processed or analyzed data rather than just final results) may indeed further scholarly communication and data mining in the future.
Many individual researchers and their groups, as well as agencies and consortiums, currently provide various types of curation. For instance:
- Inter-university Consortium for Political and Social Research (ICPSR) in the social sciences
- The National Center for Atmospheric Research (NCAR) in atmospheric sciences
- The National Center for Biotechnology Information (NCBI) in molecular biology
- National Virtual Observatory (NVO) in astronomy
It is especially to these latter communities that the D2C2 gives attention. The aim of the D2C2 is to address curation issues and work on problems related to unorganized, disparate, heterogeneous and distributed data, data workflow and environments. It obviously will work closely with the efforts of other agencies, centers, and groups which are doing related work so that practices and standards can be shared, reviewed and evaluated. Just as the D2C2 can build on the efforts of others who have been working in this area for the past few years, it is assumed that applications and solutions developed in the D2C2 will benefit others. As it is not likely that one “fits all” solution in this area will ever be agreed upon, it is the aim of the D2C2 to research insights, applications and systems to facilitate the distributed nature of curation.
One initiative in particular includes design and development of a distributed institutional repository to serve as a platform for investigation of data issues and development of applications to help solve data problems that arise in a variety of research domains. Purdue e-Scholar is the branded name for a distributed institutional repository (DIR) system at Purdue University that supports curation for management of and access to digital objects of e-research, including data and documents in various forms, formats and locations. The distributed framework interoperates with disparate information systems and repositories through an Open Archives Initiative (OAI) architecture, providing an institutional or regional context for collecting, tracking and disseminating e-research.
The mission of the Center is to investigate and resolve curation issues of facilitating access to, preservation of, and archiving for data and data sets in complex and distributed environments.
Involvement by colleges, departments and center is recognized through research collaborations and board participation by Colleges of Agriculture, Engineering and Science; Discovery Park’s Cyber Center; and ITaP. Specific departments with which the Libraries collaborates on research include Biology, Agronomy, Chemical Engineering, and Plant Sciences, as noted below. Recent unfunded collaborations have included those with departments of Food Sciences, Health Sciences, Electrical and Computer Engineering, and the Discovery Learning Center. In addition, we are about to pursue collaborations with the Colleges of Liberal Arts and Technology, the departments of Chemistry and Earth and Atmospheric Sciences, and the Birck Nanotechnology Center.
Director of the D2C2 reports to the Dean of Libraries and coordinates with board. Staff of the center includes a research systems administrator, two dat research scientists, who reports to director, as well as various collaborating Libraries faculty who will coordinate research projects with the director. Direction will be provided by an advisory board comprised of Randy Woodson (College of Agriculture), Leah Jamieson (College of Engineering), James L. Mullins (Libraries), Jeffrey Vitter (College of Science), Gerry McCartney (ITaP), Santae Kim (Depts. of Mechanical and Chemical Engineering), and Ahmed Elmagarmid (Discovery Park’s Cyber Center).
Initial funding will be provided through release time and cost share by the Libraries for an initial period up to one year for director, systems
administrator, clerical/administrative support, and two research scientists. Purdue University Libraries seek to position the D2C2 to be a leader in data
curation as recognized by the NSF and the Association of Research Libraries. In addition to funding collaborations noted above, the center will be
competitive to lead and participate on grants from NIH, NSF and related federal funding, IMLS and NARA and related libraries/archive based funding, as well
as foundation funding. The baseline for outcomes will be funded proposals submitted and awarded, but the hallmark for success will be how the Center’s
research benefits curation and archiving of data at Purdue University, and makes contributions in the area regionally and nationally.