Published in

Elsevier, Procedia Computer Science, (9), p. 1614-1619, 2012

DOI: 10.1016/j.procs.2012.04.177

Links

Tools

Export citation

Search in Google Scholar

Kurator: A Kepler Package for Data Curation Workflows

This paper is made freely available by the publisher.
This paper is made freely available by the publisher.

Full text: Download

Green circle
Preprint: archiving allowed
Green circle
Postprint: archiving allowed
Green circle
Published version: archiving allowed
Data provided by SHERPA/RoMEO

Abstract

Data curation is critical for scientific data digitization, sharing, integration, and use. This paper presents Kurator, a software package for automating data curation pipelines in the Kepler scientific workflow system. Several curation tools and services are integrated into this package as actors to enable construction of workflows to perform and document various data curation tasks. The integration of Google cloud services (e. g., Google spreadsheets), allows workflow steps to invoke human experts outside the workflow in a manner that greatly simplifies the complex data handling in distributed, multi-user curation workflows. The Kepler platform provides the modeling, execution and management ability, including a collection-oriented model of computation (COMAD), and provenance tracking and browsing for the curation package. These features not only allow workflows to be easily modeled, maintained, and evolved, but also QA/QC of curation results is facilitated through examination of provenance information recorded during workflow execution. Effectiveness of the Kurator package is demonstrated through a workflow for data curation of natural science collections.