A general concept for consistent documentation of computational analyses

Ebert, Peter; Müller, Fabian; Nordström, Karl; Lengauer, Thomas; Schulz, Marcel H.

Published in

Oxford University Press, Database, (2015), 2015

DOI: 10.1093/database/bav050

Tools

Export citation

Search in Google Scholar

A general concept for consistent documentation of computational analyses

Journal article published in 2015 by Peter Ebert, Fabian Müller

, Karl Nordström, Thomas Lengauer, Marcel H. Schulz

This paper is made freely available by the publisher.

Full text: Download

Preprint: archiving allowed

Upload

Postprint: archiving allowed

Upload

Published version: archiving allowed

Upload

Policy details

Data provided by

Abstract

The ever-growing amount of data in the field of life sciences demands standardized ways of high-throughput computational analysis. This standardization requires a thorough documentation of each step in the computational analysis to enable researchers to understand and reproduce the results. However, due to the heterogeneity in software setups and the high rate of change during tool development, reproducibility is hard to achieve. One reason is that there is no common agreement in the research community on how to document computational studies. In many cases, simple flat files or other unstructured text documents are provided by researchers as documentation, which are often missing software dependencies, versions and sufficient documentation to understand the workflow and parameter settings. As a solution we suggest a simple and modest approach for documenting and verifying computational analysis pipelines. We propose a two-part scheme that defines a computational analysis using a Process and an Analysis metadata document, which jointly describe all necessary details to reproduce the results. In this design we separate the metadata specifying the process from the metadata describing an actual analysis run, thereby reducing the effort of manual documentation to an absolute minimum. Our approach is independent of a specific software environment, results in human readable XML documents that can easily be shared with other researchers and allows an automated validation to ensure consistency of the metadata. Because our approach has been designed with little to no assumptions concerning the workflow of an analysis, we expect it to be applicable in a wide range of computational research fields.

Published in

Links

Tools

A general concept for consistent documentation of computational analyses

Abstract