Ten simple rules for writing Dockerfiles for reproducible data science

Nüst, Daniel; Sochat, Vanessa; Marwick, Ben; Eglen, Stephen J.; Head, Tim; Hirst, Tony; Evans, Benjamin D.

Published in

Public Library of Science, PLoS Computational Biology, 11(16), p. e1008316, 2020

DOI: 10.1371/journal.pcbi.1008316

Tools

Export citation

Search in Google Scholar

Ten simple rules for writing Dockerfiles for reproducible data science

Journal article published in 2020 by Daniel Nüst

, Vanessa Sochat

, Ben Marwick

, Stephen J. Eglen

, Tim Head, Tony Hirst

, Benjamin D. Evans

This paper is made freely available by the publisher.

Full text: Download

Preprint: archiving allowed

Upload

Postprint: archiving allowed

Upload

Published version: archiving allowed

Upload

Policy details

Data provided by

Abstract

Computational science has been greatly improved by the use of containers for packaging software and data dependencies. In a scholarly context, the main drivers for using these containers are transparency and support of reproducibility; in turn, a workflow’s reproducibility can be greatly affected by the choices that are made with respect to building containers. In many cases, the build process for the container’s image is created from instructions provided in a Dockerfile format. In support of this approach, we present a set of rules to help researchers write understandable Dockerfiles for typical data science workflows. By following the rules in this article, researchers can create containers suitable for sharing with fellow scientists, for including in scholarly communication such as education or scientific papers, and for effective and sustainable personal workflows.

Published in

Links

Tools

Ten simple rules for writing Dockerfiles for reproducible data science

Abstract