Dissemin is shutting down on January 1st, 2025

Published in

Oxford University Press (OUP), GigaScience, 12(9), 2020

DOI: 10.1093/gigascience/giaa106

Links

Tools

Export citation

Search in Google Scholar

File-based localization of numerical perturbations in data analysis pipelines

Journal article published in 2020 by Ali Salari ORCID, Gregory Kiar ORCID, Lindsay Lewis, Alan C. Evans, Tristan Glatard ORCID
This paper is made freely available by the publisher.
This paper is made freely available by the publisher.

Full text: Download

Green circle
Preprint: archiving allowed
Green circle
Postprint: archiving allowed
Green circle
Published version: archiving allowed
Data provided by SHERPA/RoMEO

Abstract

Abstract Background Data analysis pipelines are known to be affected by computational conditions, presumably owing to the creation and propagation of numerical errors. While this process could play a major role in the current reproducibility crisis, the precise causes of such instabilities and the path along which they propagate in pipelines are unclear. Method We present Spot, a tool to identify which processes in a pipeline create numerical differences when executed in different computational conditions. Spot leverages system-call interception through ReproZip to reconstruct and compare provenance graphs without pipeline instrumentation. Results By applying Spot to the structural pre-processing pipelines of the Human Connectome Project, we found that linear and non-linear registration are the cause of most numerical instabilities in these pipelines, which confirms previous findings.