Dissemin is shutting down on January 1st, 2025

Published in

University of California, Los Angeles, Journal of Statistical Software, 11(64)

DOI: 10.18637/jss.v064.i11

Links

Tools

Export citation

Search in Google Scholar

<b>BatchJobs</b>and<b>BatchExperiments</b>: Abstraction Mechanisms for UsingRin Batch Environments

Journal article published in 2015 by Bernd Bischl, Michel Lang ORCID, Olaf Mersmann, Jörg Rahnenführer ORCID, Claus Weihs
This paper is made freely available by the publisher.
This paper is made freely available by the publisher.

Full text: Download

Question mark in circle
Preprint: policy unknown
Question mark in circle
Postprint: policy unknown
Question mark in circle
Published version: policy unknown
Data provided by SHERPA/RoMEO

Abstract

Empirical analysis of statistical algorithms often demands time-consuming experiments. We present two R packages which greatly simplify working in batch computing environments. The package BatchJobs implements the basic objects and procedures to control any batch cluster from within R. It is structured around cluster versions of the well-known higher order functions Map, Reduce and Filter from functional programming. Computations are performed asynchronously and all job states are persistently stored in a database, which can be queried at any point in time. The second package, BatchEx periments, is tailored for the still very general scenario of analyzing arbitrary algorithms on problem instances. It extends package BatchJobs by letting the user de_ne an array of jobs of the kind \apply algorithm A to problem instance P and store results". It is possible to associate statistical designs with parameters of problems and algorithms and therefore to systematically study their inuence on the results. The packages' main features are: (a) Convenient usage: All relevant batch system operations are either handled internally or mapped to simple R functions. (b) Portability: Both packages use a clear and well-de_ned interface to the batch system which makes them applicable in most high-performance computing environments. (c) Reproducibility: Every computational part has an associated seed to ensure reproducibility even when the underlying batch system changes. (d) Abstraction and good software design: The code layers for algorithms, experiment de_nitions and execution are cleanly separated and enable the writing of readable and maintainable code.