Dissemin is shutting down on January 1st, 2025

Published in

World Scientific Publishing, Parallel Processing Letters, 03(25), p. 1541003

DOI: 10.1142/s0129626415410030

Links

Tools

Export citation

Search in Google Scholar

Online Task Resource Consumption Prediction for Scientific Workflows

Journal article published in 2015 by Rafael Ferreira da Silva ORCID, Gideon Juve, Mats Rynge, Ewa Deelman, Miron Livny
This paper is available in a repository.
This paper is available in a repository.

Full text: Download

Green circle
Preprint: archiving allowed
Green circle
Postprint: archiving allowed
Red circle
Published version: archiving forbidden
Data provided by SHERPA/RoMEO

Abstract

Estimates of task runtime, disk space usage, and memory consumption, are commonly used by scheduling and resource provisioning algorithms to support efficient and reliable workflow executions. Such algorithms often assume that accurate estimates are available, but such estimates are difficult to generate in practice. In this work, we first profile five real scientific workflows, collecting fine-grained information such as process I/O, runtime, memory usage, and CPU utilization. We then propose a method to automatically characterize workflow task requirements based on these profiles. Our method estimates task runtime, disk space, and peak memory consumption based on the size of the tasks’ input data. It looks for correlations between the parameters of a dataset, and if no correlation is found, the dataset is divided into smaller subsets using a clustering technique. Task estimates are generated based on the ratio parameter/input data size if they are correlated, or based on the probability distribution function of the parameter. We then propose an online estimation process based on the MAPE-K loop, where task executions are monitored and estimates are updated as more information becomes available. Experimental results show that our online estimation process results in much more accurate predictions than an offline approach, where all task requirements are estimated prior to workflow execution.