Online Task Resource Consumption Prediction for Scientific Workflows

da Silva, Rafael Ferreira; Juve, Gideon; Rynge, Mats; Deelman, Ewa; Livny, Miron

Published in

World Scientific Publishing, Parallel Processing Letters, 03(25), p. 1541003

DOI: 10.1142/s0129626415410030

Tools

Export citation

Search in Google Scholar

Online Task Resource Consumption Prediction for Scientific Workflows

Journal article published in 2015 by Rafael Ferreira da Silva

, Gideon Juve, Mats Rynge, Ewa Deelman, Miron Livny

This paper is available in a repository.

Full text: Download

Preprint: archiving allowed

Upload

Postprint: archiving allowed

Upload

Published version: archiving forbidden

Policy details

Data provided by

Abstract

Estimates of task runtime, disk space usage, and memory consumption, are commonly used by scheduling and resource provisioning algorithms to support efficient and reliable workflow executions. Such algorithms often assume that accurate estimates are available, but such estimates are difficult to generate in practice. In this work, we first profile five real scientific workflows, collecting fine-grained information such as process I/O, runtime, memory usage, and CPU utilization. We then propose a method to automatically characterize workflow task requirements based on these profiles. Our method estimates task runtime, disk space, and peak memory consumption based on the size of the tasks’ input data. It looks for correlations between the parameters of a dataset, and if no correlation is found, the dataset is divided into smaller subsets using a clustering technique. Task estimates are generated based on the ratio parameter/input data size if they are correlated, or based on the probability distribution function of the parameter. We then propose an online estimation process based on the MAPE-K loop, where task executions are monitored and estimates are updated as more information becomes available. Experimental results show that our online estimation process results in much more accurate predictions than an offline approach, where all task requirements are estimated prior to workflow execution.

Published in

Links

Tools

Online Task Resource Consumption Prediction for Scientific Workflows

Abstract