Data-intensive science: The Terapixel and MODISAzure projects

Agarwal, Deborah A.; Cheah, You-Wei; Fay, Dan; Fay, Jonathan; Guo, Dean; Hey, Tony; Humphrey, Marty; Jackson, Keith R.; Li, Jie; Poulain, Christophe; Ryu, Youngryel; van Ingen, Catharine

Published in

SAGE Publications, International Journal of High Performance Computing Applications, 3(25), p. 304-316, 2011

DOI: 10.1177/1094342011414746

Tools

Export citation

Search in Google Scholar

Data-intensive science: The Terapixel and MODISAzure projects

Journal article published in 2011 by Deborah A. Agarwal, You-Wei Cheah, Dan Fay, Jonathan Fay, Dean Guo, Tony Hey, Marty Humphrey, Keith R. Jackson, Jie Li, Christophe Poulain, Youngryel Ryu

, Catharine van Ingen

This paper is available in a repository.

Full text: Download

Preprint: archiving allowed

Upload

Postprint: archiving allowed

Upload

Published version: archiving forbidden

Policy details

Data provided by

Abstract

We live in an era in which scientific discovery is increasingly driven by data exploration of massive datasets. Scientists today are envisioning diverse data analyses and computations that scale from the desktop to supercomputers, yet often have difficulty designing and constructing software architectures to accommodate the heterogeneous and often inconsistent data at scale. Moreover, scientific data and computational resource needs can vary widely over time. The needs grow as the science collaboration broadens or as additional data is accumulated; the computational demand can have large transients in response to seasonal field campaigns or new instrumentation breakthroughs. Cloud computing can offer a scalable, economic, on-demand model that is well matched to some of these evolving science needs. This paper presents two of our experiences over the last year — the Terapixel Project, using workflow, high-performance computing and non-structured query language data processing to render the largest astronomical image for the WorldWide Telescope, and MODISAzure, a science pipeline for image processing, deployed using the Azure Cloud infrastructure.

Published in

Links

Tools

Data-intensive science: The Terapixel and MODISAzure projects

Abstract