Published in

Elsevier, Future Generation Computer Systems, (36), p. 418-429

DOI: 10.1016/j.future.2013.09.023

Links

Tools

Export citation

Search in Google Scholar

CAMP: Community access MODIS pipeline

This paper is available in a repository.
This paper is available in a repository.

Full text: Download

Green circle
Preprint: archiving allowed
Red circle
Postprint: archiving forbidden
Red circle
Published version: archiving forbidden
Data provided by SHERPA/RoMEO

Abstract

The Moderate Resolution Imaging Spectroradiometer (MODIS) instrument’s land and atmosphere data are important to many scientific analyses that study processes at both local and global scales. The Terra and Aqua MODIS satellites acquire data of the entire Earth’s surface every one or two days in 36 spectral bands. MODIS data provide information to complement many of the ground-based observations but are extremely critical when studying global phenomena such as gross photosynthesis and evapotranspiration. However, data procurement and processing can be challenging and cumbersome due to difficulties in volume, size of data and scale of analyses. For example, the very first step in MODIS data processing is to ensure that all products are in the same resolution and coordinate system. The reprojection step involves a complex inverse gridding algorithm and involves downloading tens of thousands of files for a single year that is often infeasible to perform on a scientist’s desktop. Thus, use of large-scale resource environments such as high performance computing (HPC) environments are becoming crucial for processing of MODIS data. However, HPC environments have traditionally been used for tightly coupled applications and present several challenges for managing data-intensive pipelines. We have developed a data-processing pipeline that downloads the MODIS swath products and reprojects the data to a sinusoidal system on an HPC system. The 10 year archive of the reprojected data generated using the pipeline is made available through a web portal. In this paper, we detail a system architecture (CAMP) to manage the lifecycle of MODIS data that includes procurement, storage, processing and dissemination. Our system architecture was developed in the context of the MODIS reprojection pipeline but is extensible to other analyses of MODIS data. Additionally, our work provides a framework and valuable experiences for future developments and deployments of data-intensive pipelines from other scientific domains on HPC systems.