Estimating extremely large amounts of missing precipitation data

Aguilera, Héctor; Guardiola-Albert, Carolina; Serrano-Hidalgo, Carmen

Published in

IWA Publishing, Journal of Hydroinformatics, 3(22), p. 578-592, 2020

DOI: 10.2166/hydro.2020.127

Tools

Export citation

Search in Google Scholar

Estimating extremely large amounts of missing precipitation data

Journal article published in 2020 by Héctor Aguilera

, Carolina Guardiola-Albert, Carmen Serrano-Hidalgo

This paper is made freely available by the publisher.

Full text: Download

Preprint: archiving allowed

Upload

Postprint: archiving allowed

Upload

Published version: archiving allowed

Upload

Policy details

Data provided by

Abstract

AbstractAccurate estimation of missing daily precipitation data remains a difficult task. A wide variety of methods exists for infilling missing values, but the percentage of gaps is one of the main factors limiting their applicability. The present study compares three techniques for filling in large amounts of missing daily precipitation data: spatio-temporal kriging (STK), multiple imputation by chained equations through predictive mean matching (PMM), and the random forest (RF) machine learning algorithm. To our knowledge, this is the first time that extreme missingness (>90%) has been considered. Different percentages of missing data and missing patterns are tested in a large dataset drawn from 112 rain gauges in the period 1975–2017. The results show that both STK and RF can handle extreme missingness, while PMM requires larger observed sample sizes. STK is the most robust method, suitable for chronological missing patterns. RF is efficient under random missing patterns. Model evaluation is usually based on performance and error measures. However, this study outlines the risk of just relying on these measures without checking for consistency. The RF algorithm overestimated daily precipitation outside the validation period in some cases due to the overdetection of rainy days under time-dependent missing patterns.

Published in

Links

Tools

Estimating extremely large amounts of missing precipitation data

Abstract