Published in

American Geophysical Union, Water Resources Research, 12(50), p. 9484-9513, 2014

DOI: 10.1002/2014wr016062

Links

Tools

Export citation

Search in Google Scholar

Model selection on solid ground: Rigorous comparison of nine ways to evaluate Bayesian model evidence

Journal article published in 2014 by Anneli Schöniger, Luis Samaniego ORCID, Thomas Wöhling, Wolfgang Nowak
This paper is made freely available by the publisher.
This paper is made freely available by the publisher.

Full text: Download

Green circle
Preprint: archiving allowed
Green circle
Postprint: archiving allowed
Orange circle
Published version: archiving restricted
Data provided by SHERPA/RoMEO

Abstract

Bayesian model selection or averaging objectively ranks a number of plausible, competing conceptual models based on Bayes' theorem. It implicitly performs an optimal tradeoff between performance in fitting available data and minimum model complexity. The procedure requires determining Bayesian model evidence (BME), which is the likelihood of the observed data integrated over each model's parameter space. The computation of this integral is highly challenging because it is as high-dimensional as the number of model parameters. Three classes of techniques to compute BME are available, each with its own challenges and limitations: 1) Exact and fast analytical solutions are limited by strong assumptions. 2) Numerical evaluation quickly becomes unfeasible for expensive models. 3) Approximations known as information criteria (ICs) such as the AIC, BIC, or KIC (Akaike, Bayesian or Kashyap information criterion, respectively) yield contradicting results with regard to model ranking. Our study features a theory-based intercomparison of these techniques. We further assess their accuracy in a simplistic synthetic example where for some scenarios an exact analytical solution exists. In more challenging scenarios, we use a brute-force Monte Carlo integration method as reference. We continue this analysis with a real-world application of hydrological model selection. This is a first-time benchmarking of the various methods for BME evaluation against true solutions. Results show that BME values from ICs are often heavily biased and that the choice of approximation method substantially influences the accuracy of model ranking. For reliable model selection, bias-free numerical methods should be preferred over ICs whenever computationally feasible.