Data-driven methods for imputing national-level incidence in global burden of disease studies

McDonald, Scott A.; Devleesschauwer, Brecht; Speybroeck, Niko; Hens, Niel; Praet, Nicolas; Torgerson, Paul R.; Havelaar, Arie H.; Wu, Felicia; Tremblay, Marlène; Amene, Ermias W.; Döpfer, Dörte

Published in

World Health Organization, Bulletin of the World Health Organization, 4(93), p. 228-236, 2015

DOI: 10.2471/blt.14.139972

Tools

Export citation

Search in Google Scholar

Data-driven methods for imputing national-level incidence in global burden of disease studies

Journal article published in 2015 by Scott A. McDonald, Brecht Devleesschauwer, Niko Speybroeck, Niel Hens, Nicolas Praet, Paul R. Torgerson, Arie H. Havelaar, Felicia Wu, Marlène Tremblay, Ermias W. Amene, Dörte Döpfer

This paper is made freely available by the publisher.

Full text: Download

Preprint: archiving forbidden

Postprint: archiving forbidden

Published version: archiving allowed

Upload

Policy details

Data provided by

Abstract

Objective To develop transparent and reproducible methods for imputing missing data on disease incidence at national-level for the year 2005. Methods We compared several models for imputing missing country-level incidence rates for two foodborne diseases – congenital toxoplasmosis and aflatoxin-related hepatocellular carcinoma. Missing values were assumed to be missing at random. Predictor variables were selected using least absolute shrinkage and selection operator regression. We compared the predictive performance of naive extrapolation approaches and Bayesian random and mixed-effects regression models. Leave-one-out cross-validation was used to evaluate model accuracy. Findings The predictive accuracy of the Bayesian mixed-effects models was significantly better than that of the naive extrapolation method for one of the two disease models. However, Bayesian mixed-effects models produced wider prediction intervals for both data sets. Conclusion Several approaches are available for imputing missing data at national level. Strengths of a hierarchical regression approach for this type of task are the ability to derive estimates from other similar countries, transparency, computational efficiency and ease of interpretation. The inclusion of informative covariates may improve model performance, but results should be appraised carefully

Published in

Links

Tools

Data-driven methods for imputing national-level incidence in global burden of disease studies

Abstract