SAGE Publications, Scandinavian Journal of Public Health, 4(47), p. 469-473, 2019
Full text: Download
Aim: We aim to compare four different weighting methods to adjust for non-response in a survey on drinking habits and to examine whether the problem of under-coverage of survey estimates of alcohol use could be remedied by these methods in comparison to sales statistics. Method: The data from a general population survey of Finns aged 15–79 years in 2016 ( n=2285, response rate 60%) were used. Outcome measures were the annual volume of drinking and prevalence of hazardous drinking. A wide range of sociodemographic and regional variables from registers were available to model the non-response. Response propensities were modelled using logistic regression and random forest models to derive two sets of refined weights in addition to design weights and basic post-stratification weights. Results: Estimated annual consumption changed from 2.43 litres of 100% alcohol using design weights to 2.36–2.44 when using the other three weights and the estimated prevalence of hazardous drinkers changed from 11.4% to 11.4–11.8%, correspondingly. The use of weights derived by the random forest method generally provided smaller estimates than use of the logistic regression-based weights. Conclusions: The use of complex non-response weights derived from the logistic regression model or random forest are not likely to provide much added value over more simple weights in surveys on alcohol use. Surveys may not catch heavy drinkers and therefore are prone for under-reporting of alcohol use at the population level. Also, factors other than sociodemographic characteristics are likely to influence participation decisions.