Links

Tools

Export citation

Search in Google Scholar

Testing the performance of the two fold FCS algorithm for multiple imputation of longitudinal clinical records

This paper is available in a repository.
This paper is available in a repository.

Full text: Download

Question mark in circle
Preprint: policy unknown
Question mark in circle
Postprint: policy unknown
Question mark in circle
Published version: policy unknown

Abstract

Multiple imputation is increasingly regarded as the standard method to account for partially observed data, but most methods have been based on cross-sectional imputation algorithms. Recently, a new multiple-imputation method, the two fold fully conditional specification (FCS) method, was developed to impute missing data in longitudinal datasets with nonmonotone missing data. (See Nevalainen J., Kenward M.G., and Virtanen S.M. 2009. Missing values in longitudinal dietary data: A multiple imputation approach based on a fully conditional specification. Statistics in Medicine 28: 3657-3669.) This method imputes missing data at a given time point based on measurements recorded at the previous and next time points. Up to now, the method has only been tested on a relatively small dataset and under very specific conditions. We have implemented the two fold FCS algorithm in Stata, and in this study we further challenge and evaluate the performance of the algorithm under different scenarios. In simulation studies, we generated 1,000 datasets, which were similar in structure to the longitudinal clinical records (The Health Improvement Network primary care database) to which we will apply the two fold FCS algorithm. Initially, these generated datasets included complete records. We then introduced different levels and patterns of partially observed data patterns and applied the algorithm to generate multiply imputed datasets. The results of our initial multiple imputations demonstrated that the algorithm provided acceptable results when using a linear substantive model and data were imputed over a limited time period for continuous variables such as weight and blood pressure. Introducing an exponential substantive model introduced some bias, but estimates were still within acceptable ranges. We will present results for simulation studies that include situations where categorical and continuous variables change over a 10-year period (for example, smokers become ex-smokers, weight increases or decreases) and large proportions of data are unobserved. We also explore how the algorithm deals with interactions and whether it has any impact on the final data distribution--whether the algorithm is initiated to run forward or backward in time.