Two-stage method to remove population- and individual-level outliers from longitudinal data in a primary care database.

Welch, Catherine; Petersen, I.; Walters, K.; Morris, Rw W.; Nazareth, I.; Kalaitzaki, E.; White, Ir R.; Marston, L.; Carpenter, J.

Published in

Wiley, Pharmacoepidemiology & Drug Safety, 7(21), p. 725-732, 2011

DOI: 10.1002/pds.2270

Tools

Export citation

Search in Google Scholar

Two-stage method to remove population- and individual-level outliers from longitudinal data in a primary care database.

Journal article published in 2011 by Catherine Welch, I. Petersen, K. Walters

, Rw W. Morris, I. Nazareth, E. Kalaitzaki, Ir R. White, L. Marston, J. Carpenter

This paper was not found in any repository, but could be made available legally by the author.

Full text: Unavailable

Preprint: archiving allowed

Upload

Postprint: archiving restricted

Upload

Published version: archiving forbidden

Policy details

Data provided by

Abstract

PURPOSE: In the UK, primary care databases include repeated measurements of health indicators at the individual level. As these databases encompass a large population, some individuals have extreme values, but some values may also be recorded incorrectly. The challenge for researchers is to distinguish between records that are due to incorrect recording and those which represent true but extreme values. This study evaluated different methods to identify outliers. METHODS: Ten percent of practices were selected at random to evaluate the recording of 513,367 height measurements. Population-level outliers were identified using boundaries defined using Health Survey for England data. Individual-level outliers were identified by fitting a random-effects model with subject-specific slopes for height measurements adjusted for age and sex. Any height measurements with a patient-level standardised residual more extreme than ±10 were identified as an outlier and excluded. The model was subsequently refitted twice after removing outliers at each stage. This method was compared with existing methods of removing outliers. RESULTS: Most outliers were identified at the population level using the boundaries defined using Health Survey for England (1550 of 1643). Once these were removed from the database, fitting the random-effects model to the remaining data successfully identified only 75 further outliers. This method was more efficient at identifying true outliers compared with existing methods. CONCLUSIONS: We propose a new, two-stage approach in identifying outliers in longitudinal data and show that it can successfully identify outliers at both population and individual level. Copyright © 2011 John Wiley & Sons, Ltd.

Published in

Links

Tools

Two-stage method to remove population- and individual-level outliers from longitudinal data in a primary care database.

Abstract