Published in

Swansea University, International Journal of Population Data Science, 5(5), 2020

DOI: 10.23889/ijpds.v5i5.1631

Links

Tools

Export citation

Search in Google Scholar

Evaluating Machine-Learning Models for Predicting Hospital Transfers in Administrative Data: A Study of Admissions for Myocardial Infarction

Journal article published in 2020 by Derrick Lopez ORCID, Juan Lu, Frank Sanfilippo, Tom Briffa, Joe Hung, Lee Nedkoff ORCID
This paper is made freely available by the publisher.
This paper is made freely available by the publisher.

Full text: Download

Green circle
Preprint: archiving allowed
Red circle
Postprint: archiving forbidden
Green circle
Published version: archiving allowed
Data provided by SHERPA/RoMEO

Abstract

IntroductionHospital administrative data is a valuable source to measure myocardial infarction (MI) rates. However, admission counts are susceptible to over-inflation if the patient is transferred multiple times during a single episode of care, and variables denoting transfers may not be reliable. To obtain an accurate number of events, hospital transfers need to be correctly identified. Objectives and ApproachWe assessed multivariable logistic regression and various machine-learning models to predict transfers in hospital administrative data. Using Western Australian linked hospital data, we identified records from 2000-2016 with a principal discharge diagnosis of MI. Our standard method to compare against was a 24-hour look-back to identify a transfer using just admission and separation dates from the current and previous records for the same patient. Multivariable logistic regression and decision trees with various boosting algorithms were used to predict if a single record was a transfer, using variables recorded in the admission (e.g. age, sex, type of hospital, admitted from, emergency/elective admission). The performance of each model was calculated using metrics including area under the curve (AUC). ResultsRecords in the training, validation and testing samples had similar characteristics: mean age=68.9 years, 66% were male and 58% admitted to tertiary hospitals. Gradient Boosting Decision Tree (AUC=0.887, 95%CI: 0.886-0.887) outperformed multivariable logistic regression (AUC=0.875; 95% CI: 0.869-0.881) and random forest models (AUC=0.859; 95% CI: 0.853-0.865). Conclusion / ImplicationsMultivariable logistic regression and machine-learning models are able to identify transfers in a single record from existing variables. They can be used in unlinked hospital administrative data where records belonging to the same patient cannot be identified.