Dissemin is shutting down on January 1st, 2025

Published in

JMIR Publications, JMIR Medical Informatics, 4(9), p. e25000, 2021

DOI: 10.2196/25000

Links

Tools

Export citation

Search in Google Scholar

Mortality Prediction of Patients with Cardiovascular Disease Using Medical Claims Data under Artificial Intelligence Architectures: Validation Study (Preprint)

This paper is made freely available by the publisher.
This paper is made freely available by the publisher.

Full text: Download

Green circle
Preprint: archiving allowed
Green circle
Postprint: archiving allowed
Green circle
Published version: archiving allowed
Data provided by SHERPA/RoMEO

Abstract

Background Cardiovascular disease (CVD) is the greatest health problem in Australia, which kills more people than any other disease and incurs enormous costs for the health care system. In this study, we present a benchmark comparison of various artificial intelligence (AI) architectures for predicting the mortality rate of patients with CVD using structured medical claims data. Compared with other research in the clinical literature, our models are more efficient because we use a smaller number of features, and this study could help health professionals accurately choose AI models to predict mortality among patients with CVD using only claims data before a clinic visit. Objective This study aims to support health clinicians in accurately predicting mortality among patients with CVD using only claims data before a clinic visit. Methods The data set was obtained from the Medicare Benefits Scheme and Pharmaceutical Benefits Scheme service information in the period between 2004 and 2014, released by the Department of Health Australia in 2016. It included 346,201 records, corresponding to 346,201 patients. A total of five AI algorithms, including four classical machine learning algorithms (logistic regression [LR], random forest [RF], extra trees [ET], and gradient boosting trees [GBT]) and a deep learning algorithm, which is a densely connected neural network (DNN), were developed and compared in this study. In addition, because of the minority of deceased patients in the data set, a separate experiment using the Synthetic Minority Oversampling Technique (SMOTE) was conducted to enrich the data. Results Regarding model performance, in terms of discrimination, GBT and RF were the models with the highest area under the receiver operating characteristic curve (97.8% and 97.7%, respectively), followed by ET (96.8%) and LR (96.4%), whereas DNN was the least discriminative (95.3%). In terms of reliability, LR predictions were the least calibrated compared with the other four algorithms. In this study, despite increasing the training time, SMOTE was proven to further improve the model performance of LR, whereas other algorithms, especially GBT and DNN, worked well with class imbalanced data. Conclusions Compared with other research in the clinical literature involving AI models using claims data to predict patient health outcomes, our models are more efficient because we use a smaller number of features but still achieve high performance. This study could help health professionals accurately choose AI models to predict mortality among patients with CVD using only claims data before a clinic visit.