Oxford University Press, European Heart Journal – Digital Health, 4(2), p. 576-585, 2021
Full text: Download
Abstract Aims This study aims to assess whether information derived from the raw 12-lead electrocardiogram (ECG) combined with clinical information is predictive of atrial fibrillation (AF) development. Methods and results We use a subset of the Telehealth Network of Minas Gerais (TNMG) database consisting of patients that had repeated 12-lead ECG measurements between 2010 and 2017 that is 1 130 404 recordings from 415 389 unique patients. Median and interquartile of age for the recordings were 58 (46–69) and 38% of the patients were males. Recordings were assigned to train-validation and test sets in an 80:20% split which was stratified by class, age and gender. A random forest classifier was trained to predict, for a given recording, the risk of AF development within 5 years. We use features obtained from different modalities, namely demographics, clinical information, engineered features, and features from deep representation learning. The best model performance on the test set was obtained for the model combining features from all modalities with an area under the receiver operating characteristic curve (AUROC) = 0.909 against the best single modality model which had an AUROC = 0.839. Conclusion Our study has important clinical implications for AF management. It is the first study integrating feature engineering, deep learning, and Electronic medical record system (EMR) metadata to create a risk prediction tool for the management of patients at risk of AF. The best model that includes features from all modalities demonstrates that human knowledge in electrophysiology combined with deep learning outperforms any single modality approach. The high performance obtained suggest that structural changes in the 12-lead ECG are associated with existing or impending AF.