Strategies for building robust prediction models using data unavailable at prediction time

Yang, Haoyu; Tourani, Roshan; Zhu, Ying; Kumar, Vipin; Melton, Genevieve B.; Steinbach, Michael; Simon, Gyorgy

Published in

Oxford University Press (OUP), JAMIA: A Scholarly Journal of Informatics in Health and Biomedicine, 1(29), p. 72-79, 2021

DOI: 10.1093/jamia/ocab229

Tools

Export citation

Search in Google Scholar

Strategies for building robust prediction models using data unavailable at prediction time

Journal article published in 2021 by Haoyu Yang, Roshan Tourani, Ying Zhu, Vipin Kumar, Genevieve B. Melton

, Michael Steinbach, Gyorgy Simon

This paper was not found in any repository, but could be made available legally by the author.

Full text: Unavailable

Preprint: archiving allowed

Upload

Postprint: archiving allowed

Upload

Published version: archiving forbidden

Policy details

Data provided by

Abstract

Abstract Objective Hospital-acquired infections (HAIs) are associated with significant morbidity, mortality, and prolonged hospital length of stay. Risk prediction models based on pre- and intraoperative data have been proposed to assess the risk of HAIs at the end of the surgery, but the performance of these models lag behind HAI detection models based on postoperative data. Postoperative data are more predictive than pre- or interoperative data since it is closer to the outcomes in time, but it is unavailable when the risk models are applied (end of surgery). The objective is to study whether such data, which is temporally unavailable at prediction time (TUP) (and thus cannot directly enter the model), can be used to improve the performance of the risk model. Materials and Methods An extensive array of 12 methods based on logistic/linear regression and deep learning were used to incorporate the TUP data using a variety of intermediate representations of the data. Due to the hierarchical structure of different HAI outcomes, a comparison of single and multi-task learning frameworks is also presented. Results and Discussion The use of TUP data was always advantageous as baseline methods, which cannot utilize TUP data, never achieved the top performance. The relative performances of the different models vary across the different outcomes. Regarding the intermediate representation, we found that its complexity was key and that incorporating label information was helpful. Conclusions Using TUP data significantly helped predictive performance irrespective of the model complexity.

Published in

Links

Tools

Strategies for building robust prediction models using data unavailable at prediction time

Abstract