American Heart Association, Circulation: Cardiovascular Quality and Outcomes, 8(14), 2021
DOI: 10.1161/circoutcomes.121.007858
Full text: Unavailable
Background: There are many clinical prediction models (CPMs) available to inform treatment decisions for patients with cardiovascular disease. However, the extent to which they have been externally tested, and how well they generally perform has not been broadly evaluated. Methods: A SCOPUS citation search was run on March 22, 2017 to identify external validations of cardiovascular CPMs in the Tufts Predictive Analytics and Comparative Effectiveness CPM Registry. We assessed the extent of external validation, performance heterogeneity across databases, and explored factors associated with model performance, including a global assessment of the clinical relatedness between the derivation and validation data. Results: We identified 2030 external validations of 1382 CPMs. Eight hundred seven (58%) of the CPMs in the Registry have never been externally validated. On average, there were 1.5 validations per CPM (range, 0–94). The median external validation area under the receiver operating characteristic curve was 0.73 (25th–75th percentile [interquartile range (IQR)], 0.66–0.79), representing a median percent decrease in discrimination of −11.1% (IQR, −32.4% to +2.7%) compared with performance on derivation data. 81% (n=1333) of validations reporting area under the receiver operating characteristic curve showed discrimination below that reported in the derivation dataset. 53% (n=983) of the validations report some measure of CPM calibration. For CPMs evaluated more than once, there was typically a large range of performance. Of 1702 validations classified by relatedness, the percent change in discrimination was −3.7% (IQR, −13.2 to 3.1) for closely related validations (n=123), −9.0 (IQR, −27.6 to 3.9) for related validations (n=862), and −17.2% (IQR, −42.3 to 0) for distantly related validations (n=717; P <0.001). Conclusions: Many published cardiovascular CPMs have never been externally validated, and for those that have, apparent performance during development is often overly optimistic. A single external validation appears insufficient to broadly understand the performance heterogeneity across different settings.