Published in

American Society of Clinical Oncology, JCO Clinical Cancer Informatics, 7, 2023

DOI: 10.1200/cci.23.00085

Links

Tools

Export citation

Search in Google Scholar

Development and Validation of a Tool to Identify Patients Diagnosed With Castration-Resistant Prostate Cancer

This paper is made freely available by the publisher.
This paper is made freely available by the publisher.

Full text: Download

Red circle
Preprint: archiving forbidden
Orange circle
Postprint: archiving restricted
Red circle
Published version: archiving forbidden
Data provided by SHERPA/RoMEO

Abstract

PURPOSE Several novel therapies for castration-resistant prostate cancer (CRPC) have been approved with randomized phase III studies with continuing observational research either planned or ongoing. Accurately identifying patients with CRPC in electronic health care data is critical for quality observational research, resource allocation, and quality improvement. Previous work in this area has relied on either structured laboratory results and medication data or natural language processing (NLP) methods. However, a computable phenotype using both structured data and NLP identifies these patients with more accuracy. METHODS The Corporate Data Warehouse (CDW) of the Veterans Health Administration (VHA) was used to collect PCa diagnoses, prostate-specific antigen test results, and information regarding patient characteristics and medication use. The final system used for validation and subsequent analysis combined the NLP system and an algorithm of structured laboratory and medication data to identify patients as being diagnosed with CRPC. Patients with both a documented diagnosis of CRPC and a documented diagnosis of metastatic PCa were classified as having mCRPC by this system. RESULTS Among 1.2 million veterans with PCa, the International Classification of Diseases (ICD)-10 diagnosis code for CRPC (Z19.2) identifies 3,791 patients from 2016 when the code was created until 2022, compared with the combined algorithm which identifies 14,103, 10,312 more than ICD-10 codes alone, from 2016 to 2022. The combined algorithm showed a sensitivity of 97.9% and a specificity of 99.2%. CONCLUSION ICD-10 codes proved to be insufficient for capturing CRPC in the VHA CDW data. Using both structured and unstructured data identified more than double the number of patients compared with ICD-10 codes alone. Application of this combined approach drastically improved identification of real-world patients and enables high-quality observational research in mCRPC.