Machine learning-based natural language processing to extract PD-L1 expression levels from clinical notes

Lin, Eric; Zwolinski, Robert; Wu, Julie Tsu-Yu; La, Jennifer; Goryachev, Sergey; Huhmann, Linden; Yildrim, Cenk; Tuck, David P.; Elbers, Danne C.; Brophy, Mary T.; Do, Nhan V.; Fillmore, Nathanael R.

Published in

SAGE Publications, Health Informatics Journal, 3(29), 2023

DOI: 10.1177/14604582231198021

Tools

Export citation

Search in Google Scholar

Machine learning-based natural language processing to extract PD-L1 expression levels from clinical notes

Journal article published in 2023 by Eric Lin

, Robert Zwolinski, Julie Tsu-Yu Wu, Jennifer La, Sergey Goryachev, Linden Huhmann, Cenk Yildrim, David P. Tuck, Danne C. Elbers

, Mary T. Brophy, Nhan V. Do

, Nathanael R. Fillmore

This paper is made freely available by the publisher.

Full text: Download

Preprint: archiving allowed

Upload

Postprint: archiving allowed

Upload

Published version: archiving allowed

Upload

Policy details

Data provided by

Abstract

Introduction: PD-L1 expression is used to determine oncology patients’ response to and eligibility for immunologic treatments; however, PD-L1 expression status often only exists in unstructured clinical notes, limiting ability to use it in population-level studies. Methods: We developed and evaluated a machine learning based natural language processing (NLP) tool to extract PD-L1 expression values from the nationwide Veterans Affairs electronic health record system. Results: The model demonstrated strong evaluation performance across multiple levels of label granularity. Mean precision of the overall PD-L1 positive label was 0.859 (sd, 0.039), recall 0.994 (sd, 0.013), and F1 0.921 (0.024). When a numeric PD-L1 value was identified, the mean absolute error of the value was 0.537 on a scale of 0 to 100. Conclusion: We presented an accurate NLP method for deriving PD-L1 status from clinical notes. By reducing the time and manual effort needed to review medical records, our work will enable future population-level studies in cancer immunotherapy.

Published in

Links

Tools

Machine learning-based natural language processing to extract PD-L1 expression levels from clinical notes

Abstract