Full text: Download
A natural language processing (NLP) pipeline was developed to identify lumbar spine imaging findings associated with low back pain (LBP) in X-radiation (X-ray), computed tomography (CT), and magnetic resonance imaging (MRI) reports. A total of 18,640 report datasets were randomly sampled (stratified by imaging modality) to obtain a balanced sample of 300 X-ray, 300 CT, and 300 MRI reports. A total of 23 radiologic findings potentially related to LBP were defined, and their presence was extracted from radiologic reports. In developing NLP pipelines, section and sentence segmentation from the radiology reports was performed using a rule-based method, including regular expression with negation detection. Datasets were randomly split into 80% for development and 20% for testing to evaluate the model’s extraction performance. The performance of the NLP pipeline was evaluated by using recall, precision, accuracy, and the F1 score. In evaluating NLP model performances, four parameters—recall, precision, accuracy, and F1 score—were greater than 0.9 for all 23 radiologic findings. These four scores were 1.0 for 10 radiologic findings (listhesis, annular fissure, disc bulge, disc extrusion, disc protrusion, endplate edema or Type 1 Modic change, lateral recess stenosis, Schmorl’s node, osteophyte, and any stenosis). In the seven potentially clinically important radiologic findings, the F1 score ranged from 0.9882 to 1.0. In this study, a rule-based NLP system identifying 23 findings related to LBP from X-ray, CT, and MRI reports was developed, and it presented good performance in regards to the four scoring parameters.