Springer Verlag, Pediatric Radiology, 7(52), p. 1272-1282, 2022
DOI: 10.1007/s00247-022-05312-y
Full text: Unavailable
Abstract Background Whole-body magnetic resonance imaging (MRI) is increasingly being used in children, however, to date there are no studies addressing the reliability of the findings. Objective To examine intra- and interobserver reliability of a scoring system for assessment of high signal areas within the bone marrow, as visualized on T2-weighted, fat-saturated images. Materials and methods Ninety-six whole-body MRIs (1.5 T) in 78 healthy volunteers (mean age: 11.5 years) and 18 children with chronic nonbacterial osteomyelitis (mean age: 12.4 years) were included. Coronal water-only Dixon T2-weighted images were used to score the left lower extremity/pelvis for high signal intensity areas, intensity (0–2 scale), extension (0–4 scale) and shape and contour in a blinded fashion by two pairs of radiologists. Results For the pelvis, grading of bone marrow signal showed moderate to good intra- and interobserver agreement with kappa values of 0.51–0.94 and 0.41–0.87, respectively. Corresponding figures for the femur were 0.61–0.68 within and 0.32–0.61 between observers, and for the tibia 0.60–0.72 and 0.51–0.73. Agreement for assessing extension was moderate to good both within and between observers for the pelvis (k = 0.52–0.85 and 0.35–0.80), for the femur (0.52–0.67 and 0.51–0.60) and for the tibia (k = 0.59–0.69 and 0.47–0.63) except for the femur metaphysis/diaphysis, with interobserver kappa values of 0.29–0.30. Scoring of shape was moderate to good within observers, but in general poorer between observers, with kappa values of 0.40–0.73 and 0.18–0.69, respectively. For contour, the corresponding figures were 0.35–0.62 and 0.09–0.54, respectively. Conclusion MRI grading of intensity and extension of high signal intensity areas within the bone marrow of pelvis and lower limb performs well and thus can be used interchangeably by different observers, while assessment of shape and contour is reliable for the same observer but is less reliable between observers. This should be considered when performing clinical trials.