Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring

Bakker, Jessie P.; Ross, Marco; Cerny, Andreas; Vasko, Ray; Shaw, Edmund; Kuna, Samuel; Magalang, Ulysses J.; Punjabi, Naresh M.; Anderer, Peter

Published in

Oxford University Press, SLEEP, 2(46), 2022

DOI: 10.1093/sleep/zsac154

Tools

Export citation

Search in Google Scholar

Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring

Journal article published in 2022 by Jessie P. Bakker

, Marco Ross

, Andreas Cerny

, Ray Vasko, Edmund Shaw, Samuel Kuna, Ulysses J. Magalang

, Naresh M. Punjabi, Peter Anderer

This paper is made freely available by the publisher.

Full text: Download

Preprint: archiving allowed

Upload

Postprint: archiving restricted

Upload

Published version: archiving forbidden

Policy details

Data provided by

Abstract

Abstract Study Objectives To quantify the amount of sleep stage ambiguity across expert scorers and to validate a new auto-scoring platform against sleep staging performed by multiple scorers. Methods We applied a new auto-scoring system to three datasets containing 95 PSGs scored by 6–12 scorers, to compare sleep stage probabilities (hypnodensity; i.e. the probability of each sleep stage being assigned to a given epoch) as the primary output, as well as a single sleep stage per epoch assigned by hierarchical majority rule. Results The percentage of epochs with 100% agreement across scorers was 46 ± 9%, 38 ± 10% and 32 ± 9% for the datasets with 6, 9, and 12 scorers, respectively. The mean intra-class correlation coefficient between sleep stage probabilities from auto- and manual-scoring was 0.91, representing excellent reliability. Within each dataset, agreement between auto-scoring and consensus manual-scoring was significantly higher than agreement between manual-scoring and consensus manual-scoring (0.78 vs. 0.69; 0.74 vs. 0.67; and 0.75 vs. 0.67; all p < 0.01). Conclusions Analysis of scoring performed by multiple scorers reveals that sleep stage ambiguity is the rule rather than the exception. Probabilities of the sleep stages determined by artificial intelligence auto-scoring provide an excellent estimate of this ambiguity. Compared to consensus manual-scoring, sleep staging derived from auto-scoring is for each individual PSG noninferior to manual-scoring meaning that auto-scoring output is ready for interpretation without the need for manual adjustment.

Published in

Links

Tools

Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring

Abstract