Why rankings of biomedical image analysis competitions should be interpreted with care

Maier-Hein, Lena; Eisenmann, Matthias; Reinke, Annika; Onogur, Sinan; Stankovic, Marko; Scholz, Patrick; Arbel, Tal; Bogunovic, Hrvoje; Bradley, Andrew P.; Carass, Aaron; Feldmann, Carolin; Frangi, Alejandro F.; Full, Peter M.; van Ginneken, Bram; Hanbury, Allan; Honauer, Katrin; Kozubek, Michal; Landman, Bennett A.; März, Keno; Maier, Oskar; Maier-Hein, Klaus; Menze, Bjoern H.; Müller, Henning; Neher, Peter F.; Niessen, Wiro; Rajpoot, Nasir; Sharp, Gregory C.; Sirinukunwattana, Korsuk; Speidel, Stefanie; Stock, Christian; Stoyanov, Danail; Taha, Abdel Aziz; van der Sommen, Fons; Wang, Ching-Wei; Weber, Marc-André; Zheng, Guoyan; Jannin, Pierre; Kopp-Schneider, Annette

Published in

Nature Research, Nature Communications, 1(9), 2018

DOI: 10.1038/s41467-018-07619-7

Tools

Export citation

Search in Google Scholar

Why rankings of biomedical image analysis competitions should be interpreted with care

Journal article published in 2018 by Lena Maier-Hein

, Matthias Eisenmann

, Annika Reinke, Sinan Onogur, Marko Stankovic, Patrick Scholz, Tal Arbel, Hrvoje Bogunovic, Andrew P. Bradley, Aaron Carass

, Carolin Feldmann, Alejandro F. Frangi

, Peter M. Full

, Bram van Ginneken, Allan Hanbury and other authors.

This paper is made freely available by the publisher.

Full text: Download

Preprint: archiving allowed

Upload

Postprint: archiving forbidden

Published version: archiving allowed

Upload

Policy details

Data provided by

Abstract

AbstractInternational challenges have become the standard for validation of biomedical image analysis methods. Given their scientific impact, it is surprising that a critical analysis of common practices related to the organization of challenges has not yet been performed. In this paper, we present a comprehensive analysis of biomedical image analysis challenges conducted up to now. We demonstrate the importance of challenges and show that the lack of quality control has critical consequences. First, reproducibility and interpretation of the results is often hampered as only a fraction of relevant information is typically provided. Second, the rank of an algorithm is generally not robust to a number of variables such as the test data used for validation, the ranking scheme applied and the observers that make the reference annotations. To overcome these problems, we recommend best practice guidelines and define open research questions to be addressed in the future.

Published in

Links

Tools

Why rankings of biomedical image analysis competitions should be interpreted with care

Abstract