Evaluation metrics and statistical tests for machine learning

Rainio, Oona; Teuho, Jarmo; Klén, Riku

Published in

Nature Research, Scientific Reports, 1(14), 2024

DOI: 10.1038/s41598-024-56706-x

Tools

Export citation

Search in Google Scholar

Evaluation metrics and statistical tests for machine learning

Journal article published in 2024 by Oona Rainio

, Jarmo Teuho

, Riku Klén

This paper is made freely available by the publisher.

Full text: Download

Preprint: archiving allowed

Upload

Postprint: archiving forbidden

Published version: archiving allowed

Upload

Policy details

Data provided by

Abstract

AbstractResearch on different machine learning (ML) has become incredibly popular during the past few decades. However, for some researchers not familiar with statistics, it might be difficult to understand how to evaluate the performance of ML models and compare them with each other. Here, we introduce the most common evaluation metrics used for the typical supervised ML tasks including binary, multi-class, and multi-label classification, regression, image segmentation, object detection, and information retrieval. We explain how to choose a suitable statistical test for comparing models, how to obtain enough values of the metric for testing, and how to perform the test and interpret its results. We also present a few practical examples about comparing convolutional neural networks used to classify X-rays with different lung infections and detect cancer tumors in positron emission tomography images.

Published in

Links

Tools

Evaluation metrics and statistical tests for machine learning

Abstract