Estimating Dataset Size Requirements for Classifying DNA Microarray Data

Mukherjee, Sayan; Tamayo, Pablo; Rogers, Simon; Rifkin, Ryan; Engle, Anna; Campbell, Colin; Golub, Todd R.; Mesirov, Jill P.

Published in

Mary Ann Liebert, Journal of Computational Biology, 2(10), p. 119-142, 2003

DOI: 10.1089/106652703321825928

Tools

Export citation

Search in Google Scholar

Estimating Dataset Size Requirements for Classifying DNA Microarray Data

Journal article published in 2003 by Sayan Mukherjee, Pablo Tamayo, Simon Rogers

, Ryan Rifkin, Anna Engle, Colin Campbell, Todd R. Golub, Jill P. Mesirov

This paper is available in a repository.

Full text: Download

Preprint: archiving allowed

Upload

Postprint: archiving allowed

Upload

Published version: archiving forbidden

Policy details

Data provided by

Abstract

A statistical methodology for estimating dataset size requirements for classifying microarray data using learning curves is introduced. The goal is to use existing classification results to estimate dataset size requirements for future classification experiments and to evaluate the gain in accuracy and significance of classifiers built with additional data. The method is based on fitting inverse power-law models to construct empirical learning curves. It also includes a permutation test procedure to assess the statistical significance of classification performance for a given dataset size. This procedure is applied to several molecular classification problems representing a broad spectrum of levels of complexity.

Published in

Links

Tools

Estimating Dataset Size Requirements for Classifying DNA Microarray Data

Abstract