Sparse Partial Least Squares Classification for High Dimensional Data*

Chung, Dongjun; Keles, Sunduz

Published in

De Gruyter, Statistical Applications in Genetics and Molecular Biology, 1(9), 2010

DOI: 10.2202/1544-6115.1492

Tools

Export citation

Search in Google Scholar

Sparse Partial Least Squares Classification for High Dimensional Data*

Journal article published in 2010 by Dongjun Chung

, Sunduz Keles

This paper is available in a repository.

Full text: Download

Preprint: archiving forbidden

Postprint: archiving forbidden

Published version: archiving restricted

Upload

Policy details

Data provided by

Abstract

Partial least squares (PLS) is a well known dimension reduction method which has been recently adapted for high dimensional classification problems in genome biology. We develop sparse versions of the recently proposed two PLS-based classification methods using sparse partial least squares (SPLS). These sparse versions aim to achieve variable selection and dimension reduction simultaneously. We consider both binary and multicategory classification. We provide analytical and simulation-based insights about the variable selection properties of these approaches and benchmark them on well known publicly available datasets that involve tumor classification with high dimensional gene expression data. We show that incorporation of SPLS into a generalized linear model (GLM) framework provides higher sensitivity in variable selection for multicategory classification with unbalanced sample sizes between classes. As the sample size increases, the two-stage approach provides comparable sensitivity with better specificity in variable selection. In binary classification and multicategory classification with balanced sample sizes, the two-stage approach provides comparable variable selection and prediction accuracy as the GLM version and is computationally more efficient.

Published in

Links

Tools

Sparse Partial Least Squares Classification for High Dimensional Data*

Abstract