Feature selection for high-dimensional integrated data

Zheng, Charles; Schwartz, Scott; Chapkin, Robert S.; Carroll, Raymond J.; Ivanov, Ivan

Published in

Proceedings of the 2012 SIAM International Conference on Data Mining, p. 1141-1150

DOI: 10.1137/1.9781611972825.98

Tools

Export citation

Search in Google Scholar

Feature selection for high-dimensional integrated data

Journal article published in 2011 by Charles Zheng

, Scott Schwartz, Robert S. Chapkin, Raymond J. Carroll, Ivan Ivanov

This paper is available in a repository.

Full text: Download

Preprint: archiving allowed

Upload

Postprint: archiving allowed

Upload

Published version: archiving allowed

Upload

Policy details

Data provided by

Abstract

Motivated by the problem of identifying correlations between genes or features of two related biological systems, we propose a model of \emph{feature selection} in which only a subset of the predictors $X_t$ are dependent on the multidimensional variate $Y$, and the remainder of the predictors constitute a "noise set" $X_u$ independent of $Y$. Using Monte Carlo simulations, we investigated the relative performance of two methods: thresholding and singular-value decomposition, in combination with stochastic optimization to determine "empirical bounds" on the small-sample accuracy of an asymptotic approximation. We demonstrate utility of the thresholding and SVD feature selection methods to with respect to a recent infant intestinal gene expression and metagenomics dataset. ; Comment: Submitted

Published in

Links

Tools

Feature selection for high-dimensional integrated data

Abstract