PCA learning for sparse high-dimensional data

Hoyle, D. C.; Rattray, M.

Published in

EPL Association, European Physical Society Letters, 1(62), p. 117-123

DOI: 10.1209/epl/i2003-00370-1

Tools

Export citation

Search in Google Scholar

PCA learning for sparse high-dimensional data

Journal article published in 2003 by D. C. Hoyle, M. Rattray

This paper is available in a repository.

Full text: Download

Preprint: archiving forbidden

Postprint: archiving restricted

Upload

Published version: archiving forbidden

Policy details

Data provided by

Abstract

We study the performance of principal component analysis (PCA). In particular, we consider the problem of how many training pattern vectors are required to accurately represent the low-dimensional structure of the data. This problem is of particular relevance now that PCA is commonly applied to extremely high-dimensional (N simeq 5000 30000) real data sets produced from molecular-biology research projects. In these applications the number of patterns p is often orders of magnitude less than the data dimension (p < ∞, with alpha = p/N fixed. For real data sets the strength of the symmetry breaking may increase with N, and therefore one must reconsider the accuracy of the mean-field theory. We show, using simulation results, that the mean-field theory is still accurate even when the strength of the symmetry breaking scales with N, and even for small values of alpha that are more appropriate to real biological data sets.

Published in

Links

Tools

PCA learning for sparse high-dimensional data

Abstract