Published in

Oxford University Press (OUP), Bioinformatics, 5(22), p. 589-596

DOI: 10.1093/bioinformatics/btk026

Links

Tools

Export citation

Search in Google Scholar

A multi-step approach to time series analysis and gene expression clustering

This paper is made freely available by the publisher.
This paper is made freely available by the publisher.

Full text: Download

Green circle
Preprint: archiving allowed
Green circle
Postprint: archiving allowed
Red circle
Published version: archiving forbidden
Data provided by SHERPA/RoMEO

Abstract

Motivation: The huge growth in gene expression data calls for the implementation of automatic tools for data processing and interpretation. Results: We present a new and comprehensive machine learning data mining framework consisting in a non-linear PCA Neural Network for feature extraction, and Probabilistic Principal Surfaces combined with an agglomerative approach based on Negentropy aimed at clustering gene microarray data. The method, which provides a user friendly visualization interface, can work on noisy data with missing points, and represents an automatic procedure to get, with no a priori assumptions, the number of clusters present in the data. Cell-cycle data set and a detailed analysis confirm the biological nature of the most significant clusters. Availability: The software described here is a subpackage part of the ASTRONEURAL package and is available upon request from the corresponding author.