Discovering significant OPSM subspace clusters in massive gene expression data

Gao, Byron J.; Griffith, Obi L.; Ester, Martin; Jones, Steven J. M.

Published in

Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining - KDD '06

DOI: 10.1145/1150402.1150529

Tools

Export citation

Search in Google Scholar

Discovering significant OPSM subspace clusters in massive gene expression data

Proceedings article published in 2006 by Byron J. Gao, Obi L. Griffith

, Martin Ester, Steven J. M. Jones

This paper is available in a repository.

Full text: Download

Preprint: archiving allowed

Upload

Postprint: archiving allowed

Upload

Published version: archiving forbidden

Policy details

Data provided by

Abstract

Order-preserving submatrixes (OPSMs) have been accepted as a biologically meaningful subspace cluster model, captur- ing the general tendency of gene expressions across a subset of conditions. In an OPSM, the expression levels of all genes induce the same linear ordering of the conditions. OPSM mining is reducible to a special case of the sequential pat- tern mining problem, in which a pattern and its supporting sequences uniquely specify an OPSM cluster. Those small twig clusters, specifled by long patterns with naturally low support, incur explosive computational costs and would be completely pruned ofi by most existing methods for massive datasets containing thousands of conditions and hundreds of thousands of genes, which are common in today's gene ex- pression analysis. However, it is in particular interest of bi- ologists to reveal such small groups of genes that are tightly coregulated under many conditions, and some pathways or processes might require only two genes to act in concert. In this paper, we introduce the KiWi mining framework for massive datasets, that exploits two parameters k and w to provide a biased testing on a bounded number of candidates, substantially reducing the search space and problem scale, targeting on highly promising seeds that lead to signiflcant clusters and twig clusters. Extensive biological and compu- tational evaluations on real datasets demonstrate that KiWi can efiectively mine biologically meaningful OPSM subspace clusters with good e-ciency and scalability.

Published in

Links

Tools

Discovering significant OPSM subspace clusters in massive gene expression data

Abstract