STEME: efficient EM to find motifs in large data sets

Reid, John E.; Wernisch, Lorenz

Published in

Oxford University Press, Nucleic Acids Research, 18(39), p. e126-e126, 2011

DOI: 10.1093/nar/gkr574

Tools

Export citation

Search in Google Scholar

STEME: efficient EM to find motifs in large data sets

Journal article published in 2011 by John E. Reid

, Lorenz Wernisch

This paper is made freely available by the publisher.

Full text: Download

Preprint: archiving allowed

Upload

Postprint: archiving allowed

Upload

Published version: archiving allowed

Upload

Policy details

Data provided by

Abstract

MEME and many other popular motif finders use the expectation–maximization (EM) algorithm to optimize their parameters. Unfortunately, the running time of EM is linear in the length of the input sequences. This can prohibit its application to data sets of the size commonly generated by high-throughput biological techniques. A suffix tree is a data structure that can efficiently index a set of sequences. We describe an algorithm, Suffix Tree EM for Motif Elicitation (STEME), that approximates EM using suffix trees. To the best of our knowledge, this is the first application of suffix trees to EM. We provide an analysis of the expected running time of the algorithm and demonstrate that STEME runs an order of magnitude more quickly than the implementation of EM used by MEME. We give theoretical bounds for the quality of the approximation and show that, in practice, the approximation has a negligible effect on the outcome. We provide an open source implementation of the algorithm that we hope will be used to speed up existing and future motif search algorithms.

Published in

Links

Tools

STEME: efficient EM to find motifs in large data sets

Abstract