Springer, Lecture Notes in Computer Science, p. 306-317, 2008
DOI: 10.1007/978-3-540-87361-7_26
Full text: Download
Variable order Markov chains (VOMCs) are a flexible class of models that extend the well-known Markov chains. They have been applied to a variety of problems in computational biology, e.g. protein family classification. A linear time and space construction algorithm has been published in 2000 by Apostolico and Bejerano. However, neither a report of the actual running time nor an implementation of it have been published since. In this paper we use the lazy suffix tree and the enhanced suffix array to improve upon the algorithm of Apostolico and Bejerano. We introduce a new software which is orders of magnitude faster than current tools for building VOMCs, and is suitable for large scale sequence analysis.