Sequence Comparison Alignment-Free Approach Based on Suffix Tree andL-WordsFrequency

Soares, Inês; Goios, Ana; Amorim, António

Published in

Hindawi, Scientific World Journal, (2012), p. 1-4, 2012

DOI: 10.1100/2012/450124

Tools

Export citation

Search in Google Scholar

Sequence Comparison Alignment-Free Approach Based on Suffix Tree andL-WordsFrequency

Journal article published in 2012 by Inês Soares, Ana Goios, António Amorim

This paper is made freely available by the publisher.

Full text: Download

Preprint: archiving allowed

Upload

Postprint: archiving allowed

Upload

Published version: archiving allowed

Upload

Policy details

Data provided by

Abstract

The vast majority of methods available for sequence comparison rely on a first sequence alignment step, which requires a number of assumptions on evolutionary history and is sometimes very difficult or impossible to perform due to the abundance of gaps (insertions/deletions). In such cases, an alternative alignment-free method would prove valuable. Our method starts by a computation of a generalized suffix tree of all sequences, which is completed in linear time. Using this tree, the frequency of all possible words with a preset lengthL—L-words—in each sequence is rapidly calculated. Based on theL-wordsfrequency profile of each sequence, a pairwise standard Euclidean distance is then computed producing a symmetric genetic distance matrix, which can be used to generate a neighbor joining dendrogram or a multidimensional scaling graph. We present an improvement to word counting alignment-free approaches for sequence comparison, by determining a single optimal word length and combining suffix tree structures to the word counting tasks. Our approach is, thus, a fast and simple application that proved to be efficient and powerful when applied to mitochondrial genomes. The algorithm was implemented in Python language and is freely available on the web.

Published in

Links

Tools

Sequence Comparison Alignment-Free Approach Based on Suffix Tree andL-WordsFrequency

Abstract