K 2 and K2*: efficient alignment-free sequence similarity measurement based on Kendall statistics

Lin, Jie; Adjeroh, Donald A.; Jiang, Bing-Hua; Jiang, Yue

Published in

Oxford University Press, Bioinformatics, 10(34), p. 1682-1689, 2017

DOI: 10.1093/bioinformatics/btx809

Tools

Export citation

Search in Google Scholar

K 2 and K2*: efficient alignment-free sequence similarity measurement based on Kendall statistics

Journal article published in 2017 by Jie Lin, Donald A. Adjeroh, Bing-Hua Jiang

, Yue Jiang

This paper was not found in any repository, but could be made available legally by the author.

Full text: Unavailable

Preprint: archiving allowed

Upload

Postprint: archiving restricted

Upload

Published version: archiving forbidden

Policy details

Data provided by

Abstract

Abstract Motivation Alignment-free sequence comparison methods can compute the pairwise similarity between a huge number of sequences much faster than sequence-alignment based methods. Results We propose a new non-parametric alignment-free sequence comparison method, called K2, based on the Kendall statistics. Comparing to the other state-of-the-art alignment-free comparison methods, K2 demonstrates competitive performance in generating the phylogenetic tree, in evaluating functionally related regulatory sequences, and in computing the edit distance (similarity/dissimilarity) between sequences. Furthermore, the K2 approach is much faster than the other methods. An improved method, K2*, is also proposed, which is able to determine the appropriate algorithmic parameter (length) automatically, without first considering different values. Comparative analysis with the state-of-the-art alignment-free sequence similarity methods demonstrates the superiority of the proposed approaches, especially with increasing sequence length, or increasing dataset sizes. Availability and implementation The K2 and K2* approaches are implemented in the R language as a package and is freely available for open access (http://community.wvu.edu/daadjeroh/projects/K2/K2_1.0.tar.gz). Supplementary information Supplementary data are available at Bioinformatics online.

Published in

Links

Tools

K 2 and K2*: efficient alignment-free sequence similarity measurement based on Kendall statistics

Abstract