Accurate Detection of Very Sparse Sequence Motifs

Heger, Andreas; Lappe, Michael; Holm, Liisa

Published in

Proceedings of the seventh annual international conference on Computational molecular biology - RECOMB '03

DOI: 10.1145/640075.640094

Mary Ann Liebert, Journal of Computational Biology, 5(11), p. 843-857

DOI: 10.1089/1066527042432242

Mary Ann Liebert, Journal of Computational Biology, 5(11), p. 843-857

DOI: 10.1089/cmb.2004.11.843

Tools

Export citation

Search in Google Scholar

Accurate Detection of Very Sparse Sequence Motifs

Journal article published in 2003 by Andreas Heger

, Michael Lappe, Liisa Holm

This paper is available in a repository.

Full text: Download

Preprint: archiving allowed

Upload

Postprint: archiving allowed

Upload

Published version: archiving forbidden

Policy details

Data provided by

Abstract

Protein sequence alignments are more reliable the shorter the evolutionary distance. Here, we align distantly related proteins using many closely spaced intermediate sequences as stepping stones. Such transitive alignments can be generated between any two proteins in a connected set, whether they are direct or indirect sequence neighbors in the underlying library of pairwise alignments. We have implemented a greedy algorithm, MaxFlow, using a novel consistency score to estimate the relative likelihood of alternative paths of transitive alignment. In contrast to traditional profile models of amino acid preferences, MaxFlow models the probability that two positions are structurally equivalent and retains high information content across large distances in sequence space. Thus, MaxFlow is able to identify sparse and narrow active-site sequence signatures which are embedded in high-entropy sequence segments in the structure based multiple alignment of large diverse enzyme superfamilies. In a challenging benchmark based on the urease superfamily, MaxFlow yields better reliability and double coverage compared to available sequence alignment software. This promises to increase information returns from functional and structural genomics, where reliable sequence alignment is a bottleneck to transferring the functional or structural characterization of model proteins to entire protein superfamilies.

Published in

Links

Tools

Accurate Detection of Very Sparse Sequence Motifs

Abstract