Filtering Redundancies For Sequence Similarity Search Programs

Cantalloube, Hubert; Chomilier, Jacques; Chiusa, Sylvain; Lonquety, Mathieu; Spadoni, Jean-Louis; Zagury, Jean-François

Published in

Taylor and Francis Group, Journal of Biomolecular Structure and Dynamics, 4(22), p. 487-492

DOI: 10.1080/07391102.2005.10507020

Tools

Export citation

Search in Google Scholar

Filtering Redundancies For Sequence Similarity Search Programs

Journal article published in 2005 by Hubert Cantalloube, Jacques Chomilier, Sylvain Chiusa, Mathieu Lonquety, Jean-Louis Spadoni, Jean-François Zagury

This paper is available in a repository.

Full text: Download

Preprint: archiving forbidden

Postprint: archiving restricted

Upload

Published version: archiving forbidden

Policy details

Data provided by

Abstract

Database scanning programs such as BLAST and FASTA are used nowadays by most biologists for the post-genomic processing of DNA or protein sequence information (in particular to retrieve the structure/function of uncharacterized proteins). Unfortunately, their results can be polluted by identical alignments (called redundancies) coming from the same protein or DNA sequences present in different entries of the database. This makes the efficient use of the listed alignments difficult. Pretreatment of databases has been proposed to suppress strictly identical entries. However, there still remain many identical alignments since redundancies may occur locally for entries corresponding to various fragments of the same sequence or for entries corresponding to very homologous sequences but differing at the level of a few residues such as ortholog proteins. In the present work, we show that redundant alignments can be indeed numerous even when working with a pretreated non-redundant data bank, going as high as 60% of the output results according to the query and the bank. Therefore the accuracy and the efficiency of the post-genomic work will be greatly increased if these redundancies are removed. To solve this up to now unaddressed problem, we have developed an algorithm that allows for the efficient and safe suppression of all the redundancies with no loss of information. This algorithm is based on various filtering steps that we describe here in the context of the Automat similarity search program, and such an algorithm should also be added to the other similarity search programs (BLAST, FASTA, etc...).

Published in

Links

Tools

Filtering Redundancies For Sequence Similarity Search Programs

Abstract