Modeling the percolation of annotation errors in a database of protein sequences

Gilks, Walter R.; Wr, Gilks; Audit, Benjamin; De Angelis, Daniela; Tsoka, Sophia; Ca, Ouzounis; Ouzounis, Christos A.

Published in

Oxford University Press (OUP), Bioinformatics, 12(18), p. 1641-1649

DOI: 10.1093/bioinformatics/18.12.1641

Tools

Export citation

Search in Google Scholar

Modeling the percolation of annotation errors in a database of protein sequences

Journal article published in 2002 by Walter R. Gilks, Gilks Wr, Benjamin Audit, Daniela De Angelis

, Sophia Tsoka, Ouzounis Ca, Christos A. Ouzounis

This paper is made freely available by the publisher.

Full text: Download

Preprint: archiving allowed

Upload

Postprint: archiving allowed

Upload

Published version: archiving forbidden

Policy details

Data provided by

Abstract

Public sequence databases contain information on the sequence, structure and function of proteins. Genome sequencing projects have led to a rapid increase in protein sequence information, but reliable, experimentally verified, information on protein function lags a long way behind. To address this deficit, functional annotation in protein databases is often inferred by sequence similarity to homologous, annotated proteins, with the attendant possibility of error. Now, the functional annotation in these homologous proteins may itself have been acquired through sequence similarity to yet other proteins, and it is generally not possible to determine how the functional annotation of any given protein has been acquired. Thus the possibility of chains of misannotation arises, a process we term 'error percolation'. With some simple assumptions, we develop a dynamical probabilistic model for these misannotation chains. By exploring the consequences of the model for annotation quality it is evident that this iterative approach leads to a systematic deterioration of database quality.

Published in

Links

Tools

Modeling the percolation of annotation errors in a database of protein sequences

Abstract