Published in

Oxford University Press (OUP), Bioinformatics, 5(12), p. 415-422

DOI: 10.1093/bioinformatics/12.5.415

Links

Tools

Export citation

Search in Google Scholar

Syntactic recognition of regulatory regions in Escherichia coli

This paper is made freely available by the publisher.
This paper is made freely available by the publisher.

Full text: Download

Green circle
Preprint: archiving allowed
Green circle
Postprint: archiving allowed
Red circle
Published version: archiving forbidden
Data provided by SHERPA/RoMEO

Abstract

sec> Motivation One of the most common methodologies identify cis-regulatory sites in regulatory regions in the DNA is that of weight matrices, as testified by several articles this issue. An alternative to strengthen the computational predictions in regulatory regions is to develop methods that incorporate more biological properties present in such DNA regions. The grammatical implementation presented in this paper provides a concrete example in this direction. Results On the basis of the analysis of an exhaustive collection of regulatory regions in Escherichia coli, a grammatical model for the regulatory regions of σ70 promoters has been developed. The terminal symbols of the grammar represent individual sites for the binding of activator and repressor proteins, and include the precise position of sites in relation to transcription initiation. Combining these symbols, the grammar generates a large number of different sentences, each of which can be searched for matching against a collection of regulatory regions by means of weight matrices specific for each set of sites for individual proteins. On the basis of this grammatical model, a Prolog syntactic recognizer is presented here. Specific sub-grammars for ArgR, LexA and TyrR were implemented. When parsing a collection of 128 σ70 promoter regions, the syntactic recognizer produces a much lower number of false-positive sites than the standard search using weight matrices. Availability A WWW interface is under development and will be freely accessible at the url: http://www.cifn.unam.mx/Computational_Biology/index.html . Contact E-mail: collado@cifn.unam.mx </sec