Lipoprotein computational prediction in spirochaetal genomes

Reis, Marcelo; Setubal, João C.; Matsunaga, James; Haake, David A.

Published in

Microbiology Society, Microbiology, 1(152), p. 113-121, 2006

DOI: 10.1099/mic.0.28317-0

Tools

Export citation

Search in Google Scholar

Lipoprotein computational prediction in spirochaetal genomes

Journal article published in 2006 by Marcelo Reis

, João C. Setubal, James Matsunaga, David A. Haake

This paper is available in a repository.

Full text: Download

Preprint: archiving allowed

Upload

Postprint: archiving allowed

Upload

Published version: archiving forbidden

Policy details

Data provided by

Abstract

Lipoproteins are of great interest in understanding the molecular pathogenesis of spirochaetes. Because spirochaete lipobox sequences exhibit more plasticity than those of other bacteria, application of existing prediction algorithms to emerging sequence data has been problematic. In this paper a novel lipoprotein prediction algorithm is described, designated SpLip, constructed as a hybrid of a lipobox weight matrix approach supplemented by a set of lipoprotein signal peptide rules allowing for conservative amino acid substitutions. Both the weight matrix and the rules are based on a training set of 28 experimentally verified spirochaetal lipoproteins. The performance of the SpLip algorithm was compared to that of the hidden Markov model-based LipoP program and the rules-based algorithm Psort for all predicted protein-coding genes of Leptospira interrogans sv. Copenhageni, L. interrogans sv. Lai, Borrelia burgdorferi, Borrelia garinii, Treponema pallidum and Treponema denticola. Psort sensitivity (13–35 %) was considerably less than that of SpLip (93–100 %) or LipoP (50–84 %) due in part to the requirement of Psort for Ala or Gly at the −1 position, a rule based on E. coli lipoproteins. The percentage of false-positive lipoprotein predictions by the LipoP algorithm (8–30 %) was greater than that of SpLip (0–1 %) or Psort (4–27 %), due in part to the lack of rules in LipoP excluding unprecedented amino acids such as Lys and Arg in the −1 position. This analysis revealed a higher number of predicted spirochaetal lipoproteins than was previously known. The improved performance of the SpLip algorithm provides a more accurate prediction of the complete lipoprotein repertoire of spirochaetes. The hybrid approach of supplementing weight matrix scoring with rules based on knowledge of protein secretion biochemistry may be a general strategy for development of improved prediction algorithms.

Published in

Links

Tools

Lipoprotein computational prediction in spirochaetal genomes

Abstract