Will my protein crystallize? A sequence-based predictor

Smialowski, Pawel; Schmidt, Thorsten; Cox, Jürgen; Kirschner, Andreas; Frishman, Dmitrij

Published in

Wiley, Proteins: Structure, Function, and Bioinformatics, 2(62), p. 343-355, 2005

DOI: 10.1002/prot.20789

Tools

Export citation

Search in Google Scholar

Will my protein crystallize? A sequence-based predictor

Journal article published in 2005 by Pawel Smialowski, Thorsten Schmidt, Jürgen Cox, Andreas Kirschner, Dmitrij Frishman

This paper is available in a repository.

Full text: Download

Preprint: archiving allowed

Upload

Postprint: archiving restricted

Upload

Published version: archiving forbidden

Policy details

Data provided by

Abstract

We propose a machine-learning approach to sequence-based prediction of protein crystallizability in which we exploit subtle differences between proteins whose structures were solved by X-ray analysis [or by both X-ray and nuclear magnetic resonance (NMR) spectroscopy] and those proteins whose structures were solved by NMR spectroscopy alone. Because the NMR technique is usually applied on relatively small proteins, sequence length distributions of the X-ray and NMR datasets were adjusted to avoid predictions biased by protein size. As feature space for classification, we used frequencies of mono-, di-, and tripeptides represented by the original 20-letter amino acid alphabet as well as by several reduced alphabets in which amino acids were grouped by their physicochemical and structural properties. The classification algorithm was constructed as a two-layered structure in which the output of primary support vector machine classifiers operating on peptide frequencies was combined by a second-level Naive Bayes classifier. Due to the application of metamethods for cost sensitivity, our method is able to handle real datasets with unbalanced class representation. An overall prediction accuracy of 67% [65% on the positive (crystallizable) and 69% on the negative (noncrystallizable) class] was achieved in a 10-fold cross-validation experiment, indicating that the proposed algorithm may be a valuable tool for more efficient target selection in structural genomics. A Web server for protein crystallizability prediction called SECRET is available at http://webclu.bio.wzw.tum.de:8080/secret.

Published in

Links

Tools

Will my protein crystallize? A sequence-based predictor

Abstract