Links

Tools

Export citation

Search in Google Scholar

Prediction of subplastidial localization of chloroplast proteins from spectral count data - Comparison of machine learning algorithms

This paper is available in a repository.
This paper is available in a repository.

Full text: Download

Question mark in circle
Preprint: policy unknown
Question mark in circle
Postprint: policy unknown
Question mark in circle
Published version: policy unknown

Abstract

To study chloroplast metabolism and functions, subplastidial localization is a prerequisite to achieve protein functional characterization. As the accurate localization of many chloroplast proteins often remains hypothetical, we set up a proteomics strategy in order to assign the accurate subplastidial localization. A comprehensive study of Arabidopsis thaliana chloroplast proteome has been carried out in our group [1], involving high performance mass spectrometry analyses of highly fractionated chloroplasts. In particular, spectral count data were acquired for the three major chloroplast sub-fractions (stroma, thylakoids and envelope) obtained by sucrose gradient purification. As the distribution of spectral counts over compartments is a fair predicator of relative abundance of proteins [2], it was justified to propose a prime statistical model [1] relating spectral counts to subplastidial localization. This predictive model was based on a logistic regression, and demonstrated an accuracy rate of 84% for chloroplast proteins. In the present work, we conducted a comparative study of various machine learning techniques to generate a predictive model of subplastidial localization of chloroplast proteins based on spectral count data.