Published in

2009 International Conference on Knowledge and Systems Engineering

DOI: 10.1109/kse.2009.11

Links

Tools

Export citation

Search in Google Scholar

Conditional random fields feature subset selection based on genetic algorithms for phosphorylation site prediction

This paper is available in a repository.
This paper is available in a repository.

Full text: Download

Green circle
Preprint: archiving allowed
Green circle
Postprint: archiving allowed
Red circle
Published version: archiving forbidden
Data provided by SHERPA/RoMEO

Abstract

Conditional Random Fields (CRFs) are undirected prob-abilistic graphical models that were introduced for solving sequence labeling and segmenting problems. CRFs have several advantages compared to other well understood and widely used techniques such as Hidden Markov Models (HMMs) or Maximum Entropy Markov Models (MEMMs). Being a conditional model, it does not explicitly model the input data sequences but uses feature functions (fea-tures) to incorporate the arbitrary interactions and inter-dependencies that exist in the observation sequences. The number of all possible features is extremely large, up to mil-lions, and is usually specified and designed in advance or according to a feature-generating scheme based on domain knowledge. This paper introduces a feature subset selection method for CRFs based on genetic algorithms, in which a population of candidate feature function subsets is evolved to achieve a maximal CRF performance. The method was experimentally validated on the well known bioinformatics problem of protein phosphorylation site prediction, phos-phorylation being one of the most important protein modifi-cation mechanisms.