Conditional random fields feature subset selection based on genetic algorithms for phosphorylation site prediction

Thanh, Hai; Thanh Hai, Dang; Engelen, Kristof; Meysman, Pieter; Marchal, Kathleen; Verschoren, Alain; Laukens, Kris; Ieee,

Published in

2009 International Conference on Knowledge and Systems Engineering

DOI: 10.1109/kse.2009.11

Tools

Export citation

Search in Google Scholar

Conditional random fields feature subset selection based on genetic algorithms for phosphorylation site prediction

Journal article published in 2009 by Hai Thanh, Dang Thanh Hai, Kristof Engelen, Pieter Meysman, Kathleen Marchal, Alain Verschoren, Kris Laukens

, Ieee

This paper is available in a repository.

Full text: Download

Preprint: archiving allowed

Upload

Postprint: archiving allowed

Upload

Published version: archiving forbidden

Policy details

Data provided by

Abstract

Conditional Random Fields (CRFs) are undirected prob-abilistic graphical models that were introduced for solving sequence labeling and segmenting problems. CRFs have several advantages compared to other well understood and widely used techniques such as Hidden Markov Models (HMMs) or Maximum Entropy Markov Models (MEMMs). Being a conditional model, it does not explicitly model the input data sequences but uses feature functions (fea-tures) to incorporate the arbitrary interactions and inter-dependencies that exist in the observation sequences. The number of all possible features is extremely large, up to mil-lions, and is usually specified and designed in advance or according to a feature-generating scheme based on domain knowledge. This paper introduces a feature subset selection method for CRFs based on genetic algorithms, in which a population of candidate feature function subsets is evolved to achieve a maximal CRF performance. The method was experimentally validated on the well known bioinformatics problem of protein phosphorylation site prediction, phos-phorylation being one of the most important protein modifi-cation mechanisms.

Published in

Links

Tools

Conditional random fields feature subset selection based on genetic algorithms for phosphorylation site prediction

Abstract