Published in

2011 11th International Conference on Intelligent Systems Design and Applications

DOI: 10.1109/isda.2011.6121678

Links

Tools

Export citation

Search in Google Scholar

Hierarchical Multi-Label Classification for Protein Function Prediction: A Local Approach based on Neural Networks

This paper is available in a repository.
This paper is available in a repository.

Full text: Download

Green circle
Preprint: archiving allowed
Green circle
Postprint: archiving allowed
Red circle
Published version: archiving forbidden
Data provided by SHERPA/RoMEO

Abstract

In Hierarchical Multi-Label Classification problems, each instance can be classified into two or more classes simultaneously, differently from conventional classification. Additionally, the classes are structured in a hierarchy, in the form of either a tree or a directed acyclic graph. Hence, an instance can be assigned to two or more paths from the hierarchical structure, resulting in a complex classification problem with possibly hundreds of classes. Many methods have been proposed to deal with such problems, some of them employing a single classifier to deal with all classes simultaneously (global methods), and others employing many classifiers to decompose the original problem into a set of subproblems (local methods). In this work, we propose a novel local method named HMC-LMLP, which uses one Multi-Layer Perceptron per hierarchical level. The predictions in one level are used as inputs to the network responsible for the predictions in the next level. We make use of two distinct Multi-Layer Perceptron algorithms: Back-propagation and Resilient Back-propagation. In addition, we make use of an error measure specially tailored to multi-label problems for training the networks. Our method is compared to state-of-the-art hierarchical multi-label classification algorithms, in protein function prediction datasets. The experimental results show that our approach presents competitive predictive accuracy, suggesting that artificial neural networks constitute a promising alternative to deal with hierarchical multi-label classification of biological data.