Published in

Proceedings of the 27th Annual ACM Symposium on Applied Computing - SAC '12

DOI: 10.1145/2245276.2245325

Links

Tools

Export citation

Search in Google Scholar

A Genetic Algorithm for Hierarchical Multi-Label Classification

Proceedings article published in 2012 by Ricardo Cerri, Rodrigo C. Barros ORCID, André C. P. L. F. de Carvalho
This paper is available in a repository.
This paper is available in a repository.

Full text: Download

Green circle
Preprint: archiving allowed
Green circle
Postprint: archiving allowed
Red circle
Published version: archiving forbidden
Data provided by SHERPA/RoMEO

Abstract

In Hierarchical Multi-Label Classification (HMC) problems, each example can be classified into two or more classes simultaneously, differently from standard classification. Moreover, the classes are structured in a hierarchy, in the form of either a tree or a directed acyclic graph. Therefore, an example can be assigned to two or more paths from a hierarchical structure, resulting in a complex classification problem with possibly hundreds or thousands of classes. Several methods have been proposed to deal with such problems, some of them employing a single classifier to deal with all classes simultaneously (global methods), and others employing many classifiers to decompose the original problem into a set of subproblems (local methods). In this work, we propose a novel global method called HMC-GA, which employs a genetic algorithm for solving the HMC problem. In our approach, the genetic algorithm evolves the antecedents of classification rules, in order to optimize the level of coverage of each antecedent. Then, the set of optimized antecedents is selected to build the corresponding consequent of the rules (set of classes to be predicted). Our method is compared to state-of-the-art HMC algorithms, in protein function prediction datasets. The experimental results show that our approach presents competitive predictive accuracy, suggesting that genetic algorithms constitute a promising alternative to deal with hierarchical multi-label classification of biological data.