Investigating fitness functions for a hyper-heuristic evolutionary algorithm in the context of balanced and imbalanced data classification

Barros, Rodrigo C.; Basgalupp, Márcio P.; de Carvalho, André C. P. L. F.; Carvalho, André Carlos Ponce de Leon Ferreira de

Published in

Springer (part of Springer Nature), Genetic Programming and Evolvable Machines, 3(16), p. 241-281

DOI: 10.1007/s10710-014-9235-z

Tools

Export citation

Search in Google Scholar

Investigating fitness functions for a hyper-heuristic evolutionary algorithm in the context of balanced and imbalanced data classification

Journal article published in 2014 by Rodrigo C. Barros

, Márcio P. Basgalupp, André C. P. L. F. de Carvalho, André Carlos Ponce de Leon Ferreira de Carvalho

This paper is available in a repository.

Full text: Download

Preprint: archiving allowed

Upload

Postprint: archiving allowed

Upload

Published version: archiving forbidden

Policy details

Data provided by

Abstract

In this paper, we analyse in detail the impact of different strategies to be used as fitness function during the evolutionary cycle of a hyper-heuristic evolutionary algorithm that automatically designs decision-tree induction algorithms (HEAD-DT). We divide the experimental scheme into two distinct scenarios: (1) evolving a decision-tree induction algorithm from multiple balanced data sets; and (2) evolving a decision-tree induction algorithm from multiple imbalanced data sets. In each of these scenarios, we analyse the difference in performance of wellknown classification performance measures such as accuracy, F-Measure, AUC, recall, and also a lesser-known criterion, namely the relative accuracy improvement. In addition, we analyse different schemes of aggregation, such as simple average, median, and harmonic mean. Finally, we verify whether the best-performing fitness functions are capable of providing HEAD-DT with algorithms more effective than traditional decision-tree induction algorithms like C4.5, CART, and REPTree. Experimental results indicate that HEAD-DT is a good option for generating algorithms tailored to (im)balanced data, since it outperforms state-of-the-art decision-tree induction algorithms with statistical significance. ; Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP) (Project 2009/14325-3)

Published in

Links

Tools

Investigating fitness functions for a hyper-heuristic evolutionary algorithm in the context of balanced and imbalanced data classification

Abstract