Published in

Springer (part of Springer Nature), Genetic Programming and Evolvable Machines, 3(16), p. 241-281

DOI: 10.1007/s10710-014-9235-z

Links

Tools

Export citation

Search in Google Scholar

Investigating fitness functions for a hyper-heuristic evolutionary algorithm in the context of balanced and imbalanced data classification

This paper is available in a repository.
This paper is available in a repository.

Full text: Download

Green circle
Preprint: archiving allowed
Green circle
Postprint: archiving allowed
Red circle
Published version: archiving forbidden
Data provided by SHERPA/RoMEO

Abstract

In this paper, we analyse in detail the impact of different strategies to be used as fitness function during the evolutionary cycle of a hyper-heuristic evolutionary algorithm that automatically designs decision-tree induction algorithms (HEAD-DT). We divide the experimental scheme into two distinct scenarios: (1) evolving a decision-tree induction algorithm from multiple balanced data sets; and (2) evolving a decision-tree induction algorithm from multiple imbalanced data sets. In each of these scenarios, we analyse the difference in performance of wellknown classification performance measures such as accuracy, F-Measure, AUC, recall, and also a lesser-known criterion, namely the relative accuracy improvement. In addition, we analyse different schemes of aggregation, such as simple average, median, and harmonic mean. Finally, we verify whether the best-performing fitness functions are capable of providing HEAD-DT with algorithms more effective than traditional decision-tree induction algorithms like C4.5, CART, and REPTree. Experimental results indicate that HEAD-DT is a good option for generating algorithms tailored to (im)balanced data, since it outperforms state-of-the-art decision-tree induction algorithms with statistical significance. ; Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP) (Project 2009/14325-3)