Exploiting the Bin-Class Histograms for Feature Selection on Discrete Data

Ferreira, Artur J.; Figueiredo, Mário A. T.

Published in

Springer, Lecture Notes in Computer Science, p. 345-353, 2015

DOI: 10.1007/978-3-319-19390-8_39

Tools

Export citation

Search in Google Scholar

Exploiting the Bin-Class Histograms for Feature Selection on Discrete Data

Book chapter published in 2015 by Artur J. Ferreira, Mário A. T. Figueiredo

This paper is available in a repository.

Full text: Download

Preprint: archiving forbidden

Postprint: archiving restricted

Upload

Published version: archiving forbidden

Policy details

Data provided by

Abstract

In machine learning and pattern recognition tasks, the use of feature discretization techniques may have several advantages. The discretized features may hold enough information for the learning task at hand, while ignoring minor fluctuations that are irrelevant or harmful for that task. The discretized features have more compact representations that may yield both better accuracy and lower training time, as compared to the use of the original features. However, in many cases, mainly with medium and high-dimensional data, the large number of features usually implies that there is some redundancy among them. Thus, we may further apply feature selection (FS) techniques on the discrete data, keeping the most relevant features, while discarding the irrelevant and redundant ones. In this paper, we propose relevance and redundancy criteria for supervised feature selection techniques on discrete data. These criteria are applied to the bin-class histograms of the discrete features. The experimental results, on public benchmark data, show that the proposed criteria can achieve better accuracy than widely used relevance and redundancy criteria, such as mutual information and the Fisher ratio.

Published in

Links

Tools

Exploiting the Bin-Class Histograms for Feature Selection on Discrete Data

Abstract