Published in

Springer, Lecture Notes in Computer Science, p. 296-306, 2004

DOI: 10.1007/978-3-540-28645-5_30

Links

Tools

Export citation

Search in Google Scholar

Learning with Class Skews and Small Disjuncts

This paper is available in a repository.
This paper is available in a repository.

Full text: Download

Red circle
Preprint: archiving forbidden
Orange circle
Postprint: archiving restricted
Red circle
Published version: archiving forbidden
Data provided by SHERPA/RoMEO

Abstract

One of the main objectives of a Machine Learning { ML { system is to induce a classier that minimizes classication errors. Two relevant topics in ML are the understanding of which domain character- istics and inducer limitations might cause an increase in misclassica- tion. In this sense, this work analyzes two important issues that might inuence the performance of ML systems: class imbalance and error- prone small disjuncts. Our main objective is to investigate how these two important aspects are related to each other. Aiming at overcoming both problems we analyzed the behavior of two over-sampling methods we have proposed, namely Smote + Tomek links and Smote + ENN. Our results suggest that these methods are eectiv e for dealing with class imbalance and, in some cases, might help in ruling out some un- desirable disjuncts. However, in some cases a simpler method, Random over-sampling, provides compatible results requiring less computational resources.