Springer, Lecture Notes in Computer Science, p. 296-306, 2004
DOI: 10.1007/978-3-540-28645-5_30
Full text: Download
One of the main objectives of a Machine Learning { ML { system is to induce a classier that minimizes classication errors. Two relevant topics in ML are the understanding of which domain character- istics and inducer limitations might cause an increase in misclassica- tion. In this sense, this work analyzes two important issues that might inuence the performance of ML systems: class imbalance and error- prone small disjuncts. Our main objective is to investigate how these two important aspects are related to each other. Aiming at overcoming both problems we analyzed the behavior of two over-sampling methods we have proposed, namely Smote + Tomek links and Smote + ENN. Our results suggest that these methods are eectiv e for dealing with class imbalance and, in some cases, might help in ruling out some un- desirable disjuncts. However, in some cases a simpler method, Random over-sampling, provides compatible results requiring less computational resources.