Springer, Lecture Notes in Computer Science, p. 24-35, 2005
DOI: 10.1007/11552253_3
Full text: Download
Several studies have pointed out that class imbalance is a bottleneck in the performance achieved by standard supervised learning systems. However, a complete understanding of how this problem aects the performance of learning is still lacking. In previous work we identified that performance degradation is not solely caused by class imbalances, but is also related to the degree of class overlapping. In this work, we conduct our research a step further by investigating sampling strategies which aim to balance the training set. Our results show that these sam- pling strategies usually lead to a performance improvement for highly imbalanced data sets having highly overlapped classes. In addition, over- sampling methods seem to outperform under-sampling methods.