Springer Verlag, Pattern Analysis and Applications, 1(19), p. 93-109
DOI: 10.1007/s10044-014-0393-7
Full text: Download
Although the k-NN classifier is a popular classification method, it suffers from the high compu-tational cost and storage requirements it involves. This paper proposes two effective cluster-based data reduc-tion algorithms for efficient k-NN classification. Both have low preprocessing cost and can achieve high data reduction rates while maintaining k-NN classification accuracy at high levels. The first proposed algorithm is called Reduction through Homogeneous Clusters (RHC) and is based on a fast pre-processing clustering proce-dure that creates homogeneous clusters. The centroids of these clusters constitute the reduced training set. The second proposed algorithm is a dynamic version of RHC that retains all its properties and, in addition, it can manage datasets that cannot fit in main mem-ory and is appropriate for dynamic environments where new training data are gradually available. Experimen-tal results, based on fourteen datasets, illustrate that both algorithms are faster and achieve higher reduc-tion rates than four known methods, while maintaining high classification accuracy.