Published in

Springer Verlag, Pattern Analysis and Applications, 1(19), p. 93-109

DOI: 10.1007/s10044-014-0393-7

Links

Tools

Export citation

Search in Google Scholar

RHC: Non-parametric Cluster-based Data Reduction for efficient k-NN Classification

Journal article published in 2015 by Stefanos Ougiaroglou, Georgios Evangelidis
This paper is available in a repository.
This paper is available in a repository.

Full text: Download

Green circle
Preprint: archiving allowed
Green circle
Postprint: archiving allowed
Red circle
Published version: archiving forbidden
Data provided by SHERPA/RoMEO

Abstract

Although the k-NN classifier is a popular classification method, it suffers from the high compu-tational cost and storage requirements it involves. This paper proposes two effective cluster-based data reduc-tion algorithms for efficient k-NN classification. Both have low preprocessing cost and can achieve high data reduction rates while maintaining k-NN classification accuracy at high levels. The first proposed algorithm is called Reduction through Homogeneous Clusters (RHC) and is based on a fast pre-processing clustering proce-dure that creates homogeneous clusters. The centroids of these clusters constitute the reduced training set. The second proposed algorithm is a dynamic version of RHC that retains all its properties and, in addition, it can manage datasets that cannot fit in main mem-ory and is appropriate for dynamic environments where new training data are gradually available. Experimen-tal results, based on fourteen datasets, illustrate that both algorithms are faster and achieve higher reduc-tion rates than four known methods, while maintaining high classification accuracy.