Published in

Association for Computing Machinery, ACM Computing Surveys, 3(55), p. 1-33, 2023

DOI: 10.1145/3490384

Links

Tools

Export citation

Search in Google Scholar

Experimental Comparisons of Clustering Approaches for Data Representation

Journal article published in 2023 by Sanjay Kumar Anand, Suresh Kumar
This paper was not found in any repository, but could be made available legally by the author.
This paper was not found in any repository, but could be made available legally by the author.

Full text: Unavailable

Green circle
Preprint: archiving allowed
Green circle
Postprint: archiving allowed
Red circle
Published version: archiving forbidden
Data provided by SHERPA/RoMEO

Abstract

Clustering approaches are extensively used by many areas such as IR, Data Integration, Document Classification, Web Mining, Query Processing, and many other domains and disciplines. Nowadays, much literature describes clustering algorithms on multivariate data sets. However, there is limited literature that presented them with exhaustive and extensive theoretical analysis as well as experimental comparisons. This experimental survey paper deals with the basic principle, and techniques used, including important characteristics, application areas, run-time performance, internal, external, and stability validity of cluster quality, etc., on five different data sets of eleven clustering algorithms. This paper analyses how these algorithms behave with five different multivariate data sets in data representation. To answer this question, we compared the efficiency of eleven clustering approaches on five different data sets using three validity metrics-internal, external, and stability and found the optimal score to know the feasible solution of each algorithm. In addition, we have also included four popular and modern clustering algorithms with only their theoretical discussion. Our experimental results for only traditional clustering algorithms showed that different algorithms performed different behavior on different data sets in terms of running time (speed), accuracy and, the size of data set. This study emphasized the need for more adaptive algorithms and a deliberate balance between the running time and accuracy with their theoretical as well as implementation aspects.