Published in

Elsevier, Computational Statistics & Data Analysis, (93), p. 46-75

DOI: 10.1016/j.csda.2014.11.004

Links

Tools

Export citation

Search in Google Scholar

Mixture-based Clustering for the Ordered Stereotype Model

Journal article published in 2015 by Richard Arnold ORCID, Shirley Pledger, Daniel Fernández Martínez ORCID
This paper is available in a repository.
This paper is available in a repository.

Full text: Download

Green circle
Preprint: archiving allowed
Red circle
Postprint: archiving forbidden
Red circle
Published version: archiving forbidden
Data provided by SHERPA/RoMEO

Abstract

Many of the methods which deal with the reduction of dimensionality in matrices of data are based on mathematical techniques. In general, it is not possible to use statistical inferences or select the appropriateness of a model via information criteria with these techniques because there is no underlying probability model. Furthermore, the use of ordinal data is very common (e.g. Likert or Braun-Blanquet scale) and the clustering methods in common use treat ordered categorical variables as nominal or continuous rather than as true ordinal data. Recently a group of likelihood-based finite mixture models for binary or count data has been developed (Pledger and Arnold, 2014). This thesis extends this idea and establishes novel likelihood-based multivariate methods for data reduction of a matrix containing ordinal data. This new approach applies fuzzy clustering via finite mixtures to the ordered stereotype model (Fernández et al., 2014a). Fuzzy allocation of rows and columns to corresponding clusters is achieved by performing the EM algorithm, and also Bayesian model fitting is obtained by performing a reversible jump MCMC sampler. Their performances for one-dimensional clustering are compared. Simulation studies and three real data sets are used to illustrate the application of these approaches and also to present novel data visualisation tools for depicting the fuzziness of the clustering results for ordinal data. Additionally, a simulation study is set up to empirically establish a relationship between our likelihood-based methodology and the performance of eleven information criteria in common use. Finally, clustering comparisons between count data and categorising the data as ordinal over a same data set are performed and results are analysed and presented.