Published in

Springer (part of Springer Nature), Statistics and Computing, 5(25), p. 1009-1022

DOI: 10.1007/s11222-014-9470-4

Links

Tools

Export citation

Search in Google Scholar

Low-dimensional tracking of association structures in categorical data

Journal article published in 2014 by Alfonso Iodice D'Enza, Angelos Markos ORCID
This paper is available in a repository.
This paper is available in a repository.

Full text: Download

Green circle
Preprint: archiving allowed
Green circle
Postprint: archiving allowed
Red circle
Published version: archiving forbidden
Data provided by SHERPA/RoMEO

Abstract

In modern applications, such as text mining and signal processing, large amounts of categorical data are produced at a high rate and are characterized by association structures changing over time. Multiple correspondence analysis (MCA) is a well established dimension reduction method to explore the associations within a set of categorical variables. A critical step of the MCA algorithm is a singular value decomposition (SVD) or an eigenvalue decomposition (EVD) of a suitably transformed matrix. The high computational and memory requirements of ordinary SVD and EVD make their application impractical on massive or sequential data sets. Several enhanced SVD/EVD approaches have been recently introduced in an effort to overcome these issues. The aim of the present contribution is twofold: (1) to extend MCA to a split-apply-combine framework, that leads to an exact and parallel MCA implementation; (2) to allow for incremental updates (downdates) of existing MCA solutions, which lead to an approximate yet highly accurate solution. For this purpose, two incremental EVD and SVD approaches with desirable properties are revised and embedded in the context of MCA.