Robust Initialization for Learning Latent Dirichlet Allocation

Lovato, Pietro; Bicego, Manuele; Murino, Vittorio; Perina, Alessandro

Published in

Springer Verlag, Lecture Notes in Computer Science, p. 117-132

DOI: 10.1007/978-3-319-24261-3_10

Tools

Export citation

Search in Google Scholar

Robust Initialization for Learning Latent Dirichlet Allocation

Proceedings article published in 2015 by Pietro Lovato, Manuele Bicego, Vittorio Murino

, Alessandro Perina

This paper was not found in any repository, but could be made available legally by the author.

Full text: Unavailable

Preprint: archiving allowed

Upload

Postprint: archiving allowed

Upload

Published version: archiving forbidden

Policy details

Data provided by

Abstract

Latent Dirichlet Allocation (LDA) represents perhaps the most famous topic model, employed in many different contexts in Computer Science. The wide success of LDA is due to the effectiveness of this model in dealing with large datasets, the competitive performances obtained on several tasks (e.g. classification, clustering), and the interpretability of the solution provided. Learning the LDA from training data usually requires to employ iterative optimization techniques such as the Expectation-Maximization, for which the choice of a good initialization is of crucial importance to reach an optimal solution. However, even if some clever solutions have been proposed, in practical applications this issue is typically disregarded, and the usual solution is to resort to random initialization. In this paper we address the problem of initializing the LDA model with two novel strategies: the key idea is to perform a repeated learning by employ a topic splitting/pruning strategy, such that each learning phase is initialized with an informative situation derived from the previous phase. The performances of the proposed splitting and pruning strategies have been assessed from a twofold perspective: i) the log-likelihood of the learned model (both on the training set and on a held-out set); ii) the coherence of the learned topics. The evaluation has been carried out on five different datasets, taken from and heterogeneous contexts in the literature, showing promising results.

Published in

Links

Tools

Robust Initialization for Learning Latent Dirichlet Allocation

Abstract