Latent Bayesian clustering for topic modelling

in: CLADAG 2023 Book of Abstracts and Short Papers, 2023.

Citation: Schiavon, L. (2023) Latent Bayesian clustering for topic modelling, in CLADAG 2023 Book of Abstracts and Short Papers (Editors: Pernal, C., Salvati, N. and Schirippa Spagnolo, F.), ISBN: 9788891935632.

Abstract: The main objective in topic modelling is uncovering the underlying themes present in a corpus of text data. This process is generally constituted by two phases: (i) identifying the main words associated with each topic; (ii) grouping documents that contain similar sets of words together. In this work, we exploit recent advances in Bayesian factor models to represent the high-dimensional space of the observed words through a set of low-dimensional latent variables, and to jointly cluster the documents according to their distribution over such latent constructs. Groups and underlying constructs are interpreted as document topics and language concepts, respectively, with the number of such dimensions that is not required in advance. We apply the proposed approach to a data set of newspaper headlines.

Link to paper