Gaussian mixture models (GMM) are the most-widely employed approach to perform model-based clustering of continuous features. Grievously, with the increasing availability of high-dimensional datasets, their direct applicability is put at stake: GMMs suffer from the curse of dimensionality issue, as the number of parameters grows quadratically with the number of variables. To this extent, a methodological link between Gaussian mixtures and Gaussian graphical models has recently been established in order to provide a framework for performing penalized model-based clustering in presence of large precision matrices. Notwithstanding, current methodologies do not account for the fact that groups may be under or over-connected, thus implicitly assuming similar levels of sparsity across clusters. We overcome this limitation by defining data-driven and component specific penalty factors, automatically accounting for different degrees of connections within groups. A real data experiment on handwritten digits recognition showcases the validity of our proposal.

Penalized Model-Based Clustering with Group-Dependent Shrinkage Estimation / A. Casa, A. Cappozzo, M. Fop (ADVANCES IN INTELLIGENT SYSTEMS AND COMPUTING). - In: Building Bridges between Soft and Statistical Methodologies for Data Science / [a cura di] L.A. García-Escudero, A. Gordaliza, A. Mayo, M. Asunción, L. Gomez, M.A. Gil, P. Grzegorzewski, O. Hryniewicz. - [s.l] : Springer, 2023. - ISBN 978-3-031-15508-6. - pp. 73-78 (( Intervento presentato al 10. convegno International Conference on Soft Methods in Probability and Statistics (SMPS) tenutosi a Valladolid nel 2022 [10.1007/978-3-031-15509-3_10].

Penalized Model-Based Clustering with Group-Dependent Shrinkage Estimation

A. Cappozzo;
2023

Abstract

Gaussian mixture models (GMM) are the most-widely employed approach to perform model-based clustering of continuous features. Grievously, with the increasing availability of high-dimensional datasets, their direct applicability is put at stake: GMMs suffer from the curse of dimensionality issue, as the number of parameters grows quadratically with the number of variables. To this extent, a methodological link between Gaussian mixtures and Gaussian graphical models has recently been established in order to provide a framework for performing penalized model-based clustering in presence of large precision matrices. Notwithstanding, current methodologies do not account for the fact that groups may be under or over-connected, thus implicitly assuming similar levels of sparsity across clusters. We overcome this limitation by defining data-driven and component specific penalty factors, automatically accounting for different degrees of connections within groups. A real data experiment on handwritten digits recognition showcases the validity of our proposal.
Settore SECS-S/01 - Statistica
2023
Book Part (author)
File in questo prodotto:
File Dimensione Formato  
cappozzo_casa_fop_SMPS2022.pdf

accesso aperto

Tipologia: Post-print, accepted manuscript ecc. (versione accettata dall'editore)
Dimensione 225.55 kB
Formato Adobe PDF
225.55 kB Adobe PDF Visualizza/Apri
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2434/1039294
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? 0
  • OpenAlex ND
social impact