Group-Wise Shrinkage Estimation in Penalized Model-Based Clustering

Casa, A.; Cappozzo, A.; Fop, M.

doi:10.1007/s00357-022-09421-z

Finite Gaussian mixture models provide a powerful and widely employed probabilistic approach for clustering multivariate continuous data. However, the practical usefulness of these models is jeopardized in high-dimensional spaces, where they tend to be over-parameterized. As a consequence, different solutions have been proposed, often relying on matrix decompositions or variable selection strategies. Recently, a methodological link between Gaussian graphical models and finite mixtures has been established, paving the way for penalized model-based clustering in the presence of large precision matrices. Notwithstanding, current methodologies implicitly assume similar levels of sparsity across the classes, not accounting for different degrees of association between the variables across groups. We overcome this limitation by deriving group-wise penalty factors, which automatically enforce under or over-connectivity in the estimated graphs. The approach is entirely data-driven and does not require additional hyper-parameter specification. Analyses on synthetic and real data showcase the validity of our proposal.

Group-Wise Shrinkage Estimation in Penalized Model-Based Clustering / A. Casa, A. Cappozzo, M. Fop. - In: JOURNAL OF CLASSIFICATION. - ISSN 0176-4268. - 39:3(2022), pp. 648-674. [10.1007/s00357-022-09421-z]

Group-Wise Shrinkage Estimation in Penalized Model-Based Clustering

Casa, Alessandro;A. Cappozzo^Secondo;Fop, Michael

2022

Abstract

Finite Gaussian mixture models provide a powerful and widely employed probabilistic approach for clustering multivariate continuous data. However, the practical usefulness of these models is jeopardized in high-dimensional spaces, where they tend to be over-parameterized. As a consequence, different solutions have been proposed, often relying on matrix decompositions or variable selection strategies. Recently, a methodological link between Gaussian graphical models and finite mixtures has been established, paving the way for penalized model-based clustering in the presence of large precision matrices. Notwithstanding, current methodologies implicitly assume similar levels of sparsity across the classes, not accounting for different degrees of association between the variables across groups. We overcome this limitation by deriving group-wise penalty factors, which automatically enforce under or over-connectivity in the estimated graphs. The approach is entirely data-driven and does not require additional hyper-parameter specification. Analyses on synthetic and real data showcase the validity of our proposal.

Scheda breve

Scheda completa

Scheda completa (DC)

	Parole chiave
	
				Model-based clustering; Penalized likelihood; Sparse precision matrices; Gaussian graphical models; Graphical lasso; EM algorithm
			
	Settori scientifico-disciplinari dell'articolo (sola visualizzazione)
	
				Settore SECS-S/01 - Statistica
			
	Data di pubblicazione
	
				2022
			
	Rivista in ANCE
	
				JOURNAL OF CLASSIFICATION
			
	DOI
	
				https://dx.doi.org/10.1007/s00357-022-09421-z
			
	URL
	
				https://link.springer.com/article/10.1007/s00357-022-09421-z
			
	Tipologia
	
				Article (author)
			
	Appare nelle tipologie:
	
				01 - Articolo su periodico

File in questo prodotto:

File	Dimensione	Formato
2105.07935.pdf accesso aperto Descrizione: Article Tipologia: Pre-print (manoscritto inviato all'editore) Dimensione 884.25 kB Formato Adobe PDF Visualizza/Apri	884.25 kB	Adobe PDF	Visualizza/Apri
s00357-022-09421-z.pdf accesso aperto Descrizione: Article Tipologia: Publisher's version/PDF Dimensione 2.05 MB Formato Adobe PDF Visualizza/Apri	2.05 MB	Adobe PDF	Visualizza/Apri

Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2434/1030199

Citazioni

ND

5

5

ND

IRIS Institutional Research Information System - AIR Archivio Istituzionale della Ricerca