A statistical significance-based approach for clustering grouped data via generalized linear model with discrete random effects

Ragni, A.; Masci, C.; Ieva, F.; Paganoni, A.M.

doi:10.1093/jrsssa/qnaf007

Identifying distinct subgroups within correlated data is essential for tailoring policies to specific needs, ensuring optimal outcomes for each group. In the context of model-based clustering, we introduce an innovative approach for clustering grouped data using linear mixed models with discrete random effects and exponential family responses (e.g. Poisson or Bernoulli). Our method uncovers the latent clustering structure, net of fixed effects, by assuming that random effects follow a discrete distribution with an a priori unknown number of support points. We refine this process within a modified Expectation–Maximization algorithm, collapsing support points of the discrete distribution with overlapping estimated confidence intervals or regions, derived from the asymptotic properties of maximum likelihood estimators. This approach offers a transparent interpretation of the latent structure, distinct from existing tools for discrete random effects, which often rely on discretionary tuning parameters or predetermined cluster counts. Through simulation studies, we compare our approach with traditional parametric methods and state-of-the-art techniques, demonstrating its effectiveness. We apply our model on real-world data from the Programme for International Student Assessment, aiming to classify countries based on their impact on low-achieving student rates in schools. Our methodology provides valuable insights for effective policy formulation.

A statistical significance-based approach for clustering grouped data via generalized linear model with discrete random effects / A. Ragni, C. Masci, F. Ieva, A.M. Paganoni. - In: JOURNAL OF THE ROYAL STATISTICAL SOCIETY. SERIES A. STATISTICS IN SOCIETY. - ISSN 0964-1998. - (2025 Mar 07). [Epub ahead of print] [10.1093/jrsssa/qnaf007]

A statistical significance-based approach for clustering grouped data via generalized linear model with discrete random effects

A. Ragni^Primo;C. Masci^Secondo;F. Ieva^Penultimo;A. M. Paganoni^Ultimo

2025

Abstract

Identifying distinct subgroups within correlated data is essential for tailoring policies to specific needs, ensuring optimal outcomes for each group. In the context of model-based clustering, we introduce an innovative approach for clustering grouped data using linear mixed models with discrete random effects and exponential family responses (e.g. Poisson or Bernoulli). Our method uncovers the latent clustering structure, net of fixed effects, by assuming that random effects follow a discrete distribution with an a priori unknown number of support points. We refine this process within a modified Expectation–Maximization algorithm, collapsing support points of the discrete distribution with overlapping estimated confidence intervals or regions, derived from the asymptotic properties of maximum likelihood estimators. This approach offers a transparent interpretation of the latent structure, distinct from existing tools for discrete random effects, which often rely on discretionary tuning parameters or predetermined cluster counts. Through simulation studies, we compare our approach with traditional parametric methods and state-of-the-art techniques, demonstrating its effectiveness. We apply our model on real-world data from the Programme for International Student Assessment, aiming to classify countries based on their impact on low-achieving student rates in schools. Our methodology provides valuable insights for effective policy formulation.

Scheda breve

Scheda completa

Scheda completa (DC)

	Parole chiave
	
				discrete random effects; education; generalized linear mixed-effects models; innumeracy levels; modified EM algorithm
			
	Settori scientifico-disciplinari dell'articolo (validi dal 09/05/2024)
	
				Settore STAT-01/A - Statistica
			
	Data di pubblicazione
	
				7-mar-2025
			
	Data ahead of print o data di stampa
	
				7-mar-2025
			
	Rivista in ANCE
	
				JOURNAL OF THE ROYAL STATISTICAL SOCIETY. SERIES A. STATISTICS IN SOCIETY
			
	DOI
	
				https://dx.doi.org/10.1093/jrsssa/qnaf007
			
	Tipologia
	
				Article (author)
			
	Appare nelle tipologie:
	
				01 - Articolo su periodico

File in questo prodotto:

File	Dimensione	Formato
qnaf007.pdf accesso riservato Tipologia: Publisher's version/PDF Dimensione 3.32 MB Formato Adobe PDF Visualizza/Apri Richiedi una copia	3.32 MB	Adobe PDF	Visualizza/Apri Richiedi una copia
2302.12103v1.pdf accesso aperto Tipologia: Pre-print (manoscritto inviato all'editore) Dimensione 3.94 MB Formato Adobe PDF Visualizza/Apri	3.94 MB	Adobe PDF	Visualizza/Apri

Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2434/1148349

Citazioni

ND

0

0

0

IRIS Institutional Research Information System - AIR Archivio Istituzionale della Ricerca