Identifying distinct subgroups within correlated data is essential for tailoring policies to specific needs, ensuring optimal outcomes for each group. In the context of model-based clustering, we introduce an innovative approach for clustering grouped data using linear mixed models with discrete random effects and exponential family responses (e.g. Poisson or Bernoulli). Our method uncovers the latent clustering structure, net of fixed effects, by assuming that random effects follow a discrete distribution with an a priori unknown number of support points. We refine this process within a modified Expectation–Maximization algorithm, collapsing support points of the discrete distribution with overlapping estimated confidence intervals or regions, derived from the asymptotic properties of maximum likelihood estimators. This approach offers a transparent interpretation of the latent structure, distinct from existing tools for discrete random effects, which often rely on discretionary tuning parameters or predetermined cluster counts. Through simulation studies, we compare our approach with traditional parametric methods and state-of-the-art techniques, demonstrating its effectiveness. We apply our model on real-world data from the Programme for International Student Assessment, aiming to classify countries based on their impact on low-achieving student rates in schools. Our methodology provides valuable insights for effective policy formulation.
A statistical significance-based approach for clustering grouped data via generalized linear model with discrete random effects / A. Ragni, C. Masci, F. Ieva, A.M. Paganoni. - In: JOURNAL OF THE ROYAL STATISTICAL SOCIETY. SERIES A. STATISTICS IN SOCIETY. - ISSN 0964-1998. - (2025 Mar 07). [Epub ahead of print] [10.1093/jrsssa/qnaf007]
A statistical significance-based approach for clustering grouped data via generalized linear model with discrete random effects
C. MasciSecondo
;
2025
Abstract
Identifying distinct subgroups within correlated data is essential for tailoring policies to specific needs, ensuring optimal outcomes for each group. In the context of model-based clustering, we introduce an innovative approach for clustering grouped data using linear mixed models with discrete random effects and exponential family responses (e.g. Poisson or Bernoulli). Our method uncovers the latent clustering structure, net of fixed effects, by assuming that random effects follow a discrete distribution with an a priori unknown number of support points. We refine this process within a modified Expectation–Maximization algorithm, collapsing support points of the discrete distribution with overlapping estimated confidence intervals or regions, derived from the asymptotic properties of maximum likelihood estimators. This approach offers a transparent interpretation of the latent structure, distinct from existing tools for discrete random effects, which often rely on discretionary tuning parameters or predetermined cluster counts. Through simulation studies, we compare our approach with traditional parametric methods and state-of-the-art techniques, demonstrating its effectiveness. We apply our model on real-world data from the Programme for International Student Assessment, aiming to classify countries based on their impact on low-achieving student rates in schools. Our methodology provides valuable insights for effective policy formulation.| File | Dimensione | Formato | |
|---|---|---|---|
|
qnaf007.pdf
accesso riservato
Tipologia:
Publisher's version/PDF
Dimensione
3.32 MB
Formato
Adobe PDF
|
3.32 MB | Adobe PDF | Visualizza/Apri Richiedi una copia |
|
2302.12103v1.pdf
accesso aperto
Tipologia:
Pre-print (manoscritto inviato all'editore)
Dimensione
3.94 MB
Formato
Adobe PDF
|
3.94 MB | Adobe PDF | Visualizza/Apri |
Pubblicazioni consigliate
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.




