Identifying distinct subgroups within correlated data is essential for tailoring policies to specific needs, ensuring optimal outcomes for each group. In the context of model-based clustering, we introduce an innovative approach for clustering grouped data using linear mixed models with discrete random effects and exponential family responses (e.g. Poisson or Bernoulli). Our method uncovers the latent clustering structure, net of fixed effects, by assuming that random effects follow a discrete distribution with an a priori unknown number of support points. We refine this process within a modified Expectation–Maximization algorithm, collapsing support points of the discrete distribution with overlapping estimated confidence intervals or regions, derived from the asymptotic properties of maximum likelihood estimators. This approach offers a transparent interpretation of the latent structure, distinct from existing tools for discrete random effects, which often rely on discretionary tuning parameters or predetermined cluster counts. Through simulation studies, we compare our approach with traditional parametric methods and state-of-the-art techniques, demonstrating its effectiveness. We apply our model on real-world data from the Programme for International Student Assessment, aiming to classify countries based on their impact on low-achieving student rates in schools. Our methodology provides valuable insights for effective policy formulation.

A statistical significance-based approach for clustering grouped data via generalized linear model with discrete random effects / A. Ragni, C. Masci, F. Ieva, A.M. Paganoni. - In: JOURNAL OF THE ROYAL STATISTICAL SOCIETY. SERIES A. STATISTICS IN SOCIETY. - ISSN 0964-1998. - (2025 Mar 07). [Epub ahead of print] [10.1093/jrsssa/qnaf007]

A statistical significance-based approach for clustering grouped data via generalized linear model with discrete random effects

C. Masci
Secondo
;
2025

Abstract

Identifying distinct subgroups within correlated data is essential for tailoring policies to specific needs, ensuring optimal outcomes for each group. In the context of model-based clustering, we introduce an innovative approach for clustering grouped data using linear mixed models with discrete random effects and exponential family responses (e.g. Poisson or Bernoulli). Our method uncovers the latent clustering structure, net of fixed effects, by assuming that random effects follow a discrete distribution with an a priori unknown number of support points. We refine this process within a modified Expectation–Maximization algorithm, collapsing support points of the discrete distribution with overlapping estimated confidence intervals or regions, derived from the asymptotic properties of maximum likelihood estimators. This approach offers a transparent interpretation of the latent structure, distinct from existing tools for discrete random effects, which often rely on discretionary tuning parameters or predetermined cluster counts. Through simulation studies, we compare our approach with traditional parametric methods and state-of-the-art techniques, demonstrating its effectiveness. We apply our model on real-world data from the Programme for International Student Assessment, aiming to classify countries based on their impact on low-achieving student rates in schools. Our methodology provides valuable insights for effective policy formulation.
discrete random effects; education; generalized linear mixed-effects models; innumeracy levels; modified EM algorithm
Settore STAT-01/A - Statistica
7-mar-2025
7-mar-2025
Article (author)
File in questo prodotto:
File Dimensione Formato  
qnaf007.pdf

accesso riservato

Tipologia: Publisher's version/PDF
Dimensione 3.32 MB
Formato Adobe PDF
3.32 MB Adobe PDF   Visualizza/Apri   Richiedi una copia
2302.12103v1.pdf

accesso aperto

Tipologia: Pre-print (manoscritto inviato all'editore)
Dimensione 3.94 MB
Formato Adobe PDF
3.94 MB Adobe PDF Visualizza/Apri
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2434/1148349
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? 0
  • OpenAlex 0
social impact