We propose a new statistical method, called generalized mixed-effects random forest (GMERF), that extends the use of random forest to the analysis of hierarchical data, for any type of response variable in the exponential family. The method maintains the flexibility and the ability of modeling complex patterns within the data, typical of tree-based ensemble methods, and it can handle both continuous and discrete covariates. At the same time, GMERF takes into account the nested structure of hierarchical data, modeling the dependence structure that exists at the highest level of the hierarchy and allowing statistical inference on this structure. In the case study, we apply GMERF to Higher Education data to analyze the university student dropout phenomenon. We predict engineering student dropout probability by means of student-level information and considering the degree program students are enrolled in as grouping factor.

Generalized mixed-effects random forest: A flexible approach to predict university student dropout / M. Pellagatti, C. Masci, F. Ieva, A.M. Paganoni. - In: STATISTICAL ANALYSIS AND DATA MINING. - ISSN 1932-1864. - 14:3(2021 Jun), pp. 241-257. [10.1002/sam.11505]

Generalized mixed-effects random forest: A flexible approach to predict university student dropout

C. Masci
Secondo
;
2021

Abstract

We propose a new statistical method, called generalized mixed-effects random forest (GMERF), that extends the use of random forest to the analysis of hierarchical data, for any type of response variable in the exponential family. The method maintains the flexibility and the ability of modeling complex patterns within the data, typical of tree-based ensemble methods, and it can handle both continuous and discrete covariates. At the same time, GMERF takes into account the nested structure of hierarchical data, modeling the dependence structure that exists at the highest level of the hierarchy and allowing statistical inference on this structure. In the case study, we apply GMERF to Higher Education data to analyze the university student dropout phenomenon. We predict engineering student dropout probability by means of student-level information and considering the degree program students are enrolled in as grouping factor.
generalized models; hierarchical data; random forest; university students dropout
Settore STAT-01/A - Statistica
giu-2021
Article (author)
File in questo prodotto:
File Dimensione Formato  
Pellagatti et al. 2021 SADM.pdf

accesso riservato

Tipologia: Publisher's version/PDF
Dimensione 2.61 MB
Formato Adobe PDF
2.61 MB Adobe PDF   Visualizza/Apri   Richiedi una copia
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2434/1148347
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 35
  • ???jsp.display-item.citation.isi??? 29
social impact