Statistical inference based on the cluster weighted model often requires some subjective judgment from the modeler. Many features influence the final solution, such as the number of mixture components, the shape of the clusters in the explanatory variables, and the degree of heteroscedasticity of the errors around the regression lines. Moreover, to deal with outliers and contamination that may appear in the data, hyper-parameter values ensuring robust estimation are also needed. In principle, this freedom gives rise to a variety of “legitimate” solutions, each derived by a specific set of choices and their implications in modeling. Here we introduce a method for identifying a “set of good models” to cluster a dataset, considering the whole panorama of choices. In this way, we enable the practitioner, or the scientist who needs to cluster the data, to make an educated choice. They will be able to identify the most appropriate solutions for the purposes of their own analysis, in light of their stability and validity.

Parameter Choice, Stability and Validity for Robust Cluster Weighted Modeling / A. Cappozzo, L. Angel Garcìa Escudero, F. Greselin, A. Mayo-Iscar. - In: STATS. - ISSN 2571-905X. - 4:3(2021), pp. 602-615. [10.3390/stats4030036]

Parameter Choice, Stability and Validity for Robust Cluster Weighted Modeling

A. Cappozzo
Co-primo
;
2021

Abstract

Statistical inference based on the cluster weighted model often requires some subjective judgment from the modeler. Many features influence the final solution, such as the number of mixture components, the shape of the clusters in the explanatory variables, and the degree of heteroscedasticity of the errors around the regression lines. Moreover, to deal with outliers and contamination that may appear in the data, hyper-parameter values ensuring robust estimation are also needed. In principle, this freedom gives rise to a variety of “legitimate” solutions, each derived by a specific set of choices and their implications in modeling. Here we introduce a method for identifying a “set of good models” to cluster a dataset, considering the whole panorama of choices. In this way, we enable the practitioner, or the scientist who needs to cluster the data, to make an educated choice. They will be able to identify the most appropriate solutions for the purposes of their own analysis, in light of their stability and validity.
cluster-weighted modeling; outliers; trimmed BIC; eigenvalue constraint; monitoring; constrained estimation; model-based clustering; robust estimation
Settore SECS-S/01 - Statistica
2021
Article (author)
File in questo prodotto:
File Dimensione Formato  
stats-04-00036-v2_compressed.pdf

accesso aperto

Descrizione: Article
Tipologia: Publisher's version/PDF
Dimensione 10.42 MB
Formato Adobe PDF
10.42 MB Adobe PDF Visualizza/Apri
stats-04-00036-v2-1-compresso.pdf

accesso aperto

Descrizione: File compresso
Tipologia: Publisher's version/PDF
Dimensione 9.29 MB
Formato Adobe PDF
9.29 MB Adobe PDF Visualizza/Apri
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2434/1030208
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 5
  • ???jsp.display-item.citation.isi??? 6
  • OpenAlex ND
social impact