We propose a novel method and algorithm for the analysis and clustering of mixed-type data using a hierarchical approach based on Forward Search. In our procedure, the identification of groups is based on the identification of similar trajectories and then linked to very intuitive two-dimensional maps. The proposed algorithm can use different measures for the calculation of distance in the case of mixed-type data, such as Gower’s metric and Related metric scaling. A key feature of our algorithm is its ability to discard redundant information from a given set of variables. The practical usefulness of the algorithm is illustrated through two applications of high relevance for empirical economic research. The first one focuses on comparing different indicators of environmental policy stringency in different countries. The second one applies our procedure to identify clusters of countries based on information regarding their institutional characteristics.

Robust multivariate analysis for mixed-type data: Novel algorithm and its practical application in socio-economic research / A. Grané, S. Salini, E. Verdolini. - In: SOCIO-ECONOMIC PLANNING SCIENCES. - ISSN 0038-0121. - (2020 Jun 23). [Epub ahead of print] [10.1016/j.seps.2020.100907]

Robust multivariate analysis for mixed-type data: Novel algorithm and its practical application in socio-economic research

S. Salini
;
2020

Abstract

We propose a novel method and algorithm for the analysis and clustering of mixed-type data using a hierarchical approach based on Forward Search. In our procedure, the identification of groups is based on the identification of similar trajectories and then linked to very intuitive two-dimensional maps. The proposed algorithm can use different measures for the calculation of distance in the case of mixed-type data, such as Gower’s metric and Related metric scaling. A key feature of our algorithm is its ability to discard redundant information from a given set of variables. The practical usefulness of the algorithm is illustrated through two applications of high relevance for empirical economic research. The first one focuses on comparing different indicators of environmental policy stringency in different countries. The second one applies our procedure to identify clusters of countries based on information regarding their institutional characteristics.
Forward Search; Mixed type data; Outliers; Robustness; Redundant information; Clustering
Settore SECS-S/01 - Statistica
23-giu-2020
23-giu-2020
Article (author)
File in questo prodotto:
File Dimensione Formato  
socio-eco-planning_science2020.pdf

accesso riservato

Tipologia: Publisher's version/PDF
Dimensione 2.57 MB
Formato Adobe PDF
2.57 MB Adobe PDF   Visualizza/Apri   Richiedi una copia
Grane_Salini_Verdolini_REVISION-5.pdf

Open Access dal 24/06/2022

Tipologia: Post-print, accepted manuscript ecc. (versione accettata dall'editore)
Dimensione 1.03 MB
Formato Adobe PDF
1.03 MB Adobe PDF Visualizza/Apri
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2434/745954
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 6
  • ???jsp.display-item.citation.isi??? 8
social impact