One of the consequences of the big data revolution is that data are more heterogeneous than ever. A new challenge appears when mixed-type data sets evolve over time and we are interested in the comparison among individuals. In this work, we propose a new protocol that integrates robust distances and visualization techniques for dynamic mixed data. In particular, given a time t ∈ T = {1, 2, . . . , N}, we start by measuring the proximity of n individuals in heterogeneous data by means of a robustified version of Gower’s metric (proposed by the authors in a previous work) yielding to a collection of distance matrices {D(t), ∀t ∈ T}. To monitor the evolution of distances and outlier detection over time, we propose several graphical tools: First, we track the evolution of pairwise distances via line graphs; second, a dynamic box plot is obtained to identify individuals which showed minimum or maximum disparities; third, to visualize individuals that are systematically far from the others and detect potential outliers, we use the proximity plots, which are line graphs based on a proximity function computed on {D(t), ∀t ∈ T}; fourth, the evolution of the inter-distances between individuals is analyzed via dynamic multiple multidimensional scaling maps. These visualization tools were implemented in the Shinny application in R, and the methodology is illustrated on a real data set related to COVID-19 healthcare, policy and restriction measures about the 2020–2021 COVID-19 pandemic across EU Member States.

Dynamic Mixed Data Analysis and Visualization / A. Grané, G. Manzi, S. Salini. - In: ENTROPY. - ISSN 1099-4300. - 24:10(2022 Oct 01), pp. 1399.1-1399.12. [10.3390/e24101399]

Dynamic Mixed Data Analysis and Visualization

G. Manzi
Penultimo
;
S. Salini
Ultimo
2022

Abstract

One of the consequences of the big data revolution is that data are more heterogeneous than ever. A new challenge appears when mixed-type data sets evolve over time and we are interested in the comparison among individuals. In this work, we propose a new protocol that integrates robust distances and visualization techniques for dynamic mixed data. In particular, given a time t ∈ T = {1, 2, . . . , N}, we start by measuring the proximity of n individuals in heterogeneous data by means of a robustified version of Gower’s metric (proposed by the authors in a previous work) yielding to a collection of distance matrices {D(t), ∀t ∈ T}. To monitor the evolution of distances and outlier detection over time, we propose several graphical tools: First, we track the evolution of pairwise distances via line graphs; second, a dynamic box plot is obtained to identify individuals which showed minimum or maximum disparities; third, to visualize individuals that are systematically far from the others and detect potential outliers, we use the proximity plots, which are line graphs based on a proximity function computed on {D(t), ∀t ∈ T}; fourth, the evolution of the inter-distances between individuals is analyzed via dynamic multiple multidimensional scaling maps. These visualization tools were implemented in the Shinny application in R, and the methodology is illustrated on a real data set related to COVID-19 healthcare, policy and restriction measures about the 2020–2021 COVID-19 pandemic across EU Member States.
mixed data; robustness; outliers; time series; data visualization
Settore SECS-S/01 - Statistica
2022
1-ott-2022
https://www.mdpi.com/1099-4300/24/10/1399
Article (author)
File in questo prodotto:
File Dimensione Formato  
entropy-24-01399.pdf

accesso aperto

Tipologia: Publisher's version/PDF
Dimensione 6.44 MB
Formato Adobe PDF
6.44 MB Adobe PDF Visualizza/Apri
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2434/940730
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? 0
social impact