IRIS Institutional Research Information System - AIR Archivio Istituzionale della Ricerca

For more than a century, the methods for data representation and the exploration of the intrinsic structures of data have developed remarkably and consist of supervised and unsupervised methods. However, recent years have witnessed the flourishing of big data, where typical dataset dimensions are high and the data can come in messy, incomplete, unlabeled, or corrupted forms. Consequently, discovering the hidden structure buried inside such data becomes highly challenging. From this perspective, exploratory data analysis (EDA) plays a substantial role in learning the hidden structures that encompass the significant features of the data in an ordered manner by extracting patterns and testing hypotheses to identify anomalies. Unsupervised generative learning (UGL) models are a class of Machine Learning (ML) models characterized by their potential to reduce the dimensionality, discover the exploratory factors, and learn representations without any predefined labels; moreover, such models can generate the data from the reduced factors’ domain. The beginner researchers can find in this survey the recent UGL models for the purpose of data exploration and learning representations; specifically, this paper covers three families of methods based on their usage in the era of big data: blind source separation (BSS), manifold learning (MfL), and Neural Networks (NNs), from shallow to deep architectures.

A survey of unsupervised generative models for exploratory data analysis and representation learning / M. Abukmeil, S. Ferrari, A. Genovese, V. Piuri, F. Scotti. - In: ACM COMPUTING SURVEYS. - ISSN 0360-0300. - 54:5(2021), pp. 99.1-99.40. [10.1145/3450963]

A survey of unsupervised generative models for exploratory data analysis and representation learning

M. Abukmeil^Primo;S. Ferrari^Secondo;A. Genovese;V. Piuri^Penultimo;F. Scotti^Ultimo

2021

Abstract

For more than a century, the methods for data representation and the exploration of the intrinsic structures of data have developed remarkably and consist of supervised and unsupervised methods. However, recent years have witnessed the flourishing of big data, where typical dataset dimensions are high and the data can come in messy, incomplete, unlabeled, or corrupted forms. Consequently, discovering the hidden structure buried inside such data becomes highly challenging. From this perspective, exploratory data analysis (EDA) plays a substantial role in learning the hidden structures that encompass the significant features of the data in an ordered manner by extracting patterns and testing hypotheses to identify anomalies. Unsupervised generative learning (UGL) models are a class of Machine Learning (ML) models characterized by their potential to reduce the dimensionality, discover the exploratory factors, and learn representations without any predefined labels; moreover, such models can generate the data from the reduced factors’ domain. The beginner researchers can find in this survey the recent UGL models for the purpose of data exploration and learning representations; specifically, this paper covers three families of methods based on their usage in the era of big data: blind source separation (BSS), manifold learning (MfL), and Neural Networks (NNs), from shallow to deep architectures.

Scheda breve

Scheda completa

Scheda completa (DC)

	Parole chiave
	
				Blind Source Separation; Manifold Learning; Neural Networks; Exploratory Data Analysis; Representation Learning; Explainable Machine Learning; Unsupervised Deep Learning
			
	Settori scientifico-disciplinari dell'articolo (sola visualizzazione)
	
				Settore INF/01 - Informatica
Settore ING-INF/05 - Sistemi di Elaborazione delle Informazioni
			
	Titolo del progetto
	
	Titolo Progetto
	
									Multi-Owner data Sharing for Analytics and Integration respecting Confidentiality and Owner control (MOSAICrOWN)
								
	Acronimo
	
									MOSAICrOWN
								
	Nome finanziatore
	
										EUROPEAN COMMISSION
									
	Finanziamento
	
									H2020
								
	N. Contratto
	
									825333
								
	Titolo Progetto
	
									High quality Open data Publishing and Enrichment (HOPE)
								
	Acronimo
	
									HOPE
								
	Nome finanziatore
	
										MINISTERO DELL'ISTRUZIONE E DEL MERITO
									
	N. Contratto
	
									2017MMJJRE_003
								
	Titolo Progetto
	
									Machine Learning-based, Networking and Computing Infrastructure Resource Management of 5G and beyond Intelligent Networks (MARSAL)
								
	Acronimo
	
									MARSAL
								
	Nome finanziatore
	
										EUROPEAN COMMISSION
									
	Finanziamento
	
									H2020
								
	N. Contratto
	
									101017171
								
	Data di pubblicazione
	
				2021
			
	Rivista in ANCE
	
				ACM COMPUTING SURVEYS
			
	DOI
	
				https://dx.doi.org/10.1145/3450963
			
	Tipologia
	
				Article (author)
			
	Appare nelle tipologie:
	
				01 - Articolo su periodico

File in questo prodotto:

File	Dimensione	Formato
csur21main.pdf accesso aperto Tipologia: Post-print, accepted manuscript ecc. (versione accettata dall'editore) Dimensione 769.91 kB Formato Adobe PDF Visualizza/Apri	769.91 kB	Adobe PDF	Visualizza/Apri
3450963.pdf accesso riservato Tipologia: Publisher's version/PDF Dimensione 553.74 kB Formato Adobe PDF Visualizza/Apri Richiedi una copia	553.74 kB	Adobe PDF	Visualizza/Apri Richiedi una copia

Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2434/815200

Citazioni

ND

49

34

social impact