IRIS Institutional Research Information System - AIR Archivio Istituzionale della Ricerca

Common clustering algorithms require multiple scans of all the data to achieve convergence, and this is prohibitive when large databases, with millions of data, must be processed. Some algorithms to extend the popular K-means method to the analysis of big data are present in literature since the publication of (Bradley et al, Scaling clustering algorithms to large databases, 1998) but they assume that the random vectors which are processed and grouped have uncorrelated components. Unfortunately this is not the case in many practical situations. We here propose an extension of the algorithm of Bradley, Fayyad and Reina to the processing of massive multivariate data, having correlated components.

A K-means clustering algorithm for multivariate big data with correlated components / G. Aletti, A. Micheletti. - (2017 Jul 05).

A K-means clustering algorithm for multivariate big data with correlated components

G. Aletti^Primo;A. Micheletti

2017

Abstract

Common clustering algorithms require multiple scans of all the data to achieve convergence, and this is prohibitive when large databases, with millions of data, must be processed. Some algorithms to extend the popular K-means method to the analysis of big data are present in literature since the publication of (Bradley et al, Scaling clustering algorithms to large databases, 1998) but they assume that the random vectors which are processed and grouped have uncorrelated components. Unfortunately this is not the case in many practical situations. We here propose an extension of the algorithm of Bradley, Fayyad and Reina to the processing of massive multivariate data, having correlated components.

Scheda breve

Scheda completa

Scheda completa (DC)

	Parole chiave
	
				Big data; Clustering; K-means; Mahalanobis distance
			
	Settori scientifico-disciplinari del pre-print (sola visualizzazione)
	
				Settore SECS-S/01 - Statistica
Settore MAT/06 - Probabilita' e Statistica Matematica
			
	Data di depostio del pre-print
	
				5-lug-2017
			
	URL del pre-print
	
				http://arxiv.org/abs/1707.01199v1
			
	Appare nelle tipologie:
	
				24 - Pre-print

File in questo prodotto:

File	Dimensione	Formato
aletti-micheletti-clustering_ARXIV.pdf accesso aperto Tipologia: Pre-print (manoscritto inviato all'editore) Dimensione 290.76 kB Formato Adobe PDF Visualizza/Apri	290.76 kB	Adobe PDF	Visualizza/Apri

Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2434/514034

Citazioni

ND

ND

ND

ND

social impact