Common clustering algorithms require multiple scans of all the data to achieve convergence, and this is prohibitive when large databases, with millions of data, must be processed. Some algorithms to extend the popular K-means method to the analysis of big data are present in literature since the publication of (Bradley et al, Scaling clustering algorithms to large databases, 1998) but they assume that the random vectors which are processed and grouped have uncorrelated components. Unfortunately this is not the case in many practical situations. We here propose an extension of the algorithm of Bradley, Fayyad and Reina to the processing of massive multivariate data, having correlated components.

A K-means clustering algorithm for multivariate big data with correlated components / G. Aletti, A. Micheletti. - (2017 Jul 05).

A K-means clustering algorithm for multivariate big data with correlated components

G. Aletti
Primo
;
A. Micheletti
2017

Abstract

Common clustering algorithms require multiple scans of all the data to achieve convergence, and this is prohibitive when large databases, with millions of data, must be processed. Some algorithms to extend the popular K-means method to the analysis of big data are present in literature since the publication of (Bradley et al, Scaling clustering algorithms to large databases, 1998) but they assume that the random vectors which are processed and grouped have uncorrelated components. Unfortunately this is not the case in many practical situations. We here propose an extension of the algorithm of Bradley, Fayyad and Reina to the processing of massive multivariate data, having correlated components.
Big data; Clustering; K-means; Mahalanobis distance
Settore SECS-S/01 - Statistica
Settore MAT/06 - Probabilita' e Statistica Matematica
5-lug-2017
http://arxiv.org/abs/1707.01199v1
File in questo prodotto:
File Dimensione Formato  
aletti-micheletti-clustering_ARXIV.pdf

accesso aperto

Tipologia: Pre-print (manoscritto inviato all'editore)
Dimensione 290.76 kB
Formato Adobe PDF
290.76 kB Adobe PDF Visualizza/Apri
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2434/514034
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact