Common clustering algorithms require multiple scans of all the data to achieve convergence, and this is prohibitive when large databases, with millions of data, must be processed. Some algorithms to extend the popular K-means method to the analysis of big data are present in literature since the publication of (Bradley et al, Scaling clustering algorithms to large databases, 1998) but they assume that the random vectors which are processed and grouped have uncorrelated components. Unfortunately this is not the case in many practical situations. We here propose an extension of the algorithm of Bradley, Fayyad and Reina to the processing of massive multivariate data, having correlated components.
A K-means clustering algorithm for multivariate big data with correlated components / G. Aletti, A. Micheletti. - (2017 Jul 05).
A K-means clustering algorithm for multivariate big data with correlated components
G. AlettiPrimo
;A. Micheletti
2017
Abstract
Common clustering algorithms require multiple scans of all the data to achieve convergence, and this is prohibitive when large databases, with millions of data, must be processed. Some algorithms to extend the popular K-means method to the analysis of big data are present in literature since the publication of (Bradley et al, Scaling clustering algorithms to large databases, 1998) but they assume that the random vectors which are processed and grouped have uncorrelated components. Unfortunately this is not the case in many practical situations. We here propose an extension of the algorithm of Bradley, Fayyad and Reina to the processing of massive multivariate data, having correlated components.File | Dimensione | Formato | |
---|---|---|---|
aletti-micheletti-clustering_ARXIV.pdf
accesso aperto
Tipologia:
Pre-print (manoscritto inviato all'editore)
Dimensione
290.76 kB
Formato
Adobe PDF
|
290.76 kB | Adobe PDF | Visualizza/Apri |
Pubblicazioni consigliate
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.