Common clustering algorithms require multiple scans of all the data to achieve convergence, and this is prohibitive when large databases, with millions of data, must be processed. Some algorithms to extend the popular K-means method to the analysis of big data are present in literature since 1998, but they assume that the random vectors which are processed and grouped have uncorrelated components. Unfortunately this is not the case in many practical situations. We here propose an extension of the algorithm of Bradley, Fayyad and Reina to the processing of massive multivariate data, having correlated components.

A generalized K-means algorithm for multivariate Big Date with correlated components / G. Aletti, A. Micheletti - In: CLADAG2017 : Book of short papers / [a cura di] F. Greselin, F. Mola, M.Z enga. - Prima edizione. - Mantova : Universitas Studiorum S.r.l. Casa Editrice, 2017 Sep. - ISBN 9788899459710. (( Intervento presentato al 11. convegno CLADAG tenutosi a Milano nel 2017.

A generalized K-means algorithm for multivariate Big Date with correlated components

G. Aletti;A. Micheletti
2017

Abstract

Common clustering algorithms require multiple scans of all the data to achieve convergence, and this is prohibitive when large databases, with millions of data, must be processed. Some algorithms to extend the popular K-means method to the analysis of big data are present in literature since 1998, but they assume that the random vectors which are processed and grouped have uncorrelated components. Unfortunately this is not the case in many practical situations. We here propose an extension of the algorithm of Bradley, Fayyad and Reina to the processing of massive multivariate data, having correlated components.
K-means; Big Data; Shrinkage Estimators
Settore MAT/06 - Probabilita' e Statistica Matematica
Settore SECS-S/01 - Statistica
Settore INF/01 - Informatica
set-2017
Societa' Italiana di Statistica
Centro di Ricerca Interdisciplinare su Modellistica Matematica, Analisi Statistica e Simulazione Computazionale per la Innovazione Scientifica e Tecnologica ADAMSS
Book Part (author)
File in questo prodotto:
File Dimensione Formato  
CLADAG_2017_paper_27.pdf

accesso aperto

Descrizione: articolo principale
Tipologia: Pre-print (manoscritto inviato all'editore)
Dimensione 99.75 kB
Formato Adobe PDF
99.75 kB Adobe PDF Visualizza/Apri
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2434/523562
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact