In the framework of unsupervised pattern analysis of gene expression, the high dimensionality of the data as well as the accuracy of clustering algorithms and the reliability of the discovered clusters are critical problems. We propose and analyze an algorithmic scheme for unsupervised cluster ensembles, where the dimensionality reduction is obtained by means of randomized embeddings with low distortion. Multiple "base" clusterings are performed on random subspaces, approximately preserving the distances between the projected examples. In this way the accuracy of each "base" clustering is maintained, and the diversity between them is improved. By combining the multipleclusterings, we can enhance the ov erall accuracy and the reliability of the discovered clusters, as shown by our experimental results with high-dimensional gene expression

Randomized Embedding Cluster Ensembles for gene expression data analysis / A. Bertoni, G. Valentini. ((Intervento presentato al convegno SETIT 2007 - IEEE International Conf. on Sciences of Electronic, Technologies of Information and Telecommunications tenutosi a Hammamet, Tunisia nel 2007.

Randomized Embedding Cluster Ensembles for gene expression data analysis

A. Bertoni
Primo
;
G. Valentini
Ultimo
2007

Abstract

In the framework of unsupervised pattern analysis of gene expression, the high dimensionality of the data as well as the accuracy of clustering algorithms and the reliability of the discovered clusters are critical problems. We propose and analyze an algorithmic scheme for unsupervised cluster ensembles, where the dimensionality reduction is obtained by means of randomized embeddings with low distortion. Multiple "base" clusterings are performed on random subspaces, approximately preserving the distances between the projected examples. In this way the accuracy of each "base" clustering is maintained, and the diversity between them is improved. By combining the multipleclusterings, we can enhance the ov erall accuracy and the reliability of the discovered clusters, as shown by our experimental results with high-dimensional gene expression
Settore INF/01 - Informatica
Randomized Embedding Cluster Ensembles for gene expression data analysis / A. Bertoni, G. Valentini. ((Intervento presentato al convegno SETIT 2007 - IEEE International Conf. on Sciences of Electronic, Technologies of Information and Telecommunications tenutosi a Hammamet, Tunisia nel 2007.
Conference Object
File in questo prodotto:
File Dimensione Formato  
bertoni-vale-SETIT07.pdf

accesso aperto

Tipologia: Pre-print (manoscritto inviato all'editore)
Dimensione 570.17 kB
Formato Adobe PDF
570.17 kB Adobe PDF Visualizza/Apri
Pubblicazioni consigliate

Caricamento pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: http://hdl.handle.net/2434/44213
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact