Clustering analysis of gene expression is characterized by the very high dimensionality and low cardinality of the data, and two important related topics are the validation and the estimate of the number of the obtained clusters. In this paper we focus on the estimate of the stability of the clusters. Our approach to this problem is based on random projections obeying the Johnson-Lindenstrauss lemma,by which gene expression data may be projected into randomly selected low dimensional subspaces, approximately preserving pairwise distances between examples. We experiment with different types of random projections, comparing empirical and theoretical distortions induced by randomized embeddings between euclidean metric spaces, and we present cluster-stability measures that may be used to validate and to quantitatively assess the reliability of the clusters obtained by a large class of clustering algorithms. Experimental results with high dimensional synthetic and DNA microarray data show the effectiveness of the proposed approach.

Random projections for assessing gene expression cluster stability / A. Bertoni, G. Valentini - In: Proceedings of the International joint conference on neural networks, IJCNN 2005 : july 31 - august 4, 2005, Montreal, Quebec, Canada. 1Piscataway : Institute of electrical and electronics engineers, 2005. - ISBN 0780390482. - pp. 149-154 (( convegno IEEE International Joint Conference on Neural Networks (IJCNN) tenutosi a Montreal nel 2005 [10.1109/IJCNN.2005.1555821].

Random projections for assessing gene expression cluster stability

A. Bertoni
Primo
;
G. Valentini
Ultimo
2005

Abstract

Clustering analysis of gene expression is characterized by the very high dimensionality and low cardinality of the data, and two important related topics are the validation and the estimate of the number of the obtained clusters. In this paper we focus on the estimate of the stability of the clusters. Our approach to this problem is based on random projections obeying the Johnson-Lindenstrauss lemma,by which gene expression data may be projected into randomly selected low dimensional subspaces, approximately preserving pairwise distances between examples. We experiment with different types of random projections, comparing empirical and theoretical distortions induced by randomized embeddings between euclidean metric spaces, and we present cluster-stability measures that may be used to validate and to quantitatively assess the reliability of the clusters obtained by a large class of clustering algorithms. Experimental results with high dimensional synthetic and DNA microarray data show the effectiveness of the proposed approach.
Settore INF/01 - Informatica
2005
Book Part (author)
File in questo prodotto:
File Dimensione Formato  
bertoni-vale-ijcnn05.pdf

accesso aperto

Tipologia: Post-print, accepted manuscript ecc. (versione accettata dall'editore)
Dimensione 126.48 kB
Formato Adobe PDF
126.48 kB Adobe PDF Visualizza/Apri
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2434/9333
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 24
  • ???jsp.display-item.citation.isi??? ND
social impact