Background: In recent years unsupervised ensemble clustering methods have been successfully applied to DNA microarray data analysis to improve the accuracy and the reliability of clustering results. Nevertheless, a major problem is represented by the fact that classes of functionally correlated examples (e.g. subclasses of diseases characterized at bio-molecular level) are not in general clearly separable, and in many cases the same gene may belong to different functional classes (e.g. may participate to different biological processes). Results: We propose an ensemble clustering algorithm scheme, based on a fuzzy approach, that directly permit to deal with overlapping classes or with genes or samples that may belong to more clusters at the same time. From our algorithmic scheme several fuzzy ensemble clustering algorithms may be derived, according to the way the multiple clusterings are combined and the consensus clustering is generated. We test some of the proposed ensemble algorithms with two DNA microarray data sets available on the web, comparing the results with other single and ensemble clustering methods. Conclusions: Our proposed fuzzy ensemble approach may be applied to discover classes of co-expressed genes or subclasses of functionally related examples, and in principle it may be applied for the unsupervised analysis of different types of complex bio-molecular data. Fuzzy ensemble algorithms can assign each gene/sample to multiple classes and can estimate and improve the accuracy and the reliability of the discovered clusterings, as shown by our experimental results.
An unsupervised fuzzy ensemble algorithmic scheme for gene expression data analysis / R. Avogadri, G. Valentini. ((Intervento presentato al convegno NETTAB 2007 workshop on a Semantic Web for Bioinformatics tenutosi a Pisa, Italy nel 2007.
An unsupervised fuzzy ensemble algorithmic scheme for gene expression data analysis
R. AvogadriPrimo
;G. ValentiniUltimo
2007
Abstract
Background: In recent years unsupervised ensemble clustering methods have been successfully applied to DNA microarray data analysis to improve the accuracy and the reliability of clustering results. Nevertheless, a major problem is represented by the fact that classes of functionally correlated examples (e.g. subclasses of diseases characterized at bio-molecular level) are not in general clearly separable, and in many cases the same gene may belong to different functional classes (e.g. may participate to different biological processes). Results: We propose an ensemble clustering algorithm scheme, based on a fuzzy approach, that directly permit to deal with overlapping classes or with genes or samples that may belong to more clusters at the same time. From our algorithmic scheme several fuzzy ensemble clustering algorithms may be derived, according to the way the multiple clusterings are combined and the consensus clustering is generated. We test some of the proposed ensemble algorithms with two DNA microarray data sets available on the web, comparing the results with other single and ensemble clustering methods. Conclusions: Our proposed fuzzy ensemble approach may be applied to discover classes of co-expressed genes or subclasses of functionally related examples, and in principle it may be applied for the unsupervised analysis of different types of complex bio-molecular data. Fuzzy ensemble algorithms can assign each gene/sample to multiple classes and can estimate and improve the accuracy and the reliability of the discovered clusterings, as shown by our experimental results.File | Dimensione | Formato | |
---|---|---|---|
avo-vale-nettab07-final.pdf
accesso aperto
Tipologia:
Pre-print (manoscritto inviato all'editore)
Dimensione
158.3 kB
Formato
Adobe PDF
|
158.3 kB | Adobe PDF | Visualizza/Apri |
Pubblicazioni consigliate
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.