DNA microarray data are characterized by high-dimensional and low-sized samples, as only few tens of DNA microarray experiments, involving each one thousands of genes, are usually available for data processing. Considering also the large biological variability of gene expression and the noise introduced by the bio-technological machinery, we need robust and variance-reducing data analysis methods. To this purpose, we propose an application of a new ensemble method based on the bias-variance decomposition of the error, using Support Vector Machines (SVMs) as base learners. This approach, that we named Low bias bagging (Lobag), tries to reduce both the bias and the variance components of the error, selecting the base learners with the lowest bias, and combining them through bootstrap aggregating techniques. We applied Lobag to the classification of normal and heterogeneous malignant tissues, using DNA microarray gene expression data. Preliminary results on this challenging two-class classification problem show that Lobag, in association with simple feature selection methods, outperforms both single and bagged ensembles of SVMs.

An application of low bias bagged SVMs to the classification of heterogeneous malignant tissues / G. Valentini - In: Neural Nets / [a cura di] B. Apolloni, M. Marinaro, R. Tagliaferri. - [s.l] : Springer, 2003. - ISBN 9783540202271. - pp. 316-321 (( Intervento presentato al 14. convegno Italian Workshop on Neural Nets tenutosi a Vietri sul Mare nel 2003.

An application of low bias bagged SVMs to the classification of heterogeneous malignant tissues

G. Valentini
Primo
2003

Abstract

DNA microarray data are characterized by high-dimensional and low-sized samples, as only few tens of DNA microarray experiments, involving each one thousands of genes, are usually available for data processing. Considering also the large biological variability of gene expression and the noise introduced by the bio-technological machinery, we need robust and variance-reducing data analysis methods. To this purpose, we propose an application of a new ensemble method based on the bias-variance decomposition of the error, using Support Vector Machines (SVMs) as base learners. This approach, that we named Low bias bagging (Lobag), tries to reduce both the bias and the variance components of the error, selecting the base learners with the lowest bias, and combining them through bootstrap aggregating techniques. We applied Lobag to the classification of normal and heterogeneous malignant tissues, using DNA microarray gene expression data. Preliminary results on this challenging two-class classification problem show that Lobag, in association with simple feature selection methods, outperforms both single and bagged ensembles of SVMs.
gene-expression data; cancer; prediction
Settore INF/01 - Informatica
Settore ING-INF/05 - Sistemi di Elaborazione delle Informazioni
2003
Book Part (author)
File in questo prodotto:
File Dimensione Formato  
chp%3A10.1007%2F978-3-540-45216-4_36.pdf

accesso riservato

Tipologia: Publisher's version/PDF
Dimensione 130.75 kB
Formato Adobe PDF
130.75 kB Adobe PDF   Visualizza/Apri   Richiedi una copia
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2434/434684
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 1
  • ???jsp.display-item.citation.isi??? 1
social impact