The complexity of gene expression and the elucidation of the mechanisms involved in its regulation constitute an extremely difficult challenge in modern bioinformatics despite the amount of information made recently available by high-throughput biotechnologies and genome-wide investigations. In this contribution we investigated the effectiveness of ensemble systems for gene expression prediction. The ability of ensemble systems to integrate heterogeneous datasets allows to exploit not only promoter sequence-based datasets, but also other sources of information, such as phylogenetic patterns of regulatory motifs and covalent histone modifications. To this end we collected data from literature, and we predicted the expression class of 2490 S.Cerevisiae genes using an ensemble of Support Vector Machines trained with 4 different sources of data. The experimental results highlighted that improvement in gene expression prediction performances can be obtained by using ensemble systems. Nevertheless, further investigations are required in order to find the best combination of datasets and data fusion methods for gene-expression class prediction.

Predicting gene expression from heterogeneous data / M. Re, G. Valentini - In: CIBB 2009, the sixth International conference on bioinformatics and biostatistics : 15-17 oct. 2009, Genova, Italy : proceedings / [a cura di] F. Masulli, L. Peterson, R. Tagliaferri. - [s.l] : Università degli Studi di Salerno, DMI, 2009. - ISBN 9788890353727. (( Intervento presentato al 6. convegno International Meeting on Computational Intelligence Methods for Bioinformatics and Biostatistics (CIBB) tenutosi a Genova nel 2009.

Predicting gene expression from heterogeneous data

M. Re
Primo
;
G. Valentini
Ultimo
2009

Abstract

The complexity of gene expression and the elucidation of the mechanisms involved in its regulation constitute an extremely difficult challenge in modern bioinformatics despite the amount of information made recently available by high-throughput biotechnologies and genome-wide investigations. In this contribution we investigated the effectiveness of ensemble systems for gene expression prediction. The ability of ensemble systems to integrate heterogeneous datasets allows to exploit not only promoter sequence-based datasets, but also other sources of information, such as phylogenetic patterns of regulatory motifs and covalent histone modifications. To this end we collected data from literature, and we predicted the expression class of 2490 S.Cerevisiae genes using an ensemble of Support Vector Machines trained with 4 different sources of data. The experimental results highlighted that improvement in gene expression prediction performances can be obtained by using ensemble systems. Nevertheless, further investigations are required in order to find the best combination of datasets and data fusion methods for gene-expression class prediction.
Settore INF/01 - Informatica
2009
Book Part (author)
File in questo prodotto:
File Dimensione Formato  
re-vale-cibb09.5-2.pdf

accesso aperto

Tipologia: Pre-print (manoscritto inviato all'editore)
Dimensione 185.1 kB
Formato Adobe PDF
185.1 kB Adobe PDF Visualizza/Apri
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2434/178278
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact