The complexity of gene expression and the elucidation of the mechanisms involved in its regulation constitute an extremely difficult challenge in modern bioinformatics despite the amount of information made recently available by high-throughput biotechnologies and genome-wide investigations. In this contribution we investigated the effectiveness of ensemble systems for gene expression prediction. The ability of ensemble systems to integrate heterogeneous datasets allows to exploit not only promoter sequence-based datasets, but also other sources of information, such as phylogenetic patterns of regulatory motifs and covalent histone modifications. To this end we collected data from literature, and we predicted the expression class of 2490 S.Cerevisiae genes using an ensemble of Support Vector Machines trained with 4 different sources of data. The experimental results highlighted that improvement in gene expression prediction performances can be obtained by using ensemble systems. Nevertheless, further investigations are required in order to find the best combination of datasets and data fusion methods for gene-expression class prediction.
Predicting gene expression from heterogeneous data / M. Re, G. Valentini - In: CIBB 2009, the sixth International conference on bioinformatics and biostatistics : 15-17 oct. 2009, Genova, Italy : proceedings / [a cura di] F. Masulli, L. Peterson, R. Tagliaferri. - [s.l] : Università degli Studi di Salerno, DMI, 2009. - ISBN 9788890353727. (( Intervento presentato al 6. convegno International Meeting on Computational Intelligence Methods for Bioinformatics and Biostatistics (CIBB) tenutosi a Genova nel 2009.
Predicting gene expression from heterogeneous data
M. RePrimo
;G. ValentiniUltimo
2009
Abstract
The complexity of gene expression and the elucidation of the mechanisms involved in its regulation constitute an extremely difficult challenge in modern bioinformatics despite the amount of information made recently available by high-throughput biotechnologies and genome-wide investigations. In this contribution we investigated the effectiveness of ensemble systems for gene expression prediction. The ability of ensemble systems to integrate heterogeneous datasets allows to exploit not only promoter sequence-based datasets, but also other sources of information, such as phylogenetic patterns of regulatory motifs and covalent histone modifications. To this end we collected data from literature, and we predicted the expression class of 2490 S.Cerevisiae genes using an ensemble of Support Vector Machines trained with 4 different sources of data. The experimental results highlighted that improvement in gene expression prediction performances can be obtained by using ensemble systems. Nevertheless, further investigations are required in order to find the best combination of datasets and data fusion methods for gene-expression class prediction.File | Dimensione | Formato | |
---|---|---|---|
re-vale-cibb09.5-2.pdf
accesso aperto
Tipologia:
Pre-print (manoscritto inviato all'editore)
Dimensione
185.1 kB
Formato
Adobe PDF
|
185.1 kB | Adobe PDF | Visualizza/Apri |
Pubblicazioni consigliate
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.