In this work we propose new ensemble methods for the hierarchical classification of gene functions. Our methods exploit the hierarchical relationships between the classes in different ways: each ensemble node is trained “locally”, according to its position in the hierarchy; moreover, in the evaluation phase the set of predicted annotations is built so to minimize a global loss function defined over the hierarchy. We also address the problem of sparsity of annotations by introducing a cost- sensitive parameter that allows to control the precision-recall trade-off. Experiments with the model organism S. cerevisiae, using the FunCat taxonomy and 7 biomolecular data sets, reveal a significant advantage of our techniques over “flat” and cost-insensitive hierarchical ensembles.
Hierarchical cost-sensitive algorithms for genome-wide gene function prediction / N. Cesa Bianchi, G. Valentini - In: Machine learning in systems biology : proceedings of the third international workshop : september 5-6, 2009, Ljubljana, Slovenia / [a cura di] S. Dzeroski, P. Geurts, J. Rousu. - Helsinki : University of Helsinky, Department of Computer Science, 2009. - ISBN 9789521056994. - pp. 25-34 (( Intervento presentato al 3. convegno International Workshop on Machine Learning in Systems Biology tenutosi a Ljubljana, Slovenia nel 2009.
Hierarchical cost-sensitive algorithms for genome-wide gene function prediction
N. Cesa BianchiPrimo
;G. ValentiniUltimo
2009
Abstract
In this work we propose new ensemble methods for the hierarchical classification of gene functions. Our methods exploit the hierarchical relationships between the classes in different ways: each ensemble node is trained “locally”, according to its position in the hierarchy; moreover, in the evaluation phase the set of predicted annotations is built so to minimize a global loss function defined over the hierarchy. We also address the problem of sparsity of annotations by introducing a cost- sensitive parameter that allows to control the precision-recall trade-off. Experiments with the model organism S. cerevisiae, using the FunCat taxonomy and 7 biomolecular data sets, reveal a significant advantage of our techniques over “flat” and cost-insensitive hierarchical ensembles.File | Dimensione | Formato | |
---|---|---|---|
workshop 2009.pdf
accesso aperto
Tipologia:
Publisher's version/PDF
Dimensione
859.87 kB
Formato
Adobe PDF
|
859.87 kB | Adobe PDF | Visualizza/Apri |
Pubblicazioni consigliate
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.