Bayesian Optimization Improves Tissue-Specific Prediction of Active Regulatory Regions with Deep Neural Networks

Cappelletti, L.; Petrini, A.; Gliozzo, J.; Casiraghi, E.; Schubach, M.; Kircher, M.; Valentini, G.

doi:10.1007/978-3-030-45385-5_54

The annotation and characterization of tissue-specific cis-regulatory elements (CREs) in non-coding DNA represents an open challenge in computational genomics. Several prior works show that machine learning methods, using epigenetic or spectral features directly extracted from DNA sequences, can predict active promoters and enhancers in specific tissues or cell lines. In particular, very recently deep-learning techniques obtained state-of-the-art results in this challenging computational task. In this study, we provide additional evidence that Feed Forward Neural Networks (FFNN) trained on epigenetic data and one-dimensional convolutional neural networks (CNN) trained on DNA sequence data can successfully predict active regulatory regions in different cell lines. We show that model selection by means of Bayesian optimization applied to both FFNN and CNN models can significantly improve deep neural network performance, by automatically finding models that best fit the data. Further, we show that techniques applied to balance active and non-active regulatory regions in the human genome in training and test data may lead to over-optimistic or poor predictions. We recommend to use actual imbalanced data that was not used to train the models for evaluating their generalization performance.

Bayesian Optimization Improves Tissue-Specific Prediction of Active Regulatory Regions with Deep Neural Networks / L. Cappelletti, A. Petrini, J. Gliozzo, E. Casiraghi, M. Schubach, M. Kircher, G. Valentini (LECTURE NOTES IN COMPUTER SCIENCE). - In: Bioinformatics and Biomedical Engineering / [a cura di] I. Rojas, O. Valenzuela, F. Rojas, L.J. Herrera, F. Ortuño. - Prima edizione. - [s.l] : Springer, 2020. - ISBN 9783030453848. - pp. 600-612 (( Intervento presentato al 8. convegno International Work-Conference on Bioinformatics and Biomedical Engineering tenutosi a Granada nel 2020 [10.1007/978-3-030-45385-5_54].

Bayesian Optimization Improves Tissue-Specific Prediction of Active Regulatory Regions with Deep Neural Networks

L. Cappelletti^Co-primo;A. Petrini^Co-primo;J. Gliozzo;E. Casiraghi;Schubach M.;Kircher M.;G. Valentini^Ultimo

2020

Abstract

The annotation and characterization of tissue-specific cis-regulatory elements (CREs) in non-coding DNA represents an open challenge in computational genomics. Several prior works show that machine learning methods, using epigenetic or spectral features directly extracted from DNA sequences, can predict active promoters and enhancers in specific tissues or cell lines. In particular, very recently deep-learning techniques obtained state-of-the-art results in this challenging computational task. In this study, we provide additional evidence that Feed Forward Neural Networks (FFNN) trained on epigenetic data and one-dimensional convolutional neural networks (CNN) trained on DNA sequence data can successfully predict active regulatory regions in different cell lines. We show that model selection by means of Bayesian optimization applied to both FFNN and CNN models can significantly improve deep neural network performance, by automatically finding models that best fit the data. Further, we show that techniques applied to balance active and non-active regulatory regions in the human genome in training and test data may lead to over-optimistic or poor predictions. We recommend to use actual imbalanced data that was not used to train the models for evaluating their generalization performance.

Scheda breve

Scheda completa

Scheda completa (DC)

	Parole chiave
	
				Deep neural networks; Genomic medicine; Regulatory region prediction; machine learning
			
	Settori scientifico-disciplinari del contributo (sola visualizzazione)
	
				Settore INF/01 - Informatica
			
	Settori scientifico-disciplinari del contributo (validi dal 09/05/2024)
	
				Settore INFO-01/A - Informatica
			
	Data di pubblicazione
	
				2020
			
	DOI
	
				https://dx.doi.org/10.1007/978-3-030-45385-5_54
			
	Tipologia
	
				Book Part (author)
			
	Appare nelle tipologie:
	
				03 - Contributo in volume

File in questo prodotto:

File	Dimensione	Formato
IWBBIO_147.pdf accesso aperto Tipologia: Post-print, accepted manuscript ecc. (versione accettata dall'editore) Licenza: Creative commons Dimensione 1.39 MB Formato Adobe PDF Visualizza/Apri	1.39 MB	Adobe PDF	Visualizza/Apri
Cappelletti2020_Chapter_BayesianOptimizationImprovesTi.pdf accesso riservato Tipologia: Publisher's version/PDF Licenza: Nessuna licenza Dimensione 1.93 MB Formato Adobe PDF Visualizza/Apri Richiedi una copia	1.93 MB	Adobe PDF	Visualizza/Apri Richiedi una copia

Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2434/778429

Citazioni

ND

2

1

ND

IRIS Institutional Research Information System - AIR Archivio Istituzionale della Ricerca