IRIS Institutional Research Information System - AIR Archivio Istituzionale della Ricerca

Data-driven identification and functional characterization of human transcripts and proteins remain challenging tasks in the post-genomics era. Transcriptional and post-transcriptional regulation mechanisms hugely increase RNA isoform diversity, while their contribution to protein synthesis remains vastly unexplored. Moreover, the transcriptome composition changes in different human cell types, tissues, and conditions. Therefore, there is great need for unbiased, dataset-specific annotation efforts. In this regard, transcriptomic and proteomic methods can help elucidate transcript functions and the detection of actively translated Open Reading Frames (ORFs). The main goal of this project is the development of methods for de novo identifications of RNAs, ORFs, and proteins directly from the data. We implement a pipeline which couples de novo transcriptome assembly, de novo ORF detection, and proteome characterization using proteogenomic approaches. Furthermore, we devise computational strategies for the evaluation of de novo detection from RNA to protein. By using our pipeline, we characterize the effects of DUX4 activation in human skeletal muscle cells as a model for facioscapulohumeral muscular dystrophy (FSHD). Our results show that misexpression of DUX4, which encodes an embryonic transcription factor, impairs RNA metabolism by inhibiting Nonsense-Mediated Decay, thus leading to the accumulation of incomplete transcripts and truncated proteins. De novo transcriptome assembly allows detection of several unannotated genes and transcripts, including potential novel DUX4 targets, whereas de novo ORF finding reveals the presence of translated ORFs within novel transcripts. By using a custom protein database and a deep Tandem Mass Tag (TMT)-labeling proteomics dataset, we identify upregulated novel proteins with evidence for RNA expression in patient-derived data. When further extending the transcriptome annotation by adding long-read-derived transcripts and genes, we find a modest increase in the number of detected changes, showcasing the power of de novo approaches even with short read data. Moreover, we analyze how transcript and ORF expression levels as well as the choice of annotation and protein database influence downstream analysis. In conclusion, our data analysis strategy allows an improved characterization of the functions of the transcribed genome. We characterize the effects of a gene misexpression on RNA metabolism and on the proteome, we identify novel targets in a rare disease, and we investigate the factors influencing results of our unbiased analyses from RNA to Protein.

A DE NOVO COMPUTATIONAL DISCOVERY PLATFORM FROM RNA TO PROTEIN / R. Albanese ; tutor: L. Calviello ; coordinator: D. Pasini ; internal advisor: F. Nicassio. Dipartimento di Scienze della Salute, 2026. 37. ciclo, Anno Accademico 2024/2025.

A DE NOVO COMPUTATIONAL DISCOVERY PLATFORM FROM RNA TO PROTEIN

R. Albanese

2026

Abstract

Data-driven identification and functional characterization of human transcripts and proteins remain challenging tasks in the post-genomics era. Transcriptional and post-transcriptional regulation mechanisms hugely increase RNA isoform diversity, while their contribution to protein synthesis remains vastly unexplored. Moreover, the transcriptome composition changes in different human cell types, tissues, and conditions. Therefore, there is great need for unbiased, dataset-specific annotation efforts. In this regard, transcriptomic and proteomic methods can help elucidate transcript functions and the detection of actively translated Open Reading Frames (ORFs). The main goal of this project is the development of methods for de novo identifications of RNAs, ORFs, and proteins directly from the data. We implement a pipeline which couples de novo transcriptome assembly, de novo ORF detection, and proteome characterization using proteogenomic approaches. Furthermore, we devise computational strategies for the evaluation of de novo detection from RNA to protein. By using our pipeline, we characterize the effects of DUX4 activation in human skeletal muscle cells as a model for facioscapulohumeral muscular dystrophy (FSHD). Our results show that misexpression of DUX4, which encodes an embryonic transcription factor, impairs RNA metabolism by inhibiting Nonsense-Mediated Decay, thus leading to the accumulation of incomplete transcripts and truncated proteins. De novo transcriptome assembly allows detection of several unannotated genes and transcripts, including potential novel DUX4 targets, whereas de novo ORF finding reveals the presence of translated ORFs within novel transcripts. By using a custom protein database and a deep Tandem Mass Tag (TMT)-labeling proteomics dataset, we identify upregulated novel proteins with evidence for RNA expression in patient-derived data. When further extending the transcriptome annotation by adding long-read-derived transcripts and genes, we find a modest increase in the number of detected changes, showcasing the power of de novo approaches even with short read data. Moreover, we analyze how transcript and ORF expression levels as well as the choice of annotation and protein database influence downstream analysis. In conclusion, our data analysis strategy allows an improved characterization of the functions of the transcribed genome. We characterize the effects of a gene misexpression on RNA metabolism and on the proteome, we identify novel targets in a rare disease, and we investigate the factors influencing results of our unbiased analyses from RNA to Protein.

Scheda breve

Scheda completa

Scheda completa (DC)

	Data di discussione
	
				2026
			
	Settori scientifico-disciplinari della tesi (validi dal 09/05/2024)
	
				Settore BIOS-07/A - Biochimica
			
	Parole chiave
	
				transcriptomics; translation; proteomics; proteogenomics; bioinformatics
			
	Supervisori e coordinatori afferenti all'Ateneo
	
				PASINI, DIEGO
			
	Tipologia
	
				Doctoral Thesis
			
	Citazione
	
				A DE NOVO COMPUTATIONAL DISCOVERY PLATFORM FROM RNA TO PROTEIN / R. Albanese ; tutor: L. Calviello ; 
coordinator: D. Pasini ; 
internal advisor: F. Nicassio. Dipartimento di Scienze della Salute, 2026. 37. ciclo, Anno Accademico 2024/2025.
			
	Appare nelle tipologie:
	
				Tesi di dottorato

File in questo prodotto:

File	Dimensione	Formato
phd_unimi_R13460.pdf accesso aperto Descrizione: Doctoral thesis Tipologia: Publisher's version/PDF Licenza: Creative commons Dimensione 8.8 MB Formato Adobe PDF Visualizza/Apri	8.8 MB	Adobe PDF	Visualizza/Apri

Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2434/1247833

Citazioni

ND

ND

ND

ND

social impact