Data-driven identification and functional characterization of human transcripts and proteins remain challenging tasks in the post-genomics era. Transcriptional and post-transcriptional regulation mechanisms hugely increase RNA isoform diversity, while their contribution to protein synthesis remains vastly unexplored. Moreover, the transcriptome composition changes in different human cell types, tissues, and conditions. Therefore, there is great need for unbiased, dataset-specific annotation efforts. In this regard, transcriptomic and proteomic methods can help elucidate transcript functions and the detection of actively translated Open Reading Frames (ORFs). The main goal of this project is the development of methods for de novo identifications of RNAs, ORFs, and proteins directly from the data. We implement a pipeline which couples de novo transcriptome assembly, de novo ORF detection, and proteome characterization using proteogenomic approaches. Furthermore, we devise computational strategies for the evaluation of de novo detection from RNA to protein. By using our pipeline, we characterize the effects of DUX4 activation in human skeletal muscle cells as a model for facioscapulohumeral muscular dystrophy (FSHD). Our results show that misexpression of DUX4, which encodes an embryonic transcription factor, impairs RNA metabolism by inhibiting Nonsense-Mediated Decay, thus leading to the accumulation of incomplete transcripts and truncated proteins. De novo transcriptome assembly allows detection of several unannotated genes and transcripts, including potential novel DUX4 targets, whereas de novo ORF finding reveals the presence of translated ORFs within novel transcripts. By using a custom protein database and a deep Tandem Mass Tag (TMT)-labeling proteomics dataset, we identify upregulated novel proteins with evidence for RNA expression in patient-derived data. When further extending the transcriptome annotation by adding long-read-derived transcripts and genes, we find a modest increase in the number of detected changes, showcasing the power of de novo approaches even with short read data. Moreover, we analyze how transcript and ORF expression levels as well as the choice of annotation and protein database influence downstream analysis. In conclusion, our data analysis strategy allows an improved characterization of the functions of the transcribed genome. We characterize the effects of a gene misexpression on RNA metabolism and on the proteome, we identify novel targets in a rare disease, and we investigate the factors influencing results of our unbiased analyses from RNA to Protein.

A DE NOVO COMPUTATIONAL DISCOVERY PLATFORM FROM RNA TO PROTEIN / R. Albanese ; tutor: L. Calviello ; coordinator: D. Pasini ; internal advisor: F. Nicassio. Dipartimento di Scienze della Salute, 2026. 37. ciclo, Anno Accademico 2024/2025.

A DE NOVO COMPUTATIONAL DISCOVERY PLATFORM FROM RNA TO PROTEIN

R. Albanese
2026

Abstract

Data-driven identification and functional characterization of human transcripts and proteins remain challenging tasks in the post-genomics era. Transcriptional and post-transcriptional regulation mechanisms hugely increase RNA isoform diversity, while their contribution to protein synthesis remains vastly unexplored. Moreover, the transcriptome composition changes in different human cell types, tissues, and conditions. Therefore, there is great need for unbiased, dataset-specific annotation efforts. In this regard, transcriptomic and proteomic methods can help elucidate transcript functions and the detection of actively translated Open Reading Frames (ORFs). The main goal of this project is the development of methods for de novo identifications of RNAs, ORFs, and proteins directly from the data. We implement a pipeline which couples de novo transcriptome assembly, de novo ORF detection, and proteome characterization using proteogenomic approaches. Furthermore, we devise computational strategies for the evaluation of de novo detection from RNA to protein. By using our pipeline, we characterize the effects of DUX4 activation in human skeletal muscle cells as a model for facioscapulohumeral muscular dystrophy (FSHD). Our results show that misexpression of DUX4, which encodes an embryonic transcription factor, impairs RNA metabolism by inhibiting Nonsense-Mediated Decay, thus leading to the accumulation of incomplete transcripts and truncated proteins. De novo transcriptome assembly allows detection of several unannotated genes and transcripts, including potential novel DUX4 targets, whereas de novo ORF finding reveals the presence of translated ORFs within novel transcripts. By using a custom protein database and a deep Tandem Mass Tag (TMT)-labeling proteomics dataset, we identify upregulated novel proteins with evidence for RNA expression in patient-derived data. When further extending the transcriptome annotation by adding long-read-derived transcripts and genes, we find a modest increase in the number of detected changes, showcasing the power of de novo approaches even with short read data. Moreover, we analyze how transcript and ORF expression levels as well as the choice of annotation and protein database influence downstream analysis. In conclusion, our data analysis strategy allows an improved characterization of the functions of the transcribed genome. We characterize the effects of a gene misexpression on RNA metabolism and on the proteome, we identify novel targets in a rare disease, and we investigate the factors influencing results of our unbiased analyses from RNA to Protein.
2026
Settore BIOS-07/A - Biochimica
transcriptomics; translation; proteomics; proteogenomics; bioinformatics
PASINI, DIEGO
Doctoral Thesis
A DE NOVO COMPUTATIONAL DISCOVERY PLATFORM FROM RNA TO PROTEIN / R. Albanese ; tutor: L. Calviello ; coordinator: D. Pasini ; internal advisor: F. Nicassio. Dipartimento di Scienze della Salute, 2026. 37. ciclo, Anno Accademico 2024/2025.
File in questo prodotto:
File Dimensione Formato  
phd_unimi_R13460.pdf

accesso aperto

Descrizione: Doctoral thesis
Tipologia: Publisher's version/PDF
Licenza: Creative commons
Dimensione 8.8 MB
Formato Adobe PDF
8.8 MB Adobe PDF Visualizza/Apri
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2434/1247833
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
  • OpenAlex ND
social impact