During my PhD, I focused on the development and application of bioinformatics methods for the analysis of data produced by single-cell technologies, with a particular focus on single-cell RNA sequencing (scRNA-seq). A critical challenge in this field is the automatic annotation of cell identity, where no standardized strategy currently exists due to inherent limitations such as data sparsity, low transcript detection, and dependence on heterogeneous reference datasets. To address this gap, I developed SCARLET, a novel probabilistic framework that defines cel- lular identity as the transcriptional program that most likely explains a cell’s gene expression profile. SCARLET integrates bootstrap-based gene selection with a mutual-information ap- proach to derive transcriptional programs and quantify their similarity, and employs likelihood- based modeling to assign cells to one or multiple plausible identities. Unlike existing methods, SCARLET explicitly accounts for the limitations of scRNA-seq. Benchmarking confirmed its competitive or superior performance relative to state-of-the-art tools. Applied to large-scale datasets encompassing more than 5 million cells, SCARLET reduced redundant labels while preserving annotation concordance, demonstrating both scalability and robustness. Beyond methodological advances, I applied single-cell approaches to two biological contexts. In rice shoot apical meristems, snRNA-seq analysis revealed transcription factor families dy- namically regulating floral transition, while underscoring current obstacles in plant single-cell research, including incomplete transcriptome coverage and limited reference resources. In breast cancer, scCROP-seq analysis uncovered mechanisms of tumor heterogeneity and evolv- ability under estrogen deprivation, identifying both perturbation-specific and shared transcrip- tional programs that illuminate adaptive survival strategies. Overall, this work advances computational strategies for cell identity annotation, delivers new biological insights, and highlights the need for improved data quality, multi-omics integration, and broader application of single-cell methods across diverse organisms.

DEVELOPMENT AND APPLICATION OF COMPUTATIONAL APPROACHES FOR SINGLE CELL RNA SEQUENCING DATA ANALYSIS / D. Traversa ; supervisore: M. Chiara; scientific committee progress report: I. Barozzi ; scientific committee progress report: M. Cereda, S. Ricagno ; scientific committee progress report: M. Delledonne ; external evaluator: I. Barozzi ; external evaluator: D. A. Silvestris. Dipartimento di Bioscienze, 2026 Mar 27. 38. ciclo, Anno Accademico 2024/2025.

DEVELOPMENT AND APPLICATION OF COMPUTATIONAL APPROACHES FOR SINGLE CELL RNA SEQUENCING DATA ANALYSIS

D. Traversa
2026

Abstract

During my PhD, I focused on the development and application of bioinformatics methods for the analysis of data produced by single-cell technologies, with a particular focus on single-cell RNA sequencing (scRNA-seq). A critical challenge in this field is the automatic annotation of cell identity, where no standardized strategy currently exists due to inherent limitations such as data sparsity, low transcript detection, and dependence on heterogeneous reference datasets. To address this gap, I developed SCARLET, a novel probabilistic framework that defines cel- lular identity as the transcriptional program that most likely explains a cell’s gene expression profile. SCARLET integrates bootstrap-based gene selection with a mutual-information ap- proach to derive transcriptional programs and quantify their similarity, and employs likelihood- based modeling to assign cells to one or multiple plausible identities. Unlike existing methods, SCARLET explicitly accounts for the limitations of scRNA-seq. Benchmarking confirmed its competitive or superior performance relative to state-of-the-art tools. Applied to large-scale datasets encompassing more than 5 million cells, SCARLET reduced redundant labels while preserving annotation concordance, demonstrating both scalability and robustness. Beyond methodological advances, I applied single-cell approaches to two biological contexts. In rice shoot apical meristems, snRNA-seq analysis revealed transcription factor families dy- namically regulating floral transition, while underscoring current obstacles in plant single-cell research, including incomplete transcriptome coverage and limited reference resources. In breast cancer, scCROP-seq analysis uncovered mechanisms of tumor heterogeneity and evolv- ability under estrogen deprivation, identifying both perturbation-specific and shared transcrip- tional programs that illuminate adaptive survival strategies. Overall, this work advances computational strategies for cell identity annotation, delivers new biological insights, and highlights the need for improved data quality, multi-omics integration, and broader application of single-cell methods across diverse organisms.
27-mar-2026
Settore BIOS-08/A - Biologia molecolare
Bioinformatics; tool; scRNA-sequencing; cell identity; cell type; programming; scCROP-seq; bioinformatics tool; review; automatic identification of cell types; likelihood; bootstrap; mutual information; review; single cell technologies; rise meristems; estrogen deprivation; cancer; perturbation; molecular biology
CHIARA, MATTEO
RICAGNO, STEFANO
Doctoral Thesis
DEVELOPMENT AND APPLICATION OF COMPUTATIONAL APPROACHES FOR SINGLE CELL RNA SEQUENCING DATA ANALYSIS / D. Traversa ; supervisore: M. Chiara; scientific committee progress report: I. Barozzi ; scientific committee progress report: M. Cereda, S. Ricagno ; scientific committee progress report: M. Delledonne ; external evaluator: I. Barozzi ; external evaluator: D. A. Silvestris. Dipartimento di Bioscienze, 2026 Mar 27. 38. ciclo, Anno Accademico 2024/2025.
File in questo prodotto:
File Dimensione Formato  
phd_unimi_R13894.pdf

accesso aperto

Descrizione: Tesi dottorato
Tipologia: Publisher's version/PDF
Licenza: Creative commons
Dimensione 185.96 MB
Formato Adobe PDF
185.96 MB Adobe PDF Visualizza/Apri
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2434/1229395
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
  • OpenAlex ND
social impact