IRIS Institutional Research Information System - AIR Archivio Istituzionale della Ricerca

In the last few years the introduction of novel technologies known as “next-generation sequencing” (NGS) has brought a major step forward in sequencing. These techniques have practically supplanted the conventional Sanger strategies that have been the principal method of sequencing DNA since the late 1970s. Different NGS platforms have been introduced, with the newest using ion-sensitive sensors to detect the incorporation of bases as opposed to the more commonly used fluorescent labelled nucleotides. Since the first techniques were introduced, both the sequencing runtime and the cost per sequenced base have dramatically decreased, and, at the current state of the art, a complete human genome can be fully sequenced in under 24 hours. On the other hand, the ever-increasing amount of short sequences (or reads) yielded per single run makes the processing of the data more difficult and challenging from a computational point of view. One of the most prominent and promising fields of application is RNA-Seq, an assay that provides a fast and reliable way to study transcriptomic variability on a whole-genome scale. Generally, in a RNA-Seq experiment, a RNA sample is converted in a cDNA library, which then undergoes several cycles of sequencing with a NGS method of choice. Usually, the resulting sequences are either mapped on the reference genome or assembled de novo without the aid of genomic sequence to produce a genome-scale transcription map, or trascriptome. The data analyzed in this thesis comes from a three year research project focused on the characterization of tissue- and individual-specific alternative splicing, and its regulation. Data consist of several RNA-Seq experiments performed on different human tissues, coming from three healthy individuals. A total of 18 sets of data (6 tissues from three individuals with 3 replicates for each) were studied. The work initially focused on the quantification of mitochondrial DNA and RNA in the six individuals, and its variability. Then, we developed a computational method for the identification of tissue- and individual- specific transcripts, able to perform a multi-sample comparison. The algorithm we implemented employs statistical test based on a variant of Shannon’s information entropy, in order to identify transcripts with an expression pattern presenting a significant bias towards one or more of the samples studied. The results obtained show the method to be robust and efficient, overcoming the need of performing pairwise comparison as with the algorithms currently available, providing a thorough and complete map of the extent of tissue-specificity of gene expression at the single individual level.

BIOINFORMATIC TOOLS FOR NEXT GENERATION TRANSCRIPTOMICS / G.m. Prazzoli ; tutor: G. Pavesi. DIPARTIMENTO DI BIOSCIENZE, 2015 May 29. 27. ciclo, Anno Accademico 2014. [10.13130/prazzoli-gian-marco_phd2015-05-29].

BIOINFORMATIC TOOLS FOR NEXT GENERATION TRANSCRIPTOMICS

G.M. Prazzoli

2015

Abstract

In the last few years the introduction of novel technologies known as “next-generation sequencing” (NGS) has brought a major step forward in sequencing. These techniques have practically supplanted the conventional Sanger strategies that have been the principal method of sequencing DNA since the late 1970s. Different NGS platforms have been introduced, with the newest using ion-sensitive sensors to detect the incorporation of bases as opposed to the more commonly used fluorescent labelled nucleotides. Since the first techniques were introduced, both the sequencing runtime and the cost per sequenced base have dramatically decreased, and, at the current state of the art, a complete human genome can be fully sequenced in under 24 hours. On the other hand, the ever-increasing amount of short sequences (or reads) yielded per single run makes the processing of the data more difficult and challenging from a computational point of view. One of the most prominent and promising fields of application is RNA-Seq, an assay that provides a fast and reliable way to study transcriptomic variability on a whole-genome scale. Generally, in a RNA-Seq experiment, a RNA sample is converted in a cDNA library, which then undergoes several cycles of sequencing with a NGS method of choice. Usually, the resulting sequences are either mapped on the reference genome or assembled de novo without the aid of genomic sequence to produce a genome-scale transcription map, or trascriptome. The data analyzed in this thesis comes from a three year research project focused on the characterization of tissue- and individual-specific alternative splicing, and its regulation. Data consist of several RNA-Seq experiments performed on different human tissues, coming from three healthy individuals. A total of 18 sets of data (6 tissues from three individuals with 3 replicates for each) were studied. The work initially focused on the quantification of mitochondrial DNA and RNA in the six individuals, and its variability. Then, we developed a computational method for the identification of tissue- and individual- specific transcripts, able to perform a multi-sample comparison. The algorithm we implemented employs statistical test based on a variant of Shannon’s information entropy, in order to identify transcripts with an expression pattern presenting a significant bias towards one or more of the samples studied. The results obtained show the method to be robust and efficient, overcoming the need of performing pairwise comparison as with the algorithms currently available, providing a thorough and complete map of the extent of tissue-specificity of gene expression at the single individual level.

Scheda breve

Scheda completa

Scheda completa (DC)

	Data di discussione
	
				29-mag-2015
			
	Settori scientifico-disciplinari della tesi (sola visualizzazione)
	
				Settore BIO/11 - Biologia Molecolare
			
	Tutor afferenti all'Ateneo
	
				PAVESI, GIULIO
			
	Tipologia
	
				Doctoral Thesis
			
	Citazione
	
				BIOINFORMATIC TOOLS FOR NEXT GENERATION TRANSCRIPTOMICS / G.m. Prazzoli ; tutor: G. Pavesi. DIPARTIMENTO DI BIOSCIENZE, 2015 May 29. 27. ciclo, Anno Accademico 2014. [10.13130/prazzoli-gian-marco_phd2015-05-29].
			
	Appare nelle tipologie:
	
				Tesi di dottorato

File in questo prodotto:

File	Dimensione	Formato
phd_unimi_R09625.pdf accesso aperto Descrizione: Tesi di dottorato completa di articoli scientifici pubblicati Tipologia: Tesi di dottorato completa Dimensione 18.11 MB Formato Adobe PDF Visualizza/Apri	18.11 MB	Adobe PDF	Visualizza/Apri

Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2434/275276

Citazioni

ND

ND

ND

ND

social impact