In the last few years the introduction of novel technologies known as “next-generation sequencing” (NGS) has brought a major step forward in sequencing. These techniques have practically supplanted the conventional Sanger strategies that have been the principal method of sequencing DNA since the late 1970s. Different NGS platforms have been introduced, with the newest using ion-sensitive sensors to detect the incorporation of bases as opposed to the more commonly used fluorescent labelled nucleotides. Since the first techniques were introduced, both the sequencing runtime and the cost per sequenced base have dramatically decreased, and, at the current state of the art, a complete human genome can be fully sequenced in under 24 hours. On the other hand, the ever-increasing amount of short sequences (or reads) yielded per single run makes the processing of the data more difficult and challenging from a computational point of view. One of the most prominent and promising fields of application is RNA-Seq, an assay that provides a fast and reliable way to study transcriptomic variability on a whole-genome scale. Generally, in a RNA-Seq experiment, a RNA sample is converted in a cDNA library, which then undergoes several cycles of sequencing with a NGS method of choice. Usually, the resulting sequences are either mapped on the reference genome or assembled de novo without the aid of genomic sequence to produce a genome-scale transcription map, or trascriptome. The data analyzed in this thesis comes from a three year research project focused on the characterization of tissue- and individual-specific alternative splicing, and its regulation. Data consist of several RNA-Seq experiments performed on different human tissues, coming from three healthy individuals. A total of 18 sets of data (6 tissues from three individuals with 3 replicates for each) were studied. The work initially focused on the quantification of mitochondrial DNA and RNA in the six individuals, and its variability. Then, we developed a computational method for the identification of tissue- and individual- specific transcripts, able to perform a multi-sample comparison. The algorithm we implemented employs statistical test based on a variant of Shannon’s information entropy, in order to identify transcripts with an expression pattern presenting a significant bias towards one or more of the samples studied. The results obtained show the method to be robust and efficient, overcoming the need of performing pairwise comparison as with the algorithms currently available, providing a thorough and complete map of the extent of tissue-specificity of gene expression at the single individual level.

BIOINFORMATIC TOOLS FOR NEXT GENERATION TRANSCRIPTOMICS / G.m. Prazzoli ; tutor: G. Pavesi. - : . DIPARTIMENTO DI BIOSCIENZE, 2015 May 29. ((27. ciclo, Anno Accademico 2014. [10.13130/prazzoli-gian-marco_phd2015-05-29].

BIOINFORMATIC TOOLS FOR NEXT GENERATION TRANSCRIPTOMICS

G.M. Prazzoli
2015-05-29

Abstract

In the last few years the introduction of novel technologies known as “next-generation sequencing” (NGS) has brought a major step forward in sequencing. These techniques have practically supplanted the conventional Sanger strategies that have been the principal method of sequencing DNA since the late 1970s. Different NGS platforms have been introduced, with the newest using ion-sensitive sensors to detect the incorporation of bases as opposed to the more commonly used fluorescent labelled nucleotides. Since the first techniques were introduced, both the sequencing runtime and the cost per sequenced base have dramatically decreased, and, at the current state of the art, a complete human genome can be fully sequenced in under 24 hours. On the other hand, the ever-increasing amount of short sequences (or reads) yielded per single run makes the processing of the data more difficult and challenging from a computational point of view. One of the most prominent and promising fields of application is RNA-Seq, an assay that provides a fast and reliable way to study transcriptomic variability on a whole-genome scale. Generally, in a RNA-Seq experiment, a RNA sample is converted in a cDNA library, which then undergoes several cycles of sequencing with a NGS method of choice. Usually, the resulting sequences are either mapped on the reference genome or assembled de novo without the aid of genomic sequence to produce a genome-scale transcription map, or trascriptome. The data analyzed in this thesis comes from a three year research project focused on the characterization of tissue- and individual-specific alternative splicing, and its regulation. Data consist of several RNA-Seq experiments performed on different human tissues, coming from three healthy individuals. A total of 18 sets of data (6 tissues from three individuals with 3 replicates for each) were studied. The work initially focused on the quantification of mitochondrial DNA and RNA in the six individuals, and its variability. Then, we developed a computational method for the identification of tissue- and individual- specific transcripts, able to perform a multi-sample comparison. The algorithm we implemented employs statistical test based on a variant of Shannon’s information entropy, in order to identify transcripts with an expression pattern presenting a significant bias towards one or more of the samples studied. The results obtained show the method to be robust and efficient, overcoming the need of performing pairwise comparison as with the algorithms currently available, providing a thorough and complete map of the extent of tissue-specificity of gene expression at the single individual level.
PAVESI, GIULIO
Settore BIO/11 - Biologia Molecolare
BIOINFORMATIC TOOLS FOR NEXT GENERATION TRANSCRIPTOMICS / G.m. Prazzoli ; tutor: G. Pavesi. - : . DIPARTIMENTO DI BIOSCIENZE, 2015 May 29. ((27. ciclo, Anno Accademico 2014. [10.13130/prazzoli-gian-marco_phd2015-05-29].
Doctoral Thesis
File in questo prodotto:
File Dimensione Formato  
phd_unimi_R09625.pdf

accesso aperto

Descrizione: Tesi di dottorato completa di articoli scientifici pubblicati
Tipologia: Tesi di dottorato completa
Dimensione 18.11 MB
Formato Adobe PDF
18.11 MB Adobe PDF Visualizza/Apri
Pubblicazioni consigliate

Caricamento pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: http://hdl.handle.net/2434/275276
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact