The present thesis is divided in two sections. The first section outlines the scientific work that I have accomplished during the last year of my graduate studies. The goal was to generate a reference genome for the European barn swallow (Hirundo rustica rustica). The barn swallow (Hirundo rustica) is a migratory bird that has been the focus of a large number of ecological, behavioural and genetic studies. To facilitate further population genetics and genomic studies, I have generated a high-quality genome for the European subspecies (Hirundo rustica rustica) using third-generation Single Molecule Real-Time (SMRT) DNA sequencing from Pacific Biosciences (Menlo Park, California, USA) and optical mapping from Bionano Genomics (San Diego, California, USA). For optical mapping, DNA molecules were labelled both with one of the original Nick, Label, Repair and Stain (NLRS) nickases (enzyme Nb.BssSI) and with the new Direct Label and Stain (DLS) approach (enzyme DLE-1). This allowed to compare and integrate optical maps derived both from NLRS and DLS technologies. The latter was officially released in February 2018 and avoids nicking and subsequent cleavage of DNA molecules upon staining. To my knowledge, this has been the first genome assembly to incorporate DLS data and this approach has more than doubled the assembly N50 with respect to the nickase system. Furthermore, the dual enzyme hybrid scaffold led to a marginal increase in scaffold N50 and an overall increase of confidence in scaffolds. After removal of haplotigs, the final assembly is approximately 1.21 Gbp in size, with a N50 value of over 25.95 Mbp. The high genome contiguity achieved represents an improvement over 650-fold with respect to a previously reported assembly based on paired-end short read data, and it is well in excess of those specified for “Platinum genomes” by the Vertebrate Genomes Project. It can therefore constitute a valuable resource for studies concerning the evolution of avian genomes in general as well as for population genetics and genomics in the barn swallow, with the potential for boosting research on the barn swallow biology and ecology at unprecedented speed. This scientific endeavour culminated in a publication that I authored entitled “SMRT long-read sequencing and Direct Label and Stain optical maps allow the generation of a high-quality genome assembly for the European barn swallow (Hirundo rustica rustica)” published in the peer-reviewed journal Gigascience (IF 7.5, 2016). The second section of this thesis presents the methodological work and the conclusions drawn from my - and other collaborators - work on the study of the evolutionary origins of Huntington’s Disease, a genetic neurodegenerative disorder. The study was conducted in the Laboratory of Stem Cell Biology and Pharmacology of Neurodegenerative Diseases directed by Prof. Elena Cattaneo at the University of Milan where I worked for the first two years of my PhD (and also during my Master Thesis work) and whose research effort is on the phylogenetic and biological investigation of HD causative gene. The goal that I wished to achieve with this study, as part of an on-going effort in the host laboratory aimed at tracing Huntington’s Disease-causing gene throughout evolution, was to reconstruct and understand the evolutionary origins of the CAG repeat embedded into the exon 1 of the Htt gene. This goal could be achieved by collecting DNA sequences from orthologous genes in order to allow a comparative analysis of the differences and similarities between the human sequence and that of other animal species. More specifically, existing sequences could be retrieved from public databases and/or assessed directly by sequencing from biological samples. These samples could be made available from already in place or newly established collaborations. Htt exon 1 sequences could then be aligned to each other in a multiple alignment, resulting in a detailed picture of Htt exon 1 CAG repeats along the tree of life. The multiple alignment, when subjected to a bioinformatics analysis of the selective pressures, could be used to elucidate the evolutionary features of this simple repeat. The study was made possible also thanks to a collaboration between Prof. Cattaneo and my Ph.D. thesis supervisor Prof. Nicola Saino. At the time of writing, a manuscript is in preparation reporting part of the data from this work together with other data obtained in the Cattaneo’s laboratory.

THIRD-GENERATION SEQUENCING AND ASSEMBLY OF THE BARN SWALLOW GENOME AND A STUDY ON THE EVOLUTION OF THE HUNTINGTIN GENE / G.p. Formenti ; tutor e coordinatore: N. M. Francesco Saino ; co-tutor: E. Cattaneo. DIPARTIMENTO DI SCIENZE E POLITICHE AMBIENTALI, 2019 Feb 05. 31. ciclo, Anno Accademico 2018. [10.13130/formenti-giulio-paolo_phd2019-02-05].

THIRD-GENERATION SEQUENCING AND ASSEMBLY OF THE BARN SWALLOW GENOME AND A STUDY ON THE EVOLUTION OF THE HUNTINGTIN GENE

G.P. Formenti
2019

Abstract

The present thesis is divided in two sections. The first section outlines the scientific work that I have accomplished during the last year of my graduate studies. The goal was to generate a reference genome for the European barn swallow (Hirundo rustica rustica). The barn swallow (Hirundo rustica) is a migratory bird that has been the focus of a large number of ecological, behavioural and genetic studies. To facilitate further population genetics and genomic studies, I have generated a high-quality genome for the European subspecies (Hirundo rustica rustica) using third-generation Single Molecule Real-Time (SMRT) DNA sequencing from Pacific Biosciences (Menlo Park, California, USA) and optical mapping from Bionano Genomics (San Diego, California, USA). For optical mapping, DNA molecules were labelled both with one of the original Nick, Label, Repair and Stain (NLRS) nickases (enzyme Nb.BssSI) and with the new Direct Label and Stain (DLS) approach (enzyme DLE-1). This allowed to compare and integrate optical maps derived both from NLRS and DLS technologies. The latter was officially released in February 2018 and avoids nicking and subsequent cleavage of DNA molecules upon staining. To my knowledge, this has been the first genome assembly to incorporate DLS data and this approach has more than doubled the assembly N50 with respect to the nickase system. Furthermore, the dual enzyme hybrid scaffold led to a marginal increase in scaffold N50 and an overall increase of confidence in scaffolds. After removal of haplotigs, the final assembly is approximately 1.21 Gbp in size, with a N50 value of over 25.95 Mbp. The high genome contiguity achieved represents an improvement over 650-fold with respect to a previously reported assembly based on paired-end short read data, and it is well in excess of those specified for “Platinum genomes” by the Vertebrate Genomes Project. It can therefore constitute a valuable resource for studies concerning the evolution of avian genomes in general as well as for population genetics and genomics in the barn swallow, with the potential for boosting research on the barn swallow biology and ecology at unprecedented speed. This scientific endeavour culminated in a publication that I authored entitled “SMRT long-read sequencing and Direct Label and Stain optical maps allow the generation of a high-quality genome assembly for the European barn swallow (Hirundo rustica rustica)” published in the peer-reviewed journal Gigascience (IF 7.5, 2016). The second section of this thesis presents the methodological work and the conclusions drawn from my - and other collaborators - work on the study of the evolutionary origins of Huntington’s Disease, a genetic neurodegenerative disorder. The study was conducted in the Laboratory of Stem Cell Biology and Pharmacology of Neurodegenerative Diseases directed by Prof. Elena Cattaneo at the University of Milan where I worked for the first two years of my PhD (and also during my Master Thesis work) and whose research effort is on the phylogenetic and biological investigation of HD causative gene. The goal that I wished to achieve with this study, as part of an on-going effort in the host laboratory aimed at tracing Huntington’s Disease-causing gene throughout evolution, was to reconstruct and understand the evolutionary origins of the CAG repeat embedded into the exon 1 of the Htt gene. This goal could be achieved by collecting DNA sequences from orthologous genes in order to allow a comparative analysis of the differences and similarities between the human sequence and that of other animal species. More specifically, existing sequences could be retrieved from public databases and/or assessed directly by sequencing from biological samples. These samples could be made available from already in place or newly established collaborations. Htt exon 1 sequences could then be aligned to each other in a multiple alignment, resulting in a detailed picture of Htt exon 1 CAG repeats along the tree of life. The multiple alignment, when subjected to a bioinformatics analysis of the selective pressures, could be used to elucidate the evolutionary features of this simple repeat. The study was made possible also thanks to a collaboration between Prof. Cattaneo and my Ph.D. thesis supervisor Prof. Nicola Saino. At the time of writing, a manuscript is in preparation reporting part of the data from this work together with other data obtained in the Cattaneo’s laboratory.
5-feb-2019
Settore BIO/05 - Zoologia
Settore BIO/07 - Ecologia
Settore BIO/11 - Biologia Molecolare
Settore BIO/18 - Genetica
genome; barn swallow; third-generation sequencing; SMRT; long reads; Bionano; DLS; DLE-1; optical maps; single molecule
SAINO, NICOLA MICHELE FRANCESCO
SAINO, NICOLA MICHELE FRANCESCO
Doctoral Thesis
THIRD-GENERATION SEQUENCING AND ASSEMBLY OF THE BARN SWALLOW GENOME AND A STUDY ON THE EVOLUTION OF THE HUNTINGTIN GENE / G.p. Formenti ; tutor e coordinatore: N. M. Francesco Saino ; co-tutor: E. Cattaneo. DIPARTIMENTO DI SCIENZE E POLITICHE AMBIENTALI, 2019 Feb 05. 31. ciclo, Anno Accademico 2018. [10.13130/formenti-giulio-paolo_phd2019-02-05].
File in questo prodotto:
File Dimensione Formato  
phd_unimi_R11215.pdf

Open Access dal 24/06/2020

Tipologia: Tesi di dottorato completa
Dimensione 10.8 MB
Formato Adobe PDF
10.8 MB Adobe PDF Visualizza/Apri
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2434/611650
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact