GENOMIC EPIDEMIOLOGY OF THE MAIN VARIANTS OF SARS-CoV-2 CIRCULATING IN ITALY BEFORE AND DURING THE OMICRON ERA Introduction Since the beginning of the pandemic, SARS-CoV-2 has shown a great genomic variability, resulting in the continuous emergence of new variants that has made their global monitoring and study a priority. Phylodynamic analysis, which is based on the combination of evolutionary, epidemiological, and immunological characteristics influencing the shape of a viral phylogeny, has become an increasingly important tool in the molecular surveillance of infections, particularly those due to emerging viruses. The COVID-19 pandemic presented a unique opportunity to apply such analyses to molecular surveillance due to the availability of SARS-CoV-2 genomic sequences since early January 2020, which has reached the impressive number of more than 16 million complete genomic sequences stored in one of the most widely used databases (GISAID, https://gisaid.org/, as of September 2023). These kinds of approaches together with genomic epidemiology have made it possible to trace international pandemic flows, to reconstruct outbreaks and transmission networks, to estimate transmissibility and identify variants with higher transmissibility or immune escape capacity. This thesis aimed to study the genomic heterogeneity, the temporal origin, the rate of viral evolution and the population dynamics of the main variants circulating in Italy before and during Omicron era, such as 20E.EU1, Alpha, Delta variants and Omicron BA.1, BA.2 and BA.5, respectively. Material and methods The nasopharyngeal swabs of COVID-19 patients attending more than 70 clinical centers distributed throughout whole Italy, participating in the collaborative SCIRE group, were collected and processed between August 2020 and December 2022. For the study of the main variants circulating between summer 2020 and winter 2021 (clade 20E.EU1, and Alpha and Delta variants), 3 national and 3 international datasets were set up using a total of 609 whole genomes characterized by our collaborative network (20E.EU1, n= 269; Alpha, n=164; Delta, n= 176). To study the major lineages of the Omicron variant circulating during 2022 (BA.1, BA.2 and BA.5), 3 national datasets were set up using 1,658 whole genomes characterized by the collaborative network (BA.1, n= 268; BA.2, n=677; BA.5, n=713). Italian genomes obtained from GISAID (https://gisaid.org) with the same variant and collection period were added to the 6 Italian datasets. International genomes with the same variant and collection period were added to the 3 international datasets. Mutational analyses were conducted only on the Italian datasets. Phylogenetic analyses were conducted on 3 international datasets for 20E.EU1, Alpha and Delta variants and 3 national datasets for Omicron BA.1, BA.2 and BA.5. The maximum likelihood (ML) trees were estimated using IQ-TREE v. 1.6.12 (http://www.iqtree.org/). The GTR + F + R4 (general time reversible + empirical base frequencies + four number of categories) model, selected by the program, was used for clade 20E.EU1, Alpha and Delta variants; the GTR+F+R3 (general time reversible + empirical base frequencies + three number of categories) for Omicron BA.1 and BA.2, and GTR+F+R6 (general time reversible + empirical base Frequencies + six number of categories) for Omicron BA.5. 1,000 parametric bootstrap replicates were performed to support the nodes (≥60% bootstrap support). The statistically significant clusters (including more than two sequences) were identified in the ML tree by Cluster Picker v.1.2.3. Cluster Matcher v.1.2.23 program was used to selected clusters containing at least one Italian sequence, classified as mixed (M) those that included more than one Italian isolate and non-Italian sequences, as pure Italian (IT) those that included only Italian isolates, and as singleton (S) those clusters containing only one Italian isolate together with non-Italian sequences. To characterize the epidemiological and evolutionary history of the different SARS-CoV-2 variants/lineages in Italy during 2020 and 2021, for each dataset only clusters including at least 70% of Italian sequences, having sufficient size for the analysis (>10 sequences) were considered by using the coalescent Bayesian Skyline Plot and the birth-death models (BEAST v. 2.7 program). Italian clusters characterized by 10 or more isolates were selected for analysis of the Omicron variants. Results For variants circulating before Omicron era, characteristic mutations in the ancestral B.1 and B.1.1 lineages were observed in addition to mutations specific to the variant. In about 1/3 of the B.1 lineage sequences two additional mutations were found in the S gene (A262V and P272L), while for the Delta variant, many additional mutations were shown, especially in the ORF1a region. The international datasets showed 26 (23% Italians, 23% singleton, 54% mixed), 40 (60% mixed, 37.5% Italians, 1 singleton) and 42 (85.7% mixed, 9.5% singleton, 4.8% Italians) clusters with at least one Italian sequence, in 20E.EU1 clade, Alpha and Delta variants, respectively. These clusters presented a tMRCAs (Time of the most recent common ancestor) between June 2020-September 2020, November 2020-February 2021 and March 2021-July 2021 for 20E.EU1 clade, Alpha and Delta variants, respectively. In all the lineage/variant, the earliest clusters were the largest in size and the most persistent in time and frequently mixed. Isolates from the major Italian Islands tended to segregate in clusters more frequently than those from other part of Italy. The estimation of evolutionary rate gave a mean of 3.8×10−4 s/s/y (95%HPD: 3.35×10−4–4.39×10-4), 4.87×10-4 s/s/y (95%HPD: 4.18×10-4–5.54×10-4) and 7.57×10-4 s/s/y (95%HPD: 6.65×10-4–8.48×10-4) for 20E.EU1 clade, and Alpha and Delta variant, respectively. The skyline plot of 20E.EU1 dataset showed a rapid increase of the effective number of infections (Ne) during July 2020, followed by a second increase in October 2020, when the curve reached the plateau. The decrease of infections started in February 2021 reaching the lowest values in April 2021. Consistent with the dynamics of infection, estimates of Re showed a value greater than 1 from the beginning of the circulation of 20E, reaching a peak around July 2020 (Re=1.12). The Re value started to decline in autumn 2020, falling around 1 between December 2020–January 2021 and decreasing below 1 in April 2021. The effective number of infections due to Alpha variant rapidly increased in December 2020, reaching a plateau during the winter 2021. A decrease in infections was observed from April 2021. Similarly, a value of Re above 1 was observed since the origin of the Alpha epidemic in the autumn 2020, with a peak around a value of 1.13 in winter 2020–2021. A decrease in the curve was observed between March and April 2021 reaching values below the unit in May 2021. A slow rise in spring 2021 was observed for Delta variant, followed by a rapid increase of infections in July 2021. From August the number of infections reached a plateau that persisted until the end of the year 2021. Similarly, the Re value shows a growth above 1 in spring 2021, reaching a value of 1.17, followed by a gradual decrease to a value below 1 in the summer of the same year. The Omicron variant datasets under investigation were characterized by numerous sub-lineages with a frequency greater than 1%, with a high proportion of BA.1.17.2 and BA.1.1 sub-lineages for BA.1 variant, BA.2.9 sub-lineage for Omicron BA.2, and BA.5.2 and BA.5.2.1 sub-lineages in the BA.5 dataset. A high number of sub-lineages with a frequency of less than 1% was also shown for each variant. All these variants showed the presence of a high number of variant-specific mutations and deletions in more than 10 percent of the isolates. Bayesian analyses conducted on 12 (Omicron BA.1), 7 (Omicron BA.2) and 10 (Omicron BA.5) Italian clusters showed a tMRCA between September 2021-November 2021, November 2021-January 2022 and October 2021-May 2022 for BA.1, BA.2 and BA.5 variants, respectively. Overall, clusters of BA.1 and BA.2 variants were characterized by a high number of isolates from northern Italy. Differently, 60% of the clusters of BA.5 variant were characterized by isolates from every part of Italy. The estimation of evolutionary rate gave a mean of 4.84x10-4 s/s/y (95%HPD:3.76x10-4 – 5.98x10-4), 3.99x10-4 s/s/y (95%HPD: 2.70x10-4 -5.33 x10-4), and 4.56x10-4 s/s/y (95%HPD: 3.72x10-4-5.44x10-4) for BA.1, BA.2 and BA.5 Omicron variants, respectively. The skyline plot analyses of BA.1 variants showed an initial, gradual increase in the number of infections between September and November 2021 followed by a further increase in December 2021 and January 2022 when the number of cases stabilized. Since the beginning of the epidemic, the value of Re remained around 1, increasing from September 2021 until reaching a peak of 1.45 in October. From January 2022 a decrease and stabilization of this value around unity was observed. For Omicron BA.2, an exponential increase in the number of infections was observed between January and February 2022 following which there was a plateau phase. There was an increase in the value of Re reaching the peak of 1.42 in January-February 2022. A first decrease was recorded between February and March 2022, followed by a further decrease between May and July 2022. This value remained stable below unity until the end of August 2022. An initial increase in the number of infections for variant BA.5 was observed in January 2021, and a second exponential increase was recorded in May and June 2022. Following the achievement of plateau phase in August 2022, the number of cases decreased between October and November of that year. Re values remained above unity since the beginning of the epidemic, with peak reached in May 2022 (Re=1.28). A slight decrease was found in July 2022 to stabilize around unity starting in September. Conclusions The analyzed genomes of SARS-CoV-2 were included in over 100 clusters with a high frequency of mixed clusters (around 70%), which included strains circulating in different regions of the world, suggesting multiple introductions of these lineages/variants in Italy, probably due to international travels. The fact that the earliest clusters were the largest in size, the most persistent, and most frequently mixed could be related to pandemic containment measures, such as travel restrictions, which were relaxed during the summer 2020 but were then reintroduced in autumn, with the arrival of variants of concern, and gradually relaxed from May 2021. Pure Italian clusters, suggesting a local circulation of the virus, were more prevalent during periods in which restrictive measures were in place while, with the easing of containment measures, they have become increasingly less frequent. This could also be the reason why pure Italian clusters were observed with higher frequency for 20E.EU1 and Alpha variant, compared to Delta variant (<5% of all observed clusters) which circulated in Italy only later, when the restriction measures were largely relaxed. Since the beginning of 2022, the Omicron variant has rapidly spread around the world, becoming the dominant variant to date with its derived sub-lineages. Rapidly the Omicron variant evolved giving rise to a succession of numerous different lineages and sub-lineages, with peculiar mutational profiles. This heterogeneity was also found in the analyzed datasets, with different sub-lineages with a frequency greater than 1%. Furthermore, analyzing the mutational profile of the BA.1, BA.2 and BA.5 variants, we found a high number of mutations, mainly located in the S gene. Phenomena of neutral evolution, such as the "founder effect", have made it possible to identify transmission clusters only at the national level, unlike an international context where, due to the high degree of evolutionary correlation between strains of a single variant, it is difficult to identify clusters of more related sequences that share a recent common ancestor. Higher Re values were observed for these variants than those estimated for the 20E.EU1, Alpha, and Delta variants, confirming the higher transmissibility of Omicron. Our estimates are like the official ones, except for the peak in January 2022, probably corresponding to the simultaneous circulation of Delta and Omicron variants. Globally, the study of infection dynamics showed a positive correlation between the trend in the effective number of infections estimated by Bayesian Skyline Plot model and the Re curves estimated by birth-death skyline plot. The estimated evolutionary rates had values around 4x10-4 s/s/y for all the variants under study with the only exception of the Delta variant, whose value is double; this could be due to a longer persistence of this variant of almost one year and the generation of numerous derived lineages. Overall, these data provided an accurate description of the dynamics’ epidemiology in our country over a broad period from August 2020 to December 2022, characterized by a succession of different viral variants.

EPIDEMIOLOGIA GENOMICA DELLE PRINCIPALI VARIANTI DI SARS-COV-2 CIRCOLANTI IN ITALIA PRIMA E DURANTE L'ERA OMICRON / A. Bergna ; coordinatore: C. La Vecchia ; tutor: G. Zehender ; revisore esterno: M. Ciccozzi, F. Ceccherini-Silberstein.. Dipartimento di Scienze Biomediche e Cliniche, 2023. 36. ciclo, Anno Accademico 2023.

EPIDEMIOLOGIA GENOMICA DELLE PRINCIPALI VARIANTI DI SARS-COV-2 CIRCOLANTI IN ITALIA PRIMA E DURANTE L'ERA OMICRON

A. Bergna
2024

Abstract

GENOMIC EPIDEMIOLOGY OF THE MAIN VARIANTS OF SARS-CoV-2 CIRCULATING IN ITALY BEFORE AND DURING THE OMICRON ERA Introduction Since the beginning of the pandemic, SARS-CoV-2 has shown a great genomic variability, resulting in the continuous emergence of new variants that has made their global monitoring and study a priority. Phylodynamic analysis, which is based on the combination of evolutionary, epidemiological, and immunological characteristics influencing the shape of a viral phylogeny, has become an increasingly important tool in the molecular surveillance of infections, particularly those due to emerging viruses. The COVID-19 pandemic presented a unique opportunity to apply such analyses to molecular surveillance due to the availability of SARS-CoV-2 genomic sequences since early January 2020, which has reached the impressive number of more than 16 million complete genomic sequences stored in one of the most widely used databases (GISAID, https://gisaid.org/, as of September 2023). These kinds of approaches together with genomic epidemiology have made it possible to trace international pandemic flows, to reconstruct outbreaks and transmission networks, to estimate transmissibility and identify variants with higher transmissibility or immune escape capacity. This thesis aimed to study the genomic heterogeneity, the temporal origin, the rate of viral evolution and the population dynamics of the main variants circulating in Italy before and during Omicron era, such as 20E.EU1, Alpha, Delta variants and Omicron BA.1, BA.2 and BA.5, respectively. Material and methods The nasopharyngeal swabs of COVID-19 patients attending more than 70 clinical centers distributed throughout whole Italy, participating in the collaborative SCIRE group, were collected and processed between August 2020 and December 2022. For the study of the main variants circulating between summer 2020 and winter 2021 (clade 20E.EU1, and Alpha and Delta variants), 3 national and 3 international datasets were set up using a total of 609 whole genomes characterized by our collaborative network (20E.EU1, n= 269; Alpha, n=164; Delta, n= 176). To study the major lineages of the Omicron variant circulating during 2022 (BA.1, BA.2 and BA.5), 3 national datasets were set up using 1,658 whole genomes characterized by the collaborative network (BA.1, n= 268; BA.2, n=677; BA.5, n=713). Italian genomes obtained from GISAID (https://gisaid.org) with the same variant and collection period were added to the 6 Italian datasets. International genomes with the same variant and collection period were added to the 3 international datasets. Mutational analyses were conducted only on the Italian datasets. Phylogenetic analyses were conducted on 3 international datasets for 20E.EU1, Alpha and Delta variants and 3 national datasets for Omicron BA.1, BA.2 and BA.5. The maximum likelihood (ML) trees were estimated using IQ-TREE v. 1.6.12 (http://www.iqtree.org/). The GTR + F + R4 (general time reversible + empirical base frequencies + four number of categories) model, selected by the program, was used for clade 20E.EU1, Alpha and Delta variants; the GTR+F+R3 (general time reversible + empirical base frequencies + three number of categories) for Omicron BA.1 and BA.2, and GTR+F+R6 (general time reversible + empirical base Frequencies + six number of categories) for Omicron BA.5. 1,000 parametric bootstrap replicates were performed to support the nodes (≥60% bootstrap support). The statistically significant clusters (including more than two sequences) were identified in the ML tree by Cluster Picker v.1.2.3. Cluster Matcher v.1.2.23 program was used to selected clusters containing at least one Italian sequence, classified as mixed (M) those that included more than one Italian isolate and non-Italian sequences, as pure Italian (IT) those that included only Italian isolates, and as singleton (S) those clusters containing only one Italian isolate together with non-Italian sequences. To characterize the epidemiological and evolutionary history of the different SARS-CoV-2 variants/lineages in Italy during 2020 and 2021, for each dataset only clusters including at least 70% of Italian sequences, having sufficient size for the analysis (>10 sequences) were considered by using the coalescent Bayesian Skyline Plot and the birth-death models (BEAST v. 2.7 program). Italian clusters characterized by 10 or more isolates were selected for analysis of the Omicron variants. Results For variants circulating before Omicron era, characteristic mutations in the ancestral B.1 and B.1.1 lineages were observed in addition to mutations specific to the variant. In about 1/3 of the B.1 lineage sequences two additional mutations were found in the S gene (A262V and P272L), while for the Delta variant, many additional mutations were shown, especially in the ORF1a region. The international datasets showed 26 (23% Italians, 23% singleton, 54% mixed), 40 (60% mixed, 37.5% Italians, 1 singleton) and 42 (85.7% mixed, 9.5% singleton, 4.8% Italians) clusters with at least one Italian sequence, in 20E.EU1 clade, Alpha and Delta variants, respectively. These clusters presented a tMRCAs (Time of the most recent common ancestor) between June 2020-September 2020, November 2020-February 2021 and March 2021-July 2021 for 20E.EU1 clade, Alpha and Delta variants, respectively. In all the lineage/variant, the earliest clusters were the largest in size and the most persistent in time and frequently mixed. Isolates from the major Italian Islands tended to segregate in clusters more frequently than those from other part of Italy. The estimation of evolutionary rate gave a mean of 3.8×10−4 s/s/y (95%HPD: 3.35×10−4–4.39×10-4), 4.87×10-4 s/s/y (95%HPD: 4.18×10-4–5.54×10-4) and 7.57×10-4 s/s/y (95%HPD: 6.65×10-4–8.48×10-4) for 20E.EU1 clade, and Alpha and Delta variant, respectively. The skyline plot of 20E.EU1 dataset showed a rapid increase of the effective number of infections (Ne) during July 2020, followed by a second increase in October 2020, when the curve reached the plateau. The decrease of infections started in February 2021 reaching the lowest values in April 2021. Consistent with the dynamics of infection, estimates of Re showed a value greater than 1 from the beginning of the circulation of 20E, reaching a peak around July 2020 (Re=1.12). The Re value started to decline in autumn 2020, falling around 1 between December 2020–January 2021 and decreasing below 1 in April 2021. The effective number of infections due to Alpha variant rapidly increased in December 2020, reaching a plateau during the winter 2021. A decrease in infections was observed from April 2021. Similarly, a value of Re above 1 was observed since the origin of the Alpha epidemic in the autumn 2020, with a peak around a value of 1.13 in winter 2020–2021. A decrease in the curve was observed between March and April 2021 reaching values below the unit in May 2021. A slow rise in spring 2021 was observed for Delta variant, followed by a rapid increase of infections in July 2021. From August the number of infections reached a plateau that persisted until the end of the year 2021. Similarly, the Re value shows a growth above 1 in spring 2021, reaching a value of 1.17, followed by a gradual decrease to a value below 1 in the summer of the same year. The Omicron variant datasets under investigation were characterized by numerous sub-lineages with a frequency greater than 1%, with a high proportion of BA.1.17.2 and BA.1.1 sub-lineages for BA.1 variant, BA.2.9 sub-lineage for Omicron BA.2, and BA.5.2 and BA.5.2.1 sub-lineages in the BA.5 dataset. A high number of sub-lineages with a frequency of less than 1% was also shown for each variant. All these variants showed the presence of a high number of variant-specific mutations and deletions in more than 10 percent of the isolates. Bayesian analyses conducted on 12 (Omicron BA.1), 7 (Omicron BA.2) and 10 (Omicron BA.5) Italian clusters showed a tMRCA between September 2021-November 2021, November 2021-January 2022 and October 2021-May 2022 for BA.1, BA.2 and BA.5 variants, respectively. Overall, clusters of BA.1 and BA.2 variants were characterized by a high number of isolates from northern Italy. Differently, 60% of the clusters of BA.5 variant were characterized by isolates from every part of Italy. The estimation of evolutionary rate gave a mean of 4.84x10-4 s/s/y (95%HPD:3.76x10-4 – 5.98x10-4), 3.99x10-4 s/s/y (95%HPD: 2.70x10-4 -5.33 x10-4), and 4.56x10-4 s/s/y (95%HPD: 3.72x10-4-5.44x10-4) for BA.1, BA.2 and BA.5 Omicron variants, respectively. The skyline plot analyses of BA.1 variants showed an initial, gradual increase in the number of infections between September and November 2021 followed by a further increase in December 2021 and January 2022 when the number of cases stabilized. Since the beginning of the epidemic, the value of Re remained around 1, increasing from September 2021 until reaching a peak of 1.45 in October. From January 2022 a decrease and stabilization of this value around unity was observed. For Omicron BA.2, an exponential increase in the number of infections was observed between January and February 2022 following which there was a plateau phase. There was an increase in the value of Re reaching the peak of 1.42 in January-February 2022. A first decrease was recorded between February and March 2022, followed by a further decrease between May and July 2022. This value remained stable below unity until the end of August 2022. An initial increase in the number of infections for variant BA.5 was observed in January 2021, and a second exponential increase was recorded in May and June 2022. Following the achievement of plateau phase in August 2022, the number of cases decreased between October and November of that year. Re values remained above unity since the beginning of the epidemic, with peak reached in May 2022 (Re=1.28). A slight decrease was found in July 2022 to stabilize around unity starting in September. Conclusions The analyzed genomes of SARS-CoV-2 were included in over 100 clusters with a high frequency of mixed clusters (around 70%), which included strains circulating in different regions of the world, suggesting multiple introductions of these lineages/variants in Italy, probably due to international travels. The fact that the earliest clusters were the largest in size, the most persistent, and most frequently mixed could be related to pandemic containment measures, such as travel restrictions, which were relaxed during the summer 2020 but were then reintroduced in autumn, with the arrival of variants of concern, and gradually relaxed from May 2021. Pure Italian clusters, suggesting a local circulation of the virus, were more prevalent during periods in which restrictive measures were in place while, with the easing of containment measures, they have become increasingly less frequent. This could also be the reason why pure Italian clusters were observed with higher frequency for 20E.EU1 and Alpha variant, compared to Delta variant (<5% of all observed clusters) which circulated in Italy only later, when the restriction measures were largely relaxed. Since the beginning of 2022, the Omicron variant has rapidly spread around the world, becoming the dominant variant to date with its derived sub-lineages. Rapidly the Omicron variant evolved giving rise to a succession of numerous different lineages and sub-lineages, with peculiar mutational profiles. This heterogeneity was also found in the analyzed datasets, with different sub-lineages with a frequency greater than 1%. Furthermore, analyzing the mutational profile of the BA.1, BA.2 and BA.5 variants, we found a high number of mutations, mainly located in the S gene. Phenomena of neutral evolution, such as the "founder effect", have made it possible to identify transmission clusters only at the national level, unlike an international context where, due to the high degree of evolutionary correlation between strains of a single variant, it is difficult to identify clusters of more related sequences that share a recent common ancestor. Higher Re values were observed for these variants than those estimated for the 20E.EU1, Alpha, and Delta variants, confirming the higher transmissibility of Omicron. Our estimates are like the official ones, except for the peak in January 2022, probably corresponding to the simultaneous circulation of Delta and Omicron variants. Globally, the study of infection dynamics showed a positive correlation between the trend in the effective number of infections estimated by Bayesian Skyline Plot model and the Re curves estimated by birth-death skyline plot. The estimated evolutionary rates had values around 4x10-4 s/s/y for all the variants under study with the only exception of the Delta variant, whose value is double; this could be due to a longer persistence of this variant of almost one year and the generation of numerous derived lineages. Overall, these data provided an accurate description of the dynamics’ epidemiology in our country over a broad period from August 2020 to December 2022, characterized by a succession of different viral variants.
9-gen-2024
Genomic epidemiology; SARS-CoV-2; Omicron variants
ZEHENDER, GIANGUGLIELMO
LA VECCHIA, CARLO VITANTONIO BATTISTA
Doctoral Thesis
EPIDEMIOLOGIA GENOMICA DELLE PRINCIPALI VARIANTI DI SARS-COV-2 CIRCOLANTI IN ITALIA PRIMA E DURANTE L'ERA OMICRON / A. Bergna ; coordinatore: C. La Vecchia ; tutor: G. Zehender ; revisore esterno: M. Ciccozzi, F. Ceccherini-Silberstein.. Dipartimento di Scienze Biomediche e Cliniche, 2023. 36. ciclo, Anno Accademico 2023.
File in questo prodotto:
File Dimensione Formato  
phd_unimi_R13013.pdf

accesso aperto

Descrizione: PhD thesis on the genomic epidemiology of major variants circulating in Italy before and during the Omicron era
Tipologia: Altro
Dimensione 4.09 MB
Formato Adobe PDF
4.09 MB Adobe PDF Visualizza/Apri
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2434/1022991
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact