Antigenic variation of SARS‐CoV‐2 in response to immune pressure

Abstract Analysis of the bat viruses most closely related to SARS‐CoV‐2 indicated that the virus probably required limited adaptation to spread in humans. Nonetheless, since its introduction in human populations, SARS‐CoV‐2 must have been subject to the selective pressure imposed by the human immune system. We exploited the availability of a large number of high‐quality SARS‐CoV‐2 genomes, as well as of validated epitope predictions, to show that B cell epitopes in the spike glycoprotein (S) and in the nucleocapsid protein (N) have higher diversity than nonepitope positions. Similar results were obtained for other human coronaviruses and for sarbecoviruses sampled in bats. Conversely, in the SARS‐CoV‐2 population, epitopes for CD4+ and CD8+ T cells were not more variable than nonepitope positions. A significant reduction in epitope variability was instead observed for some of the most immunogenic proteins (S, N, ORF8 and ORF3a). Analysis over longer evolutionary time frames indicated that this effect is not due to differential constraints. These data indicate that SARS‐CoV‐2 evolves to elude the host humoral immune response, whereas recognition by T cells is not actively avoided by the virus. However, we also found a trend of lower diversity of T cell epitopes for common cold coronaviruses, indicating that epitope conservation per se is not directly linked to disease severity. We suggest that conservation serves to maintain epitopes that elicit tolerizing T cell responses or induce T cells with regulatory activity.

Sustained human-to-human transmission had led to global spread of the virus, which has now resulted in an unprecedented global sanitary crisis. Although the majority of COVID-19 cases are relatively mild, a significant proportion of patients develop a serious, often fatal illness, characterized by acute respiratory distress syndrome (Wu & McGoogan, 2020). Both viral-induced lung pathology and overactive immune responses are thought to contribute to this disease severity (St John & Rathore, 2020;Vabret et al., 2020).
Ample evidence suggests that coronaviruses can easily cross species barriers and have high zoonotic potential. Indeed, seven coronaviruses are known to infect humans and all of them originated in animals (Cui et al., 2019;Forni et al., 2017;Ye et al., 2020).
Among these, HCoV-OC43, HCoV-HKU1, HCoV-NL63, and HCoV-229E have been circulating for decades in human populations and usually cause limited disease (Bucknall et al., 1972;Forni et al., 2017;Woo et al., 2005). They are thus referred to as "common cold" coronaviruses. Conversely, MERS-CoV and SARS-CoV, whose emergence in the 2000s preceded that of SARS-CoV-2, can cause serious illness and respiratory distress syndrome in a non-negligible proportion of infected individuals (Petrosillo et al., 2020). Like all coronaviruses, these human-infecting viruses have positive-sense, single stranded RNA genomes. Two-thirds of the coronavirus genome are occupied by two large overlapping open reading frames (ORF1a and ORF1b) that are translated into polyproteins. These latter are processed to generate 16 nonstructural proteins (nsp1 to nsp16). The remaining portion of the genome includes ORFs for the structural proteins (spike, envelope, membrane and nucleocapsid) and a variable number of accessory proteins (Cui et al., 2019;Forni et al., 2017).
Analyses of the bat viruses most closely related to SARS-CoV-2 have indicated that, in analogy to SARS-CoV, the virus probably required limited adaptation to gain the ability to infect and spread in humas (Boni et al., 2020;Cagliani et al., 2020).

Nonetheless, since its introduction in human populations, SARS-
CoV-2 must have been subject to the selective pressure imposed by the human immune system. In fact, as with most other viruses, data from COVID-19, SARS and MERS patients indicate that both B and T lymphocytes play a role in controlling infection (Channappanavar et al., 2014;St John & Rathore, 2020;Vabret et al., 2020).
Recent efforts predicted B cell and T cell epitopes in SARS-CoV-2 proteins (Grifoni, Sidney, et al., 2020) and validated such predictions using sera/lymphocytes from convalescent COVID-19 patients (Grifoni, Weiskopf, et al., 2020). These studies, as well as others (Farrera-Soler et al., 2020;Peng et al., 2020;Poh et al., 2020), revealed that the cell-mediated responses against SARS-CoV-2 are not restricted to the nucleocapsid (N) and spike (S) proteins, but rather target both structural and nonstructural viral products. In parallel, analyses of B cell responses in SARS-CoV-2-infected patients showed that the S and N proteins are the major targets of the antibody response and identified specific B cell epitopes in the S protein (Farrera-Soler et al., 2020;Jiang et al., 2020;Poh et al., 2020). We exploited this growing wealth of information to investigate whether, after a few months of sustained transmission, the selective pressure exerted by the human adaptive immune response is already detectable in the SARS-CoV-2 population.
Experimentally identified CD4 + and/or CD8 + T cell epitopes in S, N, M, ORF3a and ORF7a were retrieved from Peng et al. (2020).
Epitopes were defined as being recognized by CD4 + or CD8 + T cells following indications in the original publication .
When this information was not available, epitopes were only included in the overall analysis of T cell epitopes. Experimental B cell epitopes were obtained from two studies that systematically mapped antibody responses against the S protein (Farrera-Soler et al., 2020; Poh et al., 2020).
All protein sequences were retrieved and several filters were applied. Only complete genomes flagged as "high coverage only" and "human" were selected. Positions recommended to be masked by DeMaio and coworkers (https://virol ogical.org/t/maski ng-strat egies -for-sars-cov-2-align ments/ 480, last accessed June 5, 2020) were also removed.
Finally, for each SARS-CoV-2 protein, we selected only strains that had the same length as the protein in the SARS-CoV-2 reference strain (NC_045512), generating a set of at least 23,625 sequences for each ORF. Proteins with <60 amino acids were excluded from the analyses.
The list of GISAID IDs along with the list of laboratories which generated the data is provided in Table S1.
For all the other human coronaviruses, as well as for a set of nonhuman infecting sarbecoviruses, sequences of either complete genomes or single ORFs (i.e., nucleocapsid and spike protein) were retrieved from the National Center for Biotechnology Information database (NCBI, http://www.ncbi.nlm.nih.gov/). For all human coronaviruses, the only filter we applied was host identification as "human". SARS-CoV strains sampled during the second outbreak were excluded from the analyses. NCBI ID identifiers are listed in Tables S2 and S3. Alignments were generated using mafft (Katoh & Standley, 2013).

| Protein variability and statistical analysis
Variability at each amino acid position was estimated using the Shannon's entropy (H) index using the Shannon Entropy-One tool from the HIV database (https://www.hiv.lanl.gov/conte nt/index), with ambiguous character (e.g., gaps) excluded from the analysis. Most positions of analysed viruses are invariable along the alignments, so the distribution of H is zero-inflated. We thus calculated statistical significance by permutations. For each protein, the predicted epitope intervals were collapsed to a single position while nonepitope intervals were left unchanged. After randomly shuffling this collapsed sequence it was expanded back to full length and the difference between shuffled epitope and nonepitope H values was calculated. This procedure was repeated 1,000 times and the proportion of repetitions showing a difference more extreme than D was reported as the p-value. An in-house R script was written and is available as Appendix S1.

| Antigenic variability of SARS-CoV-2 proteins
To analyse B cell epitope diversity in SARS-CoV-2, we randomly selected 10,000 high-quality viral genomes from those available in the GISAID database (as of June 5, 2020) (Elbe & Buckland-Merrett, 2017). Potential epitopes were predicted using IEDB tools, as previously described (Grifoni, Sidney, et al., 2020). Specifically, because they are the major targets of the humoral immune response (Channappanavar et al., 2014;St John & Rathore, 2020;Vabret et al., 2020), we predicted both linear and conformational B epitopes for the S and N proteins, whereas only linear epitopes were predicted for the other viral proteins (Table S4) Variability at each amino acid site of the proteins encoded by SARS-CoV-2 was quantified using Shannon's entropy (H). Specifically, only predicted proteins longer than 60 amino acids were analysed.
Because most positions in SARS-CoV-2 genomes are invariable across the sampled genomes, the distribution of H is zero-inflated, making the use of conventional statistical tests inappropriate (McElduff et al., 2010). We thus calculated statistical significance by permutations, that is by reshuffling epitope positions as amino acid stretches of the same size as the predicted epitopes. This approach also has the advantage of accounting for the possibility that, as a result of locally varying selective constraints, H is not independent among continuous protein positions.
F I G U R E 1 Amino acid variability of the SARS-CoV-2 spike protein. Shannon's entropy (H) values for each amino acid position calculated using 10,000 SARS-CoV-2 spike proteins are shown. B cell predicted epitopes and T cell predicted epitopes are also reported in blue and green, respectively. B cell epitopes identified in the sera of COVID-19 patients (Farrera-Soler et al., 2020;Poh et al., 2020) are also reported in red Using this methodology, we found that, for the N and nsp16 proteins, positions mapping to predicted B cell linear epitopes are significantly more variable than those not mapping to these epitopes. A higher diversity of B cell epitopes was also observed for S, although it did not reach statistical significance ( Figure 2). However, the H distribution for the spike protein includes a clear outlier represented by position 614 (Figure 1). Recent studies have indicated that the D614G variant, which is now prevalent worldwide, enhances viral infectivity Korber et al., 2020;Plante et al., 2020;Yurkovetskiy et al., 2020;Zhang, Jackson, et al., 2020).
Hence, the frequency increase of D614 is unlikely to be primarily related to immune evasion. Moreover, the modulation of resistance to antibodies is thought to be mediated by a change in the exposure of neutralizing epitopes in the receptor-binding domain, rather than by the creation/destruction of an epitope by D614G itself (Plante et al., 2020;Weissman et al., 2020). We thus repeated the analyses after excluding position 614 and we observed that predicted B cell linear epitopes in the spike protein are significantly more variable than nonepitope positions ( Figure 2). The same analysis for B cell conformational epitopes in the N and S proteins indicated a similar trend, although statistical significance was not reached (data not shown). This is probably due to the small number of positions in these epitopes. Overall, these data fit very well with the observation that most humoral immune responses against SARS-CoV-2 and other human coronaviruses are directed against the S and N proteins (Farrera-Soler et al., 2020;Jiang et al., 2020;Poh et al., 2020). These results also support the idea that the selective pressure exerted by the human antibody response is already detectable in the SARS-CoV-2 population.
We next assessed whether epitopes for cell-mediated immune responses are also more variable than nonepitope positions. We F I G U R E 2 Variability of epitope and nonepitope positions among SARS-CoV-2 proteins. Shannon's entropy (H) mean values along with standard errors are shown for all SARS-CoV-2 proteins longer than 60 residues. Epitope positions are shown in dark grey and nonepitopes in light grey. Significant comparisons, calculated by a permutation approach, are indicated with asterisks (*p < .05; **p < .01; ***p < .001). Immunogenic proteins are shown in blue and the length of each protein is reported in the bottom panel thus retrieved predicted CD4 + and CD8 + T cell epitopes from Grifoni, Sidney, et al. (2020). These epitope predictions were shown to be reliable, as they capture a significant proportion of T cell responses in convalescent COVID-19 patients (Grifoni, Weiskopf, et al., 2020).
Analysis of entropy values indicated that CD4 + T cell epitopes are significantly less variable than nonepitope positions for the N and nsp16 proteins (Figure 2). A similar trend was observed for ORF8, E and S, although significance was not reached. Reduced variability was also observed for CD8 + T cell epitopes for the N protein, as well as for ORF3a. Higher variability in epitope positions was observed for nsp8 and nsp14 for CD4 + T cells alone (Figure 2). Because several epitopes for T cells comap with B cell epitopes, which tend to show higher diversity, we compared positions within CD4 + or CD8 + T cell epitopes only (not overlapping with B cell epitopes) with positions not mapping to any of these epitopes. A significant reduction of variability was observed for S, N, ORF8, nsp15 and nsp16, whereas higher diversity was still evident for nsp8 ( Figure 2).
Overall, these data indicate that T cell epitopes in the most immunogenic SARS-CoV-2 proteins (S, N, ORF3a and ORF8; Grifoni, Weiskopf, et al., 2020;Peng et al., 2020) tend to be more conserved than nonepitopes. However, this was not the case for other proteins targeted by T cell responses, namely M, ORF7a, nsp3, nsp4, and nsp6. Qualitatively similar results were obtained when a set of recently described experimental CD4 + and/or CD8 + T cell epitopes in S, N, M, ORF3a and ORF7a were analysed ; Figure S1). The lack of statistical significance in some of these comparisons is probably due to the low number of epitope positions.
Clearly, protein sequence variability is strongly influenced by functional and structural constraints. We thus reasoned that if the observations reported above were secondary to the incidental colocalization of T cell epitopes with more constrained regions, a similar pattern should be observed for H values calculated on an alignment of proteins from other sarbecoviruses. In fact, all these viruses, except SARS-CoV, were sampled from bats. Thus, whereas structural/functional constraints are expected to be maintained across long evolutionary time frames, the pressure exerted by the human cell-mediated immune response is not, given that (in different species) antigen processing within host cells results in the F I G U R E 3 Variability of epitope and nonepitope positions among sarbecoviruses. Shannon's entropy (H) mean values along with standard errors are shown for a set of sarbecovirus ORFs. SARS-CoV-2 epitope positions are shown in dark grey and nonepitopes in light grey. Significant comparisons, calculated by a permutation approach, are indicated with asterisks (*p < .05; ** p < .01; *** p < .001) preferential presentation of diverse viral epitopes to T lymphocytes depending on the MHC gene repertoire and on distinct preferences of the antigen processing pathway (Abduriyim et al., 2019;Burgevin et al., 2008;Hammer et al., 2007;Lu et al., 2019;Wynne et al., 2016).
Conversely, epitopes for antibodies tend to be conserved across species (Tse et al., 2017;Wiehe et al., 2014) and consequently the selective pressure acting on these positions is expected to be constant across time and hosts.
We thus aligned the SARS-CoV-2 reference sequences of proteins showing decreased or increased variability in T cell epitopes with those of 45 sarbecoviruses. Calculation of H indicated a significant difference only for CD4 + T cell epitopes in the N protein.
Conversely, B cell epitopes were more variable than nonepitope positions for the S, N and nsp16 proteins (Figure 3). Overall, these results indicate that the variability within SARS-CoV-2 T cell epitopes is not driven primarily by functional/structural constraints, but probably results from the interaction with the human adaptive immune response.

| Comparison with other human coronaviruses
Given the results above we set out to determine whether the other human coronaviruses show the same tendency of reduced and increased variability at T cell and B cell epitopes, respectively. For these viruses, analyses were restricted to the N and S proteins, as F I G U R E 4 Variability of epitope and nonepitope positions among human coronaviruses. Shannon's entropy (H) mean values along with standard errors are shown for human coronavirus spike and nucleocapsid proteins. Epitope positions are shown in dark grey and nonepitopes in light grey. Significant comparisons, calculated by a permutation approach, are indicated with asterisks (*p < .05; ** p < .01; *** p < .001) they are the most antigenic proteins and because the number of complete viral genomes is relatively limited (Tables S3 and S5). SARS-CoV, the human coronavirus most similar to SARS-CoV-2, caused the first human outbreak in 2002/2003 after a spillover from palm civets, followed by human-to-human transmission chains (Shi & Wang, 2017). A second zoonotic transmission occurred in December 2003 and caused a limited number of cases (Shi & Wang, 2017;Wang et al., 2005). Viral genomes sampled during the second outbreak were not included in the analyses because their evolution occurred in the civet reservoir (Table S2). Four other human coronaviruses, namely HCoV-OC43, HCoV-HKU1 (members of the Embecovirus subgenus), HCoV-229E (Duvinavirus subgenus), and HCoV-NL63 (Setracovirus subgenus), have been transmitting within human populations for at least 70 years (Forni et al., 2017). Thus, all available S and N sequences were included in the analyses (Table S3).
Conversely, MERS-CoV displays limited ability for human-to-human transmission and outbreaks were caused by repeated spillover events from the camel host (Cui et al., 2019). MERS-CoV was therefore excluded from the analyses.

Quantification of sequence variability by calculation of H indi-
cated that B cell epitopes in the S protein are significantly more variable than nonepitopes for SARS-CoV, HCoV-OC43 and HCoV-HKU1 ( Figure 4). Analysis of CD4 + and CD8 + T cell epitopes in these viruses indicated no increased diversity for epitope compared to nonepitope positions, except for the S protein of SARS-CoV for CD4 + T cells.
However, when positions within B cell epitopes were excluded from the analysis, this difference disappeared and T cell epitopes were found to be significantly less variable than nonepitopes for the spike proteins of HCoV-HKU1 and HCoV-OC43, as well as for the N protein of HCoV-229E ( Figure 4). Thus, the lack of antigenic diversity at T cell epitopes is a common feature of human coronaviruses, which instead tend to maintain sequence conservation of such epitopes.

| D ISCUSS I ON
The origin of SARS-CoV-2 remains uncertain and it is presently unknown whether the virus spilled over from a bat or another intermediate host. The hypothesis of a zoonotic origin is strongly supported by multiple lines of evidence, although it cannot be excluded that SARS-CoV-2 was transmitted cryptically in humans before gaining the ability to spread efficiently among people (Andersen et al., 2020;Sironi et al., 2020). Whatever the initial events associated with the early phases of the pandemic, it is clear that circulating SARS-CoV-2 viruses shared a common ancestor at the end of 2019 (van Dorp et al., 2020;. Due to its recent origin, the genetic diversity of the SARS-CoV-2 population remains limited. This is also the result of the relatively low mutation rate of coronaviruses (as compared to other RNA viruses), which encode enzymes with some proofreading ability (Denison et al., 2011;Forni et al., 2017).
Nonetheless, the huge number of transmissions worldwide has allowed thousands of mutations to appear in the viral population and, thanks to enormous international sequencing efforts, more than 25,000 amino acid replacements have currently been reported (http://cov-glue.cvr.gla.ac.uk). Irrespective of the host, most variants are expected to be deleterious for viral fitness, or to have no consequences (Cagliani et al., 2020;van Dorp et al., 2020;Grubaugh et al., 2020). However, a proportion of the replacements may favour the virus and some of these may contribute to adaptation to the human host. In particular, the recent and ongoing evolution of SARS-CoV-2 is expected to be at least partially driven by the selective pressure imposed by the human immune system. Indeed, antigenic drift or immune evasion mutations have been reported for other zoonotic viruses such as Lassa virus (Andersen et al., 2015) and Influenza A virus (Su et al., 2015). The emergence of immune evasion variants was also observed during an outbreak of MERS-CoV in South Korea, when mutations in the spike proteins were positively selected as they facilitated viral escape from neutralizing antibodies, even though the same variants decreased binding to the cellular receptor (Kim et al., 2019;Kim et al., 2019;Kleine-Weber et al., 2019;Rockx et al., 2010). This exemplifies a phenomenon often observed in other viruses, most notably HIV-1 (Liu et al., 2007;Martinez-Picado et al., 2006;Schneidewind et al., 2007Schneidewind et al., , 2008, whereby the virus trades off immune evasion with a fitness cost. As a consequence, immune evasion mutations may be only transiently maintained in viral populations. We therefore decided to quantify epitope variability in terms of entropy, rather than relying on measures based on substitution rates (dN/dS), which were developed for application to variants that go to fixation in different lineages over time (Kryazhimskiy & Plotkin, 2008).

The MERS-CoV mutants responsible for the outbreak in South
Korea also testify to the relevance of the antibody response in coronavirus control and the selective pressure imposed by humoral immunity on the virus (Kim et al., 2019;Kleine-Weber et al., 2019;Rockx et al., 2010). This is probably also the case for SARS-CoV-2, as a recent report indicated that the sera of most COVID-19 convalescent patients have virus-neutralization activities and that antibody titres negatively correlate with viral load (Okba et al., 2020;Vabret et al., 2020;Zhou et al., 2020). Nonetheless, studies on relatively large COVID-19 patient cohorts reported that patients with severe disease display stronger IgG responses than milder cases, and a negative correlation between anti-S antibody titres and lymphocyte counts was reported Vabret et al., 2020;Zhang, Zhou, et al., 2020;Zhao et al., 2020). Consistently, asymptomatic SARS-CoV-2-infected individuals were recently reported to have lower virus-specific IgG levels than COVID-19 patients (Long et al., 2020). These observations raised concerns that humoral responses might not necessarily be protective, but rather pathogenic, either via antibody-dependent enhancement (ADE) or other mechanisms (Cao, 2020;Iwasaki & Yang, 2020;. Clearly, gaining insight into the dynamic interaction between SARS-CoV-2 and the human immune system is of fundamental importance not only to understand COVID-19 immunopathogenesis, but also to inform therapeutic and preventive viral control strategies. We thus exploited the availability of a large number of fully sequenced high-quality SARS-CoV-2 genomes, as well as validated predictions of B cell and T cell epitopes, to investigate whether the selective pressure exerted by the adaptive immune response is detectable in the global SARS-CoV-2 population, and if the virus is evolving to evade it. Results indicated that B cell epitopes in the N and S proteins, which represent the major targets of the antibody response, have higher diversity than nonepitope positions. The same was observed for the spike proteins of HCoV-HKU1, HCoV-OC43 and SARS-CoV, although data on SARS-CoV should be taken with caution as they derive from a relatively small number of sequences sampled over a short time frame. Conversely, no evidence of antibody-mediated selective pressure was evident for HCoV-229E and HCoV-NL63. The reasons underlying these differences are unclear, but recent data on a relatively small population of patients with respiratory disease indicated that the titres of neutralizing antibodies against HCoV-OC43 tend to be higher compared to those against HCoV-229E and HCoV-NL63 (HCoV-HKU1 was not evaluated), suggesting the two latter viruses elicit mainly nonneutralizing responses (Gorse et al., 2020).
B cell epitopes within nsp16 were also found to be variable, although this protein was not reported to be immunogenic (Grifoni, Weiskopf, et al., 2020). However, the antibody response to SARS-CoV-2 has presently been systematically analysed in a relatively small number of patients and most studies focused on structural proteins. It is thus possible that, during infection, antibodies against nsp16 are raised, but they have not yet been detected. An alternative possibility is that B cell epitopes in nsp16, which is highly conserved in SARS-CoV-2 strains (Cagliani et al., 2020), coincide with regions of relatively weaker constraint. This hypothesis is partially supported by the observation that these same positions also display higher diversity when entropy is calculated on an alignment of sarbecovirus nsp16 proteins. More intriguingly, this result may indicate that nsp16, together with S and N, is a target of B cell responses in the bat reservoirs. In fact, as mentioned above, antibody binding sites tend to be conserved across species (Tse et al., 2017;Wiehe et al., 2014) and thus the selective pressure exerted on B cell epitopes is likely to be constant across hosts. Although the immunogenicity of nsp16 remains to be evaluated, these data suggest that SARS-CoV-2 is evolving to elude the host humoral immune response. However, we note that this observation does not necessarily imply that antibodies against SARS-CoV-2 are protective and it does not rule out the possibility that humoral responses contribute to COVID-19 pathogenesis. We should also add that we cannot exclude that the higher diversity observed at B cell epitopes is ultimately the result of epitope regions being more exposed at protein surfaces and less constrained than other regions. However, the fact the higher values of entropy are mainly detected in the B epitope regions of proteins that are strongly targeted by the humoral system speaks against this possibility. Finally, we note that the appearance of within-host transitory mutations in B cell epitopes has previously been observed in other zoonotic, acute viral infections (Andersen et al., 2015).
In COVID-19 patients, antibody titres were found to correlate with the strength of virus-specific T cell responses .
Surprisingly, we found that, in the SARS-CoV-2 population, epitopes for CD4 + and CD8 + T cells are not more variable than nonepitope positions. Conversely, a significant reduction in epitope variability was observed for a subset of viral proteins, in particular for some of the most immunogenic ones (S, N, ORF8 and ORF3a;Grifoni, Weiskopf, et al., 2020;Peng et al., 2020). To check that the result was not due to stronger structural/functional constraints acting on epitope positions, we again used H values calculated on an alignment of sarbecovirus genomes, all of which, except SARS-CoV, were sampled in bats. T cell responses are initiated by the presentation of antigenic epitopes by MHC (major histocompatibility complex) class I and class II molecules. Different mammals have diverse MHC gene repertoires and thus present distinct antigens. In particular, recent data from various bat species have indicated that many MHC class I molecules have a three-or five-amino acid insertion in the peptide binding pocket, resulting in very different presented peptide repertoires compared to the MHC class I molecules of other mammals (Abduriyim et al., 2019;Lu et al., 2019;Ng et al., 2016;Papenfuss et al., 2012;Wynne et al., 2016). Thus, the selective pressure acting on T cell epitopes is probably volatile and not conserved in humans and bats. Analysis of sarbecovirus proteins indicated that, apart from CD4 + T cell epitopes in the N protein, the T cell epitopes predicted in SARS-CoV-2 proteins are not less diverse than nonepitope positions, suggesting that epitope conservation is not simply secondary to structural or functional constraints, but may result from an interaction with human T cell responses. Of course, another possible explanation for this finding is that the prediction tools failed to identify real epitopes. However, we retrieved epitopes from a previous work and the authors validated their predictions using lymphocytes of 20 patients who had recovered from COVID-19 (Grifoni, Sidney, et al., 2020;Grifoni, Weiskopf, et al., 2020). Also, qualitatively similar results were observed with a small set of experimentally identified epitopes. Moreover, if a general artefact linked to epitope prediction had been introduced, we would not expect to observe significant differences and not specifically in the proteins that represent the major targets of T cell responses.
In the case of HIV-1, immune activation probably favours the virus by increasing the rate of CD4 + T cell trans-infection (Sanjuán et al., 2013). Conversely, the mechanisms underlying MTB epitope conservation have not been fully elucidated. A possible explanation is that conserved epitopes generate a decoy immune response and give an advantage to the bacterium. An alternative possibility is that T cell activation results in lung tissue inflammation and damage (cavitary tuberculosis), which favours MTB transmission by aerosols (Coscolla et al., 2015;Lindestam Arlehamn et al., 2015). Although these mechanisms are unlikely to be at play in the case of SARS-CoV-2, a deregulated immune response has been associated with COVID-19 pathogenesis (Hannan et al., 2020). Specifically, recent data indicated that patients recovering from severe COVID-19 have broader and stronger T cell responses compared to mild cases . This was particularly evident for responses against the S, membrane (M), ORF3a and ORF8 proteins .
Although this observation might simply reflect higher viral loads in severe cases, the possibility that the T cell response itself is deleterious cannot be excluded. Moreover, the same authors reported that CD8 + T cells targeting different virus proteins have distinct cytokine profiles, suggesting that the virus can modulate the host immune response to its benefit . Additionally, a post-mortem study on six patients who died from COVID-19 indicated that infection of macrophages can lead to activation-induced T cell death, which may eventually be responsible for lymphocytopenia . However, we also found a trend of lower diversity of T cell epitopes for common cold coronaviruses, indicating that epitope conservation per se is not directly linked to disease severity.
Moreover, other SARS-CoV-2 immunogenic proteins such as M and ORF7 did not show differences in T cell epitope conservation, which was instead observed for nsp16 and nsp15. These latter proteins are not known to be T cell targets (Grifoni, Weiskopf, et al., 2020).
Clearly, further analyses will be required to clarify the significance of T cell epitope conservation in SARS-CoV-2. An interesting possibility is that both for SARS-CoV-2 and for common cold coronaviruses, conservation serves to maintain epitopes that elicit tolerizing T cell responses or induce T cells with regulatory activity. Indeed, we considered T cell epitopes as a whole, but differences exist in terms of variability and, probably, antigenicity. This clearly represents a limitation of this study, but the modest amount of genetic diversity in the SARS-CoV-2 population does not presently allow for analysis of single epitope regions. Moreover, more detailed and robust analyses will certainly require the systematic, experimental definition of T and B cell epitopes in the SARS-CoV-2 proteome.

ACK N OWLED G M ENTS
We gratefully acknowledge the authors, originating and submitting laboratories of the sequences from GISAID's EpiCoV database on which this research is based. This work was supported by the Italian Ministry of Health ("Ricerca Corrente 2019-2020" to M.S., "Ricerca Corrente 2018-2020" to D.F.).

DATA AVA I L A B I L I T Y S TAT E M E N T
Lists of virus accession IDs are reported in Tables S1-S3. Data used for generating Figures 1-4 are reported in Tables S4 and S5. An R script for permutation analysis is reported as Appendix S1.