Generalisable long COVID subtypes: findings from the NIH N3C and RECOVER programmes

Reese, J.T.; Blau, H.; Casiraghi, E.; Bergquist, T.; Loomba, J.J.; Callahan, T.J.; Laraway, B.; Antonescu, C.; Coleman, B.; Gargano, M.; Wilkins, K.J.; Cappelletti, L.; Fontana, T.; Ammar, N.; Antony, B.; Murali, T.M.; Caufield, J.H.; Karlebach, G.; Mcmurry, J.A.; Williams, A.; Moffitt, R.; Banerjee, J.; Solomonides, A.E.; Davis, H.; Kostka, K.; Valentini, G.; Sahner, D.; Chute, C.G.; Madlock-Brown, C.; Haendel, M.A.; Robinson, P.N.; Spratt, H.; Visweswaran, S.; Flack, J.E.; Yoo, Y.J.; Gabriel, D.; Alexander, G.C.; Mehta, H.B.; Liu, F.; Miller, R.T.; Wong, R.; Hill, E.L.; Thorpe, L.E.; Divers, J.

doi:10.1016/j.ebiom.2022.104413

Background: Stratification of patients with post-acute sequelae of SARS-CoV-2 infection (PASC, or long COVID) would allow precision clinical management strategies. However, long COVID is incompletely understood and characterised by a wide range of manifestations that are difficult to analyse computationally. Additionally, the generalisability of machine learning classification of COVID-19 clinical outcomes has rarely been tested. Methods: We present a method for computationally modelling PASC phenotype data based on electronic healthcare records (EHRs) and for assessing pairwise phenotypic similarity between patients using semantic similarity. Our approach defines a nonlinear similarity function that maps from a feature space of phenotypic abnormalities to a matrix of pairwise patient similarity that can be clustered using unsupervised machine learning. Findings: We found six clusters of PASC patients, each with distinct profiles of phenotypic abnormalities, including clusters with distinct pulmonary, neuropsychiatric, and cardiovascular abnormalities, and a cluster associated with broad, severe manifestations and increased mortality. There was significant association of cluster membership with a range of pre-existing conditions and measures of severity during acute COVID-19. We assigned new patients from other healthcare centres to clusters by maximum semantic similarity to the original patients, and showed that the clusters were generalisable across different hospital systems. The increased mortality rate originally identified in one cluster was consistently observed in patients assigned to that cluster in other hospital systems. Interpretation: Semantic phenotypic clustering provides a foundation for assigning patients to stratified subgroups for natural history or therapy studies on PASC. Funding: NIH (TR002306/OT2HL161847-01/OD011883/HG010860), U.S.D.O.E. (DE-AC02-05CH11231), Donald A. Roux Family Fund at Jackson Laboratory, Marsico Family at CU Anschutz.

Generalisable long COVID subtypes: findings from the NIH N3C and RECOVER programmes / J.T. Reese, H. Blau, E. Casiraghi, T. Bergquist, J.J. Loomba, T.J. Callahan, B. Laraway, C. Antonescu, B. Coleman, M. Gargano, K.J. Wilkins, L. Cappelletti, T. Fontana, N. Ammar, B. Antony, T.M. Murali, J.H. Caufield, G. Karlebach, J.A. Mcmurry, A. Williams, R. Moffitt, J. Banerjee, A.E. Solomonides, H. Davis, K. Kostka, G. Valentini, D. Sahner, C.G. Chute, C. Madlock-Brown, M.A. Haendel, P.N. Robinson, H. Spratt, S. Visweswaran, J.E. Flack, Y.J. Yoo, D. Gabriel, G.C. Alexander, H.B. Mehta, F. Liu, R.T. Miller, R. Wong, E.L. Hill, L.E. Thorpe, J. Divers. - In: EBIOMEDICINE. - ISSN 2352-3964. - 87:(2023 Jan), pp. 104413.1-104413.17. [10.1016/j.ebiom.2022.104413]

Generalisable long COVID subtypes: findings from the NIH N3C and RECOVER programmes

Reese J. T.;Blau H.;E. Casiraghi;Bergquist T.;Loomba J. J.;Callahan T. J.;Laraway B.;Antonescu C.;Coleman B.;Gargano M.;Wilkins K. J.;L. Cappelletti^{Membro del Collaboration Group};Fontana T.;Ammar N.;Antony B.;Murali T. M.;Caufield J. H.;Karlebach G.;McMurry J. A.;Williams A.;Moffitt R.;Banerjee J.;Solomonides A. E.;Davis H.;Kostka K.;G. Valentini^{Membro del Collaboration Group};Sahner D.;Chute C. G.;Madlock-Brown C.;Haendel M. A.;Robinson P. N.;Spratt H.;Visweswaran S.;Flack J. E.;Yoo Y. J.;Gabriel D.;Alexander G. C.;Mehta H. B.;Liu F.;Miller R. T.;Wong R.;Hill E. L.;Thorpe L. E.;Divers J.

2023

Abstract

Background: Stratification of patients with post-acute sequelae of SARS-CoV-2 infection (PASC, or long COVID) would allow precision clinical management strategies. However, long COVID is incompletely understood and characterised by a wide range of manifestations that are difficult to analyse computationally. Additionally, the generalisability of machine learning classification of COVID-19 clinical outcomes has rarely been tested. Methods: We present a method for computationally modelling PASC phenotype data based on electronic healthcare records (EHRs) and for assessing pairwise phenotypic similarity between patients using semantic similarity. Our approach defines a nonlinear similarity function that maps from a feature space of phenotypic abnormalities to a matrix of pairwise patient similarity that can be clustered using unsupervised machine learning. Findings: We found six clusters of PASC patients, each with distinct profiles of phenotypic abnormalities, including clusters with distinct pulmonary, neuropsychiatric, and cardiovascular abnormalities, and a cluster associated with broad, severe manifestations and increased mortality. There was significant association of cluster membership with a range of pre-existing conditions and measures of severity during acute COVID-19. We assigned new patients from other healthcare centres to clusters by maximum semantic similarity to the original patients, and showed that the clusters were generalisable across different hospital systems. The increased mortality rate originally identified in one cluster was consistently observed in patients assigned to that cluster in other hospital systems. Interpretation: Semantic phenotypic clustering provides a foundation for assigning patients to stratified subgroups for natural history or therapy studies on PASC. Funding: NIH (TR002306/OT2HL161847-01/OD011883/HG010860), U.S.D.O.E. (DE-AC02-05CH11231), Donald A. Roux Family Fund at Jackson Laboratory, Marsico Family at CU Anschutz.

Scheda breve

Scheda completa

Scheda completa (DC)

	Parole chiave
	
				COVID-19; Human Phenotype Ontology; Long COVID; Machine learning; Precision medicine; Semantic similarity;
			
	Settori scientifico-disciplinari dell'articolo (sola visualizzazione)
	
				Settore INF/01 - Informatica
Settore MED/01 - Statistica Medica
			
	Data di pubblicazione
	
				gen-2023
			
	Rivista in ANCE
	
				EBIOMEDICINE
			
	DOI
	
				https://dx.doi.org/10.1016/j.ebiom.2022.104413
			
	Tipologia
	
				Article (author)
			
	Appare nelle tipologie:
	
				01 - Articolo su periodico

File in questo prodotto:

File	Dimensione	Formato
1-s2.0-S2352396422005953-main.pdf accesso aperto Tipologia: Publisher's version/PDF Dimensione 1.73 MB Formato Adobe PDF Visualizza/Apri	1.73 MB	Adobe PDF	Visualizza/Apri

Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2434/952718

Citazioni

110

123

116

ND

IRIS Institutional Research Information System - AIR Archivio Istituzionale della Ricerca