Predictive models of long COVID

Antony, B.; Blau, H.; Casiraghi, E.; Loomba, J.J.; Callahan, T.J.; Laraway, B.J.; Wilkins, K.J.; Antonescu, C.C.; Valentini, G.; Williams, A.E.; Robinson, P.N.; Reese, J.T.; Murali, T.M.

doi:10.1016/j.ebiom.2023.104777

Background: The cause and symptoms of long COVID are poorly understood. It is challenging to predict whether a given COVID-19 patient will develop long COVID in the future. Methods: We used electronic health record (EHR) data from the National COVID Cohort Collaborative to predict the incidence of long COVID. We trained two machine learning (ML) models - logistic regression (LR) and random forest (RF). Features used to train predictors included symptoms and drugs ordered during acute infection, measures of COVID-19 treatment, pre-COVID comorbidities, and demographic information. We assigned the 'long COVID' label to patients diagnosed with the U09.9 ICD10-CM code. The cohorts included patients with (a) EHRs reported from data partners using U09.9 ICD10-CM code and (b) at least one EHR in each feature category. We analysed three cohorts: all patients (n = 2,190,579; diagnosed with long COVID = 17,036), inpatients (149,319; 3,295), and outpatients (2,041,260; 13,741). Findings: LR and RF models yielded median AUROC of 0.76 and 0.75, respectively. Ablation study revealed that drugs had the highest influence on the prediction task. The SHAP method identified age, gender, cough, fatigue, albuterol, obesity, diabetes, and chronic lung disease as explanatory features. Models trained on data from one N3C partner and tested on data from the other partners had average AUROC of 0.75. Interpretation: ML-based classification using EHR information from the acute infection period is effective in predicting long COVID. SHAP methods identified important features for prediction. Cross-site analysis demonstrated the generalizability of the proposed methodology. Funding: NCATS U24 TR002306, NCATS UL1 TR003015, Axle Informatics Subcontract: NCATS-P00438-B, NIH/NIDDK/OD, PSR2015-1720GVALE_01, G43C22001320007, and Director, Office of Science, Office of Basic Energy Sciences of the U.S. Department of Energy Contract No. DE-AC02-05CH11231.

Predictive models of long COVID / B. Antony, H. Blau, E. Casiraghi, J.J. Loomba, T.J. Callahan, B.J. Laraway, K.J. Wilkins, C.C. Antonescu, G. Valentini, A.E. Williams, P.N. Robinson, J.T. Reese, T.M. Murali. - In: EBIOMEDICINE. - ISSN 2352-3964. - 96:(2023 Oct), pp. 104777.1-104777.14. [10.1016/j.ebiom.2023.104777]

Predictive models of long COVID

Antony, Blessy;Blau, Hannah;E. Casiraghi;Loomba, Johanna J;Callahan, Tiffany J;Laraway, Bryan J;Wilkins, Kenneth J;Antonescu, Corneliu C;G. Valentini;Williams, Andrew E;Robinson, Peter N;Reese, Justin T;Murali, T M

2023

Abstract

Background: The cause and symptoms of long COVID are poorly understood. It is challenging to predict whether a given COVID-19 patient will develop long COVID in the future. Methods: We used electronic health record (EHR) data from the National COVID Cohort Collaborative to predict the incidence of long COVID. We trained two machine learning (ML) models - logistic regression (LR) and random forest (RF). Features used to train predictors included symptoms and drugs ordered during acute infection, measures of COVID-19 treatment, pre-COVID comorbidities, and demographic information. We assigned the 'long COVID' label to patients diagnosed with the U09.9 ICD10-CM code. The cohorts included patients with (a) EHRs reported from data partners using U09.9 ICD10-CM code and (b) at least one EHR in each feature category. We analysed three cohorts: all patients (n = 2,190,579; diagnosed with long COVID = 17,036), inpatients (149,319; 3,295), and outpatients (2,041,260; 13,741). Findings: LR and RF models yielded median AUROC of 0.76 and 0.75, respectively. Ablation study revealed that drugs had the highest influence on the prediction task. The SHAP method identified age, gender, cough, fatigue, albuterol, obesity, diabetes, and chronic lung disease as explanatory features. Models trained on data from one N3C partner and tested on data from the other partners had average AUROC of 0.75. Interpretation: ML-based classification using EHR information from the acute infection period is effective in predicting long COVID. SHAP methods identified important features for prediction. Cross-site analysis demonstrated the generalizability of the proposed methodology. Funding: NCATS U24 TR002306, NCATS UL1 TR003015, Axle Informatics Subcontract: NCATS-P00438-B, NIH/NIDDK/OD, PSR2015-1720GVALE_01, G43C22001320007, and Director, Office of Science, Office of Basic Energy Sciences of the U.S. Department of Energy Contract No. DE-AC02-05CH11231.

Scheda breve

Scheda completa

Scheda completa (DC)

	Parole chiave
	
				COVID-19; Classification; Cross-site analysis; Explainability; Long COVID;
			
	Settori scientifico-disciplinari dell'articolo (sola visualizzazione)
	
				Settore INF/01 - Informatica
Settore MED/01 - Statistica Medica
			
	Data di pubblicazione
	
				ott-2023
			
	Rivista in ANCE
	
				EBIOMEDICINE
			
	DOI
	
				https://dx.doi.org/10.1016/j.ebiom.2023.104777
			
	URL
	
				https://www.thelancet.com/journals/ebiom/article/PIIS2352-3964(23)00343-2/fulltext
			
	Tipologia
	
				Article (author)
			
	Appare nelle tipologie:
	
				01 - Articolo su periodico

File in questo prodotto:

File	Dimensione	Formato
eBiomedicine_predictiveModels_of_longCovid.pdf accesso aperto Tipologia: Publisher's version/PDF Dimensione 1.69 MB Formato Adobe PDF Visualizza/Apri	1.69 MB	Adobe PDF	Visualizza/Apri

Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2434/1011108

Citazioni

2

22

21

ND

IRIS Institutional Research Information System - AIR Archivio Istituzionale della Ricerca

Predictive models of long COVID

Antony, Blessy;Blau, Hannah;E. Casiraghi;Loomba, Johanna J;Callahan, Tiffany J;Laraway, Bryan J;Wilkins, Kenneth J;Antonescu, Corneliu C;G. Valentini;Williams, Andrew E;Robinson, Peter N;Reese, Justin T;Murali, T M

2023

Abstract

Scheda breve

Scheda completa

Scheda completa (DC)

Pubblicazioni consigliate

Citazioni

social impact

IRIS Institutional Research Information System - AIR Archivio Istituzionale della Ricerca

Predictive models of long COVID

Antony, Blessy;Blau, Hannah;E. Casiraghi;Loomba, Johanna J;Callahan, Tiffany J;Laraway, Bryan J;Wilkins, Kenneth J;Antonescu, Corneliu C;G. Valentini;Williams, Andrew E;Robinson, Peter N;Reese, Justin T;Murali, T M

2023

Abstract

Scheda breve Scheda completa Scheda completa (DC)

Pubblicazioni consigliate

Informazioni

Citazioni

social impact

Conferma cancellazione

Scheda breve

Scheda completa

Scheda completa (DC)