Supervised learning with word embeddings derived from PubMed captures latent knowledge about protein kinases and cancer

Ravanmehr, V.; Blau, H.; Cappelletti, L.; Fontana, T.; Carmody, L.; Coleman, B.; George, J.; Reese, J.; Joachimiak, M.; Bocci, G.; Hansen, P.; Bult, C.; Rueter, J.; Casiraghi, E.; Valentini, G.; Mungall, C.; Oprea, T.I.; Robinson, P.N.

doi:10.1093/nargab/lqab113

Inhibiting protein kinases (PKs) that cause cancers has been an important topic in cancer therapy for years. So far, almost 8% of >530 PKs have been targeted by FDA-approved medications, and around 150 protein kinase inhibitors (PKIs) have been tested in clinical trials. We present an approach based on natural language processing and machine learning to investigate the relations between PKs and cancers, predicting PKs whose inhibition would be efficacious to treat a certain cancer. Our approach represents PKs and cancers as semantically meaningful 100-dimensional vectors based on word and concept neighborhoods in PubMed abstracts. We use information about phase I-IV trials in ClinicalTrials.gov to construct a training set for random forest classification. Our results with historical data show that associations between PKs and specific cancers can be predicted years in advance with good accuracy. Our tool can be used to predict the relevance of inhibiting PKs for specific cancers and to support the design of well-focused clinical trials to discover novel PKIs for cancer therapy.

Supervised learning with word embeddings derived from PubMed captures latent knowledge about protein kinases and cancer / V. Ravanmehr, H. Blau, L. Cappelletti, T. Fontana, L. Carmody, B. Coleman, J. George, J. Reese, M. Joachimiak, G. Bocci, P. Hansen, C. Bult, J. Rueter, E. Casiraghi, G. Valentini, C. Mungall, T.I. Oprea, P.N. Robinson. - In: NAR GENOMICS AND BIOINFORMATICS. - ISSN 2631-9268. - 3:4(2021 Dec), pp. lqab113.1-lqab113.13. [10.1093/nargab/lqab113]

Supervised learning with word embeddings derived from PubMed captures latent knowledge about protein kinases and cancer

Ravanmehr, Vida^Primo;Blau, Hannah;L. Cappelletti^Software;Fontana, Tommaso;Carmody, Leigh;Coleman, Ben;George, Joshy;Reese, Justin;Joachimiak, Marcin;Bocci, Giovanni;Hansen, Peter;Bult, Carol;Rueter, Jens;E. Casiraghi^Methodology;G. Valentini^Methodology;Mungall, Christopher;Oprea, Tudor I;

2021

Abstract

Inhibiting protein kinases (PKs) that cause cancers has been an important topic in cancer therapy for years. So far, almost 8% of >530 PKs have been targeted by FDA-approved medications, and around 150 protein kinase inhibitors (PKIs) have been tested in clinical trials. We present an approach based on natural language processing and machine learning to investigate the relations between PKs and cancers, predicting PKs whose inhibition would be efficacious to treat a certain cancer. Our approach represents PKs and cancers as semantically meaningful 100-dimensional vectors based on word and concept neighborhoods in PubMed abstracts. We use information about phase I-IV trials in ClinicalTrials.gov to construct a training set for random forest classification. Our results with historical data show that associations between PKs and specific cancers can be predicted years in advance with good accuracy. Our tool can be used to predict the relevance of inhibiting PKs for specific cancers and to support the design of well-focused clinical trials to discover novel PKIs for cancer therapy.

Scheda breve

Scheda completa

Scheda completa (DC)

	Parole chiave
	
				word embedding; supervised learning; kinases; cancer; bioinformatics
			
	Settori scientifico-disciplinari dell'articolo (sola visualizzazione)
	
				Settore INF/01 - Informatica
Settore BIO/11 - Biologia Molecolare
			
	Data di pubblicazione
	
				dic-2021
			
	Data ahead of print o data di stampa
	
				8-dic-2021
			
	Rivista in ANCE
	
				NAR GENOMICS AND BIOINFORMATICS
			
	DOI
	
				https://dx.doi.org/10.1093/nargab/lqab113
			
	Tipologia
	
				Article (author)
			
	Appare nelle tipologie:
	
				01 - Articolo su periodico

File in questo prodotto:

File	Dimensione	Formato
PubMed2Vec-kinases-cancer-NAR-published.pdf accesso aperto Descrizione: Articolo principale Tipologia: Publisher's version/PDF Dimensione 1.23 MB Formato Adobe PDF Visualizza/Apri	1.23 MB	Adobe PDF	Visualizza/Apri

Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2434/887943

Citazioni

4

5

4

ND

IRIS Institutional Research Information System - AIR Archivio Istituzionale della Ricerca