Development of in silico methodologies to predict the toxicity of novel proteins in the context of food and feed risk assessment

Palazzolo, L.; Laurenzi, T.; Ben Mariem, O.; Bassan, A.; Guerrini, U.; Eberini, I.

doi:10.2903/sp.efsa.2024.en-9063

This report is the outcome of an EFSA procurement (OC/EFSA/GMO/2021/02 – LOT1) aiming at developing an in silico strategy to predict the toxicity of (novel) proteins. Up-to-date, commercially available tools predicting protein toxicity based on primary structures were evaluated for their accuracy and usability, using a curated dataset of annotated toxins and non- toxins from UniProt. ToxinPred2 and Toxify emerged as the top performers, showing both high accuracy and suitability for integration into an automated pipeline. Additional bioinformatics methods were explored, which provide sequence similarity-based information rather than direct predictions (BLAST, InterPro HMM profiles). By converting their outputs into features for machine learning models, a high prediction accuracy was achieved, though there is potential for improvement to reduce overfitting risks. An Artificial Intelligence (AI)-based consensus pipeline, integrating results from ToxinPred2, Toxify, and our machine learning models was developed. This consensus model reached a 95% accuracy rate in distinguishing toxins from non-toxins. Noteworthy, our BLAST-based machine learning model - although performance-wise comparable to BLAST - offers higher sensitivity and specificity across diverse queries than BLAST; it relies on database-based evolutionary relationships, which may significantly limit its applicability to novel or mutated toxins. Structure-based prediction methods are deemed impractical due to their resource intensity and reliance on accurate structural data; AI-driven structure prediction methods - like Rosetta and AlphaFold - are promising, however they are still under development and may not be suitable for the regulatory context yet. Recommendations are provided, including enhancement of the proposed consensus pipeline to create an independent open-source, user- friendly tool for evaluating the safety of (novel) proteins in food and feed; regular updates of the proposed databases and models; incorporation of 3D structures and in general validation of AI and machine learning models for regulatory uses.

Development of in silico methodologies to predict the toxicity of novel proteins in the context of food and feed risk assessment / L. Palazzolo, T. Laurenzi, O. Ben Mariem, A. Bassan, U. Guerrini, I. Eberini. - In: EFSA SUPPORTING PUBLICATIONS. - ISSN 2397-8325. - 21:10(2024 Oct), pp. 9063E.1-9063E.99. [10.2903/sp.efsa.2024.en-9063]

Development of in silico methodologies to predict the toxicity of novel proteins in the context of food and feed risk assessment

L. Palazzolo^Primo;T. Laurenzi;O. Ben Mariem;Bassan, A.;U. Guerrini;I. Eberini^Ultimo

2024

Abstract

This report is the outcome of an EFSA procurement (OC/EFSA/GMO/2021/02 – LOT1) aiming at developing an in silico strategy to predict the toxicity of (novel) proteins. Up-to-date, commercially available tools predicting protein toxicity based on primary structures were evaluated for their accuracy and usability, using a curated dataset of annotated toxins and non- toxins from UniProt. ToxinPred2 and Toxify emerged as the top performers, showing both high accuracy and suitability for integration into an automated pipeline. Additional bioinformatics methods were explored, which provide sequence similarity-based information rather than direct predictions (BLAST, InterPro HMM profiles). By converting their outputs into features for machine learning models, a high prediction accuracy was achieved, though there is potential for improvement to reduce overfitting risks. An Artificial Intelligence (AI)-based consensus pipeline, integrating results from ToxinPred2, Toxify, and our machine learning models was developed. This consensus model reached a 95% accuracy rate in distinguishing toxins from non-toxins. Noteworthy, our BLAST-based machine learning model - although performance-wise comparable to BLAST - offers higher sensitivity and specificity across diverse queries than BLAST; it relies on database-based evolutionary relationships, which may significantly limit its applicability to novel or mutated toxins. Structure-based prediction methods are deemed impractical due to their resource intensity and reliance on accurate structural data; AI-driven structure prediction methods - like Rosetta and AlphaFold - are promising, however they are still under development and may not be suitable for the regulatory context yet. Recommendations are provided, including enhancement of the proposed consensus pipeline to create an independent open-source, user- friendly tool for evaluating the safety of (novel) proteins in food and feed; regular updates of the proposed databases and models; incorporation of 3D structures and in general validation of AI and machine learning models for regulatory uses.

Scheda breve

Scheda completa

Scheda completa (DC)

	Settori scientifico-disciplinari dell'articolo (validi dal 09/05/2024)
	
				Settore BIOS-07/A - Biochimica
Settore PHYS-06/A - Fisica per le scienze della vita, l'ambiente e i beni culturali
			
	Data di pubblicazione
	
				ott-2024
			
	Rivista in ANCE
	
				EFSA SUPPORTING PUBLICATIONS
			
	DOI
	
				https://dx.doi.org/10.2903/sp.efsa.2024.en-9063
			
	Tipologia
	
				Article (author)
			
	Appare nelle tipologie:
	
				01 - Articolo su periodico

File in questo prodotto:

File	Dimensione	Formato
EFSA Supporting Publications - 2024 - Palazzolo - Development of in silico methodologies to predict the toxicity of novel.pdf accesso aperto Tipologia: Publisher's version/PDF Dimensione 2.44 MB Formato Adobe PDF Visualizza/Apri	2.44 MB	Adobe PDF	Visualizza/Apri

Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2434/1114048

Citazioni

ND

ND

ND

ND

IRIS Institutional Research Information System - AIR Archivio Istituzionale della Ricerca