This report is the outcome of an EFSA procurement (OC/EFSA/GMO/2021/02 – LOT1) aiming at developing an in silico strategy to predict the toxicity of (novel) proteins. Up-to-date, commercially available tools predicting protein toxicity based on primary structures were evaluated for their accuracy and usability, using a curated dataset of annotated toxins and non- toxins from UniProt. ToxinPred2 and Toxify emerged as the top performers, showing both high accuracy and suitability for integration into an automated pipeline. Additional bioinformatics methods were explored, which provide sequence similarity-based information rather than direct predictions (BLAST, InterPro HMM profiles). By converting their outputs into features for machine learning models, a high prediction accuracy was achieved, though there is potential for improvement to reduce overfitting risks. An Artificial Intelligence (AI)-based consensus pipeline, integrating results from ToxinPred2, Toxify, and our machine learning models was developed. This consensus model reached a 95% accuracy rate in distinguishing toxins from non-toxins. Noteworthy, our BLAST-based machine learning model - although performance-wise comparable to BLAST - offers higher sensitivity and specificity across diverse queries than BLAST; it relies on database-based evolutionary relationships, which may significantly limit its applicability to novel or mutated toxins. Structure-based prediction methods are deemed impractical due to their resource intensity and reliance on accurate structural data; AI-driven structure prediction methods - like Rosetta and AlphaFold - are promising, however they are still under development and may not be suitable for the regulatory context yet. Recommendations are provided, including enhancement of the proposed consensus pipeline to create an independent open-source, user- friendly tool for evaluating the safety of (novel) proteins in food and feed; regular updates of the proposed databases and models; incorporation of 3D structures and in general validation of AI and machine learning models for regulatory uses.

Development of in silico methodologies to predict the toxicity of novel proteins in the context of food and feed risk assessment / L. Palazzolo, T. Laurenzi, O. Ben Mariem, A. Bassan, U. Guerrini, I. Eberini. - In: EFSA SUPPORTING PUBLICATIONS. - ISSN 2397-8325. - 21:10(2024 Oct), pp. 9063E.1-9063E.99. [10.2903/sp.efsa.2024.en-9063]

Development of in silico methodologies to predict the toxicity of novel proteins in the context of food and feed risk assessment

L. Palazzolo
Primo
;
T. Laurenzi;O. Ben Mariem;U. Guerrini;I. Eberini
Ultimo
2024

Abstract

This report is the outcome of an EFSA procurement (OC/EFSA/GMO/2021/02 – LOT1) aiming at developing an in silico strategy to predict the toxicity of (novel) proteins. Up-to-date, commercially available tools predicting protein toxicity based on primary structures were evaluated for their accuracy and usability, using a curated dataset of annotated toxins and non- toxins from UniProt. ToxinPred2 and Toxify emerged as the top performers, showing both high accuracy and suitability for integration into an automated pipeline. Additional bioinformatics methods were explored, which provide sequence similarity-based information rather than direct predictions (BLAST, InterPro HMM profiles). By converting their outputs into features for machine learning models, a high prediction accuracy was achieved, though there is potential for improvement to reduce overfitting risks. An Artificial Intelligence (AI)-based consensus pipeline, integrating results from ToxinPred2, Toxify, and our machine learning models was developed. This consensus model reached a 95% accuracy rate in distinguishing toxins from non-toxins. Noteworthy, our BLAST-based machine learning model - although performance-wise comparable to BLAST - offers higher sensitivity and specificity across diverse queries than BLAST; it relies on database-based evolutionary relationships, which may significantly limit its applicability to novel or mutated toxins. Structure-based prediction methods are deemed impractical due to their resource intensity and reliance on accurate structural data; AI-driven structure prediction methods - like Rosetta and AlphaFold - are promising, however they are still under development and may not be suitable for the regulatory context yet. Recommendations are provided, including enhancement of the proposed consensus pipeline to create an independent open-source, user- friendly tool for evaluating the safety of (novel) proteins in food and feed; regular updates of the proposed databases and models; incorporation of 3D structures and in general validation of AI and machine learning models for regulatory uses.
Settore BIOS-07/A - Biochimica
Settore PHYS-06/A - Fisica per le scienze della vita, l'ambiente e i beni culturali
ott-2024
Article (author)
File in questo prodotto:
File Dimensione Formato  
EFSA Supporting Publications - 2024 - Palazzolo - Development of in silico methodologies to predict the toxicity of novel.pdf

accesso aperto

Tipologia: Publisher's version/PDF
Dimensione 2.44 MB
Formato Adobe PDF
2.44 MB Adobe PDF Visualizza/Apri
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2434/1114048
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact