2D Prediction of the Nutritional Composition of Dishes from Food Images: Deep Learning Algorithm Selection and Data Curation Beyond the Nutrition5k Project

Bianco, R.; Coluccia, S.; Marinoni, M.; Falcon, A.; Fiori, F.; Serra, G.; Ferraroni, M.; Edefonti, V.; Parpinel, M.

doi:10.3390/nu17132196

Background/Objectives: Deep learning (DL) has shown strong potential in analyzing food images, but few studies have directly predicted mass, energy, and macronutrient content from images. In addition to the importance of high-quality data, differences in country-specific food composition databases (FCDBs) can hinder model generalization. Methods: We assessed the performance of several standard DL models using four ground truth datasets derived from Nutrition5k—the largest image–nutrition dataset with ~5000 complex US cafeteria dishes. In light of developing an Italian dietary assessment tool, these datasets varied by FCDB alignment (Italian vs. US) and data curation (ingredient–mass correction and frame filtering on the test set). We evaluated combinations of four feature extractors [ResNet-50 (R50), ResNet-101 (R101), InceptionV3 (IncV3), and Vision Transformer-B-16 (ViT-B-16)] with two regression networks (2+1 and 2+2), using IncV3_2+2 as the benchmark. Descriptive statistics (percentages of agreement, unweighted Cohen’s kappa, and Bland–Altman plots) and standard regression metrics were used to compare predicted and ground truth nutritional composition. Dishes mispredicted by ≥7 algorithms were analyzed separately. Results: R50, R101, and ViT-B-16 consistently outperformed the benchmark across all datasets. Specifically, when replacing it with these top algorithms, reductions in median Mean Absolute Percentage Errors were 6.2% for mass, 6.4% for energy, 12.3% for fat, and 33.1% and 40.2% for protein and carbohydrates. Ingredient–mass correction substantially improved prediction metrics (6–42% when considering the top algorithms), while frame filtering had a more limited effect (<3%). Performance was consistently poor across most models for complex salads, chicken-based or eggs-based dishes, and Western-inspired breakfasts. Conclusions: The R101 and ViT-B-16 architectures will be prioritized in future analyses, where ingredient–mass correction and automated frame filtering methods will be considered.

2D Prediction of the Nutritional Composition of Dishes from Food Images: Deep Learning Algorithm Selection and Data Curation Beyond the Nutrition5k Project / R. Bianco, S. Coluccia, M. Marinoni, A. Falcon, F. Fiori, G. Serra, M. Ferraroni, V. Edefonti, M. Parpinel. - In: NUTRIENTS. - ISSN 2072-6643. - 17:13(2025 Jul), pp. 2196.1-2196.23. [10.3390/nu17132196]

2D Prediction of the Nutritional Composition of Dishes from Food Images: Deep Learning Algorithm Selection and Data Curation Beyond the Nutrition5k Project

Bianco, Rachele^Primo;S. Coluccia;M. Marinoni;Falcon, Alex;F. Fiori;G. Serra;M. Ferraroni;V. Edefonti^Co-ultimo;Parpinel, Maria^Co-ultimo

2025

Abstract

Background/Objectives: Deep learning (DL) has shown strong potential in analyzing food images, but few studies have directly predicted mass, energy, and macronutrient content from images. In addition to the importance of high-quality data, differences in country-specific food composition databases (FCDBs) can hinder model generalization. Methods: We assessed the performance of several standard DL models using four ground truth datasets derived from Nutrition5k—the largest image–nutrition dataset with ~5000 complex US cafeteria dishes. In light of developing an Italian dietary assessment tool, these datasets varied by FCDB alignment (Italian vs. US) and data curation (ingredient–mass correction and frame filtering on the test set). We evaluated combinations of four feature extractors [ResNet-50 (R50), ResNet-101 (R101), InceptionV3 (IncV3), and Vision Transformer-B-16 (ViT-B-16)] with two regression networks (2+1 and 2+2), using IncV3_2+2 as the benchmark. Descriptive statistics (percentages of agreement, unweighted Cohen’s kappa, and Bland–Altman plots) and standard regression metrics were used to compare predicted and ground truth nutritional composition. Dishes mispredicted by ≥7 algorithms were analyzed separately. Results: R50, R101, and ViT-B-16 consistently outperformed the benchmark across all datasets. Specifically, when replacing it with these top algorithms, reductions in median Mean Absolute Percentage Errors were 6.2% for mass, 6.4% for energy, 12.3% for fat, and 33.1% and 40.2% for protein and carbohydrates. Ingredient–mass correction substantially improved prediction metrics (6–42% when considering the top algorithms), while frame filtering had a more limited effect (<3%). Performance was consistently poor across most models for complex salads, chicken-based or eggs-based dishes, and Western-inspired breakfasts. Conclusions: The R101 and ViT-B-16 architectures will be prioritized in future analyses, where ingredient–mass correction and automated frame filtering methods will be considered.

Scheda breve

Scheda completa

Scheda completa (DC)

	Parole chiave
	
				2D prediction of nutritional composition; Nutrition5k; deep learning; dietary assessment tools; energy prediction; food image recognition; frame filtering; macronutrients prediction; mass prediction; portion size correction
			
	Settori scientifico-disciplinari dell'articolo (validi dal 09/05/2024)
	
				Settore MEDS-24/A - Statistica medica
			
	Titolo del progetto
	
	Titolo Progetto
	
									INDACO: Incorporating Nonadditivity and nonlinearity within the Dietary patterns And Cancer risk association: statistics and machine learning to create novel research Opportunities from dietary assessment to cancer prediction
								
	Acronimo
	
									INDACO
								
	Nome finanziatore
	
										MINISTERO DELL'UNIVERSITA' E DELLA RICERCA
									
	N. Contratto
	
									20227YCB5P_001
								
	Data di pubblicazione
	
				lug-2025
			
	Data ahead of print o data di stampa
	
				30-giu-2027
			
	Rivista in ANCE
	
				NUTRIENTS
			
	DOI
	
				https://dx.doi.org/10.3390/nu17132196
			
	Tipologia
	
				Article (author)
			
	Appare nelle tipologie:
	
				01 - Articolo su periodico

File in questo prodotto:

File	Dimensione	Formato
nutrients-17-02196-v2.pdf accesso aperto Tipologia: Publisher's version/PDF Licenza: Creative commons Dimensione 8.9 MB Formato Adobe PDF Visualizza/Apri	8.9 MB	Adobe PDF	Visualizza/Apri

Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2434/1217256

Citazioni

3

3

2

2

IRIS Institutional Research Information System - AIR Archivio Istituzionale della Ricerca