Benchmarking foundation models as feature extractors for weakly supervised computational pathology

Neidlinger, P.; El Nahhas, O.S.M.; Muti, H.S.; Lenz, T.; Hoffmeister, M.; Brenner, H.; Van Treeck, M.; Langer, R.; Dislich, B.; Behrens, H.M.; Rocken, C.; Foersch, S.; Truhn, D.; Marra, A.; Saldanha, O.L.; Kather, J.N.

doi:10.1038/s41551-025-01516-3

Numerous pathology foundation models have been developed to extract clinically relevant information. There is currently limited literature independently evaluating these foundation models on external cohorts and clinically relevant tasks to uncover adjustments for future improvements. Here we benchmark 19 histopathology foundation models on 13 patient cohorts with 6,818 patients and 9,528 slides from lung, colorectal, gastric and breast cancers. The models were evaluated on weakly supervised tasks related to biomarkers, morphological properties and prognostic outcomes. We show that a vision-language foundation model, CONCH, yielded the highest overall performance when compared with vision-only foundation models, with Virchow2 as close second, although its superior performance was less pronounced in low-data scenarios and low-prevalence tasks. The experiments reveal that foundation models trained on distinct cohorts learn complementary features to predict the same label, and can be fused to outperform the current state of the art. An ensemble combining CONCH and Virchow2 predictions outperformed individual models in 55% of tasks, leveraging their complementary strengths in classification scenarios. Moreover, our findings suggest that data diversity outweighs data volume for foundation models.

Benchmarking foundation models as feature extractors for weakly supervised computational pathology / P. Neidlinger, O.S.M.E.N.. - In: NATURE BIOMEDICAL ENGINEERING. - ISSN 2157-846X. - (2025). [Epub ahead of print] [10.1038/s41551-025-01516-3]

Benchmarking foundation models as feature extractors for weakly supervised computational pathology

Neidlinger P.^Primo;El Nahhas O. S. M.;Muti H. S.;Lenz T.;Hoffmeister M.;Brenner H.;van Treeck M.;Langer R.;Dislich B.;Behrens H. M.;Rocken C.;Foersch S.;Truhn D.;A. Marra;Saldanha O. L.;

2025

Abstract

Numerous pathology foundation models have been developed to extract clinically relevant information. There is currently limited literature independently evaluating these foundation models on external cohorts and clinically relevant tasks to uncover adjustments for future improvements. Here we benchmark 19 histopathology foundation models on 13 patient cohorts with 6,818 patients and 9,528 slides from lung, colorectal, gastric and breast cancers. The models were evaluated on weakly supervised tasks related to biomarkers, morphological properties and prognostic outcomes. We show that a vision-language foundation model, CONCH, yielded the highest overall performance when compared with vision-only foundation models, with Virchow2 as close second, although its superior performance was less pronounced in low-data scenarios and low-prevalence tasks. The experiments reveal that foundation models trained on distinct cohorts learn complementary features to predict the same label, and can be fused to outperform the current state of the art. An ensemble combining CONCH and Virchow2 predictions outperformed individual models in 55% of tasks, leveraging their complementary strengths in classification scenarios. Moreover, our findings suggest that data diversity outweighs data volume for foundation models.

Scheda breve

Scheda completa

Scheda completa (DC)

	Settori scientifico-disciplinari dell'articolo (validi dal 09/05/2024)
	
				Settore MEDS-09/A - Oncologia medica
			
	Titolo del progetto
	
	Titolo Progetto
	
									Open Consortium for Decentralized Medical Artificial Intelligence
								
	Acronimo
	
									ODELIA
								
	Nome finanziatore
	
										European Commission
									
	Finanziamento
	
									Horizon Europe Framework Programme - HORIZON  Research and Innovation Actions
								
	N. Contratto
	
									101057091
								
	Titolo Progetto
	
									Understanding Gene ENvironment Interaction in ALcohol-related hepatocellular carcinoma (GENIAL)
								
	Acronimo
	
									GENIAL
								
	Nome finanziatore
	
										EUROPEAN COMMISSION
									
	N. Contratto
	
									101096312
								
	Titolo Progetto
	
									New directions for deep learning in cancer research through concept explainability and virtual experimentation.
								
	Acronimo
	
									NADIR
								
	Nome finanziatore
	
										European Commission
									
	Finanziamento
	
									Horizon Europe Framework Programme - European Research Council - HORIZON ERC Grants
								
	N. Contratto
	
									101114631
								
	Data di pubblicazione
	
				2025
			
	Data ahead of print o data di stampa
	
				1-ott-2025
			
	Rivista in ANCE
	
				NATURE BIOMEDICAL ENGINEERING
			
	DOI
	
				https://dx.doi.org/10.1038/s41551-025-01516-3
			
	Tipologia
	
				Article (author)
			
	Appare nelle tipologie:
	
				01 - Articolo su periodico

File in questo prodotto:

File	Dimensione	Formato
unpaywall-bitstream-174839603.pdf accesso aperto Tipologia: Publisher's version/PDF Licenza: Creative commons Dimensione 11.33 MB Formato Adobe PDF Visualizza/Apri	11.33 MB	Adobe PDF	Visualizza/Apri

Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2434/1249591

Citazioni

ND

29

18

40

IRIS Institutional Research Information System - AIR Archivio Istituzionale della Ricerca