Investigating Vision-Language Models Biometric Capabilities Via Sequence-Based Predictions

Donida Labati, R.; Ferrara, A.; Picascia, S.; Piuri, V.; Rocchetti, E.; Scotti, F.

doi:10.2139/ssrn.5381090

Vision-Language Models (VLMs) have emerged as powerful tools capable of jointly processing visual and textual information, creating opportunities to replace specialized models in domains such as biometrics. However, as this application remains largely underexplored, prevailing evaluation methods rely on closed-answer Multiple-Choice Questions (MCQ) and parse generated text to extract predictions. To provide an evaluation which better aligns with real-world biometric needs, we introduce an evaluation protocol that bypasses text generation entirely, producing sequence-based predictions directly from output log-probabilities. This approach enables probability-based scoring, which is essential for computing standard biometric metrics. We apply this method to assess the zero-shot capabilities of the Gemma 3 family on face verification, age/gender estimation, and attribute classification, benchmarking them against specialized systems. Furthermore, we demonstrate that traditional MCQ-based evaluations consistently underestimate VLM performance, with our log-probability scoring approach that better captures the identity-specific capabilities of VLMs. Our results show that Gemma 3 models achieve strong performance on classification tasks but struggle with regression, highlighting that a robust methodology is critical to accurately assess the true capabilities and limitations of VLMs in biometrics.

Investigating Vision-Language Models Biometric Capabilities Via Sequence-Based Predictions / R. Donida Labati, A. Ferrara, S. Picascia, V. Piuri, E. Rocchetti, F. Scotti. - (2025 Aug 06). [10.2139/ssrn.5381090]

Investigating Vision-Language Models Biometric Capabilities Via Sequence-Based Predictions

R. Donida Labati^Primo;A. Ferrara^Secondo;S. Picascia;V. Piuri;E. Rocchetti^Penultimo;F. Scotti^Ultimo

2025

Abstract

Vision-Language Models (VLMs) have emerged as powerful tools capable of jointly processing visual and textual information, creating opportunities to replace specialized models in domains such as biometrics. However, as this application remains largely underexplored, prevailing evaluation methods rely on closed-answer Multiple-Choice Questions (MCQ) and parse generated text to extract predictions. To provide an evaluation which better aligns with real-world biometric needs, we introduce an evaluation protocol that bypasses text generation entirely, producing sequence-based predictions directly from output log-probabilities. This approach enables probability-based scoring, which is essential for computing standard biometric metrics. We apply this method to assess the zero-shot capabilities of the Gemma 3 family on face verification, age/gender estimation, and attribute classification, benchmarking them against specialized systems. Furthermore, we demonstrate that traditional MCQ-based evaluations consistently underestimate VLM performance, with our log-probability scoring approach that better captures the identity-specific capabilities of VLMs. Our results show that Gemma 3 models achieve strong performance on classification tasks but struggle with regression, highlighting that a robust methodology is critical to accurately assess the true capabilities and limitations of VLMs in biometrics.

Scheda breve

Scheda completa

Scheda completa (DC)

	Parole chiave
	
				vision-language models; biometrics; zero-shot evaluation
			
	Settori scientifico-disciplinari del pre-print (validi dal 09/05/2024)
	
				Settore INFO-01/A - Informatica
			
	Data di depostio del pre-print
	
				6-ago-2025
			
	DOI
	
				https://dx.doi.org/10.2139/ssrn.5381090
			
	URL del pre-print
	
				https://ssrn.com/abstract=5381090
			
	Appare nelle tipologie:
	
				24 - Pre-print

File in questo prodotto:

File	Dimensione	Formato
ssrn-5381090.pdf accesso aperto Tipologia: Pre-print (manoscritto inviato all'editore) Licenza: Publisher Dimensione 306.85 kB Formato Adobe PDF Visualizza/Apri	306.85 kB	Adobe PDF	Visualizza/Apri

Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2434/1189257

Citazioni

ND

ND

ND

ND

IRIS Institutional Research Information System - AIR Archivio Istituzionale della Ricerca