Active Learning Methodology in LLMs Fine-Tuning

Ceravolo, P.; Mohammadi, F.; Tamborini, M.A.

doi:10.1109/csr61664.2024.10679450

Active learning (AL) presents a valuable approach for fine-tuning large language models (LLMs) by optimizing the selection of training data to enhance model performance. This study introduces a methodology integrating human expertise and synthetic data generation to create robust datasets. Our focus is on addressing gender bias in Italian job advertisements, aiming to improve LLM accuracy in identifying discriminatory language. The method-ology involves a multi-step process: constructing a representative seed dataset, expanding it with synthetically generated data, and iteratively refining the model through fine-tuning loops. Preliminary results demonstrate the potential of AL in reducing the annotation workload while maintaining high performance in bias detection tasks. Future work will extend this approach to other discrimination categories and linguistic variations.

Active Learning Methodology in LLMs Fine-Tuning / P. Ceravolo, F. Mohammadi, M.A. Tamborini (PROCEEDINGS OF THE ... IEEE INTERNATIONAL CONFERENCE ON CYBER SECURITY AND RESILIENCE (CSR)). - In: Proceedings of the 2024 IEEE International Conference on Cyber Security and Resilience (CSR) / [a cura di] S. Shiaeles, N. Kolokotronis, E. Bellini. - [s.l] : Institute of Electrical and Electronics Engineers Inc., 2024. - ISBN 979-8-3503-7536-7. - pp. 743-749 (( convegno IEEE International Conference on Cyber Security and Resilience, CSR tenutosi a London nel 2024 [10.1109/csr61664.2024.10679450].

Active Learning Methodology in LLMs Fine-Tuning

P. Ceravolo^Primo;F. Mohammadi^Secondo;M.A. Tamborini^Ultimo

2024

Abstract

Active learning (AL) presents a valuable approach for fine-tuning large language models (LLMs) by optimizing the selection of training data to enhance model performance. This study introduces a methodology integrating human expertise and synthetic data generation to create robust datasets. Our focus is on addressing gender bias in Italian job advertisements, aiming to improve LLM accuracy in identifying discriminatory language. The method-ology involves a multi-step process: constructing a representative seed dataset, expanding it with synthetically generated data, and iteratively refining the model through fine-tuning loops. Preliminary results demonstrate the potential of AL in reducing the annotation workload while maintaining high performance in bias detection tasks. Future work will extend this approach to other discrimination categories and linguistic variations.

Scheda breve

Scheda completa

Scheda completa (DC)

	Settori scientifico-disciplinari del contributo (validi dal 09/05/2024)
	
				Settore INFO-01/A - Informatica
			
	Titolo del progetto
	
	Titolo Progetto
	
									MUSA - Multilayered Urban Sustainability Actiona
								
	Acronimo
	
									MUSA
								
	Nome finanziatore
	
										MINISTERO DELL'UNIVERSITA' E DELLA RICERCA
									
	Data di pubblicazione
	
				2024
			
	Enti collegati al convegno
	
				IEEE Systems, Man, and Cybernetics Society (SMC)
			
	DOI
	
				https://dx.doi.org/10.1109/csr61664.2024.10679450
			
	Tipologia
	
				Book Part (author)
			
	Appare nelle tipologie:
	
				03 - Contributo in volume

File in questo prodotto:

File	Dimensione	Formato
Active_Learning_Methodology_in_LLMs_Fine-Tuning.pdf accesso riservato Descrizione: Conference Paper Tipologia: Publisher's version/PDF Dimensione 826.59 kB Formato Adobe PDF Visualizza/Apri Richiedi una copia	826.59 kB	Adobe PDF	Visualizza/Apri Richiedi una copia

Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2434/1119050

Citazioni

ND

3

1

ND

IRIS Institutional Research Information System - AIR Archivio Istituzionale della Ricerca