Active learning (AL) presents a valuable approach for fine-tuning large language models (LLMs) by optimizing the selection of training data to enhance model performance. This study introduces a methodology integrating human expertise and synthetic data generation to create robust datasets. Our focus is on addressing gender bias in Italian job advertisements, aiming to improve LLM accuracy in identifying discriminatory language. The method-ology involves a multi-step process: constructing a representative seed dataset, expanding it with synthetically generated data, and iteratively refining the model through fine-tuning loops. Preliminary results demonstrate the potential of AL in reducing the annotation workload while maintaining high performance in bias detection tasks. Future work will extend this approach to other discrimination categories and linguistic variations.

Active Learning Methodology in LLMs Fine-Tuning / P. Ceravolo, F. Mohammadi, M.A. Tamborini (PROCEEDINGS OF THE ... IEEE INTERNATIONAL CONFERENCE ON CYBER SECURITY AND RESILIENCE (CSR)). - In: Proceedings of the 2024 IEEE International Conference on Cyber Security and Resilience (CSR) / [a cura di] S. Shiaeles, N. Kolokotronis, E. Bellini. - [s.l] : Institute of Electrical and Electronics Engineers Inc., 2024. - ISBN 979-8-3503-7536-7. - pp. 743-749 (( convegno IEEE International Conference on Cyber Security and Resilience, CSR tenutosi a London nel 2024 [10.1109/csr61664.2024.10679450].

Active Learning Methodology in LLMs Fine-Tuning

P. Ceravolo
Primo
;
F. Mohammadi
Secondo
;
M.A. Tamborini
Ultimo
2024

Abstract

Active learning (AL) presents a valuable approach for fine-tuning large language models (LLMs) by optimizing the selection of training data to enhance model performance. This study introduces a methodology integrating human expertise and synthetic data generation to create robust datasets. Our focus is on addressing gender bias in Italian job advertisements, aiming to improve LLM accuracy in identifying discriminatory language. The method-ology involves a multi-step process: constructing a representative seed dataset, expanding it with synthetically generated data, and iteratively refining the model through fine-tuning loops. Preliminary results demonstrate the potential of AL in reducing the annotation workload while maintaining high performance in bias detection tasks. Future work will extend this approach to other discrimination categories and linguistic variations.
No
English
Settore INFO-01/A - Informatica
Intervento a convegno
Sì, ma tipo non specificato
Pubblicazione scientifica
Goal 5: Gender equality
   MUSA - Multilayered Urban Sustainability Actiona
   MUSA
   MINISTERO DELL'UNIVERSITA' E DELLA RICERCA
Proceedings of the 2024 IEEE International Conference on Cyber Security and Resilience (CSR)
S. Shiaeles, N. Kolokotronis, E. Bellini
Institute of Electrical and Electronics Engineers Inc.
2024
743
749
7
979-8-3503-7536-7
Volume a diffusione internazionale
No
IEEE International Conference on Cyber Security and Resilience, CSR
London
2024
IEEE Systems, Man, and Cybernetics Society (SMC)
Convegno internazionale
crossref
Aderisco
P. Ceravolo, F. Mohammadi, M.A. Tamborini
Book Part (author)
reserved
273
Active Learning Methodology in LLMs Fine-Tuning / P. Ceravolo, F. Mohammadi, M.A. Tamborini (PROCEEDINGS OF THE ... IEEE INTERNATIONAL CONFERENCE ON CYBER SECURITY AND RESILIENCE (CSR)). - In: Proceedings of the 2024 IEEE International Conference on Cyber Security and Resilience (CSR) / [a cura di] S. Shiaeles, N. Kolokotronis, E. Bellini. - [s.l] : Institute of Electrical and Electronics Engineers Inc., 2024. - ISBN 979-8-3503-7536-7. - pp. 743-749 (( convegno IEEE International Conference on Cyber Security and Resilience, CSR tenutosi a London nel 2024 [10.1109/csr61664.2024.10679450].
info:eu-repo/semantics/bookPart
3
Prodotti della ricerca::03 - Contributo in volume
File in questo prodotto:
File Dimensione Formato  
Active_Learning_Methodology_in_LLMs_Fine-Tuning.pdf

accesso riservato

Descrizione: Conference Paper
Tipologia: Publisher's version/PDF
Dimensione 826.59 kB
Formato Adobe PDF
826.59 kB Adobe PDF   Visualizza/Apri   Richiedi una copia
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2434/1119050
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 3
  • ???jsp.display-item.citation.isi??? 1
  • OpenAlex ND
social impact