Using a comprehensive list of job titles, we propose a framework to automatically generate job descriptions in Italian. This synthetic data is then used in a Large Language Model to detect inclusive language in job postings. Finally, we compare the results of this synthetic dataset with real data. Our study demonstrates that the data format and prompting method signif-icantly impact performance. Additionally, we identify limitations and key considerations for unifying synthetic data with real data for fine-tuning purposes. We also propose improvements to the framework and provide guidelines for effectively integrating these two types of data. The novelty of our work is generating and integrating synthetic data due to the scarcity of annotated Italian job descriptions, thereby improving the training of Large Language Models (LLMs) tailored specifically for Italian.

Synthetic Data for Identifying Inclusive Language (Case Study: Job Descriptions in Italian) / T. Romano, F. Mohammadi, P. Ceravolo (PROCEEDINGS OF THE ... IEEE INTERNATIONAL CONFERENCE ON CYBER SECURITY AND RESILIENCE (CSR)). - In: Proceedings of the 2024 IEEE International Conference on Cyber Security and Resilience (CSR) / [a cura di] S. Shiaeles, N. Kolokotronis, E. Bellini. - [s.l] : IEEE, 2024 Sep. - ISBN 979-8-3503-7536-7. - pp. 737-742 (( convegno IEEE International Conference on Cyber Security and Resilience, CSR tenutosi a London nel 2024 [10.1109/csr61664.2024.10679398].

Synthetic Data for Identifying Inclusive Language (Case Study: Job Descriptions in Italian)

F. Mohammadi
Secondo
;
P. Ceravolo
Ultimo
2024

Abstract

Using a comprehensive list of job titles, we propose a framework to automatically generate job descriptions in Italian. This synthetic data is then used in a Large Language Model to detect inclusive language in job postings. Finally, we compare the results of this synthetic dataset with real data. Our study demonstrates that the data format and prompting method signif-icantly impact performance. Additionally, we identify limitations and key considerations for unifying synthetic data with real data for fine-tuning purposes. We also propose improvements to the framework and provide guidelines for effectively integrating these two types of data. The novelty of our work is generating and integrating synthetic data due to the scarcity of annotated Italian job descriptions, thereby improving the training of Large Language Models (LLMs) tailored specifically for Italian.
No
English
Settore INFO-01/A - Informatica
Intervento a convegno
Sì, ma tipo non specificato
Pubblicazione scientifica
   MUSA - Multilayered Urban Sustainability Actiona
   MUSA
   MINISTERO DELL'UNIVERSITA' E DELLA RICERCA
Proceedings of the 2024 IEEE International Conference on Cyber Security and Resilience (CSR)
S. Shiaeles, N. Kolokotronis, E. Bellini
IEEE
set-2024
737
742
6
979-8-3503-7536-7
Volume a diffusione internazionale
No
IEEE International Conference on Cyber Security and Resilience, CSR
London
2024
IEEE Systems, Man, and Cybernetics Society (SMC)
Convegno internazionale
crossref
Aderisco
T. Romano, F. Mohammadi, P. Ceravolo
Book Part (author)
reserved
273
Synthetic Data for Identifying Inclusive Language (Case Study: Job Descriptions in Italian) / T. Romano, F. Mohammadi, P. Ceravolo (PROCEEDINGS OF THE ... IEEE INTERNATIONAL CONFERENCE ON CYBER SECURITY AND RESILIENCE (CSR)). - In: Proceedings of the 2024 IEEE International Conference on Cyber Security and Resilience (CSR) / [a cura di] S. Shiaeles, N. Kolokotronis, E. Bellini. - [s.l] : IEEE, 2024 Sep. - ISBN 979-8-3503-7536-7. - pp. 737-742 (( convegno IEEE International Conference on Cyber Security and Resilience, CSR tenutosi a London nel 2024 [10.1109/csr61664.2024.10679398].
info:eu-repo/semantics/bookPart
3
Prodotti della ricerca::03 - Contributo in volume
File in questo prodotto:
File Dimensione Formato  
Synthetic_Data_for_Identifying_Inclusive_Language_Case_Study_Job_Descriptions_in_Italian.pdf

accesso riservato

Descrizione: Conference Paper
Tipologia: Publisher's version/PDF
Dimensione 871.5 kB
Formato Adobe PDF
871.5 kB Adobe PDF   Visualizza/Apri   Richiedi una copia
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2434/1119051
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? 0
  • OpenAlex ND
social impact