This paper presents the Agent for Anti-Discriminatory Language (AAL), a legally-informed large language model designed to detect discriminatory, stereotypical, and intersectional language in Italian social media. By embedding anti-discrimination legal principles from European and Italian law into the model's behavior—via instruction tuning and expert-guided active learning—we investigate the feasibility of normatively aligned classification in high-stakes digital discourse. Using an interdisciplinary approach, we integrate structured legal knowledge, expert-annotated examples, and staged model feedback. Evaluation results demonstrate high precision in identifying overt hate speech and stereotypes, as well as the ability to generate legal justifications. However, challenges remain in identifying implicit and intersectional bias, particularly when lexical cues are weak or the social context is complex. We discuss the implications for trustworthy AI, calibrating model confidence, and integrating legal reasoning into multilingual generative systems.

Agent for Anti-Discriminatory Language (AAL) A Human-Centered AI Approach to Discriminatory Speech in Italian / M.A. Tamborini, F. Mohammadi, P. Ceravolo, S. Maghool, M.E. D'Amico, C. Siccardi - In: IEEE-CH International Conference on Cyber Humanities[s.l] : Institute of Electrical and Electronics Engineers (IEEE), 2026 Jan 20. - ISBN 979-8-3315-1435-8. - pp. 1-6 (( International Conference on Cyber Humanities (IEEE-CH) Florence 2025 [10.1109/ieee-ch65308.2025.11279699].

Agent for Anti-Discriminatory Language (AAL) A Human-Centered AI Approach to Discriminatory Speech in Italian

M.A. Tamborini
Primo
;
F. Mohammadi
Secondo
;
P. Ceravolo;S. Maghool;M.E. D'Amico
Penultimo
;
C. Siccardi
Ultimo
2026

Abstract

This paper presents the Agent for Anti-Discriminatory Language (AAL), a legally-informed large language model designed to detect discriminatory, stereotypical, and intersectional language in Italian social media. By embedding anti-discrimination legal principles from European and Italian law into the model's behavior—via instruction tuning and expert-guided active learning—we investigate the feasibility of normatively aligned classification in high-stakes digital discourse. Using an interdisciplinary approach, we integrate structured legal knowledge, expert-annotated examples, and staged model feedback. Evaluation results demonstrate high precision in identifying overt hate speech and stereotypes, as well as the ability to generate legal justifications. However, challenges remain in identifying implicit and intersectional bias, particularly when lexical cues are weak or the social context is complex. We discuss the implications for trustworthy AI, calibrating model confidence, and integrating legal reasoning into multilingual generative systems.
discrimination detection; large language models; intersectionality; legal AI; active learning; hate speech; Italian social media discourse; explainable AI; human-centered AI; fairness;
Settore INFO-01/A - Informatica
20-gen-2026
Institute of Electrical and Electronics Engineers (IEEE)
https://ieeexplore.ieee.org/abstract/document/11279699
Book Part (author)
File in questo prodotto:
File Dimensione Formato  
Agent_for_Anti-Discriminatory_Language_AAL_A_Human-Centered_AI_Approach_to_Discriminatory_Speech_in_Italian.pdf

accesso riservato

Tipologia: Publisher's version/PDF
Licenza: Nessuna licenza
Dimensione 1.52 MB
Formato Adobe PDF
1.52 MB Adobe PDF   Visualizza/Apri   Richiedi una copia
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2434/1222135
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
  • OpenAlex ND
social impact