The rapid expansion of generative AI and large language models (LLM) has transformed how individuals and institutions communicate online. While these systems offer powerful capabilities for understanding natural language and generating natural language text, their ability to reliably detect linguistic patterns depends critically on the quality, completeness, and representativeness of their training data. As LLM become increasingly central to content moderation and language classification tasks, there is a growing need for robust and reproducible pipelines grounded in well-constructed datasets. However, many existing approaches lack systematic data-centric frameworks, leading to models that may fail to generalise or perpetuate biases present in incomplete or unbalanced training data. This dissertation investigates how data-centric methodologies can be used to construct robust pipelines for language classification, with a particular focus on harmful language detection in Italian. Rather than prioritising algorithmic modifications alone, the research develops an end-to-end pipeline encompassing data collection, dataset curation, synthetic data generation, LLM-based annotation through introducing a "General Data Framework (GDF)", followed by data selection and quality assessment, supervised fine-tuning, and multi-dimensional model evaluation. A central contribution of this thesis is addressing the challenge of limited human-validated annotations for language detection tasks. Through systematic experimentation across two distinct tasks, we propose two complementary solutions, each tailored to a different data constraint scenario. The first solution targets settings where annotation quality is low: rather than relying on costly human labelling, we leverage a teacher–student framework in which an LLM generates weakly supervised labels that are then used to fine-tune a smaller, open-source model. We demonstrate that carefully curated LLM-generated labels can effectively transfer task-specific knowledge, preserving detection performance as the teacher model while substantially reducing the computational and deployment costs associated with harmful language detection systems using a smaller open-source model. The second solution addresses settings where annotation quality is high, but quantity is insufficient: starting from a small seed dataset of human-validated examples, we augment the training data with synthetically generated instances and fine-tune on the combined corpus. Applied to the task of detecting non-inclusive language in job descriptions, this approach yields meaningful performance gains despite the scarcity of the original labelled data. Taken together, these two solutions define the core methodological contribution of the thesis: a principled, data-centric framework that begins from a suboptimal starting point — whether low-quality labels or low-quantity annotations — adopts a targeted strategy to overcome the associated limitations, and achieves reliable, reproducible results in both cases. Methodologically, the dissertation draws on insights from NLP, ML, and ethical AI. It introduces a reproducible evaluation framework that integrates standard performance metrics with fairness, bias, and trustworthiness assessments, augmented by text-complexity features that enable semantic-level analysis beyond conventional aggregate metrics. Building on this foundation, the work improves the performance of language models for downstream tasks through LLM-annotated fine-tuning and knowledge distillation into smaller, robust classification systems. The proposed pipeline is validated through real-world case studies, including harmful content detection on social media and inclusive language evaluation in job descriptions, demonstrating that well-curated data and rigorous evaluation are essential for responsible and effective LLM-based language classification. The outputs and contributions of this thesis can benefit a range of stakeholders. From a research perspective, it provides a reproducible framework for implementing systems that detect harmful language efficiently and at lower cost through the teacher–student approach. Researchers can also use the GDF introduced in this thesis as a guide for data collection or enrichment, while benefiting from deeper evaluation methods that incorporate text complexity features. From a technical perspective, the results can support content moderators, HR tech platforms, and policy makers: the fine-tuned models can be used as internal tools for monitoring content published on media platforms or as moderation tools within hiring systems. For policymakers, the findings offer insights into the types of language that require more attention, helping guide educational interventions and initiatives to raise awareness about stereotypes or non-inclusive language in society. Regarding adaptability, the GDF is language-agnostic and can be applied to any task requiring structured data collection or annotation enrichment. The overall methodology workflow makes no language-specific assumptions and transfers directly to other domains or languages. The teacher–student approach embedded within the pipeline of Task 1 can likewise be extended to other domains where large-model annotation is more cost-effective than human labelling, such as medical misinformation detection, legal document classification, or financial sentiment analysis. Finally, the synthetic data generation workflow used for Task 2— based on masking and controlled replacement — can be adopted in scenarios where masking and replacement strategies are appropriate, for example, in anonymizing personal data, generating augmented examples for sentiment analysis, or expanding low-resource datasets in specialized domains such as legal or medical texts.
USING ARTIFICIAL INTELLIGENCE FOR DETECTING DISCRIMINATORY LANGUAGE / F. Mohammadi ; tutor: P. Ceravolo ; co-tutor: V. Bellandi ; coordinatore: R. Sassi. Dipartimento di Informatica Giovanni Degli Antoni, 2026 Apr 29. 38. ciclo, Anno Accademico 2025/2026.
USING ARTIFICIAL INTELLIGENCE FOR DETECTING DISCRIMINATORY LANGUAGE
F. Mohammadi
2026
Abstract
The rapid expansion of generative AI and large language models (LLM) has transformed how individuals and institutions communicate online. While these systems offer powerful capabilities for understanding natural language and generating natural language text, their ability to reliably detect linguistic patterns depends critically on the quality, completeness, and representativeness of their training data. As LLM become increasingly central to content moderation and language classification tasks, there is a growing need for robust and reproducible pipelines grounded in well-constructed datasets. However, many existing approaches lack systematic data-centric frameworks, leading to models that may fail to generalise or perpetuate biases present in incomplete or unbalanced training data. This dissertation investigates how data-centric methodologies can be used to construct robust pipelines for language classification, with a particular focus on harmful language detection in Italian. Rather than prioritising algorithmic modifications alone, the research develops an end-to-end pipeline encompassing data collection, dataset curation, synthetic data generation, LLM-based annotation through introducing a "General Data Framework (GDF)", followed by data selection and quality assessment, supervised fine-tuning, and multi-dimensional model evaluation. A central contribution of this thesis is addressing the challenge of limited human-validated annotations for language detection tasks. Through systematic experimentation across two distinct tasks, we propose two complementary solutions, each tailored to a different data constraint scenario. The first solution targets settings where annotation quality is low: rather than relying on costly human labelling, we leverage a teacher–student framework in which an LLM generates weakly supervised labels that are then used to fine-tune a smaller, open-source model. We demonstrate that carefully curated LLM-generated labels can effectively transfer task-specific knowledge, preserving detection performance as the teacher model while substantially reducing the computational and deployment costs associated with harmful language detection systems using a smaller open-source model. The second solution addresses settings where annotation quality is high, but quantity is insufficient: starting from a small seed dataset of human-validated examples, we augment the training data with synthetically generated instances and fine-tune on the combined corpus. Applied to the task of detecting non-inclusive language in job descriptions, this approach yields meaningful performance gains despite the scarcity of the original labelled data. Taken together, these two solutions define the core methodological contribution of the thesis: a principled, data-centric framework that begins from a suboptimal starting point — whether low-quality labels or low-quantity annotations — adopts a targeted strategy to overcome the associated limitations, and achieves reliable, reproducible results in both cases. Methodologically, the dissertation draws on insights from NLP, ML, and ethical AI. It introduces a reproducible evaluation framework that integrates standard performance metrics with fairness, bias, and trustworthiness assessments, augmented by text-complexity features that enable semantic-level analysis beyond conventional aggregate metrics. Building on this foundation, the work improves the performance of language models for downstream tasks through LLM-annotated fine-tuning and knowledge distillation into smaller, robust classification systems. The proposed pipeline is validated through real-world case studies, including harmful content detection on social media and inclusive language evaluation in job descriptions, demonstrating that well-curated data and rigorous evaluation are essential for responsible and effective LLM-based language classification. The outputs and contributions of this thesis can benefit a range of stakeholders. From a research perspective, it provides a reproducible framework for implementing systems that detect harmful language efficiently and at lower cost through the teacher–student approach. Researchers can also use the GDF introduced in this thesis as a guide for data collection or enrichment, while benefiting from deeper evaluation methods that incorporate text complexity features. From a technical perspective, the results can support content moderators, HR tech platforms, and policy makers: the fine-tuned models can be used as internal tools for monitoring content published on media platforms or as moderation tools within hiring systems. For policymakers, the findings offer insights into the types of language that require more attention, helping guide educational interventions and initiatives to raise awareness about stereotypes or non-inclusive language in society. Regarding adaptability, the GDF is language-agnostic and can be applied to any task requiring structured data collection or annotation enrichment. The overall methodology workflow makes no language-specific assumptions and transfers directly to other domains or languages. The teacher–student approach embedded within the pipeline of Task 1 can likewise be extended to other domains where large-model annotation is more cost-effective than human labelling, such as medical misinformation detection, legal document classification, or financial sentiment analysis. Finally, the synthetic data generation workflow used for Task 2— based on masking and controlled replacement — can be adopted in scenarios where masking and replacement strategies are appropriate, for example, in anonymizing personal data, generating augmented examples for sentiment analysis, or expanding low-resource datasets in specialized domains such as legal or medical texts.| File | Dimensione | Formato | |
|---|---|---|---|
|
phd_unimi_R14170.pdf
embargo fino al 09/12/2026
Descrizione: Full Thesis
Tipologia:
Publisher's version/PDF
Licenza:
Creative commons
Dimensione
6.93 MB
Formato
Adobe PDF
|
6.93 MB | Adobe PDF | Visualizza/Apri Richiedi una copia |
Pubblicazioni consigliate
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.




