Being able to understand the inner workings of Large Language Models (LLMs) is crucial for ensuring safer development practices and fostering trust in their predictions, particularly in sensitive applications. Causal Mediation Analysis (CMA) is a causality framework which fits perfectly for this scenario, providing a mechanistic interpretation of the behaviour of LLM components and assessing a specific type of knowledge in the model (e.g. presence of gender bias). This study discusses the challenges and potential pathways in applying CMA to open LLMs’ black boxes. Through three exemplary case studies from the literature, we show the unique insights CMA can provide. We elaborate on the inherent challenges and opportunities this approach presents. These challenges range from the influence of model architecture on prompt viability to the complexities of ensuring metric comparability across studies. Conversely, the opportunities lie in the dissection of LLMs’ knowledge through the extraction of the specific domains of knowledge activated during processing. Our discussion aims to provide a comprehensive insight into CMA, focusing on essential aspects to equip researchers with the knowledge necessary for crafting effective CMA experiments tailored towards interpretability objectives.

Causal Mediation Analysis for Interpreting Large Language Models / E. Rocchetti, A. Ferrara (CEUR WORKSHOP PROCEEDINGS). - In: SEBD 2024 : Symposium on Advanced Database Systems 2024 / [a cura di] M. Atzori, P. Ciaccia, M. Ceci, F. Mandreoli, D. Malerba, M. Sanguinetti, A. Pellicani, F. Motta. - [s.l] : CEUR-WS, 2024. - pp. 585-594 (( convegno SEBD 2024 Symposium on Advanced Database Systems 2024 tenutosi a Villasimius nel 2024.

Causal Mediation Analysis for Interpreting Large Language Models

E. Rocchetti
Primo
;
A. Ferrara
Secondo
2024

Abstract

Being able to understand the inner workings of Large Language Models (LLMs) is crucial for ensuring safer development practices and fostering trust in their predictions, particularly in sensitive applications. Causal Mediation Analysis (CMA) is a causality framework which fits perfectly for this scenario, providing a mechanistic interpretation of the behaviour of LLM components and assessing a specific type of knowledge in the model (e.g. presence of gender bias). This study discusses the challenges and potential pathways in applying CMA to open LLMs’ black boxes. Through three exemplary case studies from the literature, we show the unique insights CMA can provide. We elaborate on the inherent challenges and opportunities this approach presents. These challenges range from the influence of model architecture on prompt viability to the complexities of ensuring metric comparability across studies. Conversely, the opportunities lie in the dissection of LLMs’ knowledge through the extraction of the specific domains of knowledge activated during processing. Our discussion aims to provide a comprehensive insight into CMA, focusing on essential aspects to equip researchers with the knowledge necessary for crafting effective CMA experiments tailored towards interpretability objectives.
No
English
LLM; interpretability; causality; causal mediation analysis
Settore INFO-01/A - Informatica
Intervento a convegno
Esperti anonimi
Pubblicazione scientifica
SEBD 2024 : Symposium on Advanced Database Systems 2024
M. Atzori, P. Ciaccia, M. Ceci, F. Mandreoli, D. Malerba, M. Sanguinetti, A. Pellicani, F. Motta
CEUR-WS
2024
585
594
10
3741
Volume a diffusione internazionale
SEBD 2024 Symposium on Advanced Database Systems 2024
Villasimius
2024
Convegno internazionale
https://ceur-ws.org/Vol-3741/paper39.pdf
scopus
Aderisco
E. Rocchetti, A. Ferrara
Book Part (author)
open
273
Causal Mediation Analysis for Interpreting Large Language Models / E. Rocchetti, A. Ferrara (CEUR WORKSHOP PROCEEDINGS). - In: SEBD 2024 : Symposium on Advanced Database Systems 2024 / [a cura di] M. Atzori, P. Ciaccia, M. Ceci, F. Mandreoli, D. Malerba, M. Sanguinetti, A. Pellicani, F. Motta. - [s.l] : CEUR-WS, 2024. - pp. 585-594 (( convegno SEBD 2024 Symposium on Advanced Database Systems 2024 tenutosi a Villasimius nel 2024.
info:eu-repo/semantics/bookPart
2
Prodotti della ricerca::03 - Contributo in volume
File in questo prodotto:
File Dimensione Formato  
CEUR Vol 3741 Paper 39.pdf

accesso aperto

Tipologia: Publisher's version/PDF
Dimensione 561.81 kB
Formato Adobe PDF
561.81 kB Adobe PDF Visualizza/Apri
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2434/1144896
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? ND
  • OpenAlex ND
social impact