Being able to understand the inner workings of Large Language Models (LLMs) is crucial for ensuring safer development practices and fostering trust in their predictions, particularly in sensitive applications. Causal Mediation Analysis (CMA) is a causality framework which fits perfectly for this scenario, providing a mechanistic interpretation of the behaviour of LLM components and assessing a specific type of knowledge in the model (e.g. presence of gender bias). This study discusses the challenges and potential pathways in applying CMA to open LLMs’ black boxes. Through three exemplary case studies from the literature, we show the unique insights CMA can provide. We elaborate on the inherent challenges and opportunities this approach presents. These challenges range from the influence of model architecture on prompt viability to the complexities of ensuring metric comparability across studies. Conversely, the opportunities lie in the dissection of LLMs’ knowledge through the extraction of the specific domains of knowledge activated during processing. Our discussion aims to provide a comprehensive insight into CMA, focusing on essential aspects to equip researchers with the knowledge necessary for crafting effective CMA experiments tailored towards interpretability objectives.

Causal Mediation Analysis for Interpreting Large Language Models / E. Rocchetti, A. Ferrara (CEUR WORKSHOP PROCEEDINGS). - In: SEBD 2024 : Symposium on Advanced Database Systems 2024 / [a cura di] M. Atzori, P. Ciaccia, M. Ceci, F. Mandreoli, D. Malerba, M. Sanguinetti, A. Pellicani, F. Motta. - [s.l] : CEUR-WS, 2024. - pp. 585-594 (( convegno SEBD 2024 Symposium on Advanced Database Systems 2024 tenutosi a Villasimius nel 2024.

Causal Mediation Analysis for Interpreting Large Language Models

E. Rocchetti
Primo
;
A. Ferrara
Secondo
2024

Abstract

Being able to understand the inner workings of Large Language Models (LLMs) is crucial for ensuring safer development practices and fostering trust in their predictions, particularly in sensitive applications. Causal Mediation Analysis (CMA) is a causality framework which fits perfectly for this scenario, providing a mechanistic interpretation of the behaviour of LLM components and assessing a specific type of knowledge in the model (e.g. presence of gender bias). This study discusses the challenges and potential pathways in applying CMA to open LLMs’ black boxes. Through three exemplary case studies from the literature, we show the unique insights CMA can provide. We elaborate on the inherent challenges and opportunities this approach presents. These challenges range from the influence of model architecture on prompt viability to the complexities of ensuring metric comparability across studies. Conversely, the opportunities lie in the dissection of LLMs’ knowledge through the extraction of the specific domains of knowledge activated during processing. Our discussion aims to provide a comprehensive insight into CMA, focusing on essential aspects to equip researchers with the knowledge necessary for crafting effective CMA experiments tailored towards interpretability objectives.
LLM; interpretability; causality; causal mediation analysis
Settore INFO-01/A - Informatica
2024
https://ceur-ws.org/Vol-3741/paper39.pdf
Book Part (author)
File in questo prodotto:
File Dimensione Formato  
CEUR Vol 3741 Paper 39.pdf

accesso aperto

Tipologia: Publisher's version/PDF
Dimensione 561.81 kB
Formato Adobe PDF
561.81 kB Adobe PDF Visualizza/Apri
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2434/1144896
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? ND
social impact