IRIS Institutional Research Information System - AIR Archivio Istituzionale della Ricerca

The process by which Large Language Models (LLMs) acquire complex capabilities during training remains a key open question in mechanistic interpretability. This project investigates whether these learning dynamics can be characterized through the lens of Complex Network Theory (CNT). I introduce a novel methodology to represent a Transformer-based LLM as a directed, weighted graph where nodes are the model's computational components (attention heads and MLPs) and edges represent causal influence, measured via an intervention-based ablation technique. By tracking the evolution of this component-graph across 143 training checkpoints of the Pythia-14M model on a canonical induction task, I analyze a suite of graph-theoretic metrics. The results reveal that the network's structure evolves through distinct phases of exploration, consolidation, and refinement. Specifically, I identify the emergence of a stable hierarchy of information spreader components and a dynamic set of information gatherer components, whose roles reconfigure at key learning junctures. This work demonstrates that a component-level network perspective offers a powerful macroscopic lens for visualizing and understanding the self-organizing principles that drive the formation of functional circuits in LLMs.

Modeling Transformers as complex networks to analyze learning dynamics / E. Rocchetti. - (2025 Sep 18). [10.48550/arXiv.2509.15269]

Modeling Transformers as complex networks to analyze learning dynamics

E. Rocchetti^Primo

2025

Abstract

The process by which Large Language Models (LLMs) acquire complex capabilities during training remains a key open question in mechanistic interpretability. This project investigates whether these learning dynamics can be characterized through the lens of Complex Network Theory (CNT). I introduce a novel methodology to represent a Transformer-based LLM as a directed, weighted graph where nodes are the model's computational components (attention heads and MLPs) and edges represent causal influence, measured via an intervention-based ablation technique. By tracking the evolution of this component-graph across 143 training checkpoints of the Pythia-14M model on a canonical induction task, I analyze a suite of graph-theoretic metrics. The results reveal that the network's structure evolves through distinct phases of exploration, consolidation, and refinement. Specifically, I identify the emergence of a stable hierarchy of information spreader components and a dynamic set of information gatherer components, whose roles reconfigure at key learning junctures. This work demonstrates that a component-level network perspective offers a powerful macroscopic lens for visualizing and understanding the self-organizing principles that drive the formation of functional circuits in LLMs.

Scheda breve

Scheda completa

Scheda completa (DC)

	Parole chiave
	
				large language models; mechanistic interpretability; complex networks
			
	Settori scientifico-disciplinari del pre-print (validi dal 09/05/2024)
	
				Settore INFO-01/A - Informatica
			
	Data di depostio del pre-print
	
				18-set-2025
			
	DOI
	
				https://dx.doi.org/10.48550/arXiv.2509.15269
			
	URL del pre-print
	
				http://arxiv.org/abs/2509.15269v1
			
	Appare nelle tipologie:
	
				24 - Pre-print

File in questo prodotto:

File	Dimensione	Formato
arXiv 2509.15269.pdf accesso aperto Descrizione: Modeling Transformers as complex networks to analyze learning dynamics Tipologia: Pre-print (manoscritto inviato all'editore) Licenza: Creative commons Dimensione 3.45 MB Formato Adobe PDF Visualizza/Apri	3.45 MB	Adobe PDF	Visualizza/Apri

Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2434/1189260

Citazioni

ND

ND

ND

ND

social impact