Adhering to explicit length constraints, such as generating text with a precise word count, remains a significant challenge for Large Language Models (LLMs). This study aims at investigating the differences between foundation models and their instruction-tuned counterparts, on length-controlled text generation in English and Italian. We analyze both performance and internal component contributions using Cumulative Weighted Attribution, a metric derived from Direct Logit Attribution. Our findings reveal that instruction-tuning substantially improves length control, primarily by specializing components in deeper model layers. Specifically, attention heads in later layers of IT models show increasingly positive contributions, particularly in English. In Italian, while attention contributions are more attenuated, final-layer MLPs exhibit a stronger positive role, suggesting a compensatory mechanism. These results indicate that instruction-tuning reconfigures later layers for task adherence, with component-level strategies potentially adapting to linguistic context.

How Instruction-Tuning Imparts Length Control: A Cross-Lingual Mechanistic Analysis / E. Rocchetti, A. Ferrara. - (2025 Sep 02). [10.48550/arXiv.2509.02075]

How Instruction-Tuning Imparts Length Control: A Cross-Lingual Mechanistic Analysis

E. Rocchetti
Primo
;
A. Ferrara
Ultimo
2025

Abstract

Adhering to explicit length constraints, such as generating text with a precise word count, remains a significant challenge for Large Language Models (LLMs). This study aims at investigating the differences between foundation models and their instruction-tuned counterparts, on length-controlled text generation in English and Italian. We analyze both performance and internal component contributions using Cumulative Weighted Attribution, a metric derived from Direct Logit Attribution. Our findings reveal that instruction-tuning substantially improves length control, primarily by specializing components in deeper model layers. Specifically, attention heads in later layers of IT models show increasingly positive contributions, particularly in English. In Italian, while attention contributions are more attenuated, final-layer MLPs exhibit a stronger positive role, suggesting a compensatory mechanism. These results indicate that instruction-tuning reconfigures later layers for task adherence, with component-level strategies potentially adapting to linguistic context.
large language models; mechanistic interpretability; constrained text generation
Settore INFO-01/A - Informatica
2-set-2025
http://arxiv.org/abs/2509.02075v1
File in questo prodotto:
File Dimensione Formato  
arXiv 2509.02075.pdf

accesso aperto

Descrizione: How Instruction-Tuning Imparts Length Control: A Cross-Lingual Mechanistic Analysis
Tipologia: Pre-print (manoscritto inviato all'editore)
Licenza: Creative commons
Dimensione 458.21 kB
Formato Adobe PDF
458.21 kB Adobe PDF Visualizza/Apri
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2434/1189259
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
  • OpenAlex ND
social impact