This paper introduces a general method for the exploration of equivalence classes in the input space of Transformer models. The proposed approach is based on sound mathematical theory which describes the internal layers of a Transformer architecture as sequential deformations of the input manifold. Using eigendecomposition of the pullback of the distance metric defined on the output space through the Jacobian of the model, we are able to reconstruct equivalence classes in the input space and navigate across them. We illustrate how this method can be used as a powerful tool for investigating how a Transformer sees the input space, facilitating local and task-agnostic explainability in Computer Vision and Natural Language Processing tasks.

Unveiling Transformer Perception by Exploring Input Manifolds / A. Benfenati, A. Ferrara, A. Marta, D. Riva, E. Rocchetti. - (2024 Oct 08). [10.48550/arXiv.2410.06019]

Unveiling Transformer Perception by Exploring Input Manifolds

A. Benfenati;A. Ferrara;A. Marta;D. Riva;E. Rocchetti
2024

Abstract

This paper introduces a general method for the exploration of equivalence classes in the input space of Transformer models. The proposed approach is based on sound mathematical theory which describes the internal layers of a Transformer architecture as sequential deformations of the input manifold. Using eigendecomposition of the pullback of the distance metric defined on the output space through the Jacobian of the model, we are able to reconstruct equivalence classes in the input space and navigate across them. We illustrate how this method can be used as a powerful tool for investigating how a Transformer sees the input space, facilitating local and task-agnostic explainability in Computer Vision and Natural Language Processing tasks.
transformers interpretability; input space exploration; geometric deep learning
Settore INFO-01/A - Informatica
Settore MATH-02/B - Geometria
Settore MATH-05/A - Analisi numerica
8-ott-2024
https://arxiv.org/abs/2410.06019
File in questo prodotto:
File Dimensione Formato  
2410.06019v1.pdf

accesso aperto

Descrizione: v1
Tipologia: Pre-print (manoscritto inviato all'editore)
Dimensione 832.76 kB
Formato Adobe PDF
832.76 kB Adobe PDF Visualizza/Apri
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2434/1122101
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact