To achieve near-zero training error in a classification problem, the layers of a feed-forward network have to disentangle the manifolds of data points with different labels to facilitate the discrimination. However, excessive class separation can lead to overfitting because good generalization requires learning invariant features, which involve some level of entanglement. We report on numerical experiments showing how the optimization dynamics finds representations that balance these opposing tendencies with a non-monotonic trend. After a fast segregation phase, a slower rearrangement (conserved across datasets and architectures) increases the class entanglement. The training error at the inversion is stable under subsampling and across network initializations and optimizers, which characterizes it as a property solely of the data structure and (very weakly) of the architecture. The inversion is the manifestation of tradeoffs elicited by well-defined and maximally stable elements of the training set called 'stragglers', which are particularly influential for generalization.Feed-forward neural networks have become powerful tools in machine learning, but their behaviour during optimization is still not well understood. Ciceri and colleagues find that during optimization, class representations first separate and then rejoin, prompted by specific elements of the training set.

Inversion dynamics of class manifolds in deep learning reveals tradeoffs underlying generalization / S. Ciceri, L. Cassani, M. Osella, P. Rotondo, F. Valle, M. Gherardi. - In: NATURE MACHINE INTELLIGENCE. - ISSN 2522-5839. - 6:1(2024), pp. 40-47. [10.1038/s42256-023-00772-9]

Inversion dynamics of class manifolds in deep learning reveals tradeoffs underlying generalization

P. Rotondo;M. Gherardi
Ultimo
2024

Abstract

To achieve near-zero training error in a classification problem, the layers of a feed-forward network have to disentangle the manifolds of data points with different labels to facilitate the discrimination. However, excessive class separation can lead to overfitting because good generalization requires learning invariant features, which involve some level of entanglement. We report on numerical experiments showing how the optimization dynamics finds representations that balance these opposing tendencies with a non-monotonic trend. After a fast segregation phase, a slower rearrangement (conserved across datasets and architectures) increases the class entanglement. The training error at the inversion is stable under subsampling and across network initializations and optimizers, which characterizes it as a property solely of the data structure and (very weakly) of the architecture. The inversion is the manifestation of tradeoffs elicited by well-defined and maximally stable elements of the training set called 'stragglers', which are particularly influential for generalization.Feed-forward neural networks have become powerful tools in machine learning, but their behaviour during optimization is still not well understood. Ciceri and colleagues find that during optimization, class representations first separate and then rejoin, prompted by specific elements of the training set.
Settore FIS/02 - Fisica Teorica, Modelli e Metodi Matematici
Settore INF/01 - Informatica
   FELLowship for Innovation at INFN
   FELLINI
   European Commission
   Horizon 2020 Framework Programme
   754496
2024
Article (author)
File in questo prodotto:
File Dimensione Formato  
inversion_dynamics_of_class_manifolds.pdf

accesso aperto

Tipologia: Pre-print (manoscritto inviato all'editore)
Dimensione 4.4 MB
Formato Adobe PDF
4.4 MB Adobe PDF Visualizza/Apri
s42256-023-00772-9.pdf

accesso riservato

Descrizione: Article
Tipologia: Publisher's version/PDF
Dimensione 3.18 MB
Formato Adobe PDF
3.18 MB Adobe PDF   Visualizza/Apri   Richiedi una copia
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2434/1033158
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 1
  • ???jsp.display-item.citation.isi??? 0
social impact