Deep linear networks have been extensively studied, as they provide simplied models of deep learning. However, little is known in the case of nite-width architectures with multiple outputs and convolutional layers. In this manuscript, we provide rigorous results for the statistics of functions implemented by the aforementioned class of networks, thus moving closer to a complete characterization of feature learning in the Bayesian setting. Our results include: (i) an exact and elementary non-asymptotic integral representation for the joint prior distribution over the outputs, given in terms of a mixture of Gaussians; (ii) an analytical formula for the posterior distribution in the case of squared error loss function (Gaussian likelihood); (iii) a quantitative description of the feature learning innite-width regime, using large deviation theory. From a physical perspective, deep architectures with multiple outputs or convolutional layers represent dierent manifestations of kernel shape

Feature Learning in Finite-Width Bayesian Deep Linear Networks with Multiple Outputs and Convolutional Layers / F. Bassetti, M. Gherardi, A. Ingrosso, M. Pastore, P. Rotondo. - In: JOURNAL OF MACHINE LEARNING RESEARCH. - ISSN 1533-7928. - 26:(2025), pp. 88.1-88.35.

Feature Learning in Finite-Width Bayesian Deep Linear Networks with Multiple Outputs and Convolutional Layers

M. Gherardi
Secondo
;
M. Pastore
Penultimo
;
P. Rotondo
Ultimo
2025

Abstract

Deep linear networks have been extensively studied, as they provide simpli ed models of deep learning. However, little is known in the case of nite-width architectures with multiple outputs and convolutional layers. In this manuscript, we provide rigorous results for the statistics of functions implemented by the aforementioned class of networks, thus moving closer to a complete characterization of feature learning in the Bayesian setting. Our results include: (i) an exact and elementary non-asymptotic integral representation for the joint prior distribution over the outputs, given in terms of a mixture of Gaussians; (ii) an analytical formula for the posterior distribution in the case of squared error loss function (Gaussian likelihood); (iii) a quantitative description of the feature learning in nite-width regime, using large deviation theory. From a physical perspective, deep architectures with multiple outputs or convolutional layers represent di erent manifestations of kernel shape
Settore PHYS-02/A - Fisica teorica delle interazioni fondamentali, modelli, metodi matematici e applicazioni
2025
Article (author)
File in questo prodotto:
File Dimensione Formato  
24-1158.pdf

accesso aperto

Tipologia: Publisher's version/PDF
Licenza: Creative commons
Dimensione 447.38 kB
Formato Adobe PDF
447.38 kB Adobe PDF Visualizza/Apri
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2434/1226588
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 2
  • ???jsp.display-item.citation.isi??? 2
  • OpenAlex 0
social impact