A statistical mechanics framework for Bayesian deep neural networks beyond the infinite-width limit

Pacelli, R.; Ariosto, S.; Pastore, M.; Ginelli, F.; Gherardi, M.; Rotondo, P.

doi:10.1038/s42256-023-00767-6

Despite the practical success of deep neural networks, a comprehensive theoretical framework that can predict practically relevant scores, such as the test accuracy, from knowledge of the training data is currently lacking. Huge simplifications arise in the infinite-width limit, in which the number of units Nℓ in each hidden layer (ℓ = 1, …, L, where L is the depth of the network) far exceeds the number P of training examples. This idealization, however, blatantly departs from the reality of deep learning practice. Here we use the toolset of statistical mechanics to overcome these limitations and derive an approximate partition function for fully connected deep neural architectures, which encodes information on the trained models. The computation holds in the thermodynamic limit, where both Nℓ and P are large and their ratio αℓ = P/Nℓ is finite. This advance allows us to obtain: (1) a closed formula for the generalization error associated with a regression task in a one-hidden layer network with finite α 1; (2) an approximate expression of the partition function for deep architectures (via an effective action that depends on a finite number of order parameters); and (3) a link between deep neural networks in the proportional asymptotic limit and Student’s t-processes.

A statistical mechanics framework for Bayesian deep neural networks beyond the infinite-width limit / R. Pacelli, S. Ariosto, M. Pastore, F. Ginelli, M. Gherardi, P. Rotondo. - In: NATURE MACHINE INTELLIGENCE. - ISSN 2522-5839. - 5:12(2023 Dec 18), pp. 1497-1507. [10.1038/s42256-023-00767-6]

A statistical mechanics framework for Bayesian deep neural networks beyond the infinite-width limit

R. Pacelli;S. Ariosto;M. Pastore;F. Ginelli;M. Gherardi^Penultimo;P. Rotondo^Ultimo

2023

Abstract

Despite the practical success of deep neural networks, a comprehensive theoretical framework that can predict practically relevant scores, such as the test accuracy, from knowledge of the training data is currently lacking. Huge simplifications arise in the infinite-width limit, in which the number of units Nℓ in each hidden layer (ℓ = 1, …, L, where L is the depth of the network) far exceeds the number P of training examples. This idealization, however, blatantly departs from the reality of deep learning practice. Here we use the toolset of statistical mechanics to overcome these limitations and derive an approximate partition function for fully connected deep neural architectures, which encodes information on the trained models. The computation holds in the thermodynamic limit, where both Nℓ and P are large and their ratio αℓ = P/Nℓ is finite. This advance allows us to obtain: (1) a closed formula for the generalization error associated with a regression task in a one-hidden layer network with finite α 1; (2) an approximate expression of the partition function for deep architectures (via an effective action that depends on a finite number of order parameters); and (3) a link between deep neural networks in the proportional asymptotic limit and Student’s t-processes.

Scheda breve

Scheda completa

Scheda completa (DC)

	Presenza di coautori internazionali
	
				Sì
			
	Lingua dell'articolo
	
				English
			
	Settori scientifico-disciplinari dell'articolo (sola visualizzazione)
	
				Settore FIS/02 - Fisica Teorica, Modelli e Metodi Matematici
			
	Tipo
	
				Articolo
			
	Revisione (peer review)
	
				Esperti anonimi
			
	Classificazione della pubblicazione
	
				Pubblicazione scientifica
			
	Titolo del progetto
	
	Titolo Progetto
	
									FELLowship for Innovation at INFN
								
	Acronimo
	
									FELLINI
								
	Nome finanziatore
	
										European Commission
									
	Finanziamento
	
									Horizon 2020 Framework Programme
								
	N. Contratto
	
									754496
								
	Data di pubblicazione
	
				18-dic-2023
			
	Rivista in ANCE
	
				NATURE MACHINE INTELLIGENCE
			
	Editore
	
				Nature Publishing Group
			
	Volume o annata
	
				5
			
	Fascicolo
	
				12
			
	Pagina iniziale
	
				1497
			
	Pagina finale
	
				1507
			
	Numero di pagine
	
				11
			
	Stato di pubblicazione
	
				Pubblicato
			
	Rilevanza del periodico
	
				Periodico con rilevanza internazionale
			
	DOI
	
				https://dx.doi.org/10.1038/s42256-023-00767-6
			
	Banca dati sorgente
	
				orcid
			
	Identificativo ISI
	
				WOS:001188617300016
			
	Identificativo SCOPUS
	
				2-s2.0-85180132196
			
	Identificativo OpenAlex
	
				W4389879609
			
	Adesione alla policy Open Access di Ateneo
	
				Aderisco
			
	Tipologia
	
				info:eu-repo/semantics/article
			
	Citazione
	
				A statistical mechanics framework for Bayesian deep neural networks beyond the infinite-width limit / R. Pacelli, S. Ariosto, M. Pastore, F. Ginelli, M. Gherardi, P. Rotondo. - In: NATURE MACHINE INTELLIGENCE. - ISSN 2522-5839. - 5:12(2023 Dec 18), pp. 1497-1507. [10.1038/s42256-023-00767-6]
			
	Fulltext
	
				partially_open
			
	Tipologia
	
				Prodotti della ricerca::01 - Articolo su periodico
			
	Numero autori
	
				6
			
	Tipologia sito docente
	
				262
			
	Tipologia
	
				Article (author)
			
	Presenza impact factor
	
				Periodico con Impact Factor
			
	Tutti gli autori
	
						R. Pacelli, S. Ariosto, M. Pastore, F. Ginelli, M. Gherardi, P. Rotondo
					
	Appare nelle tipologie:
	
				01 - Articolo su periodico

File in questo prodotto:

File	Dimensione	Formato
s42256-023-00767-6.pdf accesso riservato Tipologia: Publisher's version/PDF Dimensione 1.97 MB Formato Adobe PDF Visualizza/Apri Richiedi una copia	1.97 MB	Adobe PDF	Visualizza/Apri Richiedi una copia
2209.04882.pdf accesso aperto Tipologia: Pre-print (manoscritto inviato all'editore) Dimensione 1.64 MB Formato Adobe PDF Visualizza/Apri	1.64 MB	Adobe PDF	Visualizza/Apri

Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2434/1031546

Citazioni

ND

42

37

33

IRIS Institutional Research Information System - AIR Archivio Istituzionale della Ricerca