IRIS Institutional Research Information System - AIR Archivio Istituzionale della Ricerca

In this work, we analyze both theoretically and empirically the effect of tied input-output embeddings—a popular technique that reduces the model size while often improving training. Interestingly, we found that this technique is connected to Harris (1954)'s distributional hypothesis—often portrayed by the famous Firth (1957)'s quote “a word is characterized by the company it keeps”. Specifically, our findings indicate that words (or, more broadly, symbols) with similar semantics tend to be encoded in similar input embeddings, while words that appear in similar contexts are encoded in similar output embeddings (thus explaining the semantic space arising in input and output embedding of foundational language models). As a consequence of these findings, the tying of the input and output embeddings is encouraged only when the distributional hypothesis holds for the underlying data. These results also provide insight into the embeddings of foundation language models (which are known to be semantically organized). Further, we complement the theoretical findings with several experiments supporting the claims.

By Tying Embeddings You Are Assuming the Distributional Hypothesis / F. Bertolotti, W. Cazzola (PROCEEDINGS OF MACHINE LEARNING RESEARCH). - In: ICML'24 / [a cura di] R. Salakhutdinov, Z. Kolter, K. Heller, A. Weller, N. Oliver, J. Scarlett, F. Berkenkamp. - [s.l] : PMLR, 2024 Jul. - pp. 3584-3610 (( 41. International Conference on Machine Learning : July. 21 - 27 Wien (Österreich) 2024 [10.5555/3692070.3692213].

By Tying Embeddings You Are Assuming the Distributional Hypothesis

F. Bertolotti^Primo;W. Cazzola^Ultimo

2024

Abstract

In this work, we analyze both theoretically and empirically the effect of tied input-output embeddings—a popular technique that reduces the model size while often improving training. Interestingly, we found that this technique is connected to Harris (1954)'s distributional hypothesis—often portrayed by the famous Firth (1957)'s quote “a word is characterized by the company it keeps”. Specifically, our findings indicate that words (or, more broadly, symbols) with similar semantics tend to be encoded in similar input embeddings, while words that appear in similar contexts are encoded in similar output embeddings (thus explaining the semantic space arising in input and output embedding of foundational language models). As a consequence of these findings, the tying of the input and output embeddings is encouraged only when the distributional hypothesis holds for the underlying data. These results also provide insight into the embeddings of foundation language models (which are known to be semantically organized). Further, we complement the theoretical findings with several experiments supporting the claims.

Scheda breve

Scheda completa

Scheda completa (DC)

	Settori scientifico-disciplinari del contributo (validi dal 09/05/2024)
	
				Settore INFO-01/A - Informatica
			
	Titolo del progetto
	
	Titolo Progetto
	
									Typeful Language Adaptation for Dynamic, Interacting and Evolving Systems
								
	Acronimo
	
									T-LADIES
								
	Nome finanziatore
	
										MINISTERO DELL'ISTRUZIONE E DEL MERITO
									
	N. Contratto
	
									2020TL3X8X_001
								
	Data di pubblicazione
	
				lug-2024
			
	DOI
	
				https://dx.doi.org/10.5555/3692070.3692213
			
	Tipologia
	
				Book Part (author)
			
	Appare nelle tipologie:
	
				03 - Contributo in volume

File in questo prodotto:

File	Dimensione	Formato
icaml24-published.pdf accesso aperto Tipologia: Publisher's version/PDF Licenza: Creative commons Dimensione 1.54 MB Formato Adobe PDF Visualizza/Apri	1.54 MB	Adobe PDF	Visualizza/Apri

Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2434/1231755

Citazioni

ND

0

0

ND

social impact