Efficient and Compact Representations of Deep Neural Networks via Entropy Coding

Giosuè Cataldo Marinò,; Furia, F.; Malchiodi, D.; Frasca, M.

doi:10.1109/ACCESS.2023.3317293

Matrix operations are nowadays central in many Machine Learning techniques, including in particular Deep Neural Networks (DNNs), whose core of any inference is represented by a sequence of dot product operations. An increasingly emerging problem is how to efficiently engineer their storage and operations. In this article we propose two new lossless compression schemes for real-valued matrices, supporting efficient vector-matrix multiplications in the compressed format, and specifically suitable for DNNs compression. Exploiting several recent studies that use weight pruning and quantization techniques to reduce the complexity of DNN inference, our schemes are expressly designed to benefit from both, that is from input matrices characterized by low entropy. In particular, our solutions are able to take advantage from the depth of the model, and the deeper the model, the higher the efficiency. Moreover, we derived space upper bounds for both variants in terms of the source entropy. Experiments show that our tools favourably compare in terms of energy and space efficiency against state-of-the-art matrix compression approaches, including Compressed Linear Algebra (CLA) and Compressed Shared Elements Row (CSER), the latter explicitly proposed in the context of DNN compression.

Efficient and Compact Representations of Deep Neural Networks via Entropy Coding / G. Cataldo Marinò, F. Furia, D. Malchiodi, M. Frasca. - In: IEEE ACCESS. - ISSN 2169-3536. - 11:(2023 Oct 03), pp. 106103-106125. [10.1109/ACCESS.2023.3317293]

Efficient and Compact Representations of Deep Neural Networks via Entropy Coding

Giosuè Cataldo Marinò^Primo;F. Furia^Secondo;D. Malchiodi^Penultimo;M. Frasca^Ultimo

2023

Abstract

Matrix operations are nowadays central in many Machine Learning techniques, including in particular Deep Neural Networks (DNNs), whose core of any inference is represented by a sequence of dot product operations. An increasingly emerging problem is how to efficiently engineer their storage and operations. In this article we propose two new lossless compression schemes for real-valued matrices, supporting efficient vector-matrix multiplications in the compressed format, and specifically suitable for DNNs compression. Exploiting several recent studies that use weight pruning and quantization techniques to reduce the complexity of DNN inference, our schemes are expressly designed to benefit from both, that is from input matrices characterized by low entropy. In particular, our solutions are able to take advantage from the depth of the model, and the deeper the model, the higher the efficiency. Moreover, we derived space upper bounds for both variants in terms of the source entropy. Experiments show that our tools favourably compare in terms of energy and space efficiency against state-of-the-art matrix compression approaches, including Compressed Linear Algebra (CLA) and Compressed Shared Elements Row (CSER), the latter explicitly proposed in the context of DNN compression.

Scheda breve

Scheda completa

Scheda completa (DC)

	Presenza di coautori internazionali
	
				No
			
	Lingua dell'articolo
	
				English
			
	Parole chiave
	
				Neural network compression; space-conscious data structures; weight pruning; weight quantization; source coding; sparse matrices;
			
	Settori scientifico-disciplinari dell'articolo (sola visualizzazione)
	
				Settore INF/01 - Informatica
			
	Tipo
	
				Articolo
			
	Revisione (peer review)
	
				Esperti anonimi
			
	Classificazione in base al tipo di ricerca
	
				Ricerca applicata
			
	Classificazione della pubblicazione
	
				Pubblicazione scientifica
			
	Titolo del progetto
	
	Titolo Progetto
	
									Multi-criteria optimized data structures: from compressed indexes to learned indexes, and beyond
								
	Nome finanziatore
	
										MINISTERO DELL'ISTRUZIONE E DEL MERITO
									
	N. Contratto
	
									2017WR7SHH_004
								
	Data di pubblicazione
	
				3-ott-2023
			
	Rivista in ANCE
	
				IEEE ACCESS
			
	Editore
	
				Institute of Electrical and Electronics Engineers (IEEE)
			
	Volume o annata
	
				11
			
	Pagina iniziale
	
				106103
			
	Pagina finale
	
				106125
			
	Numero di pagine
	
				23
			
	Stato di pubblicazione
	
				Pubblicato
			
	Rilevanza del periodico
	
				Periodico con rilevanza internazionale
			
	DOI
	
				https://dx.doi.org/10.1109/ACCESS.2023.3317293
			
	URL
	
				https://ieeexplore.ieee.org/document/10255645
			
	Banca dati sorgente
	
				orcid
crossref
			
	Identificativo ISI
	
				WOS:001081381400001
			
	Identificativo SCOPUS
	
				2-s2.0-85173056201
			
	Adesione alla policy Open Access di Ateneo
	
				Aderisco
			
	Tipologia
	
				info:eu-repo/semantics/article
			
	Citazione
	
				Efficient and Compact Representations of Deep Neural Networks via Entropy Coding / G. Cataldo Marinò, F. Furia, D. Malchiodi, M. Frasca. - In: IEEE ACCESS. - ISSN 2169-3536. - 11:(2023 Oct 03), pp. 106103-106125. [10.1109/ACCESS.2023.3317293]
			
	Fulltext
	
				open
			
	Tipologia
	
				Prodotti della ricerca::01 - Articolo su periodico
			
	Numero autori
	
				4
			
	Tipologia sito docente
	
				262
			
	Tipologia
	
				Article (author)
			
	Presenza impact factor
	
				Periodico con Impact Factor
			
	Tutti gli autori
	
						G. Cataldo Marinò, F. Furia, D. Malchiodi, M. Frasca
					
	Appare nelle tipologie:
	
				01 - Articolo su periodico

File in questo prodotto:

File	Dimensione	Formato
Efficient_and_Compact_Representations_of_Deep_Neural_Networks_via_Entropy_Coding.pdf accesso aperto Tipologia: Publisher's version/PDF Dimensione 2.43 MB Formato Adobe PDF Visualizza/Apri	2.43 MB	Adobe PDF	Visualizza/Apri

Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2434/1012789

Citazioni

ND

2

0

ND

IRIS Institutional Research Information System - AIR Archivio Istituzionale della Ricerca