Matrix operations are nowadays central in many Machine Learning techniques, including in particular Deep Neural Networks (DNNs), whose core of any inference is represented by a sequence of dot product operations. An increasingly emerging problem is how to efficiently engineer their storage and operations. In this article we propose two new lossless compression schemes for real-valued matrices, supporting efficient vector-matrix multiplications in the compressed format, and specifically suitable for DNNs compression. Exploiting several recent studies that use weight pruning and quantization techniques to reduce the complexity of DNN inference, our schemes are expressly designed to benefit from both, that is from input matrices characterized by low entropy. In particular, our solutions are able to take advantage from the depth of the model, and the deeper the model, the higher the efficiency. Moreover, we derived space upper bounds for both variants in terms of the source entropy. Experiments show that our tools favourably compare in terms of energy and space efficiency against state-of-the-art matrix compression approaches, including Compressed Linear Algebra (CLA) and Compressed Shared Elements Row (CSER), the latter explicitly proposed in the context of DNN compression.

Efficient and Compact Representations of Deep Neural Networks via Entropy Coding / G. Cataldo Marinò, F. Furia, D. Malchiodi, M. Frasca. - In: IEEE ACCESS. - ISSN 2169-3536. - 11:(2023 Oct 03), pp. 106103-106125. [10.1109/ACCESS.2023.3317293]

Efficient and Compact Representations of Deep Neural Networks via Entropy Coding

F. Furia
Secondo
;
D. Malchiodi
Penultimo
;
M. Frasca
Ultimo
2023

Abstract

Matrix operations are nowadays central in many Machine Learning techniques, including in particular Deep Neural Networks (DNNs), whose core of any inference is represented by a sequence of dot product operations. An increasingly emerging problem is how to efficiently engineer their storage and operations. In this article we propose two new lossless compression schemes for real-valued matrices, supporting efficient vector-matrix multiplications in the compressed format, and specifically suitable for DNNs compression. Exploiting several recent studies that use weight pruning and quantization techniques to reduce the complexity of DNN inference, our schemes are expressly designed to benefit from both, that is from input matrices characterized by low entropy. In particular, our solutions are able to take advantage from the depth of the model, and the deeper the model, the higher the efficiency. Moreover, we derived space upper bounds for both variants in terms of the source entropy. Experiments show that our tools favourably compare in terms of energy and space efficiency against state-of-the-art matrix compression approaches, including Compressed Linear Algebra (CLA) and Compressed Shared Elements Row (CSER), the latter explicitly proposed in the context of DNN compression.
No
English
Neural network compression; space-conscious data structures; weight pruning; weight quantization; source coding; sparse matrices;
Settore INF/01 - Informatica
Articolo
Esperti anonimi
Ricerca applicata
Pubblicazione scientifica
   Multi-criteria optimized data structures: from compressed indexes to learned indexes, and beyond
   MINISTERO DELL'ISTRUZIONE E DEL MERITO
   2017WR7SHH_004
3-ott-2023
Institute of Electrical and Electronics Engineers (IEEE)
11
106103
106125
23
Pubblicato
Periodico con rilevanza internazionale
https://ieeexplore.ieee.org/document/10255645
orcid
crossref
Aderisco
info:eu-repo/semantics/article
Efficient and Compact Representations of Deep Neural Networks via Entropy Coding / G. Cataldo Marinò, F. Furia, D. Malchiodi, M. Frasca. - In: IEEE ACCESS. - ISSN 2169-3536. - 11:(2023 Oct 03), pp. 106103-106125. [10.1109/ACCESS.2023.3317293]
open
Prodotti della ricerca::01 - Articolo su periodico
4
262
Article (author)
Periodico con Impact Factor
G. Cataldo Marinò, F. Furia, D. Malchiodi, M. Frasca
File in questo prodotto:
File Dimensione Formato  
Efficient_and_Compact_Representations_of_Deep_Neural_Networks_via_Entropy_Coding.pdf

accesso aperto

Tipologia: Publisher's version/PDF
Dimensione 2.43 MB
Formato Adobe PDF
2.43 MB Adobe PDF Visualizza/Apri
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2434/1012789
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 2
  • ???jsp.display-item.citation.isi??? 0
  • OpenAlex ND
social impact