Explainable methods for understanding deep neural networks are currently being employed for many visual tasks and provide valuable insights about their decisions. While post-hoc visual explanations offer easily understandable human cues behind neural networks’ decision-making processes, comparing their outcomes still remains challenging. Furthermore, balancing the performance-explainability trade-off could be a time-consuming process and require a deep domain knowledge. In this regard, we propose a novel auxiliary module, built upon convolutional-based encoders, which acts on the final layers of convolutional neural networks (CNNs) to learn orthogonal feature maps with a more discriminative and explainable power. This module is trained via a disentangle loss which specifically aims to decouple the object from the background in the input image. To quantitatively assess its impact on standard CNNs, and compare the quality of the resulting visual explanations, we employ metrics specifically designed for semantic segmentation tasks. These metrics rely on bounding-box annotations that may accompany image classification (or recognition) datasets, allowing us to compare both ground-truth and predicted regions. Finally, we explore the impact of various self-supervised pre-training strategies, due to their positive influence on vision tasks, and assess their effectiveness on our considered metrics.
Features Disentanglement For Explainable Convolutional Neural Networks / P. Coscia, A. Genovese, F. Scotti, V. Piuri (PROCEEDINGS - INTERNATIONAL CONFERENCE ON IMAGE PROCESSING). - In: 2024 IEEE International Conference on Image Processing (ICIP)[s.l] : IEEE, 2024 Sep 27. - ISBN 979-8-3503-4939-9. - pp. 514-520 (( convegno ICIP tenutosi a Abu Dhabi nel 2024 [10.1109/icip51287.2024.10647568].
Features Disentanglement For Explainable Convolutional Neural Networks
P. Coscia;A. Genovese;F. Scotti;V. Piuri
2024
Abstract
Explainable methods for understanding deep neural networks are currently being employed for many visual tasks and provide valuable insights about their decisions. While post-hoc visual explanations offer easily understandable human cues behind neural networks’ decision-making processes, comparing their outcomes still remains challenging. Furthermore, balancing the performance-explainability trade-off could be a time-consuming process and require a deep domain knowledge. In this regard, we propose a novel auxiliary module, built upon convolutional-based encoders, which acts on the final layers of convolutional neural networks (CNNs) to learn orthogonal feature maps with a more discriminative and explainable power. This module is trained via a disentangle loss which specifically aims to decouple the object from the background in the input image. To quantitatively assess its impact on standard CNNs, and compare the quality of the resulting visual explanations, we employ metrics specifically designed for semantic segmentation tasks. These metrics rely on bounding-box annotations that may accompany image classification (or recognition) datasets, allowing us to compare both ground-truth and predicted regions. Finally, we explore the impact of various self-supervised pre-training strategies, due to their positive influence on vision tasks, and assess their effectiveness on our considered metrics.File | Dimensione | Formato | |
---|---|---|---|
icip24.pdf
accesso aperto
Tipologia:
Post-print, accepted manuscript ecc. (versione accettata dall'editore)
Dimensione
10.18 MB
Formato
Adobe PDF
|
10.18 MB | Adobe PDF | Visualizza/Apri |
Features_Disentanglement_For_Explainable_Convolutional_Neural_Networks.pdf
accesso riservato
Tipologia:
Publisher's version/PDF
Dimensione
8.3 MB
Formato
Adobe PDF
|
8.3 MB | Adobe PDF | Visualizza/Apri Richiedi una copia |
Pubblicazioni consigliate
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.