Can a machine truly grasp the myriad emotions an image evokes? We dive into this fascinating challenge, exploring how contextual information can dramatically improve the recognition of emotions in images. Our journey begins with EmoSet, a rich dataset already annotated for the feelings images stir within us. We then expand its horizons, carefully crafting both concise and expansive textual descriptions for each visual. These added narratives act as crucial guides, unearthing semantic and emotional nuances that may remain hidden in the image alone. When we fused these new text embeddings with visual features to train a baseline model, the results were compelling: a nearly 5% boost in accuracy on the manually annotated subset of our dataset. This significant improvement, achieved even with relatively straightforward contextual additions, underscores a vital insight. It highlights how even simple forms of contextual enrichment can meaningfully contribute to emotion classification and, more broadly, emphasizes the profound importance of multimodal inputs for truly understanding the affective content of images.

Seeing Beyond: Unlocking Image Emotion with Contextual Depths / F. Cozzi, A. D'Eusanio, G. Boccignone (LECTURE NOTES IN COMPUTER SCIENCE). - In: Image Analysis and Processing – ICIAP 2025 Workshops / [a cura di] E. Rodolà, F. Galasso, I. Masi. - [s.l] : Springer Science and Business Media Deutschland GmbH, 2026. - ISBN 9783032113160. - pp. 29-40 (( 23. Workshops and competitions hosted by the 23rd International Conference on Image Analysis and Processing, ICIAP 2025 : 15-19 settembre Roma 2025 [10.1007/978-3-032-11317-7_3].

Seeing Beyond: Unlocking Image Emotion with Contextual Depths

G. Boccignone
Ultimo
2026

Abstract

Can a machine truly grasp the myriad emotions an image evokes? We dive into this fascinating challenge, exploring how contextual information can dramatically improve the recognition of emotions in images. Our journey begins with EmoSet, a rich dataset already annotated for the feelings images stir within us. We then expand its horizons, carefully crafting both concise and expansive textual descriptions for each visual. These added narratives act as crucial guides, unearthing semantic and emotional nuances that may remain hidden in the image alone. When we fused these new text embeddings with visual features to train a baseline model, the results were compelling: a nearly 5% boost in accuracy on the manually annotated subset of our dataset. This significant improvement, achieved even with relatively straightforward contextual additions, underscores a vital insight. It highlights how even simple forms of contextual enrichment can meaningfully contribute to emotion classification and, more broadly, emphasizes the profound importance of multimodal inputs for truly understanding the affective content of images.
Affective Computing; Evoked Emotion Recognition; Multimodal Learning
Settore IINF-05/A - Sistemi di elaborazione delle informazioni
2026
International Association for Pattern Recognition
Book Part (author)
File in questo prodotto:
File Dimensione Formato  
2025_ICIAP-SeeingBeyond-Emotiva.pdf

accesso riservato

Tipologia: Publisher's version/PDF
Licenza: Nessuna licenza
Dimensione 2.96 MB
Formato Adobe PDF
2.96 MB Adobe PDF   Visualizza/Apri   Richiedi una copia
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2434/1213656
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? ND
  • OpenAlex 0
social impact