Can a machine truly grasp the myriad emotions an image evokes? We dive into this fascinating challenge, exploring how contextual information can dramatically improve the recognition of emotions in images. Our journey begins with EmoSet, a rich dataset already annotated for the feelings images stir within us. We then expand its horizons, carefully crafting both concise and expansive textual descriptions for each visual. These added narratives act as crucial guides, unearthing semantic and emotional nuances that may remain hidden in the image alone. When we fused these new text embeddings with visual features to train a baseline model, the results were compelling: a nearly 5% boost in accuracy on the manually annotated subset of our dataset. This significant improvement, achieved even with relatively straightforward contextual additions, underscores a vital insight. It highlights how even simple forms of contextual enrichment can meaningfully contribute to emotion classification and, more broadly, emphasizes the profound importance of multimodal inputs for truly understanding the affective content of images.
Seeing Beyond: Unlocking Image Emotion with Contextual Depths / F. Cozzi, A. D'Eusanio, G. Boccignone (LECTURE NOTES IN COMPUTER SCIENCE). - In: Image Analysis and Processing – ICIAP 2025 Workshops / [a cura di] E. Rodolà, F. Galasso, I. Masi. - [s.l] : Springer Science and Business Media Deutschland GmbH, 2026. - ISBN 9783032113160. - pp. 29-40 (( 23. Workshops and competitions hosted by the 23rd International Conference on Image Analysis and Processing, ICIAP 2025 : 15-19 settembre Roma 2025 [10.1007/978-3-032-11317-7_3].
Seeing Beyond: Unlocking Image Emotion with Contextual Depths
G. BoccignoneUltimo
2026
Abstract
Can a machine truly grasp the myriad emotions an image evokes? We dive into this fascinating challenge, exploring how contextual information can dramatically improve the recognition of emotions in images. Our journey begins with EmoSet, a rich dataset already annotated for the feelings images stir within us. We then expand its horizons, carefully crafting both concise and expansive textual descriptions for each visual. These added narratives act as crucial guides, unearthing semantic and emotional nuances that may remain hidden in the image alone. When we fused these new text embeddings with visual features to train a baseline model, the results were compelling: a nearly 5% boost in accuracy on the manually annotated subset of our dataset. This significant improvement, achieved even with relatively straightforward contextual additions, underscores a vital insight. It highlights how even simple forms of contextual enrichment can meaningfully contribute to emotion classification and, more broadly, emphasizes the profound importance of multimodal inputs for truly understanding the affective content of images.| File | Dimensione | Formato | |
|---|---|---|---|
|
2025_ICIAP-SeeingBeyond-Emotiva.pdf
accesso riservato
Tipologia:
Publisher's version/PDF
Licenza:
Nessuna licenza
Dimensione
2.96 MB
Formato
Adobe PDF
|
2.96 MB | Adobe PDF | Visualizza/Apri Richiedi una copia |
Pubblicazioni consigliate
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.




