The Cow of Rembrandt Analyzing Artistic Prompt Interpretation in Text-to-Image Models

Ferrara, A.; Picascia, S.; Rocchetti, E.

doi:10.1109/MLSP62443.2025.11204333

Text-to-image diffusion models have demonstrated remarkable capabilities in generating artistic content by learning from billions of images, including popular artworks. However, the fundamental question of how these models internally represent concepts, such as content and style in paintings, remains unexplored. Traditional computer vision assumes content and style are orthogonal, but diffusion models receive no explicit guidance about this distinction during training. In this work, we investigate how transformer-based text-to-image diffusion models encode content and style concepts when generating artworks. We leverage cross-attention heatmaps to attribute pixels in generated images to specific prompt tokens, enabling us to isolate image regions influenced by content-describing versus style-describing tokens. Our findings reveal that diffusion models demonstrate varying degrees of content-style separation depending on the specific artistic prompt and style requested. In many cases, content tokens primarily influence object-related regions while style tokens affect background and texture areas, suggesting an emergent understanding of the content-style distinction. These insights contribute to our understanding of how large-scale generative models internally represent complex artistic concepts without explicit supervision. We share the code and dataset, together with an exploratory tool for visualizing attention maps at https://github.com/umilISLab/artistic-prompt-interpretation.

The Cow of Rembrandt Analyzing Artistic Prompt Interpretation in Text-to-Image Models / A. Ferrara, S. Picascia, E. Rocchetti (IEEE INTERNATIONAL WORKSHOP ON MACHINE LEARNING FOR SIGNAL PROCESSING). - In: 2025 IEEE 35th International Workshop on Machine Learning for Signal Processing (MLSP)[s.l] : IEEE, 2025 Oct 24. - ISBN 979-8-3315-7029-3. - pp. 1-6 (( Intervento presentato al 35. convegno IEEE International Workshop on Machine Learning for Signal Processing tenutosi a Istanbul nel 2025 [10.1109/MLSP62443.2025.11204333].

The Cow of Rembrandt Analyzing Artistic Prompt Interpretation in Text-to-Image Models

A. Ferrara^Primo;S. Picascia^Secondo;E. Rocchetti^Ultimo

2025

Abstract

Text-to-image diffusion models have demonstrated remarkable capabilities in generating artistic content by learning from billions of images, including popular artworks. However, the fundamental question of how these models internally represent concepts, such as content and style in paintings, remains unexplored. Traditional computer vision assumes content and style are orthogonal, but diffusion models receive no explicit guidance about this distinction during training. In this work, we investigate how transformer-based text-to-image diffusion models encode content and style concepts when generating artworks. We leverage cross-attention heatmaps to attribute pixels in generated images to specific prompt tokens, enabling us to isolate image regions influenced by content-describing versus style-describing tokens. Our findings reveal that diffusion models demonstrate varying degrees of content-style separation depending on the specific artistic prompt and style requested. In many cases, content tokens primarily influence object-related regions while style tokens affect background and texture areas, suggesting an emergent understanding of the content-style distinction. These insights contribute to our understanding of how large-scale generative models internally represent complex artistic concepts without explicit supervision. We share the code and dataset, together with an exploratory tool for visualizing attention maps at https://github.com/umilISLab/artistic-prompt-interpretation.

Scheda breve

Scheda completa

Scheda completa (DC)

	Parole chiave
	
				text-to-image generation; diffusion models; cross-attention analysis; content-style disentanglement
			
	Settori scientifico-disciplinari del contributo (validi dal 09/05/2024)
	
				Settore INFO-01/A - Informatica
			
	Data di pubblicazione
	
				24-ott-2025
			
	DOI
	
				https://dx.doi.org/10.1109/MLSP62443.2025.11204333
			
	Tipologia
	
				Book Part (author)
			
	Appare nelle tipologie:
	
				03 - Contributo in volume

File in questo prodotto:

File	Dimensione	Formato
2507.23313v1_compressed.pdf accesso aperto Tipologia: Pre-print (manoscritto inviato all'editore) Licenza: Creative commons Dimensione 497.08 kB Formato Adobe PDF Visualizza/Apri	497.08 kB	Adobe PDF	Visualizza/Apri
The_Cow_of_Rembrandt_Analyzing_Artistic_Prompt_Interpretation_in_Text-to-Image_Models.pdf accesso riservato Tipologia: Post-print, accepted manuscript ecc. (versione accettata dall'editore) Licenza: Nessuna licenza Dimensione 2.87 MB Formato Adobe PDF Visualizza/Apri Richiedi una copia	2.87 MB	Adobe PDF	Visualizza/Apri Richiedi una copia

Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2434/1189255

Citazioni

ND

0

ND

0

IRIS Institutional Research Information System - AIR Archivio Istituzionale della Ricerca