The path towards making visual art more accessible is rapidly evolving. In the museum context, audio description (AD) is a crucial tool for providing access to diverse audiences by intersemiotically translating visual content into verbal descriptions. However, the widespread adoption of ADs in museums remains limited due to financial constraints, resource shortages and few professionals in the field. The Generative Pre-trained Transformer (GPT) series, developed by OpenAI, has revolutionized the field of natural language processing (NLP) and artificial intelligence (AI). These Large Language Models (LLMs) show exceptional performance across a wide range of NLP tasks, including language translation, image description, text summarization, and question answering. In particular, GPT-4 and, by extension GPT Pro, models provide enhanced functionalities and customized options for creating sophisticated AI applications, including personalized bots. The present study builds upon a previous research that analyzed the accessibility level of a corpus of artworks ADs produced by prompting different LLMs (ChatGPT 3.5, Google Gemini, and Copilot), specifically addressing necessities of a visually challenged and/or blind audience. In this prior research, primary discursive structure, lexical and textual characteristics were examined through the use of a text analysis software called Sketch Engine and AD guidelines. The findings revealed both positive aspects and shortcomings; notably, these models often prioritized engaging narratives over delivering a deep and precise analysis of the artwork’s elements (Dini et al., forthcoming). To achieve higher quality audio descriptions (ADs), the present study aims to create a customized chatbot, originally trained with AD guidelines, utilizing the previously mentioned functionality of GPT Pro. Initially, a comparison of AD guidelines for museums produced by various entities and organizations was carried out, resulting in a comprehensive final list. For clarity, the list was then organized into three macro-categories: macrostructure, microstructure, and multimodalities. Unlike the previous study, the guidelines in this case are used to configure the GPT bot in advance, ensuring that the model consistently follows these instructions. The bot employs the “chain-of-thought” prompting principle, allowing users to engage in a conversation with it. The artworks ADs provided by the bot will be analysed and compared with the original ADs created by authorised organisations (museums and/or associations) in order to assess the impact of the prior training on promoting accessibility and carefully crafted narrative. As AI language models continue to advance, this study seek to evaluate if their integration can be a valuable support in the translation process, bridging the gap between technology and the humanities, and providing direction for future studies in the field, especially regarding the evolving role of the translator.

Customizing AI Language Models for the Production of Quality Artwork Audio Descriptions / S. Dini, L.A. Ludovico, M.J. Valero Gisbert. ((Intervento presentato al 11. convegno Media for All tenutosi a Hong Kong nel 2025.

Customizing AI Language Models for the Production of Quality Artwork Audio Descriptions

L.A. Ludovico;M.J. Valero Gisbert
2025

Abstract

The path towards making visual art more accessible is rapidly evolving. In the museum context, audio description (AD) is a crucial tool for providing access to diverse audiences by intersemiotically translating visual content into verbal descriptions. However, the widespread adoption of ADs in museums remains limited due to financial constraints, resource shortages and few professionals in the field. The Generative Pre-trained Transformer (GPT) series, developed by OpenAI, has revolutionized the field of natural language processing (NLP) and artificial intelligence (AI). These Large Language Models (LLMs) show exceptional performance across a wide range of NLP tasks, including language translation, image description, text summarization, and question answering. In particular, GPT-4 and, by extension GPT Pro, models provide enhanced functionalities and customized options for creating sophisticated AI applications, including personalized bots. The present study builds upon a previous research that analyzed the accessibility level of a corpus of artworks ADs produced by prompting different LLMs (ChatGPT 3.5, Google Gemini, and Copilot), specifically addressing necessities of a visually challenged and/or blind audience. In this prior research, primary discursive structure, lexical and textual characteristics were examined through the use of a text analysis software called Sketch Engine and AD guidelines. The findings revealed both positive aspects and shortcomings; notably, these models often prioritized engaging narratives over delivering a deep and precise analysis of the artwork’s elements (Dini et al., forthcoming). To achieve higher quality audio descriptions (ADs), the present study aims to create a customized chatbot, originally trained with AD guidelines, utilizing the previously mentioned functionality of GPT Pro. Initially, a comparison of AD guidelines for museums produced by various entities and organizations was carried out, resulting in a comprehensive final list. For clarity, the list was then organized into three macro-categories: macrostructure, microstructure, and multimodalities. Unlike the previous study, the guidelines in this case are used to configure the GPT bot in advance, ensuring that the model consistently follows these instructions. The bot employs the “chain-of-thought” prompting principle, allowing users to engage in a conversation with it. The artworks ADs provided by the bot will be analysed and compared with the original ADs created by authorised organisations (museums and/or associations) in order to assess the impact of the prior training on promoting accessibility and carefully crafted narrative. As AI language models continue to advance, this study seek to evaluate if their integration can be a valuable support in the translation process, bridging the gap between technology and the humanities, and providing direction for future studies in the field, especially regarding the evolving role of the translator.
29-mag-2025
Settore INFO-01/A - Informatica
https://www.m4all11.org/
Customizing AI Language Models for the Production of Quality Artwork Audio Descriptions / S. Dini, L.A. Ludovico, M.J. Valero Gisbert. ((Intervento presentato al 11. convegno Media for All tenutosi a Hong Kong nel 2025.
Conference Object
File in questo prodotto:
File Dimensione Formato  
Abstract Media 4 All.pdf

accesso aperto

Descrizione: pdf
Tipologia: Post-print, accepted manuscript ecc. (versione accettata dall'editore)
Licenza: Creative commons
Dimensione 117.25 kB
Formato Adobe PDF
117.25 kB Adobe PDF Visualizza/Apri
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2434/1174379
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
  • OpenAlex ND
social impact