TRoTR: A Framework for Evaluating the Recontextualization of Text

Periti, F.; Cassotti, P.; Montanelli, S.; Tahmasebi, N.; Schlechtweg, D.

doi:10.18653/v1/2024.emnlp-main.774

Current approaches for detecting text reuse do not focus on recontextualization, i.e., how the new context(s) of a reused text differs from its original context(s). In this paper, we propose a novel framework called TRoTR that relies on the notion of topic relatedness for evaluating the diachronic change of context in which text is reused. TRoTR includes two NLP tasks: TRiC and TRaC. TRiC is designed to evaluate the topic relatedness between a pair of recontextualizations. TRaC is designed to evaluate the overall topic variation within a set of recontextualizations. We also provide a curated TRoTR benchmark of biblical text reuse, human-annotated with topic relatedness. The benchmark exhibits an inter-annotator agreement of.811. We evaluate multiple, established SBERT models on the TRoTR tasks and find that they exhibit greater sensitivity to textual similarity than topic relatedness. Our experiments show that fine-tuning these models can mitigate such a kind of sensitivity.

TRoTR: A Framework for Evaluating the Recontextualization of Text / F. Periti, P. Cassotti, S. Montanelli, N. Tahmasebi, D. Schlechtweg - In: 2024 Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference[s.l] : Association for Computational Linguistics, 2024. - ISBN 979-8-89176-164-3. - pp. 13972-13990 (( convegno Conference on Empirical Methods in Natural Language Processing tenutosi a Miami nel 2024 [10.18653/v1/2024.emnlp-main.774].

TRoTR: A Framework for Evaluating the Recontextualization of Text

F. Periti;Cassotti P.;S. Montanelli;Tahmasebi N.;Schlechtweg D.

2024

Abstract

Current approaches for detecting text reuse do not focus on recontextualization, i.e., how the new context(s) of a reused text differs from its original context(s). In this paper, we propose a novel framework called TRoTR that relies on the notion of topic relatedness for evaluating the diachronic change of context in which text is reused. TRoTR includes two NLP tasks: TRiC and TRaC. TRiC is designed to evaluate the topic relatedness between a pair of recontextualizations. TRaC is designed to evaluate the overall topic variation within a set of recontextualizations. We also provide a curated TRoTR benchmark of biblical text reuse, human-annotated with topic relatedness. The benchmark exhibits an inter-annotator agreement of.811. We evaluate multiple, established SBERT models on the TRoTR tasks and find that they exhibit greater sensitivity to textual similarity than topic relatedness. Our experiments show that fine-tuning these models can mitigate such a kind of sensitivity.

Scheda breve

Scheda completa

Scheda completa (DC)

	Settori scientifico-disciplinari del contributo (validi dal 09/05/2024)
	
				Settore INFO-01/A - Informatica
			
	Data di pubblicazione
	
				2024
			
	Enti collegati al convegno
	
				Apple
Bloomberg
Citadel Securities
et al.
Google DeepMind
Meta
			
	DOI
	
				https://dx.doi.org/10.18653/v1/2024.emnlp-main.774
			
	URL
	
				https://aclanthology.org/2024.emnlp-main.774.pdf
			
	Tipologia
	
				Book Part (author)
			
	Appare nelle tipologie:
	
				03 - Contributo in volume

File in questo prodotto:

File	Dimensione	Formato
2024.emnlp-main.774.pdf accesso aperto Tipologia: Publisher's version/PDF Dimensione 379.93 kB Formato Adobe PDF Visualizza/Apri	379.93 kB	Adobe PDF	Visualizza/Apri

Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2434/1161201

Citazioni

ND

1

ND

ND

IRIS Institutional Research Information System - AIR Archivio Istituzionale della Ricerca