This study provides insights into both addressing data confidentiality concerns and enhancing document retrieval effectiveness in Data Marketplaces, which in this specific study consist of unstructured, textual documents. Through a semi-automatic sanitization process leveraging token masking with text summarization, possibly complemented by Coreference Resolution, the proposed solution mitigates the risk of inferring confidential information while maintaining search performance. Experimental results demonstrate encouraging improvements in both aspects with respect to baseline solutions.

Assessing Document Sanitization for Controlled Information Release and Retrieval in Data Marketplaces / L. Cassani, G. Livraga, M. Viviani (LECTURE NOTES IN COMPUTER SCIENCE). - In: Experimental IR Meets Multilinguality, Multimodality, and Interaction / [a cura di] L. Goeuriot, P. Mulhem, G. Quénot, D. Schwab, G.M. Di Nunzio, L. Soulier, P. Galuščáková, A. García Seco de Herrera, G. Faggioli, N. Ferro. - [s.l] : Springer, 2024 Sep. - ISBN 9783031717352. - pp. 88-99 (( Intervento presentato al 15. convegno International Conference of the Cross-Language Evaluation Forum for European Languages tenutosi a Grenoble nel 2024 [10.1007/978-3-031-71736-9_4].

Assessing Document Sanitization for Controlled Information Release and Retrieval in Data Marketplaces

G. Livraga;
2024

Abstract

This study provides insights into both addressing data confidentiality concerns and enhancing document retrieval effectiveness in Data Marketplaces, which in this specific study consist of unstructured, textual documents. Through a semi-automatic sanitization process leveraging token masking with text summarization, possibly complemented by Coreference Resolution, the proposed solution mitigates the risk of inferring confidential information while maintaining search performance. Experimental results demonstrate encouraging improvements in both aspects with respect to baseline solutions.
Text Sanitization; Confidentiality; Text Summarization; Coreference Resolution; Information Retrieval; Data Marketplaces
Settore INF/01 - Informatica
Settore INFO-01/A - Informatica
   Green responsibLe privACy preservIng dAta operaTIONs
   GLACIATION
   EUROPEAN COMMISSION

   KURAMi: Knowledge-based, explainable User empowerment in Releasing private data and Assessing Misinformation in online environments
   KURAMI
   MINISTERO DELL'UNIVERSITA' E DELLA RICERCA
   20225WTRFN_003

   SEcurity and RIghts in the CyberSpace (SERICS)
   SERICS
   MINISTERO DELL'UNIVERSITA' E DELLA RICERCA
   codice identificativo PE00000014
set-2024
Book Part (author)
File in questo prodotto:
File Dimensione Formato  
clv-clef2024.pdf

accesso riservato

Tipologia: Post-print, accepted manuscript ecc. (versione accettata dall'editore)
Dimensione 534.98 kB
Formato Adobe PDF
534.98 kB Adobe PDF   Visualizza/Apri   Richiedi una copia
978-3-031-71736-9_4.pdf

accesso riservato

Tipologia: Publisher's version/PDF
Dimensione 634.08 kB
Formato Adobe PDF
634.08 kB Adobe PDF   Visualizza/Apri   Richiedi una copia
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2434/1118826
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? 0
social impact