This study provides insights into both addressing data confidentiality concerns and enhancing document retrieval effectiveness in Data Marketplaces, which in this specific study consist of unstructured, textual documents. Through a semi-automatic sanitization process leveraging token masking with text summarization, possibly complemented by Coreference Resolution, the proposed solution mitigates the risk of inferring confidential information while maintaining search performance. Experimental results demonstrate encouraging improvements in both aspects with respect to baseline solutions.
Assessing Document Sanitization for Controlled Information Release and Retrieval in Data Marketplaces / L. Cassani, G. Livraga, M. Viviani (LECTURE NOTES IN COMPUTER SCIENCE). - In: Experimental IR Meets Multilinguality, Multimodality, and Interaction / [a cura di] L. Goeuriot, P. Mulhem, G. Quénot, D. Schwab, G.M. Di Nunzio, L. Soulier, P. Galuščáková, A. García Seco de Herrera, G. Faggioli, N. Ferro. - [s.l] : Springer, 2024 Sep. - ISBN 9783031717352. - pp. 88-99 (( Intervento presentato al 15. convegno International Conference of the Cross-Language Evaluation Forum for European Languages tenutosi a Grenoble nel 2024 [10.1007/978-3-031-71736-9_4].
Assessing Document Sanitization for Controlled Information Release and Retrieval in Data Marketplaces
G. Livraga;
2024
Abstract
This study provides insights into both addressing data confidentiality concerns and enhancing document retrieval effectiveness in Data Marketplaces, which in this specific study consist of unstructured, textual documents. Through a semi-automatic sanitization process leveraging token masking with text summarization, possibly complemented by Coreference Resolution, the proposed solution mitigates the risk of inferring confidential information while maintaining search performance. Experimental results demonstrate encouraging improvements in both aspects with respect to baseline solutions.File | Dimensione | Formato | |
---|---|---|---|
clv-clef2024.pdf
accesso riservato
Tipologia:
Post-print, accepted manuscript ecc. (versione accettata dall'editore)
Dimensione
534.98 kB
Formato
Adobe PDF
|
534.98 kB | Adobe PDF | Visualizza/Apri Richiedi una copia |
978-3-031-71736-9_4.pdf
accesso riservato
Tipologia:
Publisher's version/PDF
Dimensione
634.08 kB
Formato
Adobe PDF
|
634.08 kB | Adobe PDF | Visualizza/Apri Richiedi una copia |
Pubblicazioni consigliate
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.