Joinable table discovery consists of the identification of tabular datasets that can be joined with a given query dataset. The use of contextual information associated with the datasets and columns (tailored to the kinds of analyses the user intends to carry out) is seldom considered in the approaches proposed so far. In this paper, the generation of semantic task-oriented schema-based catalogs that facilitate the identification of joinable columns is proposed. By identifying a schema diagram that outlines the classes and relationship types for a certain kind of analysis, datasets are semantically annotated, and annotations are used to generate the catalog. The catalog, represented as a property graph, can then be leveraged for visual exploration, query formulation, and identification of joinable datasets useful for a specific analysis. The approach leverages the availability of metadata about datasets and their columns, combined with general-purpose large language models (LLMs). Initial experiments suggest that our approach is both practical and efficient, yielding promising results in terms of both accuracy and usability.
A Semantic Schema-Based Catalog for Identifying Joinable Columns via LLMs / E. Cavalleri, M. Castagna, M. Mesiti (LECTURE NOTES IN COMPUTER SCIENCE). - In: Flexible Query Answering Systems / [a cura di] G. De Tré, S. Sotirov, J. Kacprzyk, G. Psaila, G. Smits, T. Andreasen, G. Bordogna, H. Legind Larsen. - [s.l] : Springer, 2025 Sep 08. - ISBN 9783032056061. - pp. 206-218 (( Intervento presentato al 16. convegno FQAS tenutosi a Burgas nel 2025 [10.1007/978-3-032-05607-8_20].
A Semantic Schema-Based Catalog for Identifying Joinable Columns via LLMs
E. CavalleriPrimo
;M. Castagna;M. MesitiUltimo
2025
Abstract
Joinable table discovery consists of the identification of tabular datasets that can be joined with a given query dataset. The use of contextual information associated with the datasets and columns (tailored to the kinds of analyses the user intends to carry out) is seldom considered in the approaches proposed so far. In this paper, the generation of semantic task-oriented schema-based catalogs that facilitate the identification of joinable columns is proposed. By identifying a schema diagram that outlines the classes and relationship types for a certain kind of analysis, datasets are semantically annotated, and annotations are used to generate the catalog. The catalog, represented as a property graph, can then be leveraged for visual exploration, query formulation, and identification of joinable datasets useful for a specific analysis. The approach leverages the availability of metadata about datasets and their columns, combined with general-purpose large language models (LLMs). Initial experiments suggest that our approach is both practical and efficient, yielding promising results in terms of both accuracy and usability.| File | Dimensione | Formato | |
|---|---|---|---|
|
fqas2025.pdf
accesso riservato
Tipologia:
Post-print, accepted manuscript ecc. (versione accettata dall'editore)
Licenza:
Nessuna licenza
Dimensione
4.31 MB
Formato
Adobe PDF
|
4.31 MB | Adobe PDF | Visualizza/Apri Richiedi una copia |
Pubblicazioni consigliate
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.




