Spreadsheets are often used as a simple way for representing tabular data. However, since they do not impose any restriction on their table structures and contents, their automatic processing and the integration with other information sources are particularly hard problems to solve. Many table understanding approaches have been proposed for extracting data from tables and transforming them in meaningful information. However, they require some regularities on the table contents. Starting from CSV spreadsheets that present values of different types and errors, in this paper we introduce an approach for inferring the types of columns in CSV tables by exploiting a multi-label classification approach. By means of our approach, each column of the table can be associated with a simple datatype (such as integer, float, text), a domain-specific one (such as the name of a municipality, and address), or an “union” of types (that takes into account the frequency of the corresponding values). Since the automatically inferred types might not be accurate, graphical interfaces have been developed for supporting the user in fixing the mistakes. Experimental results are finally reported on real spreadsheets obtained by a debt collection agency.
Semi-automatic Column Type Inference for CSV Table Understanding / S. Bonfitto, L. Cappelletti, F. Trovato, G. Valentini, M. Mesiti (LECTURE NOTES IN ARTIFICIAL INTELLIGENCE). - In: SOFSEM 2021: Theory and Practice of Computer Science / [a cura di] T. Bureš, R. Dondi, J. Gamper, G. Guerrini, T. Jurdziński, C. Pahl, F. Sikora, P.W.H. Wong. - [s.l] : Springer, 2021. - ISBN 9783030677305. - pp. 535-549 (( Intervento presentato al 47. convegno International Conference on Current Trends in Theory and Practice of Computer Science tenutosi a Bolzano nel 2021.
|Titolo:||Semi-automatic Column Type Inference for CSV Table Understanding|
MESITI, MARCO (Corresponding)
|Parole Chiave:||Table understanding; Type inference; GUI; CSVs|
|Settore Scientifico Disciplinare:||Settore INF/01 - Informatica|
|Data di pubblicazione:||2021|
|Digital Object Identifier (DOI):||http://dx.doi.org/10.1007/978-3-030-67731-2_39|
|Tipologia:||Book Part (author)|
|Appare nelle tipologie:||03 - Contributo in volume|
File in questo prodotto:
|Bonfitto2021_Chapter_Semi-automaticColumnTypeInfere.pdf||Publisher's version/PDF||Administrator Richiedi una copia|