Data transformation and schema conciliation are relevant topics in Industry due to the incorporation of data-intensive business processes in organizations. As the amount of data sources increases, the complexity of such data increases as well, leading to complex and nested data schemata. Nowadays, novel approaches are being employed in academia and Industry to assist non-expert users in transforming, integrating, and improving the quality of datasets (i.e., data wrangling). However, there is a lack of support for transforming semi-structured complex data. This article makes a state-of-the-art by identifying and analyzing the most relevant solutions that can be found in academia and Industry to transform this type of data. In addition, we propose a Domain-Specific Language (DSL) to support the transformation of complex data as a first approach to enhance data wrangling processes. We also develop a framework to implement the DSL and evaluate it in a real-world case study.

CHAMALEON: Framework to improve Data Wrangling with Complex Data / A. Valencia Parra, A.J. Varela Vaca, M.T. Gómez López, P. Ceravolo - In: ICIS 2019[s.l] : Association for Information Systems (AIS) Electronic Library, 2019. - ISBN 978-0-9966831-9-7. - pp. 1-17 (( Intervento presentato al 40. convegno International Conference on Information Systems : December : December ,15th through 18th tenutosi a Munich (Germany) nel 2019.

CHAMALEON: Framework to improve Data Wrangling with Complex Data

P. Ceravolo
Ultimo
2019

Abstract

Data transformation and schema conciliation are relevant topics in Industry due to the incorporation of data-intensive business processes in organizations. As the amount of data sources increases, the complexity of such data increases as well, leading to complex and nested data schemata. Nowadays, novel approaches are being employed in academia and Industry to assist non-expert users in transforming, integrating, and improving the quality of datasets (i.e., data wrangling). However, there is a lack of support for transforming semi-structured complex data. This article makes a state-of-the-art by identifying and analyzing the most relevant solutions that can be found in academia and Industry to transform this type of data. In addition, we propose a Domain-Specific Language (DSL) to support the transformation of complex data as a first approach to enhance data wrangling processes. We also develop a framework to implement the DSL and evaluate it in a real-world case study.
Data Wrangling; Complex Data; Data Transformation; Semi-structured Data; Data Preparation;
Settore INF/01 - Informatica
2019
https://aisel.aisnet.org/icis2019/data_science/data_science/16/
Book Part (author)
File in questo prodotto:
File Dimensione Formato  
CHAMALEON_FrameworktoimproveDataWranglingwithComplexData.pdf

accesso riservato

Tipologia: Publisher's version/PDF
Dimensione 663.9 kB
Formato Adobe PDF
663.9 kB Adobe PDF   Visualizza/Apri   Richiedi una copia
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2434/961882
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 8
  • ???jsp.display-item.citation.isi??? ND
social impact