Encoding methods are employed across several process mining tasks, including predictive process monitoring, anomalous case detection, trace clustering, etc. These methods are usually performed as preprocessing steps and are responsible for mapping complex event data information into a numerical feature space. Most papers choose existing encoding methods arbitrarily or employ a strategy based on expert domain knowledge. Moreover, existing methods are employed by using their default parameters without evaluating other options. This practice can lead to several drawbacks, such as suboptimal performance and unfair comparisons with the state-of-the-art. Therefore, this work aims at providing a comprehensive survey and benchmark on event log encoding by comparing 27 methods, from different natures, in terms of expressivity, scalability, correlation, and domain agnosticism. To the best of our knowledge, this is the most comprehensive study so far focusing on trace encoding in process mining. It contributes to maturing awareness about the role of trace encoding in process mining pipelines and sheds light on issues, concerns, and future research directions regarding the use of encoding methods to bridge the gap between machine learning models and process mining.

Trace encoding in process mining: A survey and benchmarking / G.M. Tavares, R.S. Oyamada, S. Barbon, P. Ceravolo. - In: ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE. - ISSN 0952-1976. - 126:(2023), pp. 107028.1-107028.23. [10.1016/j.engappai.2023.107028]

Trace encoding in process mining: A survey and benchmarking

R.S. Oyamada
;
P. Ceravolo
2023

Abstract

Encoding methods are employed across several process mining tasks, including predictive process monitoring, anomalous case detection, trace clustering, etc. These methods are usually performed as preprocessing steps and are responsible for mapping complex event data information into a numerical feature space. Most papers choose existing encoding methods arbitrarily or employ a strategy based on expert domain knowledge. Moreover, existing methods are employed by using their default parameters without evaluating other options. This practice can lead to several drawbacks, such as suboptimal performance and unfair comparisons with the state-of-the-art. Therefore, this work aims at providing a comprehensive survey and benchmark on event log encoding by comparing 27 methods, from different natures, in terms of expressivity, scalability, correlation, and domain agnosticism. To the best of our knowledge, this is the most comprehensive study so far focusing on trace encoding in process mining. It contributes to maturing awareness about the role of trace encoding in process mining pipelines and sheds light on issues, concerns, and future research directions regarding the use of encoding methods to bridge the gap between machine learning models and process mining.
Anomaly detection; Encoding methods; Process mining
Settore INF/01 - Informatica
2023
Article (author)
File in questo prodotto:
File Dimensione Formato  
1-s2.0-S0952197623012125-main.pdf

accesso aperto

Tipologia: Publisher's version/PDF
Dimensione 2.69 MB
Formato Adobe PDF
2.69 MB Adobe PDF Visualizza/Apri
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2434/1006201
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 4
  • ???jsp.display-item.citation.isi??? 3
social impact