Trace clustering has been extensively used to discover aspects of the data from event logs. Process Mining techniques guide the identification of sub-logs by grouping traces with similar behaviors, producing more understandable models and improving conformance indicators. Nevertheless, little attention has been posed to the relationship among event log properties, the pipeline of encoding and clustering algorithms, and the quality of the obtained outcome. The present study contributes to the understanding of the aforementioned relationships and provides an automatic selection of a proper combination of algorithms for clustering a given event log. We propose a Meta-Learning framework to recommend the most suitable pipeline for trace clustering, which encompasses the encoding method, clustering algorithm, and its hyperparameters. Our experiments were conducted using a thousand event logs, four encoding techniques, and three clustering methods. Results indicate that our framework sheds light on the trace clustering problem and can assist users in choosing the best pipeline considering their environment.

Selecting Optimal Trace Clustering Pipelines with Meta-learning / G. Marques Tavares, S. Barbon Junior, E. Damiani, P. Ceravolo (LECTURE NOTES IN COMPUTER SCIENCE). - In: Intelligent Systems / [a cura di] J.C. Xavier-Junior, R. Araújo Rios. - [s.l] : Springer Science and Business, 2022. - ISBN 9783031216855. - pp. 150-164 (( Intervento presentato al 11. convegno Brazilian Conference on Intelligent Systems tenutosi a Campinas nel 2022 [10.1007/978-3-031-21686-2_11].

Selecting Optimal Trace Clustering Pipelines with Meta-learning

G. Marques Tavares
Primo
;
E. Damiani
Penultimo
;
P. Ceravolo
Ultimo
2022

Abstract

Trace clustering has been extensively used to discover aspects of the data from event logs. Process Mining techniques guide the identification of sub-logs by grouping traces with similar behaviors, producing more understandable models and improving conformance indicators. Nevertheless, little attention has been posed to the relationship among event log properties, the pipeline of encoding and clustering algorithms, and the quality of the obtained outcome. The present study contributes to the understanding of the aforementioned relationships and provides an automatic selection of a proper combination of algorithms for clustering a given event log. We propose a Meta-Learning framework to recommend the most suitable pipeline for trace clustering, which encompasses the encoding method, clustering algorithm, and its hyperparameters. Our experiments were conducted using a thousand event logs, four encoding techniques, and three clustering methods. Results indicate that our framework sheds light on the trace clustering problem and can assist users in choosing the best pipeline considering their environment.
Meta-learning; Pipeline design; Process mining; Recommendation; Trace clustering
Settore INF/01 - Informatica
2022
Book Part (author)
File in questo prodotto:
Non ci sono file associati a questo prodotto.
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2434/954753
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 5
  • ???jsp.display-item.citation.isi??? 5
social impact