The Complexity of Why-Provenance for Datalog Queries

Calautti, M.; Livshits, E.; Pieris, A.; Schneider, M.

doi:10.1145/3651146

Datalog is a powerful rule-based language that allows us to express complex recursive queries and has found numerous applications over the years. Explaining why a result to a Datalog query is obtained is an essential task towards explainable and transparent data-intensive applications that rely on Datalog. A standard way of explaining a query result is the so-called why-provenance, which provides information about the witnesses to a query result in the form of subsets of the input database that as a whole can be used to derive that result. To our surprise, despite the fact that the notion of why-provenance for Datalog queries has been around for decades and intensively studied, its computational complexity remains unexplored. Our goal is to fill this gap in the why-provenance literature. Towards this end, we pinpoint the data complexity of why-provenance for Datalog queries and key subclasses thereof. The takeaway of our work is that why-provenance for recursive queries, even if the recursion is limited to be linear, is an intractable problem, whereas for non-recursive queries is highly tractable.

The Complexity of Why-Provenance for Datalog Queries / M. Calautti, E. Livshits, A. Pieris, M. Schneider. - In: PROCEEDINGS OF THE ACM ON MANAGEMENT OF DATA. - ISSN 2836-6573. - 2:2(2024 May), pp. 1-16. [10.1145/3651146]

The Complexity of Why-Provenance for Datalog Queries

M. Calautti^Primo;Livshits, Ester;Pieris, Andreas;

2024

Abstract

Datalog is a powerful rule-based language that allows us to express complex recursive queries and has found numerous applications over the years. Explaining why a result to a Datalog query is obtained is an essential task towards explainable and transparent data-intensive applications that rely on Datalog. A standard way of explaining a query result is the so-called why-provenance, which provides information about the witnesses to a query result in the form of subsets of the input database that as a whole can be used to derive that result. To our surprise, despite the fact that the notion of why-provenance for Datalog queries has been around for decades and intensively studied, its computational complexity remains unexplored. Our goal is to fill this gap in the why-provenance literature. Towards this end, we pinpoint the data complexity of why-provenance for Datalog queries and key subclasses thereof. The takeaway of our work is that why-provenance for recursive queries, even if the recursion is limited to be linear, is an intractable problem, whereas for non-recursive queries is highly tractable.

Scheda breve

Scheda completa

Scheda completa (DC)

	Parole chiave
	
				Theory of computation; Theory and algorithms for application domains; Database theory; Data provenance; Datalog; Provenance
			
	Settori scientifico-disciplinari dell'articolo (validi dal 09/05/2024)
	
				Settore INFO-01/A - Informatica
			
	Data di pubblicazione
	
				mag-2024
			
	Data ahead of print o data di stampa
	
				14-mag-2024
			
	Rivista in ANCE
	
				PROCEEDINGS OF THE ACM ON MANAGEMENT OF DATA
			
	DOI
	
				https://dx.doi.org/10.1145/3651146
			
	Tipologia
	
				Article (author)
			
	Appare nelle tipologie:
	
				01 - Articolo su periodico

File in questo prodotto:

File	Dimensione	Formato
3651146.pdf accesso riservato Tipologia: Publisher's version/PDF Dimensione 928.26 kB Formato Adobe PDF Visualizza/Apri Richiedi una copia	928.26 kB	Adobe PDF	Visualizza/Apri Richiedi una copia

Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2434/1157380

Citazioni

ND

ND

ND

ND

IRIS Institutional Research Information System - AIR Archivio Istituzionale della Ricerca