Datalog is a well-established rule-based language that allows us to express complex recursive queries. As for every other query language, explaining why a result to a Datalog query is obtained is an essential task towards explainable and transparent query evaluation. A standard way of explaining a query result is the so-called why-provenance, which provides information about the witnesses to a query result in the form of subsets of the input database that as a whole can be used to derive that result. The data complexity of why-provenance for Datalog queries has been recently studied and it was shown to be intractable, namely NP-complete even if the recursion is linear. An interesting question that comes up is whether we can ensure tractable data complexity by considering a slightly less informative notion of provenance, in particular, whyminimal-provenance that keeps only subset-minimal witnesses to a query result. Another interesting question is whether we can adopt a more informative notion of provenance than why-provenance, in particular, whymultiplicity-provenance that also keeps track of how many times a certain fact occurs in a witness to a query result, without paying a price in data complexity. This work provides definitive answers to the above questions: (i) whyminimal-provenance ensures tractable data complexity, and (ii) whymultiplicity-provenance can be adopted without paying a price in complexity, apart from one surprising case where the data complexity becomes PSPACE-complete.

Below and Above Why-Provenance for Datalog Queries / M. Calautti, E. Livshits, A. Pieris, M. Schneider. - In: PROCEEDINGS OF THE ACM ON MANAGEMENT OF DATA. - ISSN 2836-6573. - 2:5(2024 Nov 07), pp. 211.1-211.21. [10.1145/3695829]

Below and Above Why-Provenance for Datalog Queries

M. Calautti
Primo
;
2024

Abstract

Datalog is a well-established rule-based language that allows us to express complex recursive queries. As for every other query language, explaining why a result to a Datalog query is obtained is an essential task towards explainable and transparent query evaluation. A standard way of explaining a query result is the so-called why-provenance, which provides information about the witnesses to a query result in the form of subsets of the input database that as a whole can be used to derive that result. The data complexity of why-provenance for Datalog queries has been recently studied and it was shown to be intractable, namely NP-complete even if the recursion is linear. An interesting question that comes up is whether we can ensure tractable data complexity by considering a slightly less informative notion of provenance, in particular, whyminimal-provenance that keeps only subset-minimal witnesses to a query result. Another interesting question is whether we can adopt a more informative notion of provenance than why-provenance, in particular, whymultiplicity-provenance that also keeps track of how many times a certain fact occurs in a witness to a query result, without paying a price in data complexity. This work provides definitive answers to the above questions: (i) whyminimal-provenance ensures tractable data complexity, and (ii) whymultiplicity-provenance can be adopted without paying a price in complexity, apart from one surprising case where the data complexity becomes PSPACE-complete.
Settore INFO-01/A - Informatica
7-nov-2024
Article (author)
File in questo prodotto:
File Dimensione Formato  
3695829.pdf

accesso riservato

Tipologia: Publisher's version/PDF
Dimensione 929.96 kB
Formato Adobe PDF
929.96 kB Adobe PDF   Visualizza/Apri   Richiedi una copia
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2434/1157382
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
  • OpenAlex ND
social impact