This paper addresses a data integration problem: given several mutually consistent datasetseach of which measures a subset of the variables of interest, how can one construct a probabilisticmodel that fits the data and gives reasonable answers to questions which are under-determined bythe data? Here we show how to obtain a Bayesian network model which represents the uniqueprobability function that agrees with the probability distributions measured by the datasets and oth-erwise has maximum entropy. We provide a general algorithm, OBN-cDS, which offers substantialefficiency savings over the standard brute-force approach to determining the maximum entropyprobability function. Furthermore, we develop modifications to the general algorithm which en-able further efficiency savings but which are only applicable in particular situations. We show thatthere are circumstances in which one can obtain the model (i) directly from the data; (ii) by solvingalgebraic problems; and (iii) by solving relatively simple independent optimisation problems.

Objective Bayesian Nets for Integrating Consistent Datasets / J. Landes, J. Williamson. - In: THE JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH. - ISSN 1076-9757. - 74:(2022), pp. 393-458. [10.1613/jair.1.13363]

Objective Bayesian Nets for Integrating Consistent Datasets

J. Landes;
2022

Abstract

This paper addresses a data integration problem: given several mutually consistent datasetseach of which measures a subset of the variables of interest, how can one construct a probabilisticmodel that fits the data and gives reasonable answers to questions which are under-determined bythe data? Here we show how to obtain a Bayesian network model which represents the uniqueprobability function that agrees with the probability distributions measured by the datasets and oth-erwise has maximum entropy. We provide a general algorithm, OBN-cDS, which offers substantialefficiency savings over the standard brute-force approach to determining the maximum entropyprobability function. Furthermore, we develop modifications to the general algorithm which en-able further efficiency savings but which are only applicable in particular situations. We show thatthere are circumstances in which one can obtain the model (i) directly from the data; (ii) by solvingalgebraic problems; and (iii) by solving relatively simple independent optimisation problems.
Settore M-FIL/02 - Logica e Filosofia della Scienza
Settore INF/01 - Informatica
Settore MAT/01 - Logica Matematica
2022
Article (author)
File in questo prodotto:
File Dimensione Formato  
sminton,+13363-Article+(PDF)-30705-1-11-20220523.pdf

accesso aperto

Tipologia: Publisher's version/PDF
Dimensione 625.4 kB
Formato Adobe PDF
625.4 kB Adobe PDF Visualizza/Apri
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2434/937846
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 8
  • ???jsp.display-item.citation.isi??? 7
social impact