This paper addresses a data integration problem: given several mutually consistent datasetseach of which measures a subset of the variables of interest, how can one construct a probabilisticmodel that fits the data and gives reasonable answers to questions which are under-determined bythe data? Here we show how to obtain a Bayesian network model which represents the uniqueprobability function that agrees with the probability distributions measured by the datasets and oth-erwise has maximum entropy. We provide a general algorithm, OBN-cDS, which offers substantialefficiency savings over the standard brute-force approach to determining the maximum entropyprobability function. Furthermore, we develop modifications to the general algorithm which en-able further efficiency savings but which are only applicable in particular situations. We show thatthere are circumstances in which one can obtain the model (i) directly from the data; (ii) by solvingalgebraic problems; and (iii) by solving relatively simple independent optimisation problems.
Objective Bayesian Nets for Integrating Consistent Datasets / J. Landes, J. Williamson. - In: THE JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH. - ISSN 1076-9757. - 74:(2022), pp. 393-458. [10.1613/jair.1.13363]
Objective Bayesian Nets for Integrating Consistent Datasets
J. Landes;
2022
Abstract
This paper addresses a data integration problem: given several mutually consistent datasetseach of which measures a subset of the variables of interest, how can one construct a probabilisticmodel that fits the data and gives reasonable answers to questions which are under-determined bythe data? Here we show how to obtain a Bayesian network model which represents the uniqueprobability function that agrees with the probability distributions measured by the datasets and oth-erwise has maximum entropy. We provide a general algorithm, OBN-cDS, which offers substantialefficiency savings over the standard brute-force approach to determining the maximum entropyprobability function. Furthermore, we develop modifications to the general algorithm which en-able further efficiency savings but which are only applicable in particular situations. We show thatthere are circumstances in which one can obtain the model (i) directly from the data; (ii) by solvingalgebraic problems; and (iii) by solving relatively simple independent optimisation problems.File | Dimensione | Formato | |
---|---|---|---|
sminton,+13363-Article+(PDF)-30705-1-11-20220523.pdf
accesso aperto
Tipologia:
Publisher's version/PDF
Dimensione
625.4 kB
Formato
Adobe PDF
|
625.4 kB | Adobe PDF | Visualizza/Apri |
Pubblicazioni consigliate
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.