Time-completeness trade-offs in record linkage using Adaptive Query Processing

Lengu, R.; Missier, P.; Fernandes, A.; Guerrini, G.; Mesiti, M.

doi:10.1145/1516360.1516458

Applications that involve data integration among multiple sources often require a preliminary step of data reconciliation in order to ensure that tuples match correctly across the sources. In dynamic settings such as data mashups, however, traditional offline data reconciliation techniques that require prior availability of the data may not be applicable. The alternative, performing similarity joins at query time, is computationally expensive, while ignoring the mismatch problem altogether leads to an incomplete integration. In this paper we make the assumption that, in some dynamic integration scenarios, users may agree to trade the completeness of a join result in return for a faster computation. We explore the consequences of this assumption by proposing a novel, hybrid join algorithm that involves a combination of exact and approximate join operators, managed using adaptive query processing techniques. The algorithm is optimistic: it can switch between physical join operators multiple times throughout query processing, but it only resorts to approximate join operators when there is statistical evidence that result completeness is compromised. Our experiments show that sensible savings in join execution time can be achieved in practice, at the expense of a modest reduction in result completeness. Copyright 2009 ACM.

Time-completeness trade-offs in record linkage using Adaptive Query Processing / R. LENGU, P. MISSIER, A. FERNANDES, G. GUERRINI, M. MESITI - In: 12th International Conference on Extending Database Technology (EDBT)[s.l] : ACM Digital Library, 2009. - pp. 851-861 [10.1145/1516360.1516458]

Time-completeness trade-offs in record linkage using Adaptive Query Processing

R. LENGU;P. MISSIER;A. FERNANDES;G. GUERRINI;M. Mesiti^Ultimo

2009

Abstract

Applications that involve data integration among multiple sources often require a preliminary step of data reconciliation in order to ensure that tuples match correctly across the sources. In dynamic settings such as data mashups, however, traditional offline data reconciliation techniques that require prior availability of the data may not be applicable. The alternative, performing similarity joins at query time, is computationally expensive, while ignoring the mismatch problem altogether leads to an incomplete integration. In this paper we make the assumption that, in some dynamic integration scenarios, users may agree to trade the completeness of a join result in return for a faster computation. We explore the consequences of this assumption by proposing a novel, hybrid join algorithm that involves a combination of exact and approximate join operators, managed using adaptive query processing techniques. The algorithm is optimistic: it can switch between physical join operators multiple times throughout query processing, but it only resorts to approximate join operators when there is statistical evidence that result completeness is compromised. Our experiments show that sensible savings in join execution time can be achieved in practice, at the expense of a modest reduction in result completeness. Copyright 2009 ACM.

Scheda breve

Scheda completa

Scheda completa (DC)

	Settori scientifico-disciplinari del contributo (sola visualizzazione)
	
				Settore INF/01 - Informatica
			
	Data di pubblicazione
	
				2009
			
	DOI
	
				https://dx.doi.org/10.1145/1516360.1516458
			
	Tipologia
	
				Book Part (author)
			
	Appare nelle tipologie:
	
				03 - Contributo in volume

File in questo prodotto:

File	Dimensione	Formato
lengu.pdf accesso aperto Tipologia: Post-print, accepted manuscript ecc. (versione accettata dall'editore) Dimensione 1.27 MB Formato Adobe PDF Visualizza/Apri	1.27 MB	Adobe PDF	Visualizza/Apri

Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2434/202320

Citazioni

ND

5

ND

ND

IRIS Institutional Research Information System - AIR Archivio Istituzionale della Ricerca