The concept of heterogeneity is very important in XML data management, since many common applications must deal with large and complex collections which do not conform to a schema. Heterogeneity in XML collections can be present at many different levels (textual and structural) and needs to be addressed from several perspectives. This paper contributes a formal characterization of heterogeneity in XML collections based on information-theoretic considerations. We show how it can be applied in some important use cases, and we demonstrate its effectiveness by using it to analyze a number of relevant XML collections and retrieval approaches found in the literature. We show that a large space of highly heterogeneous collections has not been adequately addressed by these approaches.

An Entropy-Based Characterization of the Heterogeneity of XML Collections / I. Sanz, M. MESITI, G. Guerrini, L. R. BERLANGA - In: International Workshop on Database and Expert Systems Applications[s.l] : IEEE Computer Society, 2008. - pp. 238-242

An Entropy-Based Characterization of the Heterogeneity of XML Collections

M. MESITI
Secondo
;
2008

Abstract

The concept of heterogeneity is very important in XML data management, since many common applications must deal with large and complex collections which do not conform to a schema. Heterogeneity in XML collections can be present at many different levels (textual and structural) and needs to be addressed from several perspectives. This paper contributes a formal characterization of heterogeneity in XML collections based on information-theoretic considerations. We show how it can be applied in some important use cases, and we demonstrate its effectiveness by using it to analyze a number of relevant XML collections and retrieval approaches found in the literature. We show that a large space of highly heterogeneous collections has not been adequately addressed by these approaches.
Settore INF/01 - Informatica
2008
Book Part (author)
File in questo prodotto:
Non ci sono file associati a questo prodotto.
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2434/202314
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 3
  • ???jsp.display-item.citation.isi??? 2
social impact