The concept of heterogeneity is very important in XML data management, since many common applications must deal with large and complex collections which do not conform to a schema. Heterogeneity in XML collections can be present at many different levels (textual and structural) and needs to be addressed from several perspectives. This paper contributes a formal characterization of heterogeneity in XML collections based on information-theoretic considerations. We show how it can be applied in some important use cases, and we demonstrate its effectiveness by using it to analyze a number of relevant XML collections and retrieval approaches found in the literature. We show that a large space of highly heterogeneous collections has not been adequately addressed by these approaches.
|Titolo:||An Entropy-Based Characterization of the Heterogeneity of XML Collections|
MESITI, MARCO (Secondo)
|Settore Scientifico Disciplinare:||Settore INF/01 - Informatica|
|Data di pubblicazione:||2008|
|Digital Object Identifier (DOI):||10.1109/DEXA.2008.55|
|Tipologia:||Book Part (author)|
|Appare nelle tipologie:||03 - Contributo in volume|