In this paper, we describe a technique for extracting patterns to a XML data flow; then, we show how such patterns can be developed into an ontology of classes. Also, we discuss the impact of different fuzzy representation techniques for XML data on the outcome of our procedure. One might wonder why all this is needed, since the semantics of XML data could in principle be satisfactorily represented via their associated XML schemata ComplexTypes. Unfortunately it turns out that standard XML schema definitions need to cover a wide repertoire of possible attributes. For this reason, optional elements are widely used, thus decreasing the expressiveness of XML schemata as descriptors of the content of single instances. Our approach relies on comparing fuzzy encodings of XML fragments. This comparison will allow us to define “typical” sets of attributes, that we shall consider hints to possible meaningful classes. Then, we shall evaluate fuzzy overlapping between candidate cluster heads in order to define a tentative class hierarchy. Our fuzzy modelling assumes that a domain expert has associated an importance degree in the [0, 1] interval to vocabulary elements (i.e. tag names). As we shall see in the remainder of the paper, this burden is not excessive, since this importance assessment only needs to be carried out once, looking at the schema. At run time, each incoming XML fragment is mapped into a fuzzy set whose elements are the tag names . Each element membership is computed by aggregating the vocabulary importance values of the tags lying on the path from it to the root.
|Titolo:||Mining class hierarchies from XML data : representation techniques|
|Autori interni:||CERAVOLO, PAOLO (Primo)|
DAMIANI, ERNESTO (Secondo)
|Settore Scientifico Disciplinare:||Settore INF/01 - Informatica|
|Data di pubblicazione:||2006|
|Digital Object Identifier (DOI):||10.1007/3-540-31182-3_36|
|Tipologia:||Book Part (author)|
|Appare nelle tipologie:||03 - Contributo in volume|
File in questo prodotto:
- PubMed Central loading...