In this paper, we describe a technique for extracting patterns to a XML data flow; then, we show how such patterns can be developed into an ontology of classes. Also, we discuss the impact of different fuzzy representation techniques for XML data on the outcome of our procedure. One might wonder why all this is needed, since the semantics of XML data could in principle be satisfactorily represented via their associated XML schemata ComplexTypes. Unfortunately it turns out that standard XML schema definitions need to cover a wide repertoire of possible attributes. For this reason, optional elements are widely used, thus decreasing the expressiveness of XML schemata as descriptors of the content of single instances. Our approach relies on comparing fuzzy encodings of XML fragments. This comparison will allow us to define “typical” sets of attributes, that we shall consider hints to possible meaningful classes. Then, we shall evaluate fuzzy overlapping between candidate cluster heads in order to define a tentative class hierarchy. Our fuzzy modelling assumes that a domain expert has associated an importance degree in the [0, 1] interval to vocabulary elements (i.e. tag names). As we shall see in the remainder of the paper, this burden is not excessive, since this importance assessment only needs to be carried out once, looking at the schema. At run time, each incoming XML fragment is mapped into a fuzzy set whose elements are the tag names [3]. Each element membership is computed by aggregating the vocabulary importance values of the tags lying on the path from it to the root.

Mining class hierarchies from XML data : representation techniques / P. Ceravolo, E. Damiani (ADVANCES IN SOFT COMPUTING). - In: Computational intelligence, theory and applications : international conference 8. fuzzy days in Dortmund, Germany, Sept. 29-Oct. 01, 2004 : proceedings / [a cura di] B. Reusch. - Berlin : Springer, 2006. - ISBN 9783540228073. - pp. 385-396 [10.1007/3-540-31182-3_36]

Mining class hierarchies from XML data : representation techniques

P. Ceravolo
Primo
;
E. Damiani
Secondo
2006

Abstract

In this paper, we describe a technique for extracting patterns to a XML data flow; then, we show how such patterns can be developed into an ontology of classes. Also, we discuss the impact of different fuzzy representation techniques for XML data on the outcome of our procedure. One might wonder why all this is needed, since the semantics of XML data could in principle be satisfactorily represented via their associated XML schemata ComplexTypes. Unfortunately it turns out that standard XML schema definitions need to cover a wide repertoire of possible attributes. For this reason, optional elements are widely used, thus decreasing the expressiveness of XML schemata as descriptors of the content of single instances. Our approach relies on comparing fuzzy encodings of XML fragments. This comparison will allow us to define “typical” sets of attributes, that we shall consider hints to possible meaningful classes. Then, we shall evaluate fuzzy overlapping between candidate cluster heads in order to define a tentative class hierarchy. Our fuzzy modelling assumes that a domain expert has associated an importance degree in the [0, 1] interval to vocabulary elements (i.e. tag names). As we shall see in the remainder of the paper, this burden is not excessive, since this importance assessment only needs to be carried out once, looking at the schema. At run time, each incoming XML fragment is mapped into a fuzzy set whose elements are the tag names [3]. Each element membership is computed by aggregating the vocabulary importance values of the tags lying on the path from it to the root.
Settore INF/01 - Informatica
2006
Book Part (author)
File in questo prodotto:
Non ci sono file associati a questo prodotto.
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2434/42000
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? 1
social impact