Measuring the structural similarity between an XML document and a DTD has many relevant applications that range from document classification and approximate structural queries on XML documents to selective dissemination of XML documents and document protection. The problem is harder than measuring structural similarity among documents, because a DTD can be considered as a generator of documents. Thus, the problem is to evaluate the similarity between a document and a set of documents. An effective structural similarity measure should face different requirements that range from considering the presence and absence of required elements, as well as the structure and level of themissing and extra elements to vocabulary discrepancies due to the use of synonymous or syntactically similar tags. In the paper, starting from these requirements, we provide a definition of the measure and present an algorithm for matching a document against a DTD to obtain their structural similarity. Finally, experimental results to assess the effectiveness of the approach are presented.

Measuring the Structural Similarity among XML Documents and DTDs / E. Bertino, G. Guerrini, M. Mesiti. - In: JOURNAL OF INTELLIGENT INFORMATION SYSTEMS. - ISSN 0925-9902. - 30:1(2008 Feb), pp. 55-92.

Measuring the Structural Similarity among XML Documents and DTDs

M. Mesiti
Ultimo
2008

Abstract

Measuring the structural similarity between an XML document and a DTD has many relevant applications that range from document classification and approximate structural queries on XML documents to selective dissemination of XML documents and document protection. The problem is harder than measuring structural similarity among documents, because a DTD can be considered as a generator of documents. Thus, the problem is to evaluate the similarity between a document and a set of documents. An effective structural similarity measure should face different requirements that range from considering the presence and absence of required elements, as well as the structure and level of themissing and extra elements to vocabulary discrepancies due to the use of synonymous or syntactically similar tags. In the paper, starting from these requirements, we provide a definition of the measure and present an algorithm for matching a document against a DTD to obtain their structural similarity. Finally, experimental results to assess the effectiveness of the approach are presented.
H.3.2 Information storage; H.3.3 Information search and retrieval; H.3.5.f XML/XSL/RDF; I.5.3.b Similarity measure
Settore INF/01 - Informatica
feb-2008
Article (author)
File in questo prodotto:
File Dimensione Formato  
MeasuringTheStructural.pdf

accesso aperto

Tipologia: Post-print, accepted manuscript ecc. (versione accettata dall'editore)
Dimensione 703.36 kB
Formato Adobe PDF
703.36 kB Adobe PDF Visualizza/Apri
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2434/26556
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 23
  • ???jsp.display-item.citation.isi??? 12
social impact