IRIS Institutional Research Information System - AIR Archivio Istituzionale della Ricerca

The large amount and heterogeneity of XML documents on the Web require the development of clustering techniques to group together similar documents. Documents can be grouped together according to their content, their structure, and links inside and among documents. For instance, grouping together documents with similar structures has interesting applications in the context of information extraction, of heterogeneous data integration, of personalized content delivery, of access control definition, of web site structural analysis, of comparison of RNA secondary structures. Many approaches have been proposed for evaluating the structural and content similarity between tree-based and vector-based representations of XML documents. Link-based similarity approaches developed for Web data clustering have been adapted for XML documents. This chapter discusses and compares the most relevant similarity measures and their employment for XML document clustering.

An Overview of Similarity Measures for Clustering XML Documents / Giovanna Guerrini, Marco Mesiti, Ismael Sanz - In: Web data management practices : emerging techniques and technologies / [a cura di] Athena Vakali, George Pallis. - Hershey, PA : Idea Group Publishing, 2006. - ISBN 1599042990. - pp. 56-78 [10.4018/978-1-59904-228-2]

An Overview of Similarity Measures for Clustering XML Documents

Giovanna Guerrini;M. Mesiti;Ismael Sanz

2006

Abstract

The large amount and heterogeneity of XML documents on the Web require the development of clustering techniques to group together similar documents. Documents can be grouped together according to their content, their structure, and links inside and among documents. For instance, grouping together documents with similar structures has interesting applications in the context of information extraction, of heterogeneous data integration, of personalized content delivery, of access control definition, of web site structural analysis, of comparison of RNA secondary structures. Many approaches have been proposed for evaluating the structural and content similarity between tree-based and vector-based representations of XML documents. Link-based similarity approaches developed for Web data clustering have been adapted for XML documents. This chapter discusses and compares the most relevant similarity measures and their employment for XML document clustering.

Scheda breve

Scheda completa

Scheda completa (DC)

	Parole chiave
	
				XML, data mining, web-based applications, retrieval
			
	Settori scientifico-disciplinari del contributo (sola visualizzazione)
	
				Settore INF/01 - Informatica
			
	Data di pubblicazione
	
				2006
			
	DOI
	
				https://dx.doi.org/10.4018/978-1-59904-228-2
			
	Tipologia
	
				Book Part (author)
			
	Appare nelle tipologie:
	
				03 - Contributo in volume

File in questo prodotto:

Non ci sono file associati a questo prodotto.

Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2434/25567

Citazioni

ND

19

ND

ND

social impact