Folksonomies - collections of user-contributed tags, proved to be efficient in reducing the inherent semantic gap when retrieving web contents. To get best use of folksonomies, tag clustering was proposed to address the problems implied by free-style user tagging, such as lexical variations, tag split, multilingualism, etc. In this paper, we propose a novel approach for identifying similar tags in folksonomies. It is based on the idea that in folksonomies, the most frequent tags can be used to identify groups of semantically related tags. For this purpose, frequent tags are identified and their co-occurrence statistics are used to create a probability distribution for each tag. After that, the frequent tags are clustered based on the distance between their co-occurrence probability distributions. Next, probability distributions for the less frequent tags are generated based on the co-occurrence with the clusters of most frequent tags. Finally, similar tags are identified by calculating the distance between the corresponding probability distributions. To that end, we propose an extension for Jensen-Shannon Divergence which is sensitive for the size of the sample from which the co-occurrence probability distributions are calculated. We evaluated our approach by applying it on folksonomies obtained from Flickr. Additionally, we compared our results to that which were produced by a traditional method for tag clustering. The adversary method identifies similar tags by calculating the cosine similarity between the co-occurrence vectors of the tags. The evaluation shows promising results and emphasizes the advantage of our approach.

Tag Similarity in Folksonomies / H. Mousselly Sergieh, E. Egyed Zsigmond, G. Gianini, M. Doller, H. Kosch, J. Pinon - In: Proceedings of the XXXI INFORSID congressPrima edizione. - [s.l] : INFORSID, 2013. - ISBN 9781632662354. - pp. 277-291 (( Intervento presentato al 31. convegno INFORSID tenutosi a Paris nel 2013.

Tag Similarity in Folksonomies

G. Gianini;
2013

Abstract

Folksonomies - collections of user-contributed tags, proved to be efficient in reducing the inherent semantic gap when retrieving web contents. To get best use of folksonomies, tag clustering was proposed to address the problems implied by free-style user tagging, such as lexical variations, tag split, multilingualism, etc. In this paper, we propose a novel approach for identifying similar tags in folksonomies. It is based on the idea that in folksonomies, the most frequent tags can be used to identify groups of semantically related tags. For this purpose, frequent tags are identified and their co-occurrence statistics are used to create a probability distribution for each tag. After that, the frequent tags are clustered based on the distance between their co-occurrence probability distributions. Next, probability distributions for the less frequent tags are generated based on the co-occurrence with the clusters of most frequent tags. Finally, similar tags are identified by calculating the distance between the corresponding probability distributions. To that end, we propose an extension for Jensen-Shannon Divergence which is sensitive for the size of the sample from which the co-occurrence probability distributions are calculated. We evaluated our approach by applying it on folksonomies obtained from Flickr. Additionally, we compared our results to that which were produced by a traditional method for tag clustering. The adversary method identifies similar tags by calculating the cosine similarity between the co-occurrence vectors of the tags. The evaluation shows promising results and emphasizes the advantage of our approach.
Folksonomies; Tag Similarity; Tag Clustering; Semantic Web
Settore INF/01 - Informatica
Settore ING-INF/05 - Sistemi di Elaborazione delle Informazioni
2013
Book Part (author)
File in questo prodotto:
File Dimensione Formato  
2013_5_1 Mousselly Sergieh.pdf

accesso riservato

Tipologia: Post-print, accepted manuscript ecc. (versione accettata dall'editore)
Dimensione 1.41 MB
Formato Adobe PDF
1.41 MB Adobe PDF   Visualizza/Apri   Richiedi una copia
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2434/526750
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 10
  • ???jsp.display-item.citation.isi??? ND
  • OpenAlex ND
social impact