Folksonomies - collections of user-contributed tags, proved to be efficient in reducing the inherent semantic gap when retrieving web contents. To get best use of folksonomies, tag clustering was proposed to address the problems implied by free-style user tagging, such as lexical variations, tag split, multilingualism, etc. In this paper, we propose a novel approach for identifying similar tags in folksonomies. It is based on the idea that in folksonomies, the most frequent tags can be used to identify groups of semantically related tags. For this purpose, frequent tags are identified and their co-occurrence statistics are used to create a probability distribution for each tag. After that, the frequent tags are clustered based on the distance between their co-occurrence probability distributions. Next, probability distributions for the less frequent tags are generated based on the co-occurrence with the clusters of most frequent tags. Finally, similar tags are identified by calculating the distance between the corresponding probability distributions. To that end, we propose an extension for Jensen-Shannon Divergence which is sensitive for the size of the sample from which the co-occurrence probability distributions are calculated. We evaluated our approach by applying it on folksonomies obtained from Flickr. Additionally, we compared our results to that which were produced by a traditional method for tag clustering. The adversary method identifies similar tags by calculating the cosine similarity between the co-occurrence vectors of the tags. The evaluation shows promising results and emphasizes the advantage of our approach.
Tag Similarity in Folksonomies / H. Mousselly Sergieh, E. Egyed Zsigmond, G. Gianini, M. Doller, H. Kosch, J. Pinon - In: Proceedings of the XXXI INFORSID congressPrima edizione. - [s.l] : INFORSID, 2013. - ISBN 9781632662354. - pp. 277-291 (( Intervento presentato al 31. convegno INFORSID tenutosi a Paris nel 2013.
Tag Similarity in Folksonomies
G. Gianini;
2013
Abstract
Folksonomies - collections of user-contributed tags, proved to be efficient in reducing the inherent semantic gap when retrieving web contents. To get best use of folksonomies, tag clustering was proposed to address the problems implied by free-style user tagging, such as lexical variations, tag split, multilingualism, etc. In this paper, we propose a novel approach for identifying similar tags in folksonomies. It is based on the idea that in folksonomies, the most frequent tags can be used to identify groups of semantically related tags. For this purpose, frequent tags are identified and their co-occurrence statistics are used to create a probability distribution for each tag. After that, the frequent tags are clustered based on the distance between their co-occurrence probability distributions. Next, probability distributions for the less frequent tags are generated based on the co-occurrence with the clusters of most frequent tags. Finally, similar tags are identified by calculating the distance between the corresponding probability distributions. To that end, we propose an extension for Jensen-Shannon Divergence which is sensitive for the size of the sample from which the co-occurrence probability distributions are calculated. We evaluated our approach by applying it on folksonomies obtained from Flickr. Additionally, we compared our results to that which were produced by a traditional method for tag clustering. The adversary method identifies similar tags by calculating the cosine similarity between the co-occurrence vectors of the tags. The evaluation shows promising results and emphasizes the advantage of our approach.| File | Dimensione | Formato | |
|---|---|---|---|
|
2013_5_1 Mousselly Sergieh.pdf
accesso riservato
Tipologia:
Post-print, accepted manuscript ecc. (versione accettata dall'editore)
Dimensione
1.41 MB
Formato
Adobe PDF
|
1.41 MB | Adobe PDF | Visualizza/Apri Richiedi una copia |
Pubblicazioni consigliate
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.




