Although tagging simplifies resource browsing and retrieval, it suffers from several issues: among them are redundancy and ambiguity. In this work we focus on the problem of resolving tag word-sense ambiguity within a typical semi-automatic tagging procedure. In that process a user proposes a tag for a resource, if the tag is found to be related to more than one context, she is provided with two or more cues among which to choose, so as to remove the tag ambiguity. Key phases, in such a disambiguation procedure, are ambiguous tag detection and cue discovery. Both should rely on effective word-to-context relatedness metrics. Among the most effective relatedness metrics are those defined on the basis of a feature vector representation of the words. In this work we compare different word-to-context relatedness metrics in terms of effectiveness within the disambiguation process. We propose to use a metrics derived from a Maximum Likelihood estimator of the Jensen-Shannon Divergence among feature-count histograms and we show that such a metrics performs -- in terms of quality of the output -- better than both the Jensen-Shannon and the Symmetrized Kullback-Leibler divergence between histograms. We study the relative gain in quality within the task of unsupervised cue discovery by using a synthetic language corpus.

Unsupervised cue-words discovery for tag-sense disambiguation: comparing dissimilarity metrics / M. Legesse, G. Gianini, D. Teferi, H. Mousselly Sergieh, D. Coquil, E. Egyed Zsigmond - In: MEDES '15 : proceedingsPrima edizione. - [s.l] : ACM, 2015. - ISBN 9781450334808. - pp. 24-28 (( Intervento presentato al 7. convegno International Conference on Management of computational and collective intElligence in Digital EcoSystems tenutosi a Caraguatatuba nel 2015 [10.1145/2857218.2857222].

Unsupervised cue-words discovery for tag-sense disambiguation: comparing dissimilarity metrics

G. Gianini
Secondo
;
2015

Abstract

Although tagging simplifies resource browsing and retrieval, it suffers from several issues: among them are redundancy and ambiguity. In this work we focus on the problem of resolving tag word-sense ambiguity within a typical semi-automatic tagging procedure. In that process a user proposes a tag for a resource, if the tag is found to be related to more than one context, she is provided with two or more cues among which to choose, so as to remove the tag ambiguity. Key phases, in such a disambiguation procedure, are ambiguous tag detection and cue discovery. Both should rely on effective word-to-context relatedness metrics. Among the most effective relatedness metrics are those defined on the basis of a feature vector representation of the words. In this work we compare different word-to-context relatedness metrics in terms of effectiveness within the disambiguation process. We propose to use a metrics derived from a Maximum Likelihood estimator of the Jensen-Shannon Divergence among feature-count histograms and we show that such a metrics performs -- in terms of quality of the output -- better than both the Jensen-Shannon and the Symmetrized Kullback-Leibler divergence between histograms. We study the relative gain in quality within the task of unsupervised cue discovery by using a synthetic language corpus.
Tagging; disambiguation; semantic relatedness; dissimilarity metrics; Jensen-Shannon divergence
Settore INF/01 - Informatica
Settore ING-INF/05 - Sistemi di Elaborazione delle Informazioni
2015
IFSP Federal Institute of São Paulo
The French Chapter of ACM Special Interest Group on Applied Computing
Book Part (author)
File in questo prodotto:
File Dimensione Formato  
2015 - Meshesha Legesse - Unsupervised cue-words discovery for tag-sense disambiguation - comparing dissimilarity metrics - PUBLISHED VERSION.pdf

accesso riservato

Descrizione: Articolo principale
Tipologia: Publisher's version/PDF
Dimensione 213.73 kB
Formato Adobe PDF
213.73 kB Adobe PDF   Visualizza/Apri   Richiedi una copia
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2434/373803
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 2
  • ???jsp.display-item.citation.isi??? ND
social impact