Understanding the correlation between two different scores for the same set of items is a common problem in graph analysis and information retrieval. The most commonly used statistics that quantifies this correlation is Kendall's tau; however, the standard definition fails to capture that discordances between items with high rank are more important than those between items with low rank. Recently, a new measure of correlation based on average precision has been proposed to solve this problem, but like many alternative proposals in the literature it assumes that there are no ties in the scores. This is a major deficiency in a number of contexts, and in particular when comparing centrality scores on large graphs, as the obvious baseline, indegree, has a very large number of ties in social networks and web graphs. We propose to extend Kendall's definition in a natural way to take into account weights in the presence of ties. We prove a number of interesting mathematical properties of our generalization and describe an O(n log n) algorithm for its computation. We also validate the usefulness of our weighted measure of correlation using experimental data on social networks and web graphs.

A weighted correlation index for rankings with ties / S. Vigna - In: WWW '15 : proceedingsPrima edizione. - [s.l] : ACM New York, 2015. - ISBN 9781450334693. - pp. 1166-1176 (( Intervento presentato al 24. convegno World Wide Web [10.1145/2736277.2741088].

A weighted correlation index for rankings with ties

S. Vigna
Primo
2015

Abstract

Understanding the correlation between two different scores for the same set of items is a common problem in graph analysis and information retrieval. The most commonly used statistics that quantifies this correlation is Kendall's tau; however, the standard definition fails to capture that discordances between items with high rank are more important than those between items with low rank. Recently, a new measure of correlation based on average precision has been proposed to solve this problem, but like many alternative proposals in the literature it assumes that there are no ties in the scores. This is a major deficiency in a number of contexts, and in particular when comparing centrality scores on large graphs, as the obvious baseline, indegree, has a very large number of ties in social networks and web graphs. We propose to extend Kendall's definition in a natural way to take into account weights in the presence of ties. We prove a number of interesting mathematical properties of our generalization and describe an O(n log n) algorithm for its computation. We also validate the usefulness of our weighted measure of correlation using experimental data on social networks and web graphs.
kendall tau; correlation; weights
Settore INF/01 - Informatica
   New tools and Algorithms for Direct NEtwork analysis
   NADINE
   EUROPEAN COMMISSION
   FP7
   288956
2015
Book Part (author)
File in questo prodotto:
File Dimensione Formato  
p1166.pdf

accesso riservato

Tipologia: Publisher's version/PDF
Dimensione 885.95 kB
Formato Adobe PDF
885.95 kB Adobe PDF   Visualizza/Apri   Richiedi una copia
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2434/372429
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 56
  • ???jsp.display-item.citation.isi??? 43
social impact