The Open Wikipedia Ranking is an open dataset published yearly, containing the ranking of Wikipedia pages with respect to centrality measures computed on the whole Wikipedia graph for that year. In this paper, ten years after its start, we report some details, results and anecdotal observations on this dataset. The goal of the Open Wikipedia Ranking is to provide a completely open and reproducible ranking of Wikipedia pages based on indegree, PageRank, harmonic centrality, and page views; the Wikipedia graphs themselves are also made available by the Laboratory of Web Algorithmics. What characterizes the Open Wikipedia Ranking is that the whole graph construction and ranking process are meticulously documented and reproducible. All computations are based on open-source Java software and algorithms from the literature. Thus, the reason of the centrality score of pages can be exactly traced back to structural graph properties.

Ten Years of Open Wikipedia Ranking / P. Boldi, F. Furia, S. Vigna - In: WWW '25: Companion / [a cura di] G. Long, M. Blumestein. - [s.l] : ACM, 2025. - ISBN 979-8-4007-1331-6. - pp. 883-887 (( convegno The ACM on Web Conference tenutosi a Sydney nel 2025 [10.1145/3701716.3715510].

Ten Years of Open Wikipedia Ranking

P. Boldi;F. Furia;S. Vigna
2025

Abstract

The Open Wikipedia Ranking is an open dataset published yearly, containing the ranking of Wikipedia pages with respect to centrality measures computed on the whole Wikipedia graph for that year. In this paper, ten years after its start, we report some details, results and anecdotal observations on this dataset. The goal of the Open Wikipedia Ranking is to provide a completely open and reproducible ranking of Wikipedia pages based on indegree, PageRank, harmonic centrality, and page views; the Wikipedia graphs themselves are also made available by the Laboratory of Web Algorithmics. What characterizes the Open Wikipedia Ranking is that the whole graph construction and ranking process are meticulously documented and reproducible. All computations are based on open-source Java software and algorithms from the literature. Thus, the reason of the centrality score of pages can be exactly traced back to structural graph properties.
wikipedia; ranking; open data
Settore INFO-01/A - Informatica
   SEcurity and RIghts in the CyberSpace (SERICS)
   SERICS
   MINISTERO DELL'UNIVERSITA' E DELLA RICERCA
   codice identificativo PE00000014
2025
Book Part (author)
File in questo prodotto:
File Dimensione Formato  
main.pdf

accesso riservato

Tipologia: Pre-print (manoscritto inviato all'editore)
Dimensione 602.72 kB
Formato Adobe PDF
602.72 kB Adobe PDF   Visualizza/Apri   Richiedi una copia
3701716.3715510.pdf

accesso aperto

Tipologia: Publisher's version/PDF
Dimensione 1.04 MB
Formato Adobe PDF
1.04 MB Adobe PDF Visualizza/Apri
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2434/1166397
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? 0
  • OpenAlex 0
social impact