Recently, a new temporal dataset has been made public: it is made of a series of twelve 100 M pages snapshots of the .uk domain. The Web graphs of the twelve snapshots have been merged into a single time-aware graph that provide constant-time access to temporal information. In this paper we present the first statistical analysis performed on this graph, with the goal of checking whether the information contained in the graph is reliable (i.e. whether it depends essentially on appearance and disappearance of pages and links, or on the crawler behaviour). We perform a number of tests that show that the graph is actually reliable, and provide the first public data on the evolution of the Web that use a large scale and a significant diversity in the sites considered.

Temporal evolution of the UK web / I. Bordino, P. Boldi, D. Donato, M. Santini, S. Vigna - In: Proceedings [of the] IEEE International Conference on Data Mining Workshops : ICDM Workshops 2008 : 15-19 december 2008, Pisa, Italy / / [a cura di] F.Bonchi [et al.]. - Los Alamitos : IEEE Computer Society, 2008. - ISBN 9780769535036. - pp. 909-918 (( convegno IEEE International Conference on Workshops Data Mining Workshops tenutosi a Pisa, Italy nel 2008 [10.1109/ICDMW.2008.88].

Temporal evolution of the UK web

P. Boldi
Secondo
;
M. Santini
Penultimo
;
S. Vigna
Ultimo
2008

Abstract

Recently, a new temporal dataset has been made public: it is made of a series of twelve 100 M pages snapshots of the .uk domain. The Web graphs of the twelve snapshots have been merged into a single time-aware graph that provide constant-time access to temporal information. In this paper we present the first statistical analysis performed on this graph, with the goal of checking whether the information contained in the graph is reliable (i.e. whether it depends essentially on appearance and disappearance of pages and links, or on the crawler behaviour). We perform a number of tests that show that the graph is actually reliable, and provide the first public data on the evolution of the Web that use a large scale and a significant diversity in the sites considered.
Temporal-evolution ; Web-characterization ; Web-evolution
Settore INF/01 - Informatica
2008
IEEE
Book Part (author)
File in questo prodotto:
Non ci sono file associati a questo prodotto.
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2434/54618
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 18
  • ???jsp.display-item.citation.isi??? ND
social impact