Since the first investigations on web graph compression, it has been clear that the ordering of the nodes of the graph has a fundamental influence on the compression rate (usually expressed as the number of bits per link). The authors of the LINK database, for instance, investigated three different approaches: an extrinsic ordering (URL ordering) and two intrinsic (or coordinate-free) orderings based on the rows of the adjacency matrix (lexicographic and Gray code); they concluded that URL ordering has many advantages in spite of a small penalty in compression. In this paper we approach this issue in a more systematic way, testing some old orderings and proposing some new ones. Our experiments are made in the WebGraph framework, and show that the compression technique and the structure of the graph can produce significantly different results. In particular, we show that for the transpose web graph URL ordering is significantly less effective, and that some new orderings combining host information and Gray/lexicographic orderings outperform all previous methods. In particular, in some large transposed graphs they yield the quite incredible compression rate of 1 bit per link.

Permuting web graphs / P. Boldi, M. Santini, S. Vigna - In: Algorithms and models for the web-graph 6th international workshop, WAW 2009, Barcelona, Spain, February 12-13, 2009 : proceedings / [a cura di] K. Avrachenkov, D. Donato, and N. Litvak. - Berlin : Springer, 2009. - ISBN 9783540959946. - pp. 116-126 (( convegno Algorithms and Models for the Web-Graph, 6th International Workshop, WAW 2009 tenutosi a Barcellona nel 2009 [10.1007/978-3-540-95995-3_10].

Permuting web graphs

P. Boldi
Primo
;
M. Santini
Secondo
;
S. Vigna
Ultimo
2009

Abstract

Since the first investigations on web graph compression, it has been clear that the ordering of the nodes of the graph has a fundamental influence on the compression rate (usually expressed as the number of bits per link). The authors of the LINK database, for instance, investigated three different approaches: an extrinsic ordering (URL ordering) and two intrinsic (or coordinate-free) orderings based on the rows of the adjacency matrix (lexicographic and Gray code); they concluded that URL ordering has many advantages in spite of a small penalty in compression. In this paper we approach this issue in a more systematic way, testing some old orderings and proposing some new ones. Our experiments are made in the WebGraph framework, and show that the compression technique and the structure of the graph can produce significantly different results. In particular, we show that for the transpose web graph URL ordering is significantly less effective, and that some new orderings combining host information and Gray/lexicographic orderings outperform all previous methods. In particular, in some large transposed graphs they yield the quite incredible compression rate of 1 bit per link.
Settore INF/01 - Informatica
2009
Book Part (author)
File in questo prodotto:
Non ci sono file associati a questo prodotto.
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2434/55070
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 22
  • ???jsp.display-item.citation.isi??? 15
social impact