Studying web graphs is often difficult due to their large size. Recently, several proposals have been published about various techniques that allow to store a web graph in memory in a limited space, exploiting the inner redundancies of the web. The WebGraph framework is a suite of codes, algorithms and tools that aims at making it easy to manipulate large web graphs. This papers presents the compression techniques used in WebGraph, which are centred around referentiation and intervalisation (which in turn are dual to each other). WebGraph can compress the WebBase graph (118 Mnodes, 1 Glinks) in as little as 3.08 bits per link, and its transposed version in as little as 2.89 bits per link.

The WebGraph framework I : compression techniques / P. Boldi, S. Vigna - In: Proceedings of the 13th international conference on World Wide Web : 2004, New York,NY,USA, May 17-20, 2004New York : ACM Press, 2004. - ISBN 158113844X. - pp. 595-602 (( Intervento presentato al 13th. convegno International World Wide Web Conference tenutosi a New York nel 2004 [10.1145/988672.988752].

The WebGraph framework I : compression techniques

P. Boldi
Primo
;
S. Vigna
Ultimo
2004

Abstract

Studying web graphs is often difficult due to their large size. Recently, several proposals have been published about various techniques that allow to store a web graph in memory in a limited space, exploiting the inner redundancies of the web. The WebGraph framework is a suite of codes, algorithms and tools that aims at making it easy to manipulate large web graphs. This papers presents the compression techniques used in WebGraph, which are centred around referentiation and intervalisation (which in turn are dual to each other). WebGraph can compress the WebBase graph (118 Mnodes, 1 Glinks) in as little as 3.08 bits per link, and its transposed version in as little as 2.89 bits per link.
Compression; Web graph
Settore INF/01 - Informatica
2004
Book Part (author)
File in questo prodotto:
Non ci sono file associati a questo prodotto.
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2434/142632
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 930
  • ???jsp.display-item.citation.isi??? ND
social impact